swish-e-2.4.7/0000777000077100017500000000000011166013172010110 500000000000000swish-e-2.4.7/conf/0000777000077100017500000000000011166013170011033 500000000000000swish-e-2.4.7/conf/Makefile.am0000664000077100017500000000072211166010103012777 00000000000000exampledir = $(datadir)/doc/$(PACKAGE)/examples/conf conf_dir = \ stopwords/dutch.txt \ stopwords/english.txt \ stopwords/german.txt \ stopwords/spanish.txt \ README \ example1.config \ example2.config \ example3.config \ example4.config \ example5.config \ example6.config \ example7.config \ example8.config \ example9.config \ example9.pl nobase_example_DATA = $(conf_dir) EXTRA_DIST = $(conf_dir) swish-e-2.4.7/conf/example1.config0000775000077100017500000000255111166010103013653 00000000000000# ----- Example 1 - limit by extension ------------- # # Please see the swish-e documentation for # information on configuration directives. # Documentation is included with the swish-e # distribution, and also can be found on-line # at http://swish-e.org # # # This example demonstrates how to limit # indexing to just .htm and .html files. # #--------------------------------------------------- # By default, swish creates an index file in the current directory # called "index.swish-e" (and swish uses this name by default when # searching. This is convenient, but not always desired. IndexFile /home/indexfiles/docs.index # Although you can specify which files or directories to # index on the command line with -i, it's common to specify # it here. Note that these are relative to the current directory. # Index two directories, "docs" (below current directory) and # "/home/otherdocs", and within those directories (and all sub # directories) index only files ending in .html and .htm. IndexDir docs /home/otherdocs IndexOnly .htm .html # If you wish to follow symbolic links use the following. # Note that the default is "no". I you are indexing many # files, and you do not have any symbolic links, you may # still want to set this to "yes". This will avoid an extra # lstat system call for every file while indexing. FollowSymLinks yes # end of example swish-e-2.4.7/conf/example6.config0000775000077100017500000000461511166010103013663 00000000000000# ----- Example 6 - Spider using "prog" feature ------- # # Please see the swish-e documentation for # information on configuration directives. # Documentation is included with the swish-e # distribution, and also can be found on-line # at http://swish-e.org # # # This example demonstrates how to use the # new (as of 2.2) "prog" document source feature # to spider a webserver. # # The "prog" document source feature allows # an external program to feed documents to # swish, one after another. This allows you # to index documents from any source (e.g. web, DBMS) # and to filter and adjust the content before swish # indexes the content. # # This example uses the provided spider.pl program # to spider a remote web server. This spider offers # more features than the "http" spider method shown # in example7.config. # # ** Please don't test with this exact config ** # spider your own web server # # Indexing (spidering) is started with the following # command issued from the "conf" directory: # # swish-e -S prog -c example6.config # # Note: You should have the current Bundle::LWP bundle # of perl modules installed. This was tested with: # libwww-perl-5.53 # Run "perldoc spider.pl" in the prog-bin directory for # more information. # # ** Do not spider a web server without permission ** # #--------------------------------------------------- # Include our site-wide configuration settings: IncludeConfigFile example4.config # Specify the program to run IndexDir ../prog-bin/spider.pl # When running under the "prog" document source method you can # pass a list of parameters to the program (specified with -i or IndexDir). # If a parameter is passed to spider.pl, it will use that as the configuration # file. # As a special case, the word "default" followed by URL(s). # In this case the spider will use default settings to spider the provided URLs. SwishProgParameters default http://swish-e.org # Note: the default used by spider.pl is SwishSpiderConfig.pl. # See prog-bin/SwishSpiderConfig.pl for examples # that include filtering PDF and MS Word documents. # Tell swish that about how to parse the content DefaultContents HTML IndexContents HTML .htm .html IndexContents TXT .txt .conf # Just to make it interesting, let's modify the URL that get's indexed: # replace http://swish-e.org/ => http:/localhost/ ReplaceRules replace swish-e.org localhost # end of example swish-e-2.4.7/conf/README0000775000077100017500000000435511166010103011634 00000000000000This "conf" directory contains example swish-e configuration settings. In the "stopwords" directory are files that contain lists of common stopwords for a few languages. They can be selected with the IgnoreWords configuration directive. See SWISH-CONFIG.pod for more information. Configuration examples: Note: In many cases you may not need a configurataion file at all when indexing with swish. The configuration defaults should get you started. The configuration defaults can also be set when compiling swish-e by adjusting the settings in config.h. For example, you can index a directory (and sub directories) simply by calling swish as: swish-e -i . In general, though, you will use a config file to specify the configuration parameters to use while indexing: swish-e -c mysettings.config If you are having problems indexing some files, you can specify a single file on the command line which will override the IndexDir configuration setting in your config file: swish-e -c mysettings.config -i test.html -f other.index which will index with your settings, but only index one file, and write the index to the file specified with the -f option. The files included here are examples. It is recommended that you create your own configuration file as needed, only adding additional directives when you need to change the default behavior. If you are generating a number of indexes, then consider moving common configuration directives into a single file, and then including that configuration file in other configuration files. See the IncludeConfig directive in the SWISH-CONFIG.pod manual page. Examples: example1.config - index only html files, plus add labels to the index file example2.config - include metanames in your index example3.config - add descriptive tags to your index example4.config - site-wide settings example5.config - Using "FileRules" to control what gets indexed example6.config - spider using "prog" feature example7.config - spider using the "http" method of indexing example8.config - using "filters" to convert PDF files. example9.config - using the "prog" method for filtering example9.pl swish-e-2.4.7/conf/example3.config0000775000077100017500000000202211166010103013646 00000000000000# ----- Example 3 - Descriptive Index Files ------- # # Please see the swish-e documentation for # information on configuration directives. # Documentation is included with the swish-e # distribution, and also can be found on-line # at http://swish-e.org # # # This example demonstrates how to include # descriptive names to your index file # #--------------------------------------------------- # Define headers values for the index -- this might be # helpful if you have a common front end program that # reads more than one index file. These headers can be # returned when running swish (see the -H switch). IndexName "Test index" IndexDescription "This is an index of our two document directories." IndexPointer "http://someplace" IndexAdmin "Doc Manager (dmanager@foo.invalid)" # From previous examples: # What to index IndexFile /home/indexfiles/docs.index IndexDir docs /home/otherdocs IndexOnly .htm .html FollowSymLinks yes # Index meta tags MetaNames author description UndefinedMetaNames ignore # end of example swish-e-2.4.7/conf/example5.config0000775000077100017500000000151511166010103013656 00000000000000# ----- Example 5 - Using FileRules ------- # # Please see the swish-e documentation for # information on configuration directives. # Documentation is included with the swish-e # distribution, and also can be found on-line # at http://swish-e.org # # # This is a simple example of how to use FileRules # to limit what is indexed. # #--------------------------------------------------- # Include our site-wide configuration settings: IncludeConfigFile example4.config # Index the current directory IndexDir . # Now let's index only these example scripts # It would be easier to use IndexOnly, true, # but, this is just an example... # Don't index the stopwords directory FileRules pathname contains stopwords CVS # And don't index example6.spider or any index files FileRules filename contains .spider index. # end of example swish-e-2.4.7/conf/example7.config0000775000077100017500000000352311166010103013661 00000000000000# ----- Example 7 - Spider using "http" method ------- # # Please see the swish-e documentation for # information on configuration directives. # Documentation is included with the swish-e # distribution, and also can be found on-line # at http://swish-e.org # # # This example demonstrates how to use the # the "http" method of spidering. # # Indexing (spidering) is started with the following # command issued from the "conf" directory: # # swish-e -S http -c example7.config # # Note: You should have the current Bundle::LWP bundle # of perl modules installed. This was tested with: # libwww-perl-5.53 # # ** Do not spider a web server without permission ** # #--------------------------------------------------- # Include our site-wide configuration settings: IncludeConfigFile example4.config # Specify the URL (or URLs) to index: IndexDir http://swish-e.org # If a server goes by more than one name you can use this directive: EquivalentServer http://swish-e.org http://www.swish-e.org # This defines how many links the spider should # follow before stopping. A value of 0 configures the spider to # traverse all links. The default is 5 # The idea is to limit spidering, but seems of questionable use # since depth may not be related to anything useful. MaxDepth 10 # The number of seconds to wait between issuing # requests to a server. The default is 60 seconds. Delay 1 # (default /var/tmp) The location of a writeable temp directory # on your system. The HTTP access method tells the Perl helper to place # its files there. The default is defined in src/config.h and depends on # the current OS. TmpDir . # The "http" method uses a perl helper program to fetch each document # from the web called "swishspider" and is included in the src directory of # the swish-e distribution. SpiderDirectory ../src # end of example swish-e-2.4.7/conf/Makefile.in0000664000077100017500000002564511166010103013023 00000000000000# Makefile.in generated by automake 1.9.6 from Makefile.am. # @configure_input@ # Copyright (C) 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, # 2003, 2004, 2005 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. @SET_MAKE@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = .. am__cd = CDPATH="$${ZSH_VERSION+.}$(PATH_SEPARATOR)" && cd INSTALL = @INSTALL@ install_sh_DATA = $(install_sh) -c -m 644 install_sh_PROGRAM = $(install_sh) -c install_sh_SCRIPT = $(install_sh) -c INSTALL_HEADER = $(INSTALL_DATA) transform = $(program_transform_name) NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : build_triplet = @build@ host_triplet = @host@ subdir = conf DIST_COMMON = README $(srcdir)/Makefile.am $(srcdir)/Makefile.in ACLOCAL_M4 = $(top_srcdir)/aclocal.m4 am__aclocal_m4_deps = $(top_srcdir)/config/acinclude.m4 \ $(top_srcdir)/configure.in am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \ $(ACLOCAL_M4) mkinstalldirs = $(SHELL) $(top_srcdir)/config/mkinstalldirs CONFIG_HEADER = $(top_builddir)/src/acconfig.h CONFIG_CLEAN_FILES = SOURCES = DIST_SOURCES = am__vpath_adj_setup = srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; am__vpath_adj = case $$p in \ $(srcdir)/*) f=`echo "$$p" | sed "s|^$$srcdirstrip/||"`;; \ *) f=$$p;; \ esac; am__strip_dir = `echo $$p | sed -e 's|^.*/||'`; am__installdirs = "$(DESTDIR)$(exampledir)" nobase_exampleDATA_INSTALL = $(install_sh_DATA) DATA = $(nobase_example_DATA) DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST) ACLOCAL = @ACLOCAL@ ALLOCA = @ALLOCA@ AMDEP_FALSE = @AMDEP_FALSE@ AMDEP_TRUE = @AMDEP_TRUE@ AMTAR = @AMTAR@ AR = @AR@ AS = @AS@ AUTOCONF = @AUTOCONF@ AUTOHEADER = @AUTOHEADER@ AUTOMAKE = @AUTOMAKE@ AWK = @AWK@ BTREE_OBJS = @BTREE_OBJS@ BUILDDOCS_FALSE = @BUILDDOCS_FALSE@ BUILDDOCS_TRUE = @BUILDDOCS_TRUE@ CC = @CC@ CCDEPMODE = @CCDEPMODE@ CFLAGS = @CFLAGS@ CPP = @CPP@ CPPFLAGS = @CPPFLAGS@ CXX = @CXX@ CXXCPP = @CXXCPP@ CXXDEPMODE = @CXXDEPMODE@ CXXFLAGS = @CXXFLAGS@ CYGPATH_W = @CYGPATH_W@ DEFS = @DEFS@ DEPDIR = @DEPDIR@ DLLTOOL = @DLLTOOL@ ECHO = @ECHO@ ECHO_C = @ECHO_C@ ECHO_N = @ECHO_N@ ECHO_T = @ECHO_T@ EGREP = @EGREP@ EXEEXT = @EXEEXT@ F77 = @F77@ FFLAGS = @FFLAGS@ INSTALLDOCS_FALSE = @INSTALLDOCS_FALSE@ INSTALLDOCS_TRUE = @INSTALLDOCS_TRUE@ INSTALL_DATA = @INSTALL_DATA@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@ LARGEFILES_MACROS = @LARGEFILES_MACROS@ LDFLAGS = @LDFLAGS@ LIBOBJS = @LIBOBJS@ LIBS = @LIBS@ LIBTOOL = @LIBTOOL@ LIBXML2_CFLAGS = @LIBXML2_CFLAGS@ LIBXML2_LIB = @LIBXML2_LIB@ LIBXML2_OBJS = @LIBXML2_OBJS@ LIBXML_REQUIRED_VERSION = @LIBXML_REQUIRED_VERSION@ LN_S = @LN_S@ LTLIBOBJS = @LTLIBOBJS@ MAINT = @MAINT@ MAINTAINER_MODE_FALSE = @MAINTAINER_MODE_FALSE@ MAINTAINER_MODE_TRUE = @MAINTAINER_MODE_TRUE@ MAKEINFO = @MAKEINFO@ OBJDUMP = @OBJDUMP@ OBJEXT = @OBJEXT@ PACKAGE = @PACKAGE@ PACKAGE_BUGREPORT = @PACKAGE_BUGREPORT@ PACKAGE_NAME = @PACKAGE_NAME@ PACKAGE_STRING = @PACKAGE_STRING@ PACKAGE_TARNAME = @PACKAGE_TARNAME@ PACKAGE_VERSION = @PACKAGE_VERSION@ PATH_SEPARATOR = @PATH_SEPARATOR@ PCRE_CFLAGS = @PCRE_CFLAGS@ PCRE_CONFIG = @PCRE_CONFIG@ PCRE_LIBS = @PCRE_LIBS@ PCRE_REQUIRED_VERSION = @PCRE_REQUIRED_VERSION@ PERL = @PERL@ POD2MAN = @POD2MAN@ RANLIB = @RANLIB@ SET_MAKE = @SET_MAKE@ SHELL = @SHELL@ STRIP = @STRIP@ SWISH_WEB = @SWISH_WEB@ VERSION = @VERSION@ XML2_CONFIG = @XML2_CONFIG@ Z_CFLAGS = @Z_CFLAGS@ Z_LIBS = @Z_LIBS@ ac_ct_AR = @ac_ct_AR@ ac_ct_AS = @ac_ct_AS@ ac_ct_CC = @ac_ct_CC@ ac_ct_CXX = @ac_ct_CXX@ ac_ct_DLLTOOL = @ac_ct_DLLTOOL@ ac_ct_F77 = @ac_ct_F77@ ac_ct_OBJDUMP = @ac_ct_OBJDUMP@ ac_ct_RANLIB = @ac_ct_RANLIB@ ac_ct_STRIP = @ac_ct_STRIP@ am__fastdepCC_FALSE = @am__fastdepCC_FALSE@ am__fastdepCC_TRUE = @am__fastdepCC_TRUE@ am__fastdepCXX_FALSE = @am__fastdepCXX_FALSE@ am__fastdepCXX_TRUE = @am__fastdepCXX_TRUE@ am__include = @am__include@ am__leading_dot = @am__leading_dot@ am__quote = @am__quote@ am__tar = @am__tar@ am__untar = @am__untar@ bindir = @bindir@ build = @build@ build_alias = @build_alias@ build_cpu = @build_cpu@ build_os = @build_os@ build_vendor = @build_vendor@ datadir = @datadir@ exec_prefix = @exec_prefix@ host = @host@ host_alias = @host_alias@ host_cpu = @host_cpu@ host_os = @host_os@ host_vendor = @host_vendor@ includedir = @includedir@ infodir = @infodir@ install_sh = @install_sh@ libdir = @libdir@ libexecdir = @libexecdir@ localstatedir = @localstatedir@ mandir = @mandir@ mkdir_p = @mkdir_p@ oldincludedir = @oldincludedir@ prefix = @prefix@ program_transform_name = @program_transform_name@ sbindir = @sbindir@ sharedstatedir = @sharedstatedir@ sysconfdir = @sysconfdir@ target_alias = @target_alias@ exampledir = $(datadir)/doc/$(PACKAGE)/examples/conf conf_dir = \ stopwords/dutch.txt \ stopwords/english.txt \ stopwords/german.txt \ stopwords/spanish.txt \ README \ example1.config \ example2.config \ example3.config \ example4.config \ example5.config \ example6.config \ example7.config \ example8.config \ example9.config \ example9.pl nobase_example_DATA = $(conf_dir) EXTRA_DIST = $(conf_dir) all: all-am .SUFFIXES: $(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am $(am__configure_deps) @for dep in $?; do \ case '$(am__configure_deps)' in \ *$$dep*) \ cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh \ && exit 0; \ exit 1;; \ esac; \ done; \ echo ' cd $(top_srcdir) && $(AUTOMAKE) --foreign conf/Makefile'; \ cd $(top_srcdir) && \ $(AUTOMAKE) --foreign conf/Makefile .PRECIOUS: Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status @case '$?' in \ *config.status*) \ cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh;; \ *) \ echo ' cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe)'; \ cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe);; \ esac; $(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh $(top_srcdir)/configure: @MAINTAINER_MODE_TRUE@ $(am__configure_deps) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh $(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh mostlyclean-libtool: -rm -f *.lo clean-libtool: -rm -rf .libs _libs distclean-libtool: -rm -f libtool uninstall-info-am: install-nobase_exampleDATA: $(nobase_example_DATA) @$(NORMAL_INSTALL) test -z "$(exampledir)" || $(mkdir_p) "$(DESTDIR)$(exampledir)" @$(am__vpath_adj_setup) \ list='$(nobase_example_DATA)'; for p in $$list; do \ if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \ $(am__vpath_adj) \ echo " $(nobase_exampleDATA_INSTALL) '$$d$$p' '$(DESTDIR)$(exampledir)/$$f'"; \ $(nobase_exampleDATA_INSTALL) "$$d$$p" "$(DESTDIR)$(exampledir)/$$f"; \ done uninstall-nobase_exampleDATA: @$(NORMAL_UNINSTALL) @$(am__vpath_adj_setup) \ list='$(nobase_example_DATA)'; for p in $$list; do \ $(am__vpath_adj) \ echo " rm -f '$(DESTDIR)$(exampledir)/$$f'"; \ rm -f "$(DESTDIR)$(exampledir)/$$f"; \ done tags: TAGS TAGS: ctags: CTAGS CTAGS: distdir: $(DISTFILES) $(mkdir_p) $(distdir)/stopwords @srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; \ topsrcdirstrip=`echo "$(top_srcdir)" | sed 's|.|.|g'`; \ list='$(DISTFILES)'; for file in $$list; do \ case $$file in \ $(srcdir)/*) file=`echo "$$file" | sed "s|^$$srcdirstrip/||"`;; \ $(top_srcdir)/*) file=`echo "$$file" | sed "s|^$$topsrcdirstrip/|$(top_builddir)/|"`;; \ esac; \ if test -f $$file || test -d $$file; then d=.; else d=$(srcdir); fi; \ dir=`echo "$$file" | sed -e 's,/[^/]*$$,,'`; \ if test "$$dir" != "$$file" && test "$$dir" != "."; then \ dir="/$$dir"; \ $(mkdir_p) "$(distdir)$$dir"; \ else \ dir=''; \ fi; \ if test -d $$d/$$file; then \ if test -d $(srcdir)/$$file && test $$d != $(srcdir); then \ cp -pR $(srcdir)/$$file $(distdir)$$dir || exit 1; \ fi; \ cp -pR $$d/$$file $(distdir)$$dir || exit 1; \ else \ test -f $(distdir)/$$file \ || cp -p $$d/$$file $(distdir)/$$file \ || exit 1; \ fi; \ done check-am: all-am check: check-am all-am: Makefile $(DATA) installdirs: for dir in "$(DESTDIR)$(exampledir)"; do \ test -z "$$dir" || $(mkdir_p) "$$dir"; \ done install: install-am install-exec: install-exec-am install-data: install-data-am uninstall: uninstall-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am installcheck: installcheck-am install-strip: $(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \ install_sh_PROGRAM="$(INSTALL_STRIP_PROGRAM)" INSTALL_STRIP_FLAG=-s \ `test -z '$(STRIP)' || \ echo "INSTALL_PROGRAM_ENV=STRIPPROG='$(STRIP)'"` install mostlyclean-generic: clean-generic: distclean-generic: -test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES) maintainer-clean-generic: @echo "This command is intended for maintainers to use" @echo "it deletes files that may require special tools to rebuild." clean: clean-am clean-am: clean-generic clean-libtool mostlyclean-am distclean: distclean-am -rm -f Makefile distclean-am: clean-am distclean-generic distclean-libtool dvi: dvi-am dvi-am: html: html-am info: info-am info-am: install-data-am: install-nobase_exampleDATA install-exec-am: install-info: install-info-am install-man: installcheck-am: maintainer-clean: maintainer-clean-am -rm -f Makefile maintainer-clean-am: distclean-am maintainer-clean-generic mostlyclean: mostlyclean-am mostlyclean-am: mostlyclean-generic mostlyclean-libtool pdf: pdf-am pdf-am: ps: ps-am ps-am: uninstall-am: uninstall-info-am uninstall-nobase_exampleDATA .PHONY: all all-am check check-am clean clean-generic clean-libtool \ distclean distclean-generic distclean-libtool distdir dvi \ dvi-am html html-am info info-am install install-am \ install-data install-data-am install-exec install-exec-am \ install-info install-info-am install-man \ install-nobase_exampleDATA install-strip installcheck \ installcheck-am installdirs maintainer-clean \ maintainer-clean-generic mostlyclean mostlyclean-generic \ mostlyclean-libtool pdf pdf-am ps ps-am uninstall uninstall-am \ uninstall-info-am uninstall-nobase_exampleDATA # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: swish-e-2.4.7/conf/stopwords/0000777000077100017500000000000011166013170013077 500000000000000swish-e-2.4.7/conf/stopwords/german.txt0000664000077100017500000000176711166010103015033 00000000000000# Some Stopwords for German Language # -- for swish-e (V2.0 or higher) # -- Rainer.Scherg (Rainer.Scherg@t-online.de) # # ab aber als am an auch auf aus bei beim bin bis bist da dadurch daher dann darum das dass daß dein deine dem den der deren des deshalb die dies dieser dieses doch dort du durch ein eine einem einen einer eines er es euer eure euren für fuer haben hatte hatten hattest hattet hätte haette hätten hätten hier hinten ich ihr ihre im in ist ja jede jedem jeden jeder jedes jener jenes jetzt kann kannst können könnt könnte konnte koennen koennte könnten koennten machen mein meine mit muss muß musst mußt muessen müssen muesst müßt muesste müßte nach nachdem nein nicht noch nun oder seid sein seine seit sich sie sind soll sollen sollst sollt solltest sonst soweit sowie und unser unsere unter über ueber vom von vor wann warum war waere wäre was weiter weitere welcher welcher wenn wer werde werden werdet weshalb wie wieder wieso wir wird wirst wo wurde zu zum zur zurück swish-e-2.4.7/conf/stopwords/dutch.txt0000664000077100017500000000125411166010103014660 00000000000000# Some Stopwords for the Dutch Language # -- for swish-e (V2.0 or higher) # -- Bas Meijer (bas@antraciet.nl) # aan achter af alle als behalve ben bent bij daar daardoor daarom dan dat de derhalve deze die echter een elk elke en enige haar had hadden heb hebben hebt heeft hen het hier hij hun ieder iedere iemand ik in is ja jij jouw jullie kan kunnen maak maakt maar maken met mijn mits moest moet moeten na nee niemand niet noch nog nu of omdat onder ons onze ook over overige sinds tot totdat tussen u uw van vanaf verder voor voorts waar waarom wanneer wat wederom weer welk welke werd werden wie wij word worden wordt zal zich zij zijn zoals zou zouden zullen swish-e-2.4.7/conf/stopwords/english.txt0000664000077100017500000000433311166010103015203 00000000000000# Stopwords for English Language # as taken from http://sunsite.berkeley.edu/SWISH-E/stopwords.txt # -- for swish-e (V2.0 or higher) # -- Jose Ruiz (jmruiz@boe.es) # a above according across actually adj after afterwards again against all almost alone along already also although always among amongst an and another any anyhow anyone anything anywhere are aren aren't around as at be became because become becomes becoming been before beforehand begin beginning behind being below beside besides between beyond billion both but by can can't cannot caption co could couldn couldn't did didn didn't do does doesn doesn't don don't down during each eg eight eighty either else elsewhere end ending enough etc even ever every everyone everything everywhere except few fifty first five for former formerly forty found four from further had has hasn hasn't have haven haven't he hence her here hereafter hereby herein hereupon hers herself him himself his how however hundred ie i.e. if in inc inc. indeed instead into is isn isn't it its itself last later latter latterly least less let like likely ll ltd made make makes many maybe me meantime meanwhile might million miss more moreover most mostly mr mrs much must my myself namely neither never nevertheless next nine ninety no nobody none nonetheless noone nor not nothing now nowhere of off often on once one only onto or others otherwise our ours ourselves out over overall own per perhaps rather re recent recently same seem seemed seeming seems seven seventy several she should shouldn shouldn't since six sixty so some somehow someone something sometime sometimes somewhere still stop such taking ten than that the their them themselves then thence there thereafter thereby therefore therein thereupon these they thirty this those though thousand three through throughout thru thus to together too toward towards trillion twenty two under unless unlike unlikely until up upon us used using ve very via was wasn we we well were weren weren't what whatever when whence whenever where whereafter whereas whereby wherein whereupon wherever whether which while whither who whoever whole whom whomever whose why will with within without won would wouldn wouldn't yes yet you your yours yourself yourselves swish-e-2.4.7/conf/stopwords/spanish.txt0000664000077100017500000000143311166010103015215 00000000000000# Some Stopwords for Spanish Language # -- for swish-e (V2.0 or higher) # -- Jose Ruiz (jmruiz@boe.es) # # a al ante atras asi aun aunque aqui alli bajo cabe con contra cada casi como cual cuales cualquiera cuando como cuanto cuantos de desde del demas donde en entre el este esta esto estos estas ese esa eso esos esas aquel aquella aquello aquellos aquellas ella ello ellos ellas etc hacia hasta la lo los las mas me mio mia mios mias mi mis no ni ningun ninguno ninguna nosotros nosotras nunca nuestro nuestra nuestros nuestras para por porque pero pues que quiza quizas segun si so sobre se sin sino suyo suyas sus tras tu te tal tales tambien tampoco tan tanto tantos tanta tantas tuyo tuyos tus un uno unos unas vosotros vosotras vuestro vuestra vuestros vuestras y yo ya swish-e-2.4.7/conf/example2.config0000775000077100017500000000170011166010103013647 00000000000000# ----- Example 2 - Include MetaNames ------------- # # Please see the swish-e documentation for # information on configuration directives. # Documentation is included with the swish-e # distribution, and also can be found on-line # at http://swish-e.org # # # This example demonstrates how to include # MetaNames in your index. # The metanames can be used when searching: # # swish-e -w foo author=shakespeare # #--------------------------------------------------- # Specify what you want to be indexed. # (see example1.config) IndexFile /home/indexfiles/docs.index IndexDir docs /home/otherdocs IndexOnly .htm .html FollowSymLinks yes # Now, specify which meta name to include in the index. MetaNames author description # By default, undefined meta names are indexed as plain text # This feature can change this behaviour. Here we say # don't index text in metatags unless defined in MetaNames UndefinedMetaTags ignore # end of example swish-e-2.4.7/conf/example9.pl0000775000077100017500000000314311166010103013027 00000000000000#!/usr/local/bin/perl -w use strict; # This is a short example that basically does the same # thing as the default file system access method by # recursing directories, but also shows how to process different # file types -- in this example pdf is converted to xml for indexing. # in this example, only .pdf and .config files are indexed. # the pdf2xml module is in the prog-bin directory of the swish-e distribution use lib '../prog-bin'; use File::Find; # for recursing a directory tree use pdf2xml; # example module for pdf to xml conversion # Not that you need IndexContents XML .pdf in the # swish-e config file # See perldoc File::Find for information on following symbolic links use constant DEBUG => 0; # See if a directory was passed in via the SwishProgParameters swish # directive my $dir = shift || '.'; find( { wanted => \&wanted, no_chdir => 1, }, $dir, ); sub wanted { return if -d; if ( /\.pdf$/ ) { print STDERR "Indexing pdf $File::Find::name\n" if DEBUG; print ${ pdf2xml( $File::Find::name ) }; } elsif ( /\.config$/ ) { print STDERR "Indexing $File::Find::name\n" if DEBUG; print ${ get_content( $File::Find::name ) }; } else { print STDERR "Skipping $File::Find::name\n" if DEBUG; } } sub get_content { my $path = shift; my ( $size, $mtime ) = (stat $path )[7,9]; open FH, $path or die "$path: $!"; my $content = <; return \$content; } swish-e-2.4.7/conf/example4.config0000775000077100017500000000553711166010103013665 00000000000000# ----- Example 4 - Site-wide settings ------- # # Please see the swish-e documentation for # information on configuration directives. # Documentation is included with the swish-e # distribution, and also can be found on-line # at http://swish-e.org # # # This example demonstrates how to define # settings that change the way swish indexes. # Since you will probably want consistent # settings for all your indexes, you can # create one file, and include it in other # config files. # # Once you define a common configuration file you # can include it in other configuration files. For # example, if this file was saved as "common.config" # you can include it in other configuration files # with the following directive: # # ... # IncludeConfig /home/swish/common.config # ... # #--------------------------------------------------- # These settings tell swish what defines a word. # We only index words that include letters, numbers, a dash, # or a period. (Not very realistic) # These are the characters that are allowed in a "word". # i.e. words are split on any character NOT found in WordCharacters WordCharacters abcdefghijklmnopqrstuvwxyz0123456789.- # We allow a period and a dash within words, but strip them # from the beginning or end of a word. This is done after # WordCharacters above is used to split words. IgnoreFirstChar .- IgnoreLastChar .- # Finally, resulting words must begin/end with one # of the characters listed here BeginCharacters abcdefghijklmnopqrstuvwxyz0123456789 EndCharacters abcdefghijklmnopqrstuvwxyz0123456789 # Turn this on for a slight performance improvement #FollowSymLinks yes # This is how detailed you want reporting. You can specify numbers # 0 to 3 - 0 is totally silent, 3 is the most verbose. # 4 is debugging. Can be overridden with -v on the command line IndexReport 2 # Set the stopwords (words to ignore when searching and when indexing) # Carefully think about this feature before using a list of stopwords # You can list the words here: # IgnoreWords of or and the a to # Or you can use the compiled in defaults: # IgnoreWords SwishDefault # Or you can use a file that includes your own words: IgnoreWords file: stopwords/my_stopwords.txt # Another option is to use the IgnoreLimit directive, and # swish will determine what stopwords to use. But please # read the documentation before using the IgnoreLimit directive. # It can be slow, and may not work with other options. # Since we are using such a restrictive WordCharacters settings, we # want to map eight-bit characters to ascii. # For example, "resumé" will be indexed and searched as "resume". # See docs for more info. TranslateCharacters :ascii7: # We don't want pharse searches to work across sentenses, plus # we use the pipe "|" to force a break in phrases when indexing. BumpPositionCounterCharacters |. # end of example swish-e-2.4.7/conf/example8.config0000775000077100017500000000241511166010103013661 00000000000000# ----- Example 8 - Filtering PDF files ------- # # Please see the swish-e documentation for # information on configuration directives. # Documentation is included with the swish-e # distribution, and also can be found on-line # at http://swish-e.org # # # This example demonstrates how to use swish's # "filter" feature to index PDF documents. # # Filters can be used to filter PDF or MS Word documents # to uncompress gzipped files, or to modify content # before indexing. # # You will need the xpdf package installed to use # this filter. # # See filter-bin/_pdf2html.pl for more information. # # Please see the documentation on File Filters in # the SWISH-CONFIG.pod manual page. # # Note: # If you are filtering many documents and/or using # a perl script to filter, see example9.config for # perhaps a faster way to filter. # #--------------------------------------------------- # Include our site-wide configuration settings: IncludeConfigFile example4.config # Index the example config files and .pdf files # in the current directory (and sub directories) IndexDir . IndexOnly .config .pdf # Assign the pdf2text.pl filter to .pdf files # Please see docs on what data can be passed to the filter. FileFilter .pdf ../filter-bin/_pdf2html.pl # end of example swish-e-2.4.7/conf/example9.config0000775000077100017500000000524011166010103013661 00000000000000# ----- Example 9 - Filtering PDF with "prog" ------- # # Please see the swish-e documentation for # information on configuration directives. # Documentation is included with the swish-e # distribution, and also can be found on-line # at http://swish-e.org # # # This example demonstrates how to use swish's # "prog" document source feature to filter documents. # # The "prog" document source feature allows # an external program to feed documents to # swish, one after another. This allows you # to index documents from any source (e.g. web, DBMS) # and to filter and adjust the content before swish # indexes the content. # # Using the "prog" method to filter documents requires more # work to set up than using the "filters" described in # example8.config because you must write a program to retrieve # the documents and feed them to swish. # # On the otherhand, the "prog" method should be faster than the # filter method in example8.config because swish doesn't need to fork # itself and run an external program for each document to filter. # This can be significant if you are using a perl script as a filter since # the perl script must be compiled each time it is run. This "prog" method # avoides that overhead. # # This example uses the example9.pl program. This program # is very similar to the included DirTree.pl program found in # the prog-bin directory. This program simple reads files from the # file system, and passes their content onto swish if they are the correct # type. PDF files are converted by the prog-bin/pdf2xml.pm module. # # The PDF info fields (e.g. author) are placed in xml tags # which allows indexing the PDF info as MetaNames. # By specifying metanemes you can limit searches by this PDF info. # # For this example, you will need the xpdf package. # Type "perldoc pdf2xml" from the prog-bin directory for # more information. # # Run this example as: # # swish-e -S prog -c example9.config # #--------------------------------------------------- # Include our site-wide configuration settings: IncludeConfigFile example4.config # Define the program to run IndexDir ./example9.pl # Pass in the top-level directory to index # (here we specify the current directory) SwishProgParameters . # Swish can index a number of different types of documents. # .config are text, and .pdf are converted (filtered) to xml: IndexContents TXT .config IndexContents XML .pdf # Since the pdf2xml module generates xml for the PDF info fields and # for the PDF content, let's use MetaNames # Instead of specifying each metaname, let's let swish do it automatically. UndefinedMetaTags auto # Show what's happening IndexReport 3 # end of example swish-e-2.4.7/Makefile.am0000664000077100017500000000472511166010113012062 00000000000000AUTOMAKE_OPTIONS = foreign SUBDIRS = filters prog-bin conf filter-bin example html man src tests pod docdir = $(datadir)/doc/$(PACKAGE) # Install these three in the doc directory # INSTALL and README are built at make time from .pod source doc_DATA = \ $(srcdir)/INSTALL \ $(srcdir)/README \ README.cvs # These create REAME and INSTALL in the top level *source* # directory for the distribution. Created at "make" time. $(srcdir)/INSTALL: $(top_srcdir)/pod/INSTALL.pod -rm -f $(top_srcdir)/INSTALL -pod2text $(top_srcdir)/pod/INSTALL.pod > $(top_srcdir)/INSTALL $(srcdir)/README: $(top_srcdir)/pod/README.pod -rm -f $(top_srcdir)/README -pod2text $(top_srcdir)/pod/README.pod > $(top_srcdir)/README config_dir = \ config/config.guess \ config/config.sub \ config/install-sh \ config/ltmain.sh \ config/missing \ config/mkinstalldirs perl_dir = \ perl/Changes \ perl/MANIFEST \ perl/README \ perl/Makefile.PL \ perl/Makefile.mingw \ perl/API.pm \ perl/API.xs \ perl/typemap \ perl/t/test.t \ perl/t/dummy.t \ perl/t/test.conf \ perl/t/first.html \ perl/t/second.html \ perl/t/third.html vms_dir = \ src/vms/acconfig.h_vms \ src/vms/build_swish-e.com \ src/vms/config.h \ src/vms/descrip_axp.mms \ src/vms/descrip_libxml2.mms \ src/vms/descrip_vax.mms \ src/vms/libtest.opt \ src/vms/readme_vms.txt \ src/vms/regex.c \ src/vms/regex.h \ src/vms/regexpr.h \ src/vms/swish.opt win32_dir = \ src/win32/acconfig.h \ src/win32/dirent.c \ src/win32/dirent.h \ src/win32/libswishe.dsp \ src/win32/libswishindex.dsp \ src/win32/swishe.dsp \ src/win32/swishe.dsw \ src/win32/release.nsi \ src/win32/filebase.nsh \ src/win32/fixperl.pl \ src/win32/build-perl.bat \ src/win32/build.sh \ src/win32/dist.sh \ src/worddata.c \ src/worddata.h rpm_dir = \ rpm/swish-e.spec.in \ rpm/swish-e.xpm debian_dir = \ debian/README.Debian \ debian/changelog \ debian/compat \ debian/control \ debian/copyright \ debian/files \ debian/rules \ debian/swish-e.doc-base \ debian/swish-e.substvars bin_SCRIPTS = swish-config pkgconfigdir = $(libdir)/pkgconfig pkgconfig_DATA = swish-e.pc EXTRA_DIST = \ $(config_dir) $(perl_dir) \ $(vms_dir) $(win32_dir) $(rpm_dir) \ $(doc_DATA) swish-config.in swish-e.pc.in\ $(debian_dir) .PHONEY: test test: check swish-e-2.4.7/html/0000777000077100017500000000000011166013170011052 500000000000000swish-e-2.4.7/html/swish-config.html0000644000077100017500000035072211166010463014267 00000000000000 Swish-e :: SWISH-CONFIG - Configuration File Directives
Skip to main content.
home | support | download

SWISH-CONFIG - Configuration File Directives

Swish-e version 2.4.7

Table of Contents


OVERVIEW

This document lists the available configuration directives available in Swish-e.

CONFIGURATION FILE

What files Swish-e indexes and how they are indexed, and where the index is written can be controlled by a configuration file.

The configuration file is a text file composed of comments, blank lines, and configuration directives. The order of the directives is not important. Some directives may be used more than once in the configuration file, while others can only be used once (e.g. additional directives will overwrite preceding directives). Case of the directive is not important -- you may use upper, lower, or mixed case.

Comments are any line that begin with a "#".

    # This is a comment

As of 2.4.3 lines may be continued by placing a backslas as the last character on the line:

    IgnoreWords \
        am \
        the \
        foo

Directives may take more than one parameter. Enclose single parameters that include whitespace in quotes (single or double). Inside of quotes the backslash escapes the next character.

    ReplaceRules append "foo bar"   <- define "foo bar" as a single parameter

If you need to include a quote character in the value either use a backslash to escape it, or enclose it in quotes of the other type.

Backslashes also have special meaning in regular expressions.

    FileFilterMatch pdftotext "'%p' -" /\.pdf$/

This says that the dot is a real dot (instead of matching any character). If you place the regular expression in quotes then you must use double-backslashes.

    FileFilterMatch pdftotext "'%p' -" "/\\.pdf$/"

Swish-e will convert the double backslash into a single backslash before passing the parameter to the regular expression compiler.

Commented example configuration files are included in the conf directory of the Swish-e distribution.

Some command line arguments can override directives specified in the configuration file. Please see also the SWISH-RUN for instructions on running Swish-e, and the SWISH-SEARCH page for information and examples on how to search your index.

The configuration file is specified to Swish-e by the -c switch. For example,

    swish-e -c myconfig.conf

You may also split your directives up into different configuration files. This allows you to have a master configuration file used for many different indexes, and smaller configuration files for each separate index. You can specify the different configuration files when running from the command line with the -c switch (see SWISH-RUN), or you may include other Configuration file with the IncludeConfigFile directive below.

Typically, in a configuration file the directives are grouped together in some logical order -- that is, directives that control the source of the documents would be grouped together first, and directives that control how each document is filtered or its words index in another group of directives. (The directives listed below are grouped in this order).

The configuration file directives are listed below in these groups:

Alphabetical Listing of Directives

Directives that Control Swish

These configuration directives control the general behavior of Swish-e.

  • IncludeConfigFile *path to config file*

    This directive can be used to include configuration directives located in another file.

        IncludeConfigFile /usr/local/swish/conf/site_config.config
  • IndexReport [0|1|2|3]

    This is how detailed you want reporting while indexing. You can specify numbers 0 to 3. 0 is totally silent, 3 is the most verbose. The default is 1.

    This may be overridden from the command line via the -v switch (see SWISH-RUN).

  • ParserWarnLevel [0|1|2|3]

    Sets the error level when using the libxml2 parser for XML and HTML. libxml2 will point out structural errors in your documents.

        0 = no report
        1 = fatal errors
        2 = errors
        3 = warnings

    Currently (as of 2.4.4 - early 2005) libxml2 only reports errors at level 2. The default as of 2.4.4 is "2" which should report any errors that might indicate a problem parsing a document.

    The exception to this is UTF-8 to Latin-1 conversion errors are reported at level 3 (changed from 1 in 2.4.4). Although these errors indicate a problem indexing text, they are only reported at level 3 because they can be very common.

    It is recommended that you index at ParserWarnLevel 3 when first starting out to see what errors and warnings are reported. Then reduce the level when you understand what documents are causing parsing problems and why.

  • IndexFile *path*

    Index file specifies the location of the generated index file. If not specified, Swish-e will create the file index.swish-e in the current directory.

        IndexFile /usr/local/swish/site.index
  • obeyRobotsNoIndex [yes|NO]

    When enabled, Swish-e will not index any HTML file that contains:

        <meta name="robots" content="noindex">

    The default is to ignore these meta tags and index the document. This tag is described at http://www.robotstxt.org/wc/exclusion.html.

    Note: This feature is only available with the libxml2 HTML parser.

    Also, if you are using the libxml2 parser (HTML2 and XML2) then you can use the following comments in your documents to prevent indexing:

           <!-- SwishCommand noindex -->
           <!-- SwishCommand index -->

    and/or these may be used also:

           <!-- noindex -->
           <!-- index -->

    For example, these are very helpful to prevent indexing of common headers, footers, and menus.

NOTE: This following items are currently not available. These items require Swish-e to parse the configuration file while searching.

  • EnableAltSearchSyntax [yes|NO]

    NOTE: This following item is currently not available.

    Enable alternate search syntax. Allows the usage of a basic "Altavista(c)", "Lycos(c)", etc. like search syntax. This means a search query can contain "+" and "-" as syntax parameter.

    Example:

        swish-e -w "+word1 +word2 -word3  word4 word5"
        "+"  = following word has to be in all found documents
        "-"  = following word may not be in any document found
        " "  = following word will be searched in documents
  • SwishSearhOperators <and-word> <or-word> <not-word>

    NOTE: This following item is currently not available.

    Using this config directive you can change the boolean search operators of Swish-e, e.g. to adapt these to your language. The default is: AND OR NOT

    Example (german):

        SwishSearchOperators   UND  ODER  NICHT
  • SwishSearchDefaultRule [<AND-WORD>|<or-word>]

    NOTE: This following item is currently not available.

    SwishSearchDefaultRule defines the default Boolean operator to use if none is specified between words or phrases. The default is AND.

    The word you specify must match one of the available SwishSearchOperators.

    Example:

        SwishSearchOperators   UND  ODER  NICHT
        # Make it act like a web search engine
        SwishSearchDefaultRule ODER
  • ResultExtFormatName name -x format string

    NOTE: This following item is currently not available.

    The output of Swish-e can be defined by specifying a format string with the -x command line argument. Using ResultExtFormatName you can assign a predefined format string to a name.

    Examples:

        ResultExtFormatName  moreinfo   "%c|%r|%t|%p|<author>|<publishyear>\n"

    Then when searching you can specify the format string's name

        swish-e   ...  -x moreinfo  ...

    See the -x switch in SWISH-RUN for more information about output formats.

Administrative Headers Directives

Swish-e stores configuration information in the header of the index file. This information can be retrieved while searching or by functions in the Swish-e C library. There are a number of fields available for your own use. None of these fields are required:

  • IndexName *text*
  • IndexDescription *text*
  • IndexPointer *text*
  • IndexAdmin *text*

    These variables specify information that goes into index files to help users and administrators. IndexName should be the name of your index, like a book title. IndexDescription is a short description of the index or a URL pointing to a more full description. IndexPointer should be a pointer to the original information, most likely a URL. IndexAdmin should be the name of the index maintainer and can include name and email information. These values should not be more than 70 or so characters and should be contained in quotes. Note that the automatically generated date in index files is in D/M/Y and 24-hour format.

    Examples:

        IndexName "Linux Documentation"
        IndexDescription "This is an index of /usr/doc on our Linux machine." 
        IndexPointer http://localhost/swish/linux/index.html
        IndexAdmin webmaster

Document Source Directives

These directives control what documents are indexed and how they are accessed. See also Directives for the File Access method only and Directives for the HTTP Access Method Only for directives that are specific to those access methods.

  • IndexDir [directories or files|URL|external program]

    IndexDir defines the source of the documents for Swish-e. Swish-e currently supports three file access methods: File system, HTTP (also called spidering), and prog for reading files from an external program.

    The -S command line argument is used to select the file access method.

        swish-e -c swish.config -S fs    - file system
        swish-e -c swish.config -S http  - internal http spider
        swish-e -c swish.config -S prog  - external program of any type

    For the fs method of access IndexDir is a space-separated list of files and directories to index. Use a forward slash as the path separator in MS Windows.

    For the http method the IndexDir setting is a list of space-separated URLs.

    For the prog method the IndexDir setting is a list of space-separated programs to run (which generate documents for swish to index).

    You may specify more than one IndexDir directive.

    Any sub-directories of any listed directory will also be indexed.

    Note: While processing directories, Swish-e will ignore any files or directories that begin with a dot ("."). You may index files or directories that begin with a dot by specifying their name with IndexDir or -i.

    Examples:

        # Index this directory an any subdirectories
        IndexDir /usr/local/home/http
    
        # Index the docs directory in current directory
        IndexDir ./docs
    
        # Index these files in the current directory
        IndexDir ./index.html ./page1.html ./page2.html
        # and index this directory, too
        IndexDir ../public_html

    For the HTTP method of access specify the URL's from which you want the spidering to begin.

    Example:

        IndexDir http://www.my-site.com/index.html
        IndexDir http://localhost/index.html

    Obviously, using the HTTP method to index is much slower than indexing local files. Be well aware that some sites do not appreciate spidering and may block your IP address. You may wish to contact the remote site before spidering their web site. More information about spidering can be found in Directives for the HTTP Access Method Only below.

    For the prog method of access IndexDir specifies the path to the program(s) to execute. The external program must correctly format the documents being passed back to Swish-e. Examples of external programs are provided in the prog-bin directory.

        IndexDir ./myprogram.pl

    See prog for details.

    Note: Not all directives work with all methods.

  • NoContents *list of file suffixes*

    Files with these suffixes will not have their contents indexed, but will have their path name (file name) indexed instead.

    If the file's type is HTML or HTML2 (as set by IndexContents or DefaultContents) then the file will be parsed for a HTML title and that title will be indexed. Note that you must set the file's type with IndexContents or DefaultContents: .html and .htm are NOT type HTML by default. For example:

       IndexContents HTML* .htm .html

    If a title is found, it will still be checked for FileRules title, and the file will be skipped if a match is found. See FileRules.

    If the file's type is not HTML, or it is HTML and no title is found, then the file's path will be indexed.

    For example, this will allow searching by image file name.

        NoContents .gif .xbm .au .mov .mpg .pdf .ps

    Note: Using this directive will not cause files with those suffixes to be indexed. That is, if you use IndexOnly to limit the types of files that are indexed, then you must specify in IndexOnly the same suffixes listed in NoContents.

    This does not work:

        # Wrong!
        IndexOnly .htm .html
        NoContents .gif .xbm .au .mov .mpg .pdf .ps

    A -S prog program may set the No-Contents: header to enable this feature for a specific document (although it would be smarter for the -S prog program to simply only send the pathname or title to be indexed.

  • ReplaceRules [replace|remove|prepend|append|regex]

    ReplaceRules allows you to make changes to file pathnames before they're indexed. These changed file names or URLs will be returned in search results.

    For example, you may index your files locally (with the File system indexing method), yet return a URL in search results. This directive can be used to map the file names to their respective URLs on your web server.

    There are five operations you can specify: replace, append, remove, prepend, and regex They will parse the pathname in the order you've typed these commands.

    This directive uses C library regex.h regular expressions.

       replace "the string you want replaced" "what to change it to"
       remove "a string to remove"   
       prepend "a string to add before the result"
       append "a string to add after the result"
       regex  "/search string/replace string/options"

    Remember, quotes are needed if an expression contains white space, and backslashes have special meaning.

    Regex is an Extended Regular Expression. The first character found is the delimiter (but it's not smart enough to use matched chars such as [], (), and {}).

    The replace string may use substitution variables:

        $0      the entire matched (sub)string
        $1-$9   returns patterns captured in "(" ")" pairs
        $`      the string before the matched pattern
        $'      the string after the matched pattern

    The options change the behavior of expression:

        i       ignore the case when matching
        g       repeat the substitution for the entire pattern

    Examples:

        ReplaceRules replace testdir/ anotherdir/
        ReplaceRules replace [a-z_0-9]*_m.*\.html index.html
    
        ReplaceRules remove testdir/
    
        ReplaceRules prepend http://localhost/
        ReplaceRules append .html
    
        ReplaceRules regex  !^/web/(.+)/!http://$1.domain.com/!
        replaces a file path:
            /web/search/foo/index.html
        with
            http://search.domain.com/foo/index.html
    
        ReplaceRules regex  #^#http://localhost/www#
        ReplaceRules prepend http://localhost/www  (same thing)
    
        # Remove all extensions from C source files
        ReplaceRules remove .c     # ERROR! That "." is *any char*
        ReplaceRules remove \.c    # much better...
    
        ReplaceRules remove "\\.c" # if in quotes you need double-backslash!  
        ReplaceRules remove "\.c"  # ERROR! "\." -> "." and is *any char*
  • IndexContents [TXT|HTML|XML|TXT2|HTML2|XML2|TXT*|HTML*|XML*] *file extensions*

    The IndexContents directive assigns one of Swish-e's document parsers to a document, based on the its extension. Swish-e currently knows how to parse TXT, HTML, and XML documents.

    The XML2, HTML2, and TXT2 parsers are currently only available when Swish-e is configured to use libxml2.

    You may use XML*, HTML*, and TXT* to select the parser automatically. If libxml2 is installed then it will be used to parse the content. Otherwise, Swish-e's internal parsers will be used.

    Documents that are not assigned a parser with IndexContents will, by default, use the HTML2 parser if libxml2 is installed, otherwise will use Swish-e's internal HTML parser. The DefaultContents directive may be used to assign a parser to documents that do not match a file extension defined with the IndexContents directive.

    Example:

        IndexContents HTML* .htm .html .shtml
        IndexContents TXT*  .txt .log .text
        IndexContents XML*  .xml

    HTML* is the default type for all files, unless otherwise specified (and this default can be changed by the DefaultContents directive. Swish-e parses titles from HTML files, if available, and keeps track of the context of the text for context searching (see -t in SWISH-RUN).

    If using filters (with the FileFilter directive) to convert documents you should include those extensions, too. For example, if using a filter to convert .pdf to .html, you need to tell Swish-e that .pdf should be indexed by the internal HTML parser:

        FileFilter  .pdf   pdf2html
        IndexContent  HTML  .pdf

    See also Document Filter Directives.

    Note: Some of this may be changed in the future to use content-types instead of file extensions. See SWISH-3.0

  • DefaultContents [TXT|HTML|XML|TXT2|HTML2|XML2|TXT*|HTML*|XML*]

    This sets the default parser for documents that are not specified in IndexContents. If not specified the default is HTML.

    The XML2, HTML2, and TXT2 parsers are currently only available when Swish-e is configured to use libxml2.

    You may use XML*, HTML*, and TXT* to select the parser automatically. If libxml2 is installed then it will be used to parse the content. Otherwise, Swish-e's internal parsers will be used.

    Example:

        DefaultContents HTML

    The DefaultContents directive should be used when spidering, as HTML files may be returned without a file extension (such as when requesting a directory and the default index.html is returned).

  • FileInfoCompression [yes|NO]

    ** This directive is currently not supported **

    Setting FileInfoCompression to yes will compress the index file to save disk space. This may result in longer indexing times. The default is no.

    Also see the -e switch in SWISH-RUN for saving RAM during indexing.

Document Contents Directives

These directives control what information is extracted from your source documents, and how that information is made available during searching.

  • ConvertHTMLEntities [YES|no]

    ASCII entities can be converted automatically while indexing documents of type HTML (not for HTML2). For performance reasons you may wish to set this to no if your documents do not contain HTML entities. The default is yes.

    If ConvertHTMLEntities is set no the entities will be indexed without conversion.

    NOTE: Entities within XML files and files parsed with libxml2 (HTML2) are converted regardless of this setting.

  • MetaNames *list of names*

    META names are a way to define "fields" in your XML and HTML documents. You can use the META names in your queries to limit the search to just the words contained in that META name of your document. For example, you might have a META tagged field in your documents called subjects and then you can search your documents for the word "foo" but only return documents where "foo" is within the subjects META tag.

        swish-e -w subjects=foo

    (See also the -t switch in SWISH-RUN for information about context searching in HTML documents.)

    The MetaNames directive is a space separated list. For example:

        MetaNames meta1 meta2 keywords subjects

    You may also use UndefinedMetaTags to specify automatic extraction of meta names from your HTML and XML documents, and also to ignore indexing content of meta tags.

    META tags can have two formats in your HTML source documents:

        <META NAME="meta1" CONTENT="some content">

    and (if using the HTML2/libxml2 parser)

        <meta1>
            some content
        </meta1>

    But this second version is invalid HTML, and will generate a warning if ParserWarningLevel is set (libxml2 only).

    And in XML documents, use the format:

        <meta1>
            Some Content
        </meta1>

    Then you can limit your search to just META meta1 like this:

        swish-e -w 'meta1=(apples or oranges)'

    You may nest the XML and the start/end tag versions:

        <keywords>
            <tag1>
                some content
            </tag1>
            <tag2>
                some other content
            </tag2>
        <keywords>

    Then you can search in both tag2 and tag2 with:

        swish-e -w 'keywords=(query words)'

    Swish-e indexes all text as some metaname. The default is swishdefault, so these two queries are the same:

        swish-e -w foo
        swish-e -w swishdefault=foo

    When indexing HTML Swish-e indexes the HTML title as default text, so when searching Swish-e will find matches in both the HTML body and the HTML title. Swish also, by default, indexes content of meta tags. So:

        swish-e -w foo

    will find "foo" in the body, the title, or any meta tags.

    Currently, there's no way to prevent Swish-e from indexing the title contents along with the body contents, but see UndefinedMetaTags for how to control the indexing of meta tags.

    If you would like to search just the title text, you may use:

        MetaNames swishtitle

    This will index the title text separately under the built-in swish internal meta name "swishtitle". You may then search like

        swish-e -w foo  -- search for "foo" in title, body (and undefined meta tags)
        swish-e -w swishtitle=foo -- search for "foo" in title only

    In addition to swishtitle, you can limit searches to documents' path with:

       MetaNames swishdocpath

    Then to search for "foo" but also limit searches to documents that include "manual" or "tutorial" in their path:

       swish-e -w foo swishdocpath=(manual or tutorial)

    See also ExtractPath.

  • MetaNameAlias *meta name* *list of aliases*

    MetaNameAlias assigns aliases for a meta name. For example, if your documents contain meta tags "description", "summary", and "overview" that all give a summary of your documents you could do this:

        MetaNames summary
        MetaNameAlias summary description overview

    Then all three tags will get indexed as meta tag "summary". You can then search all the fields as:

        -w summary=foo

    The Alias work at search time, too. So these will also limit the search to the "summary" meta name.

        -w description=foo
        -w overview=foo
  • MetaNamesRank integer *list of meta names*

    You can assign a bias to metanames that will affect how ranking is calculated. The range of values is from -10 to +10, with zero being no bias.

        MetaNamesRank 4 subject
        MetaNamesRank 3 swishdefault
        MetaNamesRank 2 author publisher
        MetaNamesRank -5 wrongwords

    This feature is still considered experimental. If you use it, please send feedback to the discussion list.

  • HTMLLinksMetaName *metaname*

    Allows indexing of HTML links. Normally, HTML links (href tags) are not indexed by Swish-e. This directive defines a metaname, and links will be indexed under this meta name.

    Example:

        HTMLLinksMetaName links

    Now, to limit searches to files with a link to "home.html" do this:

        -w links='"home.html"'

    The double quotes force a phrase search.

    To make Swish-e index links as normal text, you may use:

        HTMLLinksMetaName swishdefault

    This feature is only available with the libxml2 HTML parser.

  • ImageLinksMetaName *metaname*

    Allows indexing of image links under a metaname. Normally, image URLs are not indexed.

    Example:

        ImagesLinksMetaName images

    Now, if you would like to find pages that include a nice image of a beach:

        -w images='beach'

    To make Swish-e index links as normal text, you may use:

        ImageLinksMetaName swishdefault

    This feature is only available with the libxml2 HTML parser.

  • IndexAltTagMetaName *tagname*|as-text

    Allows indexing of images <IMG> ALT tag text. Specify either a tag name which will be used as a metaname, or the special text "as-text" which says to index the ALT text as if it were plain text at the current location.

    For example, by specifying a tag name:

       IndexAltTagMetaName bar

    would make this markup:

        <foo>
            <img src="/someimage.png" alt="Alt text here">
        </foo>

    appear like

        <foo>
            <bar>Alt text here</bar>
        </foo>

    Then the normal rules (MetaNames and PropertyNames) apply to how that text is indexed.

    If you use the special tag "as-text" then

        <foo>
            <img src="/someimage.png" alt="Alt text here">
        </foo>

    simply becomes

        <foo>
            Alt text here
        </foo>

    This feature is only available when using the libxml2 parser (HTML2 and XML2).

  • AbsoluteLinks [yes|NO]

    If this is set true then Swish-e will attempt to convert relative URIs extracted from HTML documents for use with HTMLLinksMetaName and ImageLinksMetaName into absolute URIs. Swish-e will use any <BASE> tag found in the document, otherwise it will use the file's pathname. The pathname used will be the pathname *after* ReplaceRules has been applied to the document's pathname.

    For example, say you wish to index image links under the metaname "images".

        ImageLinksMetaName images

    If an image is located in http://localhost/vacations/france/index.html and AbsoluteLinks is set to no, then a image within that document:

         <img src="beach.jpeg">

    will only index "beach.jpeg".

    But, if you want more detail when searching, you can enable AbsoluteLinks and Swish-e will index "http://localhost/vacations/france/beach.jpeg". You can then look for images of beaches, but only in France:

        -w images=(beach and france)

    This also means you can search for any images within France:

        -w images=(france)

    This feature is only available with the libxml2 HTML parser.

  • UndefinedMetaTags [error|ignore|INDEX|auto]

    This directive defines the behavior of Swish-e during indexing when a meta name is found but is not listed in MetaNames. There are four choices:

    • error

      If a meta name is found that is not listed in MetaNames then indexing will be halted and an error reported.

    • ignore

      The contents of the meta tag are ignored and not indexed unless a metaname has been defined with the MetaNames directive.

    • index

      The contents of the meta tag are indexed, but placed in the main index unless there's an enclosing metatag already in force. This is the default.

    • auto

      This method create meta tags automatically for HTML meta names and XML elements. Using this is the same as specifying all the meta names explicitly in a MetaNames directive.

  • UndefinedXMLAttributes [DISABLE|error|ignore|index|auto]

    This is similar to UndefinedMetaTags, but only applies to XML documents (parsed with libxml2). This allows indexing of attribute content, and provides a way to index the content under a metaname. For example, UndefinedXMLAttributes can make

        <person age="23">
              John Doe
        </person>

    look like the following to swish:

        <person>
            <person.age>
                23
            </person.age>
            John Doe
        </person>

    What happens to the text "23" will depend on the setting of UndefinedXMLAttributes:

    • disable

      XML attributes are not parsed and not indexed. This is the default.

    • error

      If the concatenated meta name (e.g. person.age) is not listed in MetaNames then indexing will be halted and an error reported.

    • ignore

      The contents of the meta tag are ignored and not indexed unless a metaname has been defined with the MetaNames directive.

    • index

      The contents of the meta tag are indexed, but placed in the main index unless there's an enclosing metatag already in force.

    • auto

      This method will create meta tags from the combined element and attributes (and XML Class name) This options should be used with caution as it can generate a lot of metaname entries.

      See also the example below XMLClassAttribues.

  • XMLClassAttributes *list of XML attribute names*

    Combines an XML class name with the element name to make up a metaname. For example:

        XMLClassAttributes class
    
        <person class="first">
            John
        </person>
        <person class="last">
            Doe
        </person>

    Will appear to Swish-e as:

        <person>
            <person.first>
            John
            </person.first>
        </person>
        <person>
            <person.last>
            Doe
            </person.last>
        </person>

    How the data is indexed depends on MetaNames and UndefinedMetaTags.

    Here's an example using the following configuration which combines the two directives XMLClassAttributes and UndefinedXMLAttributes.

        XMLClassAttributes class
        UndefinedMetaTags auto
        UndefinedXMLAttributes auto
        IndexContents XML2 .xml

    The source XML file looks like:

        <xml> <person class="student" phone="555-1212" age="102"> John </person>
        <person greeting="howdy">Bill</person> </xml>

    Swish-e parses as:

        ./swish-e -c 2 -i 1.xml -T parsed_tags  parsed_text  -v 0
        Indexing Data Source: "File-System"
    
        <xml> (MetaName)
    
            <person> (MetaName)
                <person.student> (MetaName)
                    <person.student.phone> (MetaName)
                        555-1212
                    </person.student.phone> 
                    <person.student.age> (MetaName)
                        102
                    </person.student.age> 
                    John
            </person> 
    
            <person> (MetaName)
                <person.greeting> (MetaName)
                    howdy
                </person.greeting> 
                Bill
            </person> 
    
        </xml> 
        Indexing done!

    One thing to note is that the first <person> block finds a class name "student" so all metanames that are created from attributes use the combined name "person.student". The second <person> block doesn't contain a "class" so, the attribute name is combined directly with the element name (e.g. "person.greeting").

  • ExtractPath *metaname* [replace|remove|prepend|append|regex]

    This directive can be used to index extracted parts of a document's path. A common use would be to limit searches to specific areas of your file tree.

    The extracted string will be indexed under the specified meta name.

    See ReplaceRules for a description of the various pattern replacement methods, but you will use the regex method.

    For example, say your file system (or web tree) was organized into departments:

        /web/sales/foo...
        /web/parts/foo...
        /web/accounting/foo...

    And you wanted a way to limit searches to just documents under "sales".

        ExtractPath department regex !^/web/([^/]+)/.*$!$1!

    Which says, extract out the department name (as substring $1) and index it as meta name department. Then to limit a search to the sales department:

        swish-e -w foo AND department=sales

    Note that the regex method uses a substitution pattern, so to index only a sub-string match the entire document path in the regular expression, as shown above. Otherwise any part that is not matched will end up in the substitution pattern.

    See the ExtractPathDefault option for a way to set a value if not patterns match.

    Although unlikely, you may use more than one ExtractPath directive. More than one directive of the same meta name will operate successively (in order listed in the configuration file) on the path. This allows you to use regular expressions on the results of the previous pattern substitution (as if piping the output from one expression to the patter of the next).

        ExtractPath foo regex !^(...).+$!$1!
        ExtractPath foo regex !^.+(.)$!$1!

    So, the third letter is indexed as meta name "foo" if both patterns match.

        ExtractPath foo regex !^X(...).+$!$1!
        ExtractPath foo regex !^.+(.)$!$1!

    Now (not the "X"), if the first pattern doesn't match, the last character of the path name is indexed. You must be clear on this behavior if you are using more than one ExtractPath directive with the same metaname.

    The document path operated on is the real path swish used to access the document. That is, the ReplaceRules directive has no effect on the path used with ExtractPath.

    The full path is used for each meta name if more than one ExtractPath directive is used. That is, changes to the path used in ExtractPath foo do not affect the path used by ExtractPath bar.

  • ExtractPathDefault *metaname* default_value

    This can be used with ExtractPath to set a default string to index under the given metaname if none of the ExtractPath patterns match.

    For example, say your want to index each document with a metaname "department" based on the following path examples:

        /web/sales/foo...
        /web/parts/foo...
        /web/accounting/foo...

    But you are also indexing documents that do not follow that pattern and you want to search those separately, too.

        ExtractPath department regex !^/web/([^/]+)/.*$!$1!
        ExtractPathDefault department other

    Now, you may search like this:

        -w foo department=(sales)      - limit searches to the sales documents
        -w foo department=(parts)      - limit searches to the parts documents
        -w foo department=(accounting) - limit searches to the accounting documents
        -w foo department=(other)      - everything but sales, parts, and accounting.

    This basically is a shortcut for:

        -w foo not department=(sales or parts or accounting)

    but you don't need to keep track of what was extracted.

  • PropertyNames *list of meta names*
  • PropertyNamesCompareCase *list of meta names*
  • PropertyNamesIgnoreCase *list of meta names*

    Swish-e allows you to specify certain META tags that can be used as document properties. The contents of any META tag that has been identified as a document property can be returned as part of the search results along with the rank, file name, title, and document size (see the -p and -x switches in SWISH-RUN).

    Properties are useful for returning additional data from documents in search results -- this saves the effort of reading and parsing the source files while reading Swish-e search results, and is especially useful when the source documents are no longer available or slow to access (e.g. over http).

    Another feature of properties is that Swish-e can use the PropertyNames for sorting the search results (see the -s switch).

        PropertyNames author subjects

    Two variations are available. PropertyNamesCompareCase and PropertyNamesIgnoreCase. These tell Swish-e to either ignore or compare case when sorting results. The default for PropertyNames is to ignore the case.

        PropertyNamesIgnoreCase subject
        PropertyNamesCompareCase keyword

    The defaults for "internal" properties are:

        swishtitle          --  ignore the case
        swishdocpath        --  compare case
        swishdescription    --  compare case

    These can be overridden with PropertyNamesCompareCase and PropertyNamesIgnoreCase.

        PropertyNamesCompareCase swishtitle    

    Use of PropertyNames will increase the size of your index files, sometimes significantly. Properties will be compressed if Swish-e is compiled with zlib as described in the INSTALL manual page.

    If Swish-e finds more than one property of the same name in a document the property's contents will be concatinated for strings, and a warning issues for numeric (or date) properties.

  • PropertyNamesNoStripChars

    PropertyNamesNoStripChars specifies that the listed properties should not have strings of low ASCII characters replaced with a space character. Properties will be stored as found in the document.

    When printing properties with the swish-e binary newlines are replaced with a space character. Use the swish-e library (or SWISH::API perl module) to fetch properties without newlines replaced.

  • PropertyNamesNumeric

    This directive is similar to PropertyNames, but it flags the property as being a string of digits (integer value) that will be stored as binary data instead of a string. This allows sorting with -s and limiting with -L to sort and limit the property correctly.

    Swish-e uses strtoul(3) to convert the string into an unsigned long integer. Therefore, only positive integers can be stored.

    Future versions of Swish-e may be able to store different property types (such as negative integers and real numbers). This directive may change in future releases of Swish.

  • PropertyNamesDate

    This directive is exactly like PropertyNamesNumeric, but it also flags the number as a machine timestamp (seconds since Epoch), and will print a formatted date when returning this property. See -x in SWISH-RUN.

    Swish-e will not parse dates when indexing; you must use a timestamp.

  • PropertyNameAlias *property name* *list of aliases*

    This allows aliases for a property name. For example, if you are indexing HTML files, plus XML files that are written in English, German, and Spanish and thus use the tags "title", "titel", and "título" you can use:

        PropertyNameAlias swishtitle title titel título titulo

    Note that "swishtitle" is the built-in property used to store the title of a document, and therefore you do not need to specify it as a PropertyName before use.

  • PropertyNamesMaxLength integer *list of meta names*

    This option will set the max length of the text stored in a property. You must specify a number between 0 and the max integer size on your platform, and a list of properties. The properties specified must not be aliases.

    If any of the property names do not exist they will be created (e.g. you do not need to define the property with PropertyNames first).

    In general, this feature will only be useful when parsing HTML or XML with the libxml2 parser.

    For example:

        PropertyNamesMaxLength 1000 swishdescription
        PropertyNameAlias swishdescription body

    Is somewhat like

        StoreDescription HTML <body> 1000
        StoreDescription XML <body> 1000
        StoreDescription HTML2 <body> 1000
        StoreDescription XML2 <body> 1000

    but StoreDescription allows setting the tag for each parser type.

        PropertyNamesMaxLength 1000 headings
        PropertyNameAlias headings h1 h2 h3 h4

    collects all the heading text into a single property called "headings", not to exceed 1000 characters.

  • PropertyNamesSortKeyLength integer *list of meta names*

    Sets the length of the string used when sorting. The default is 100 characters. The -T metanames debugging option will list the current values for an index.

    This setting is used when sorting during indexing, and perhaps when sorting while searching. It also effects the order when limiting to a range of values with the -L option.

  • PreSortedIndex *list of property names*

    By default Swish-e generates presorted tables while indexing for each property name. This allows faster sorting when generating results. On large document collections this presorting may add to the indexing time, and also adds to the total size of the index. This directive can be used to customize exactly which properties will be presorted.

    If PreSortedIndex it is not present in the config file (default action), all the properties will be presorted at indexing time. If it is present without any parameter, no properties will be presorted. Otherwise, only the property names specified will be presorted.

    For example, if you only wish to sort results by a property called title:

        PropertyNames title age time
        PreSortedIndex  title
  • StoreDescription [XML <tag> size|HTML <meta> size|TXT size]

    StoreDescription allows you to store a document description in the index file. This description can be returned in your search results when the -x switch is used to include the swishdescription for extended results, or by using -p swishdescription.

    The document type (XML, HTML and TXT) must match the document type currently being indexed as set by IndexContents or DefaultContents. See those directives for possible values. A common problem is using StoreDescription yet not setting the document's type with IndexContents or DefaultContents. Another problem is different types:

        IndexContents HTML2 .html
        StoreDescription HTML <body>

    Then .html documents are assigned a type of HTML2 (and parsed by the libxml2 parser), but the description will not be stored since it is type HTML instead of HTML2.

    For text documents you specify the type TXT (or TXT2 or TXT*) and the number of characters to capture.

        StoreDescription TXT 20

    The above stores only the first twenty characters from the text file in the Swish-e index file.

    For HTML, and XML file types, specify the tag to use for the description, and optionally the number of characters to capture. If not specified will capture the entire contents of the tag.

        StoreDescription HTML <body> 20000
        StoreDescription XML  <desc> 40

    Again, note that documents must be assigned a document type with IndexContents or DefaultContents to use this feature.

    Swish-e will compress the descriptions (or any other large property) if compiled to use zlib (see INSTALL). This is recommended when using StoreDescription and a large number of documents. Compression of 30% to 50% is not uncommon with HTML files.

  • PropCompressionLevel [0-9]

    This directive sets the compression level used when storing properties to disk. A setting of zero is no compression, and a setting of nine is the most compression.

    The default depends on the default setting compiled with zlib, but is typically six.

    This option is useful when using StoreDescription to store a large amount text in properties (or if using PropertyNames with large property sizes).

    Properties must be over a value defined in config.h (100 is the default) before compression will be attempted. Swish-e will never store the results of the compression if the compressed data is larger than the original data.

    This option is only available when Swish-e is compiled with zlib support.

  • TruncateDocSize *number of characters*

    TruncateDocSize limits the size of a document while indexing documents and/or using filters. This config directive truncates the numbers of read bytes of a document to the specified size. This means: if a document is larger, read only the specified numbers of bytes of the document.

    Example:

        TruncateDocSize    10000000

    The default is zero, which means read all data.

    Warning: If you use TruncateDocSize, use it with care! TruncateDocSize is a safety belt only, to limit e.g. filteroutput, when accessing databases, or to limit "runnaway" filters. Truncating doc input may destroy document structures for Swish-e (e.g. swish may miss closing tags for XML or HTML documents).

    TruncateDocSize does not currently work with the prog input source method.

  • FuzzyIndexingMode NONE|Stemming|Soundex|Metaphone|DoubleMetaphone

    Selects the type of index to create. Only one type of index may be created.

    It's a good idea to create both a normal index and a fuzzy index and allow your search interface select which index to use. Many people find the fuzzy searches to be too fuzzy.

    The available fuzzy indexing options can be displayed by running

       swish-e -T LIST_FUZZY_MODES

    Available options include:

    • None

      Words are stored in the index without any conversion. This is the default.

    • Stemming_*

      This options uses one of the installed Snowball stemmers (http://snowball.tartarus.org/).

      The installed stemmers can be viewed by running

         swish-e -T LIST_FUZZY_MODES

      For example, to use the Spanish stemming module:

         FuzzyIndexingMode Stemming_es
    • Stem or Stemming_en

      **This option is no longer supported.**

      Selects the legacy Swish-e English stemmer.

      This is deprecated in favor of the Snowball English stemmer Stemming_en1.

      Words are converted using the Porter stemming algorithm.

      From: http://www.tartarus.org/~martin/PorterStemmer/

          The Porter stemming algorithm (or Porter stemmer) is a
          process for removing the commoner morphological and inflexional
          endings from words in English. Its main use is as part of a
          term normalisation process that is usually done when setting up
          Information Retrieval systems.

      This will help a search for "running" to also find "run" and "runs", for example.

      The stemming function does not convert words to their root, rather programmatically removes endings on words in an attempt to make similar words with different endings stem to the same string of characters. It's not a perfect system, and searches on stemmed indexes often return curious results. For example, two entirely different words may stem to the same word.

      Stemming also can be confusing when used with a wildcard (truncation). For example, you might expect to find the word "running" by searching for "runn*". But this fails when using a stemmed index, as "running" stems to "run", yet searching for "runn*" looks for words that start with "runn".

    • Soundex

      Soundex was developed in the 1880s so records for people with similar sounding names could be found more readily. Soundex is a coded surname based on the way a surname sounds rather than spelling. Surnames that sound similar, like Smith and Smyth, are filed together under the same Soundex code. This is mostly useful for US English.

      Soundex should not be used to search for sound-alike words. Metaphone would be more appropriate for generic sound matching of words. Soundex should only be used where you need to search multiple documents for proper names which sound similar. This is primarily used for indexing genealogical records. This may be useful for indexing other collections of data consisting mostly of names. Many common name variations are matched by Soundex. The only notable exception is the first letter of the name. The first letter is not matched for sound.

    • Metaphone and DoubleMetaphone

      Words are transformed into a short series of letters representing the sound of the word (in English). Metaphone algorithms are often used for looking up mis-spelled words in dictionary programs.

      From: http://aspell.sourceforge.net/metaphone/

          Lawrence Philips' Metaphone Algorithm is an algorithm which returns
          the rough approximation of how an English word sounds.

      The DoubleMetaphone mode will sometimes generate two different metaphones for the same word. This is supposed to be useful when a word may be pronounced more than one way.

      A metaphone index should give results somewhere in between Soundex and Stemming.

  • UseStemming [yes|NO]

    Put yes to apply word stemming algorithm during indexing, else no.

        UseStemming no
        UseStemming yes

    When UseStemming is set to yes every word is stemmed before placing it in to the index.

    This option is deprecated. It has been superceded by FuzzyIndexingMode.

  • UseSoundex [yes|NO]

    When UseSoundex is set to yes every word is converted to a Soundex code before placing it in to the index.

    This option is deprecated. It has been superceded by FuzzyIndexingMode.

  • IgnoreTotalWordCountWhenRanking [YES|no]

    Put yes to ignore the total number of words in the file when calculating ranking. Often better with merges and small files. Default is yes.

        IgnoreTotalWordCountWhenRanking no

    The default was changed from no to yes in version 2.2.

    NOTE: must be set to no if you intend to use the -R 1 option when searching.

  • MinWordLimit *integer*

    Set the minimum length of an word. Shorter words will not be indexed. The default is 1 (as defined in src/config.h).

        MinWordLimit 5
  • MaxWordLimit *integer*

    Set the maximum length of an indexable word. Every longer word will not be indexed. The Default is 40 (as defined in src/config.h).

  • WordCharacters *string of characters*
  • IgnoreFirstChar *string of characters*
  • IgnoreLastChar *string of characters*
  • BeginCharacters *string of characters*
  • EndCharacters *string of characters*

    These settings define what a word consists of to the Swish-e indexing engine. Compiled in defaults are in src/config.h.

    When indexing Swish-e uses WordCharacters to split up the document into words. Words are defined by any string of non-blank characters that contain only the characters listed in WordCharacters. If a string of characters includes a character that is not in WordCharacters then the word will be spit into two or more separate words.

    For example:

        WordCharacters abde

    Would turn "abcde" into two words "ab" and "de".

    Next, of these words, any characters defined in IgnoreFirstChar are stripped off the start of the word, and IgnoreLastChar characters are stripped off the end of the word. This allows, for example, periods within a word (www.slashdot.com), but not at the end of a word. Characters in IgnoreFirstChar and IgnoreLastChar must be in WordCharacters.

    Finally, the resulting words MUST begin with one of the characters listed in BeginCharacters and end with one of the characters listed in EndCharacters. BeginCharacters and EndCharacters must be a subset of the characters in WordCharacters. Often, WordCharacters, BeginCharacters and EndCharacters will all be the same.

    Note that the same process applies to the query while searching.

    Getting these settings correct will take careful consideration and practice. It's helpful to create an index of a single test file, and then look at the words that are placed in the index (see the -v 4, -D and -k searching switches).

    Currently there is only support for eight-bit characters.

    Example:

        WordCharacters  .abcdefghijklmnopqrstuvwxyz
        BeginCharacters abcdefghijklmnopqrstuvwxyz
        EndCharacters   abcdefghijklmnopqrstuvwxyz
        IgnoreFirstChar .
        IgnoreLastChar  .

    So the string

        Please visit http://www.example.com/path/to/file.html.

    will be indexed as the following words:

        please
        visit
        http
        www.example.com
        path
        to
        file.html

    Which means that you can search for www.example.com as a single word, but searching for just example will not find the document.

    Note: when indexing HTML documents HTML entities are converted to their character equivalents before being processed with these directives. This is a change from previous versions of Swish-e where you were required to include the characters 0123456789&#; to index entities. See also ConvertHTMLEntities

  • Buzzwords [*list of buzzwords*|File: path]

    The Buzzwords option allows you to specify words that will be indexed regardless of WordCharacters, BeginCharacters, EndCharacters, stemming, soundex and many of the other checks done on words while indexing.

    Buzzwords are case insensitive.

    Buzzwords should be separated by spaces and may span multiple directives. If the special format File:filename is used then the Buzzwords will be read from an external file during indexing.

    Examples:

        Buzzwords C++ TCP/IP
    
        Buzzwords File: ./buzzwords.lst

    If a Buzzword contains search operator characters they must be backslashed when searching. For example:

        Buzzwords C++ TCP/IP web=http
    
        ./swish-e -w 'web\=http'

    Buzzwords are found by splitting the text on whitespace, removing IgnoreFirstChar and IgnoreLastChar characters from the word, and then comparing with the list of Buzzwords. Therefore, if adding Buzzwords to an index you will probably want to define IgnoreFirstChar and IgnoreLastChar settings.

    Note: Buzzwords specific settings for IgnoreFirstChar and IgnoreLastChar may be used in the future.

  • CompressPositions [yes|NO]

    This option enables zlib compression for individual word data in the index file. The default is NO, that is the index word data is not compressed by default.

    Enabling this option can reduced the size of the index file, but at the expense of slower wildcard search times.

    The default changed from YES to NO starting with version 2.4.3.

  • IgnoreWords [*list of stop words*|File: path]

    The IgnoreWords option allows you to specify words to ignore, called stopwords. The default is to not use any stopwords.

    Words should be separated by spaces and may span multiple directives. If the special format File:filename is used then the stop words will be read from an external file during indexing.

    In previous versions of Swish-e you could use the directive

        IgnoreWords swishdefault - obsolete!

    to include a default list of compiled in stopwords. This keyword is no longer supported.

    Examples:

        IgnoreWords www http a an the of and or
    
        IgnoreWords File: ./stopwords.de
  • UseWords [*list of words*|File: path]

    UseWords defines the words that Swish-e will index. Only the words listed will be indexed.

    You can specify a list of words following the directive (you may specify more than one UseWords directive in a config file), and/or use the File: form to specify a path to a file containing the words:

        UseWords perl python pascal fortran basic cobal php
        UseWords File: /path/to/my/wordlist

    Please drop the Swish-e list a note if you actually use this feature. It may be removed from future versions.

  • IgnoreLimit *integer integer*

    This automatically omits words that appear too often in the files (these words are called stopwords). Specify a whole percentage and a number, such as "80 256". This omits words that occur in over 80% of the files and appear in over 256 files. Comment out to turn off auto-stopwording.

        IgnoreLimit 50 1000

    Swish-e must do extra processing to adjust the entire index when this feature is used. It is recommended that instead of using this feature that you decided what words are stopwords and add them to IngoreWords in your configuration file. To do this, use IgnoreLimit one time and note the stop words that are found while indexing. Add this list to IgnoreWords, and then remove IgnoreLimit from the configuration file.

  • IgnoreMetaTags *list of names*

    IgnoreMetaTags defines a list of metatags to ignore while indexing XML files (and HTML files if using libxml2 for parsing HTML). All text within the tags will be ignored -- both for indexing (MetaNames) and properties (PropertyNames). To still parse properties, yet do not index the text, see UndefinedMetaTags.

    This option is useful to avoid indexing specific data from a file. For example:

        <person>
            <first_name>
                William
            </first_name> <last_name>
                Shakespeare
            </last_name> <updated_date>
                April 25, 1999
            </updated_date>
        </person>

    In the above example you might not want to index the updated date, and therefore prevent finding this record by searching

        -w 'person=(April)'

    This is solved by:

        IgnoreMetaTags updated_date

    See also UndefinedMetaTags.

  • IgnoreNumberChars *list of characters*

    Experimental Feature

    This experimental feature can be used to define a set of characters that describe a number. If a word is found to contain only those characters it will not be indexed. The characters listed must be part of WordCharacters settings. In other words, the "word" checked is a word that Swish-e would otherwise index.

    For example,

        IgnoreNumberChars 0123456789$.,

    Then Swish-e would not index the following:

        123
        123,456.78
        $123.45

    You might be tempted to avoid indexing hex numbers with:

        IgnoreNumberChars 0123456789abcdef

    which will not index 0D31, but will also not index the word "bad".

    This is an experimental feature that may change in future versions. One possible change is to use regular expressions instead.

  • IndexComments [NO|yes]

    This option allows the user decide if to index the contents of HTML comments. Default is no. Set to yes if comment indexing is required.

        IndexComments yes

    Note: This is a change in the default behavior prior to version 2.2.

  • TranslateCharacters [*string1 string2*|:ascii7:]

    The TranslateCharacters directive maps the characters in string1 to the characters listed in string2.

    For example:

        # This will index a_b as a-b and ámo as amo
        TranslateCharacters _á -a

    TranslateCharacters :ascii7: is a predefined set of characters that will translate eight bit characters to ascii7 characters. Using the :ascii7: rule will translate "Ääç" to "aac". This means: searching "Çelik", "çelik" or "celik" will all match the same word.

    TranslateCharacters is done early in the indexing process, after converting HTML entities but before splitting the input text into words based on WordCharacters. So characters you are translating from do not need to be listed in word characters.

    The same character translations take place when searching.

  • BumpPositionCounterCharacters *string*

    When indexing Swish-e assigns a word position to each word. This enables phrase searching. There may be cases where you would like to prevent phrase matching. The BumpPositionCounterCharacters directive allows you to specify a set of characters that when found in the text will increment the word position -- effectively preventing phrase matches across that character.

    For example, if you have a tag:

        <subjects>
            computer programming | apple computers
        </subjects>

    You might want to prevent matching "programming apple" in that meta name.

        BumpPositionCounterCharacters |

    There is no default, and you may list a string of characters.

  • DontBumpPositionOnEndTags *list of names*
  • DontBumpPositionOnStartTags *list of names*

    Since metatags are typically separate data fields, the word position counter is automatically bumped between metatags (actually, bumped when a start tag is found and when an end tag is found). This prevents matching a phrase that spans more than one metaname. DontBumpPositionOnEndTags and DontBumpPositionOnStartTags disables this feature for the listed metanames.

    For example,

        <person>
            <first_name>
                William
            </first_name>
            <last_name>
                Shakespeare
            </last_name>
            <updated_date>
                April 25, 1999
            </updated_date>
        </person>

    In the configuration file:

        DontBumpPositionOnEndTags first_name
        DontBumpPositionOnStartTags last_name

    This configuration allows this phrase search

        -w 'person=("william shakespeare")'

    but this phrase search will fail

        -w 'person=("shakespeare april")'

Directives for the File Access method only

Some directives have different uses depending on the source of the documents. These directives are only valid when using the File system method of indexing.

  • IndexOnly *list of file suffixes*

    This directive specifies the allowable file suffixes (extensions) while indexing. The default is to index all files specified in IndexDir.

        # Only index .html .htm and .q files
        IndexOnly .html .htm .q

    IndexOnly checks that the file end in the characters listed. It does not check "extensions". IndexOnly is tested right before FileRules is processed.

  • FollowSymLinks [yes|NO]

    Put "yes" to follow symbolic links in indexing, else "no". Default is no.

        FollowSymLinks no
        FollowSymLinks yes

    Note that when set to no extra stat(2) system calls must be made for each file. For large number of files you may see a small reduction in indexing time by setting this to yes.

    See also the -l switch in SWISH-RUN.

  • FileRules [type] [contains|is|regex] *regular expression*
  • FileMatch [type] [contains|is|regex] *regular expression*

    FileRules and FileMatch are used to, respectively, exclude and include files and directories to index. Since, by default, Swish-e indexes all files and recurses all directories (but see also FollowSymLinks) you will typically only use FileRules to exclude files or directories. FileMatch is useful in a few cases, for example, to override the behavior of IndexOnly. Some examples are included below.

    Except for FileRules title ..., this feature is only available for file access method (-S fs), which is the default indexing mode. Also, any pathname modification with ReplaceRules happens after the check for FileRules. (It's unlikely that you would exclude files with FileRules based on text you added with ReplaceRules!)

    The regular expression is a C regex.h extended regular expression. You may supply more than one regular expression per line, or use separate directives. Preceding the regular expression with the word "not" negates the match.

    The regular expression is compared against [type] as described below.

    For historical reasons, you can specify contains or is. is simply forces the regular expression to match at the start and end of the string (by internally prepending "^" and appending "$" to the regular expression).

    The regex option requires delimiter characters:

        FileRules title regex /^private/i

    The only advantage of regex is if you want to do case insensitive matches, or simply like your regular expressions to look like perl regular expressions. You must use matching delimiters; (), {}, and [], are not currently supported for no good reason other than laziness.

    Use quotes (" or ') around a pattern if it contains any white space. Note that the backslash character becomes the escape character within quotes.

    For example, these sets generate the same regular expressions.

        FileRules title is hello
        FileRules title contains ^hello$
        FileRules title regex /^hello$/

    These all need quotes due to the included space character

        FileRules title is "hello there"
        FileRules title contains "^hello there$"
        FileRules title regex "!^hello there$!"

    These show how the backslash must be doubled inside of quotes. Swish-e converts a double-backslash into a single backslash, and then passes that single onto the regular expression compiler.

        FileRules filename regex /\.pdf/
        FileRules filename regex "/\\.pdf/"
    
        FileRules filename regex !hello\\there!     # need double for real backslash 
        FileRules filename regex "!hello\\\\there!" # need double-double inside of quotes

    Matching Types

    The following types of match strings my be supplied:

        FileRules pathname
        FileRules dirname
        FileRules filename
        FileRules directory
        FileRules title
    
        FileMatch pathname
        FileMatch filename
        FileMatch dirname
        FileMatch directory

    pathname matches the regular expression against the current pathname. The pathname may or may not be absolute depending on what you supplied to IndexDir.

    Example:

        # Don't index paths that contain private or hidden
        FileRules pathname contains (private|hidden)
    
        # Same thing
        FileRules pathname regex /(private|hidden)/
    
        # Don't index exe files
        FileRules pathname contains \.exe$

    dirname and filename split the path name by the last delimiter character into a directory name, and a file name. Then these are compared against the patterns supplied. Directory names do not have a trailing slash. All path names use the forward slash as a delimiter within Swish-e.

    Example:

        # Same as last example - don't index *.exe files.
        FileRules filename contains \.exe$
    
        # Don't index any file called test.html files
        FileRules filename contains ^test\.html$
    
        # Same thing
        FileRules filename is test\.html
    
        # Don't index any directories that contain "old"  (/usr/local/myold/docs)
        FileRules dirname contains old
    
        # Don't index any directories that contain the path segment "old" (/usr/local/old/foo)
        FileRules dirname contains /old/  
    
        # Index only .htm, .html, plus any all-digit file names
        IndexOnly .htm .html
        FileMatch filename contains ^\d+$
    
        # Same as previous, but maybe a little slower
        FileRules filename regex not !\.(htm|html)$!
        FileMatch filename contains ^\d+$

    Swish-e checks these settings in the order of pathname, dirname, and filename, and FileMatch patterns are checked before FileRules, in general. This allows you to exclude most files with FileRules, yet allow in a few special cases with FileMatch. For example:

        # Exclude all files of .exe, .bin, and .bat
        FileRules filename contains \.(exe|bin|bat)$
        # But, let these two in
        FileMatch filename is baseball\.bat incoming_mail\.bin
    
        # Same, but as a single pattern
        FileMatch filename is (baseball\.bat|incoming_mail\.bin)

    The directory type is somewhat unique. When Swish-e recurses into a directory it will compare all the files in the directory with the pattern and then decide if that entire directory should or should not be indexed (or recursed). Note that you are matching against file names in a directory -- and some of those names may be directory names.

    A FileRules directory match will cause Swish-e to ignore all files and sub-directories in the current directory.

    Warning: A match with FileMatch directory says to index everything in the *current* directory and ignore any FileRules for this directory.

    Example:

        # Don't index any directories (and sub directories) that contain
        # a file (or sub-directory) called "index.skip"
        FileRules directory contains ^index\.skip$
    
        # Don't index directories that contain a .htaccess file.
        FileRules directory contains ^\.htaccess

    Note: While processing directories, Swish-e will ignore any files or directories that begin with a dot ("."). You may index files or directories that begin with a dot by specifying their name with IndexDir or -i.

    title checks for a pattern match in an HTML title.

    Example:

        FileRules title contains construction example pointers
    
        # This example says to ignore case
        FileRules title regex "/^Internal document/i"

    Note: FileRules title works for any input method (fs, prog, or http) that is parsed as HTML, and where a title was found in the document.

    In case all this seems a bit confusing, processing a directory happens in the following order.

    First the directory name is checked:

        FileRules dirname - reject entire directory if matches

    Next the directory is scanned and each file name (which might be the name of a sub-directory) is checked:

        FileRules directory - reject entire dir if *any* files match
        FileMatch directory - accept entire dir if *any* files match

    Then, unless FileMatch directory matched, each file is tested with FileMatch. A match says to index the file without further testing (i.e. overrides FileRules and IndexOnly):

        FileMatch pathname  \
        FileMatch dirname   - file is accepted if any match
        FileMatch filename  /

    otherwise

        IndexOnly - file is checked for the correct file extension
    
        FileRules pathname  \
        FileRules dirname   - file is rejected if any match
        FileRules filename  /

    finally, the file is indexed.

    Files (not directories) listed with IndexDir or -i are processed in a similar way:

        FileMatch pathname  \
        FileMatch dirname   - file is accepted if any match
        FileMatch filename  /

    otherwise, the file is rejected if it doesn't have the correct extension or a FileRules matches.

        IndexOnly - file is checked for the correct file extension
    
        FileRules pathname  \
        FileRules dirname   - file is rejected if any match
        FileRules filename  /

    Note: If things are not indexing as you expect, create a directory with some test files and use the -T regex trace option to see how file names are checked. Start with very simple tests!

Directives for the HTTP Access Method Only

The HTTP Access method is enabled by the "-S http" switch when indexing. It works by running a Perl program called SwishSpider which fetches documents from a web server.

Only text files (content-type of "text/*") are indexed with the HTTP Access Method. Other document types (e.g. PDF or MSWord) may be indexed as well. The SwishSpider will attempt to make use of the SWISH::Filter module (included with the Swish-e distribution) to convert documents into a format that Swish-e can index.

Note: The -S prog method of spidering (using spider.pl) can be a replacement for the -S http method. It offers more configuration options and better spidering speed.

These directives below are available when using the HTTP Access Method of indexing.

  • MaxDepth *integer*

    MaxDepth defines how many links the spider should follow before stopping. A value of 0 configures the spider to traverse all links. The default is MaxDepth 0.

        MaxDepth 5

    Note: The default was changed from 5 to 0 in release 2.4.0

  • Delay *seconds*

    The number of seconds to wait between issuing requests to a server. This setting allows for more friendly spidering of remote sites. The default is 5 seconds.

        Delay 1

    Note: The default was changed from 60 to 5 seconds in release 2.4.0

  • TmpDir *path*

    The location of a writable temp directory on your system. The HTTP access method tells the Perl helper to place its files in this location, and the -e switch causes Swish-e to use this directory while indexing. There is no default.

        TmpDir /tmp/swish

    If this directory does not exist or is not writable Swish-e will fail with an error during indexing.

    Note, the environment variables of TMPDIR, TMP, and TEMP (in that order) will override this setting.

  • SpiderDirectory *path*

    The location of the Perl helper script called swishspider. If you use a relative directory, it is relative to your directory when you run Swish-e, not to the directory that Swish-e is in. The default is the location swishspider was installed. Normally this does not need to be set.

        SpiderDirectory /usr/local/swish
  • EquivalentServer *server alias*

    Often times the same site may be referred to by different names. A common example is that often http://www.some-server.com and http://some-server.com are the same. Each line should have a list of all the method/names that should be considered equivalent. Multiple EquivalentServer directives may be used. Each directive defines its own set of equivalent servers.

        EquivalentServer http://library.berkeley.edu http://www.lib.berkeley.edu
        EquivalentServer http://sunsite.berkeley.edu:2000 http://sunsite.berkeley.edu

Directives for the prog Access Method Only

This section details the directives that are only available for the "prog" document source feature of Swish-e. The "prog" access method runs an external program that "feeds" documents to Swish-e. This allows indexing and filtering of documents from any source.

See prog - general purpose access method in the SWISH-RUN man page for more information.

A number of example programs for use with the "prog" access method are provided in the prog-bin directory. Please see those example if you have questions about implementing a "prog" input program.

  • SwishProgParameters *list of parameters*

    This is a list of parameters that will be sent to the external program when running with the "prog" document source method.

        SwishProgParameters /path/to/config hello there
        IndexDir /path/to/program.pl

    Then running:

        swish-e -c config -S prog

    Swish-e will execute /path/to/program.pl and pass /path/to/config hello there as three command line arguments to the program. This directive makes it easy to pass settings from the Swish-e configuration file to the external program.

    For example, the spider.pl program (included in the prog-bin directory) uses the SwishProgParameters to specify what file to read for configuration information.

        SwishProgParameters spider.config
        IndexDir ./spider.pl

    The spider.pl program also has a default action so you can avoid using a configuration file:

        SwishProgParameters default http://www.swishe.org/ http://some.other.site/
        IndexDir ./spider.pl

    And the spider program will use default settings for spidering those sites.

    Swish-e can read documents from standard input, so another way to run an external program with parameters is:

        ./spider.pl spider.conf | ./swish-e -S prog -i stdin

Notes when using MS Windows

You should use unix style path separators to specify your external program. Swish will convert forward slashes to backslashes before calling the external program. This is only true for the program name specified with IndexDir or the -i command line option.

In addition, Swish-e will make sure the program specified actually exists, which means you need to use the full name of the program.

For example, to run the perl spider program spider.pl you would need a Swish-e configuration file such as:

    IndexDir e:/perl/bin/perl.exe
    SwishProgParameters prog-bin/spider.pl default http://swish-e.org

and run indexing with the command:

    swish-e -c swish.cfg -S prog -v 9

The IndexDir command tells Swish-e the name of the program to run. Under unix you can just specify the name of the script, since unix will figure out the program from the first line of the script.

The SwishProgParameters are the parameters passed to the program specified by IndexDir (perl.exe in this case). The first parameter is the perl script to run (prog-bin/spider.pl). Perl passes the rest of the parameters directly to the perl script. The second parameter default tells the spider.pl program to use default settings for spidering (or you could specify a spider config file -- see perldoc spider.pl for details), and lastly, the URL is passed into the spider program.

Document Filter Directives

Internally, Swish-e knows how to parse only text, HTML, and XML documents. With "filters" you can index other types of documents. For example, if all your web pages are in gzip format a filter can uncompress these on the fly for indexing.

You may wish to read the Swish-e FAQ question on filtering before continuing here. How Do I filter documents?

There are two suggested methods for filtering.

Filtering with SWISH::Filter

The Swish-e distribution includes a Perl module called SWISH::Filter and individual filters located in the filters directory. This system uses plug-in filters to extend the types of documents that Swish-e can index. The plug-in filters do not actually do the filtering, but rather provide a standard interface for accessing programs that can filter or convert documents. The programs that do the filtering are not part of the Swish-e distribution; they must be downloaded and installed separately.

The advantage of this method is that new filtering methods can be installed easily.

This system is designed to work with the -S http and -prog methods, but may also be used with the FileFilter feature and -S fs indexing method. See $prefix/share/doc/swish-e/examples/filter-bin/swish_filter.pl for an example.

See the filters/README file for more information.

Filtering with the FileFilter feature

A filter is an external program that Swish-e executes while processing a document of a given type. Swish-e will execute the filter program for each file that matches the file suffix (extension) set in the FileFilter or FileFilterMatch directives. FileFilterMatch matches using regular expressions and is described below.

Filters may be used with any type of input method (i.e. -S fs, -S http, or -S prog). But because

Swish-e calls the external program passing as default arguments:

  • $0

    the name of the filter program

  • $1

    the physical path name of the file to read. This may be a temporary file location if indexing by the http method.

  • $2

    When indexing under the file system this will be the same as $1 (the path to the source file), but when indexing under the http method this will be the URL of the source document.

Swish-e can also pass other parameters to the filter program. These parameters can be defined using the FileFilter or FileFilterMatch directives. See Filter Options below.

The filter program must open the file, process its contents, and return it to Swish-e by printing to STDOUT.

Note that this can add a significant amount of time to the indexing process if your external program is a perl or shell script. If you have many files to filter you should consider writing your filter in C instead of a shell or perl script, or using the "prog" Access Method along with SWISH::Filter.

  • FilterDir *path-to-directory*

    Deprecated.

    This is the path to a directory where the filter programs are stored. Swish-e looks in this directory to find the filter specified in the FileFilter directive.

    This directive is not needed if the filter program can be found in your system's path. Even if your filter is not in your system's path you can specify the full path to the filter in the FileFilter or FileFilterMatch directives.

    Example:

        FilterDir /usr/local/swish/filters
  • FileFilter *suffix* "filter-prog" ["filter-options"]

    This maps file suffix (extension) to a filter program. If filter-prog starts with a directory delimiter (absolute path), Swish-e doesn't use the FilterDir settings, but uses the given filter-prog path directly.

    On systems that have a working fork(2) system call the filter program is run by forking swish then executing the filter. This mean the shell is not used for running the filter and no arguments are passed through the shell.

    On other systems (e.g. Windows) the arguments are double-quoted and popen(3) is used to run the program. This does pass argument though the shell and may be a security concern depending on the abilities of the shell.

    Filter options:

    Filter options are a string passed as arguments to the filter-prog. Filter options can contain variables, replaced by Swish-e. If you omit filter-options Swish-e will use default parameters for the options listed above.

        Default:      %p %P
        Which means:  pass   "workfile path" and "documentfile path" to filter.

    Variables in filter options:

        %%   =  %
        %P   =  Full document pathname (e.g. URL, or path on filesystem)  
        %p   =  Full pathname to work file (maybe a tmpfile or the real document path on filesystem)
        %F   =  Filename stripped from full document pathname
        %f   =  Filename stripped from "work" pathname
        %D   =  Directoryname stripped from full document pathname
        %d   =  Directoryname stripped from full "work" pathname

    Examples of strings passed:

        %P =  document pathname:  http://myserver/path1/mydoc.txt
        %p =  work pathname:      /tmp/tmp.1234.mydoc.txt
        %F =     mydoc.txt
        %f =     tmp.1234.mydoc.txt
        %D =     http://myserver/path1
        %d =     /tmp

    Notes when using MS Windows

    Windows uses double quotes to escape shell metacharacters, so if you need to use quotes then use single quotes around the entire option string.

        FileFiler .mydoc mydocfilter.exe '--title "text with spaces"'

    You can specify the filter program using forward slashes (unix style). Swish will convert the slashes to backslashes before running your program.

        FileFilter .mydoc     c:/some/path/mydocfilter.exe  '-d "%d" -example -url "%P" "%f"'

    Examples of filters:

        FileFilter .doc       /usr/local/bin/catdoc "-s8859-1 -d8859-1 %p"
        FileFilter .pdf       pdftotext   "%p -"
        FileFilter .html.gz   gzip  "-c %p"
        FileFilter .mydoc     "/some/path/mydocfilter"  "-d %d -example -url %P %f"

    The above examples are running a binary filter program. For more complicated filtering needs you may use a scripting language such as Perl or a shell script. Here's some examples of calling a shell and perl script:

        FileFilter .pdf       pdf2html.sh
        FileFilter .ps        ghostscript-filter.pl

    Using a scripting language (or any language that has a large startup cost) can greatly increase the indexing time. For small indexing jobs, this may not be an issue, but for large collections of files that require processing by a scripting language, you may be better off using the -S prog access method where the script will only be compiled once, instead of for each document.

    Filters are probably easier to write than a -S prog program. Which you decide to use depends on your requirements. Examples of filter scripts can be found in the filter-bin directory, and examples of -S prog programs can be found in the prog-bin directory.

  • FileFilterMatch *filter-prog* *filter-options* *regex* [*regex* ...]

    This is similar to FileMatch except uses regular expressions to match against the file name. *filter-prog* is the path to the program. Unlike FileFilter this does not use the FilterDir option. Also unlike FileFilter you must specify the *filter-options*.

    Examples:

        FileFilterMatch ./pdftotext "%p -" /\.pdf$/

    Note that will also match a file called ".pdf", so you may want to use something that requires a filename that has more than just an extension. For example:

        FileFilterMatch ./pdftotext "%p -" /.\.pdf$/

    To specify more than one extension:

        FileFilterMatch ./check_title.pl "%p" /\.html$/  /\.htm$/

    Or a few ways to do the same thing:

        FileFilterMatch ./check_title.pl %p /\.(html|html)$/
        FileFilterMatch ./check_title.pl %p /\.html?$/

    And to ignore case:

        FileFilterMatch ./check_title.pl %p /\.html?$/i

    You may also precede an expression with "not" to negate regular expression that follow. For example, to match files that do not have an extension:

        FileFilterMatch ./convert "%p %P" not /\..+$/

Document Info

$Id: SWISH-CONFIG.pod 1846 2006-10-20 20:18:30Z whmoseley $

.

swish-e-2.4.7/html/swish.css0000644000077100017500000002327311166010467012652 00000000000000 /* Swish-e CSS based on: */ /************************************************* * TITLE: Prosimii Alternative Screen Stylesheet * * URI : prosimii/prosimii-screen-alt.css * * MODIF: 2004-Apr-28 21:56 +0800 * *************************************************/ .clearfix:after { content: "."; display: block; height: 0; font-size: 0; clear: both; visibility: hidden; } .clearfix {display: inline-block;} body { font-family: arial, helvetica, sans-serif; font-size: 100.1%; margin: 0; padding: 0; } /* Hides from IE5/Mac \*/ * html .clearfix {height: 1px;} .clearfix {display: block;} /* End hide from IE5/Mac */ /* ----- layout -------- */ /* wraps side-bar and content-area */ #body-area { width: 100%; margin: 0; padding: 0; margin-top: 1em; } /* wraps main-copy */ #content-area { float: right; width: 80%; margin: 0; padding: 0; } #side-bar { float: right; width: 19%; margin: 0; padding: 0; } #main-copy { margin-left: 1em; margin-right: 1em; } /* ------------------ */ acronym, .titleTip { border-bottom: 1px dotted rgb(61,92,122); cursor: help; margin: 0; padding: 0 0 0.4px 0; } a { color: rgb(61,92,122); background-color: transparent; text-decoration: none; margin: 0; padding: 0 1px 2px 1px; } a:hover { color: rgb(117,144,174); text-decoration: underline; } ol { margin: 1em 0 1.5em 0; padding: 0; } ul { list-style-type: square; margin: 1em 0 1.5em 0; padding: 0; } dl { margin: 1em 0 0.5em 0; padding: 0; } ul li { line-height: 1.5em; margin: 1.25ex 0 0 1.5em; padding: 0; } ol li { line-height: 1.5em; margin: 1.25ex 0 0 2em; padding: 0; } dt { font-weight: bold; margin: 0; padding: 0 0 1ex 0; } dd { line-height: 1.75em; margin: 0 0 1.5em 1.5em; padding: 0; } .doNotDisplay { display: none !important; } /* #### search form #### */ .srchform { margin: 0px; } input.button { /* font-size: 80%; */ color: rgb(61,92,135); border-style: outset; border-width: 1px; border-color: #004186; background-color: white; } input.button:hover { /* font-weight: bold; */ color: rgb(193,102,90); /* font-size:9px; */ background-color: #e5ecf9; border-style: inset; /* letter-spacing: -0.5px */ } /* ##### Header ##### */ .superHeader { color: rgb(130,128,154); background-color: rgb(33,50,66); text-align: right; margin: 0; padding: 0.5ex 10px; } .superHeader span { color: rgb(195,196,210); background-color: transparent; font-weight: bold; text-transform: uppercase; } .superHeader a { color: rgb(195,196,210); background-color: transparent; text-decoration: none; margin: 0; padding: 0 0.25ex 0 0; } .superHeader a:hover { color: rgb(193,102,90); background-color: transparent; text-decoration: none; } .midHeader { color: white; background-color: rgb(61,92,135); margin: 0; padding: 0.26ex 10px; } .headerTitle { font-size: 400%; margin: 0; padding: 0; } .headerSubTitle { font-size: 151%; font-weight: normal; font-style: italic; margin: 0 0 1ex 0; padding: 0; } .headerLinks { text-align: right; margin: 0; padding: 0 0 2ex 0; position: absolute; right: 1.5em; top: 3.5em; } .headerLinks a { color: white; background-color: transparent; text-decoration: none; margin: 0; padding: 0 0 0.5ex 0; display: block; } .headerLinks a:hover { color: rgb(195,196,210); background-color: transparent; text-decoration: underline; } .subHeader { color: white; background-color: rgb(117,144,174); margin: 0; padding: 0.5ex 10px; } .subHeader a, .subHeader .highlight { color: white; background-color: transparent; font-size: 110%; font-weight: bold; text-decoration: none; margin: 0; padding: 0 0.25ex 0 0; } .subHeader a:hover, .subHeader .highlight { /*color: rgb(255,204,0); */ color: rgb(195,196,210); background-color: transparent; text-decoration: underline; } /* ##### Side Menu ##### */ #side-bar ul { color: rgb(204,204,204); background-color: transparent; list-style-type: square; list-style-position: inside; border: 1px solid rgb(204,204,204); margin: 0 5px 15px 0; } #side-bar ul ul { border: none; } /* floating turned off for documentation pages */ /* NOTE this doesn't work in MSIE */ /* .floating { position: fixed !important; } */ /* disabled for now */ #side-bar a { text-decoration: none; } #side-bar:hover { color: rgb(117,144,174); background-color: transparent; border-color: rgb(117,144,174); } #side-bar li { margin: 0; /* padding: 0.75ex 0 1ex 1.75ex; */ /* esp with documentation sublinks, this gets too long */ padding: 0 0 0 1ex; } #side-bar li:hover { color: rgb(61,92,122); background-color: transparent; } #side-bar li a:hover { text-decoration: underline; } /* indicate the parent of a submenu when not active */ /* play with these at your whimsey */ #side-bar li.menuparent > a { font-style: italic; /* font-weight: bold; */ } #side-bar li.menuparent > a.thisfile { font-style: normal; font-weight: normal; } /* turn 'off' link for current page */ #side-bar li a.thisfile { text-decoration: none; color: black; } #side-bar li a.thisfile:hover { text-decoration: none; color: black; } /* documentation submenu */ ul.submenu { /* font-size: 80%; */ padding: 0 0 0 0; margin: 0; list-style-type: circle; } li.submenu { /* font-size: 80%; */ text-indent: 4pt; padding: 0 0 0 0; margin: 0; } /* TOC at top of each POD page */ ul.toc li { padding: 0 0 0 0; margin-left: 12pt; margin-top: 0; margin-bottom: 0; } ul.toc li a { text-decoration: none; } ul.toc li a:hover { text-decoration: underline; } /* ##### Documentation #### */ /* the Doc defs were originally for a new 'make html' build from POD. Keeping here for reference. */ div.doc { /* mostly to give us a sane margin on left/right */ text-align: left; margin: 8px; } div.navbar { text-align: center; } /* ##### Main Copy content ##### */ #main-copy h1 { color: rgb(117,144,174); background-color: transparent; font-family: arial, helvetica, sans-serif; font-size: 186%; margin: 0; padding: 1.5ex 0 0 0; } #main-copy h2 { color: rgb(61,92,122); background-color: transparent; font-family: arial, helvetica, sans-serif; font-weight: normal; font-size: 151%; margin: 0; padding: 1ex 0 0 0; } #main-copy p { line-height: 1.5em; margin: 1em 0 1.5em 0; padding: 0; } #main-copy p.quote { line-height: 1em; padding: 1ex; margin: 0; padding: 1em 2em; background-color: rgb(239,239,239);; } .more { text-align: right; margin: 0; padding: 0.5em 0; } .more a { color: rgb(61,92,122); background-color: transparent; /* font-size: 92%; */ text-decoration: underline; margin: 0; padding: 0.25ex 0.75ex; } .more a:hover { color: rgb(117,144,174); text-decoration: none; } /* ##### Footer ##### */ #footer { color: rgb(51,51,102); background-color: rgb(239,239,239); font-size: 87%; text-align: center; line-height: 1.25em; padding: 1ex 10px; clear: both; } #footer a { color: rgb(0,68,204); background-color: transparent; text-decoration: underline; } #footer a:hover { text-decoration: none; } /* graphics page */ div.graphtitle { text-align: center; margin: 10px 10px 10px 10px; padding: 10px 10px 10px 10px; } /* For header on file lists (like download and swish-daily ) */ .dirHeader { color: white; background-color: rgb(117,144,174); padding: 0.5ex 10px; margin-bottom: 10ex; /* How do I get this to work??? */ } /* For
 sections on the pod docs */

pre  {
  background: #eeeeee;
  border: 1px solid #888888;
  color: black;
  padding: 1em;
  white-space: pre;
}

/* ## Search Results ## */

form#searchform {
    margin-bottom: 2em;
}

div.search-page {  /* enclosing div */
    margin: 2em 2em 0 0em;
    padding: 3px 3px 3px 3px;
    /* background-color:  #eeeeee; */
    /* border: 1px solid black; */
}

.search-message { /* error messages, and "No Results" */
    color: red;
    text-align: center;
    font-size: 150%;
}

.search-header { /* Results for "foo"  page 1 */
    font-size: 130%;
    background-color: #EEEEEE;
    padding-left: 5px;
}


.search-results { /* contains all results */
    margin-top: 2em;
}

.search-result {  /* each result */
    margin-bottom: 2em;
}

/* search links -- is this the right format ?? */
div.search-result a {
    text-decoration: none;
}
div.search-result a:hover {
    text-decoration: underline;
}

h2#result-heading {
    font-size: 100%;
    border-bottom: 1px solid gray;
}
h2#result-heading a:hover {
    text-decoration: none;
}

#showform, #hideform {
    text-decoration: underline;
    font-size: 80%;
    color: rgb(61,92,122);
}

.search-rank {
    color: gray;
    font-size: 85%;
}

.search-description {
    margin-top: 1em;
    margin-bottom: 1em;
    margin-left: 2em;
    max-width: 700px; /* not supported by IE */
    
    /* what kind of black magic is this? */
    /* width:expression(document.body.clientWidth > 600? "600px": "auto" ); */
}

.search-metadata { 
    margin-left: 2em;
    font-size: 0.8em;
    color: green;
}
.search-metadata a { 
    text-decoration: none; 
    color: green;
}
.highlight { 
    background : #FFFF99;
    font-weight: bold;
}

span.indxtype {
/* for ID'ing the search results according to index */
    color: gray;
    font-size: 90%;
}

#archive {
    margin-left: 1em;
    margin-right: 1em;
    font-size: .85em;
}

#archive ul li {
    margin: 0 0 0 3em;
}

#archive tr:hover {
    background: #eee;
}

#archive #listheader, #archive .head {
    border-bottom: 1px solid gray;
    margin-bottom: 1em;
}


swish-e-2.4.7/html/Makefile.am0000664000077100017500000000133511166010112013017 00000000000000# $id$
#
# Conditionally install the html documentation


# Where docs are installed
htmldir = $(datadir)/doc/$(PACKAGE)/html

if BUILDDOCS

# build the docs from website src

$(html_files):
	$(SWISH_WEB) -swishsrc $(top_srcdir) -poddest . -v -all

DISTCLEANFILES = \
	$(html_DATA)

endif

if INSTALLDOCS

html_DATA = \
	$(html_files)

endif



html_files = \
    api.html \
    changes.html \
    filter.html \
    index.html \
    install.html \
    readme.html \
    search.cgi.html \
    spider.html \
    swish-3.0.html \
    swish-bugs.html \
    swish.cgi.html \
    swish-config.html \
    swish.css \
    swish-faq.html \
    swish-library.html \
    swish-run.html \
    swish-search.html

EXTRA_DIST = \
    $(html_DATA)

swish-e-2.4.7/html/swish-bugs.html0000644000077100017500000002354011166010464013756 00000000000000





  
    
    
    

    
        
    
        
    
        
    
        
    
        
    
        
    

    Swish-e :: SWISH-BUGS - List of bugs known in Swish-e


  



    

    

    
      

    

    
    


home | support | download

SWISH-BUGS - List of bugs known in Swish-e

Swish-e version 2.4.7

Table of Contents


DESCRIPTION

This file contains a list of bugs reported or known in Swish-e. If you find a bug listed here you do not need to report it as a bug. But feel free to bug the developers about it on the Swish-e discussion list.

Note that this list is imcomplete and may not be up to date.

Bugs in Swish-e version 2.4

  • Stopwords not removed from query with Soundex

    In dev version 2.5.2 noticed that stopwords are not removed from the query when using Soundex. The plan is to rewrite the parser soon... (July 2004)

  • Wild card searching can be very slow

    Wild card searching needs to be optimized.

    Here's a three letter search:

      $ swish-e -w 'tra*' -m1
      # Number of hits: 99952
      # Search time: 5.424 seconds

    Two letters:

      $ swish-e -w 'tr*' -m1
      # Number of hits: 100000
      # Search time: 10.563 seconds

    Single letter search:

      $ swish-e -w 't*' -m1
      # Number of hits: 100000
      # Search time: 510.939 seconds

    and used about 280MB or RAM.

    This is a potential for a DoS attack. If you have a large index you may wish to filter out single character wild cards.

  • Character Encodings

    The XML parser (Expat) returns UTF-8 data to swish-e. Therefore, the XML parser should only be used for parsing US-ASCII encoded text.

    The XML2 & HTML2 parsers (Libxml2) converts characters from UTF-8 to 8859-1 encodings before indexing and writing properties. Indexing non-8859-1 data may result in invalid character mappings.

    These issues will be resolved soon.

  • Phrase search failes with DoubleMetaphone

    DoubleMetaphone searching can produce two search words for a single query word. The words are expanded to (word1 OR word2), but that fails in a phrase query: "some phrase (word1 or word2) here"

    swish-e query parser is due for a rewrite, and this could be resolved then.

        Reported: August 20, 2002 - moseley
  • Merging

    merge.c does not check for matching stopwords or buzzwords in each index.

    History:

        Reported: September 3, 2002 - moseley
  • ResultSortOrder

    ResultSort order is not used (and is not documented). The problem is that the data passed to Compare_Properties() does not have access to the ResultSortOrder table.

History:

    Reported: September 3, 2002 - moseley

Document Info

$Id: SWISH-BUGS.pod 1613 2005-02-02 22:53:39Z whmoseley $

.

swish-e-2.4.7/html/install.html0000644000077100017500000021434611166010461013334 00000000000000 Swish-e :: INSTALL - Swish-e Installation Instructions
home | support | download

INSTALL - Swish-e Installation Instructions

Swish-e version 2.4.7

Table of Contents


OVERVIEW

This document describes how to download, build, and install Swish-e from source. Also found below is a basic overview of using Swish-e to index documents, with pointers to other, more advanced examples.

This document also provides instructions on how to get help installing and using Swish-e (and the important information you should provide when asking for help). Please read these instructions before requesting help on the Swish-e discussion list. See "QUESTIONS AND TROUBLESHOOTING".

Although building from source is recommended, some OS distributions (e.g., Debian) provide pre-compiled binaries. Check with your distribution for available packages. Build from source, if your distribution does not offer the current version of Swish-e.

Also, please read the Swish-e FAQ (SWISH-FAQ), as it answers many frequently-asked questions.

Swish-e knows how to index HTML, XML, and plain text documents. Helper applications and other tools are used to convert documents such as PDF or MS Word into a format that Swish-e can index. These additional applications and tools (listed below) must be installed separately. The process of converting documents is called "filtering".

NOTE: Swish-e version 4.2.0 installs a lot more files when running "make install". Be aware that the Swish-e documentation may thus include errors about where files are located. Please notify the Swish-e discussion list of any documentation errors.

Upgrading from previous versions of Swish-e

If you are upgrading from a previous version of Swish-e, read the CHANGES page first. The Swish-e index format may have changed and existing indexes may not work with the newer version of Swish-e.

If you have existing indexes, you may need to re-index your data before running the "make install" step described below. Swish-e may be run from the build directory after compiling, but before installation.

Windows Users

A Windows binary version is available as a separate download from the Swish-e site (http://swish-e.org). Many of the installation instructions below will not apply to Windows users; the Windows version is pre-compiled and includes libxml2, zlib, xpdf, and catdoc.

A number of Perl modules may also be needed. These can be installed with ActiveState's PPM utility.

   libwww-perl   - the LWP modules (for spidering)
   HTML-Tagset   - used by web spider
   HTML-Parser   - used by web spider
   MIME-Types    - used for filtering documents when not spidering
   HTML-Template - formatting output from swish.cgi (optional)
   HTML-FillInForm (if HTML-Template is used)

Building from CVS

Please refer to the README.cvs file found in the documentation directory $prefix/share/doc/swish-e.

SYSTEM REQUIREMENTS

Swish-e makes use of a number of libraries and tools that are not distributed with Swish-e. Some libraries need to be installed before building Swish-e from source; other tools can be installed at any time. See below for details.

Software Requirements

Swish-e is written in C. It has been tested on a number of platforms, including Sun/Solaris, Dec Alpha, BSD, Linux, Mac OS X, and Open VMS.

The GNU C compiler (gcc) and GNU make are strongly recommended. Repeat: you will find life easier if you use the GNU tools.

Optional but Recommended Packages

Most of the packages listed below are available as easily installable packages. Check with your operating system vendor or install them from source. Most are very common packages that may already be installed on your computer.

As noted below, some packages need to be installed before building Swish-e from source, while others may be added after Swish-e is installed.

  • Libxml2

    libxml2 is very strongly recommended. It is used for parsing both HTML and XML files. Swish-e can be built and installed without libxml2, but the HTML parser that is built into Swish-e is not as accurate as libxml2.

        http://xmlsoft.org/

    libxml2 must be installed before Swish-e is built, or it will not be used.

    If libxml2 is installed in a non-standard location (e.g., libxml2 is built with --prefix $HOME/local), make sure that you add the bin directory to your $PATH before building Swish-e. Swish-e's configure script uses a program created by libxml2 (xml2-config) to find the location of libxml2. Use which xml2-config to verify that the program can be found where expected.

  • Zlib Compression

    The Zlib compression library is commonly installed on most systems and is recommended for use with Swish-e. Zlib is used for compressing text stored in the Swish-e index.

        http://www.gzip.org/zlib/

    Zlib must be installed before building Swish-e.

  • Perl Modules

    Although Swish-e is a compiled C program, many support features use Perl. For example, both the web spiders and modules to help with filtering documents are written in Perl.

    The following Perl modules may be required. Check your current Perl installation, as many may already be installed.

        LWP
        URI
        HTML::Parser
        HTML::Tagset
        MIME::Types (optional)

    Note that installing Bundle::LWP with the CPAN module

        perl -MCPAN -e 'install Bundle::LWP'

    will install many of the above modules.

    If you wish to use HTML-Template with swish.cgi to generate output, install:

        HTML::Template
        HTML::FillInForm

    If you wish to use Template-Toolkit with swish.cgi to generate output, install:

        Template

    Questions about installing these modules may be sent to the Swish-e discussion list.

    The search.cgi example script requires both Template-Toolkit and HTML::FillInForm.

  • Indexing PDF Documents

    Indexing PDF files requires the xpdf package. This is a common package, available with most operating systems and often provided as an add-on package.

        http://www.foolabs.com/xpdf/

    Xpdf may be added after Swish-e is installed.

  • Indexing MS Word Documents

    Indexing MS Word documents requires the Catdoc program.

        http://www.wagner.pp.ru/~vitus/software/catdoc/

    Catdoc may be added after Swish-e is installed.

  • Indexing MP3 ID3 Tags

    Indexing MP3 ID3 Tags requires the MP3::Tag Perl module. See http://search.cpan.org. MP3::Tag may be installed after Swish-e is installed.

  • Indexing MS Excel Files

    Indexing MS Excel files is supported by the following Perl modules, also available at http://search.cpan.org.

        Spreadsheet::ParseExcel
        HTML::Entities

    These Perl modules may be installed after Swish-e is installed.

INSTALLATION

Here are brief installation instructions that should work in most cases. Following this section are more detailed instructions and examples.

Building Swish-e

Download Swish-e using your favorite web browser or a utility such as wget, lynx, or lwp-download. Unpack and build the distribution, using the following steps:

Note: "swish-e-2.4.0" is used as an example. Download the most current available version and adjust the commands below! Also, if you are running Debian, see the notes below on building a .deb package from the Swish-e source package.

Pay careful attention to the "prompt" character used on the following command lines. A "$" prompt indicates steps run as an unprivileged user. A "#" indicates steps run as the superuser (root).

    $ wget http://swish-e.org/Download/swish-e-2.4.0.tar.gz
    $ gzip -dc swihs-e-2.4.0.tar.gz | tar xof -
    $ cd swish-e-2.4.0  (this directory will depend on the version of Swish-e)

    $ ./configure
    $ make
    $ make check
    ...
    ==================
    All 3 tests passed
    ==================

    $ su root  (or use sudo)
    (enter password)

    # make install
    # exit
    $ swish-e -V
    SWISH-E 2.4.0

IMPORTANT: Once Swish-e is installed, do not run it as the superuser (root) -- root is only required during the installation step, when installing into system directories. Please do not break this rule.

NOTE: If you are upgrading from an older version of Swish-e, be sure and review the CHANGES file. Old index files may not be compatible with newer versions of Swish-e. After building Swish-e (but before running "make install"), Swish-e can be run from the build directory:

    $ src/swish-e -V

To minimize downtime, create new index files before running "make install", by using Swish-e from the build directory. Then, copy the index files to the live location and run "make install":

    $ src/swish-e -c /path/to/config -f index.new

Keep in mind that the location you index from may affect the paths stored in the index file.

Installing without root access

Here's another installation example. This might be used if you do not have root access or you wish to install Swish-e someplace other than /usr/local.

This example also shows building Swish-e in a "build" directory that is separate from where the source files are located. This is the recommended way to build Swish-e, but it requires GNU Make. Without GNU Make, you will likely need to build from within the source directory, as shown in the previous example.

    $ tar zxof swish-e-2.4.0.tar.gz  (GNU tar with "z" option)
    $ mkdir build
    $ cd build

Note that the current directory is not where Swish-e was unpacked.

Swish-e uses a configure script. configure has many options, but it uses reasonable and standard defaults. Running

    $ ../swish-e-2.4.0/configure --help

will display the options.

Two options are of common interest: --prefix sets the top-level installation directory; --disable-shared will link Swish-e statically, which may be needed on some platforms (Solaris 2.6, perhaps).

Platforms may require varying link instructions when libraries are installed in non-standard locations. Swish-e uses the GNU autoconf tools for building the package. autoconf is good at building and testing, but still requires you to provide information appropriate for your platform. This may mean reading the manual page for your compiler and linker to see how to specify non-standard file locations.

For most Unix-type platforms, you can use LDFLAGS and CPPFLAGS environment variables to specify paths to "include" (header) files and to libraries that are not in standard locations.

In this example, we do not have root access. We have installed libxml2 and libz in $HOME/local. Swish-e will also be installed in $HOME/local (by using the --prefix setting).

In this case, you would need to add $HOME/local/bin to the start of your shell's $PATH setting. This is required because libxml2 installs a program that is used when running the configure script. Before running configure, type:

    $ which xml2-config

It should list $HOME/local/bin/xml2-config.

Now run configure (remember, we are in a separate "build" directory):

    $ ../swish-e-2.4.0/configure \
        --prefix=$HOME/local \
        CPPFLAGS=-I$HOME/local/include \
        LDFLAGS="-R$HOME/local/lib -L$HOME/local/lib"

    $ make >/dev/null  (redirect output to only see warnings and errors)

    $ make check
    ...
    ==================
    All 3 tests passed
    ==================

    $ make install
    $ $HOME/local/bin/swish-e -V 
    SWISH-E 2.4.0

Note the use of double quotes in the LDFLAGS line above. This allows $HOME to be expanded within the text string.

Run-time paths

The -R option says to add a specified path (or paths) to those that are used to find shared libraries at run time. These paths are stored in the Swish-e binary. When Swish-e is run, it will look in these directories for shared libraries.

Some platforms may not support the -R option. In this event, set the LD_RUN_PATH environment variable before running make.

Some systems, such as Redhat, do not look in /usr/local/lib for libraries. In these cases, you can either use -R, as above, when building Swish-e or add /usr/local/lib to /etc/ld.so.conf and run ldconfig as root.

If all else fails, you may need to actually read the man pages for your platform.

Building a Debian Package

The Swish-e distribution includes the files required to build a Debian package.

    $ tar zxof swish-e-2.4.0.tar.gz  (GNU tar with "z" option)
    $ cd swish-e-2.4.0
    $ fakeroot debian/rules binary
    [lots of output]
    dpkg-deb: building package `swish-e' in `../swish-e_2.4.0-0_i386.deb'.
    $ su
    # dpkg -i ../swish-e_2.4.0-0_i386.deb

What's installed

Swish installs a number of files. By default, all files are installed below /usr/local, but this can be changed by setting --prefix when running configure (as shown above). Individual paths may also be set. Run configure --help for details.

   $prefix/bin/swish-e         The Swish-e binary program
   $prefix/share/doc/swish-e/  Full documentation and examples
   $prefix/lib/libswish-e      The Swish-e C library
   $prefix/include/swish-e.h   The library header file
   $prefix/man/man1/           Documentation as manual pages
   $prefix/lib/swish-e/        Helper programs (spider.pl, swishspider, swish.cgi)
   $prefix/lib/swish-e/perl/   Perl helper modules

Note that the Perl modules are not installed in the system Perl library. Swish-e and the Perl scripts that require the modules know where to find the modules, but the perldoc program (used for reading documentation) does not. This can be corrected by adding $prefix/lib/swish-e and $prefix/lib/swish-e/perl to the PERL5LIB environment variable.

Documentation

Documentation can be found in the $prefix/share/doc/swish-e directory. Documentation is in html format at $prefix/share/doc/swish-e/html and can also be read on-line at the Swish-e web site:

    http://swish-e.org/

The Swish-e documentation as man(1) pages

Running "make install" installs some of the Swish-e documentation as man pages. The following man pages are installed:

    SWISH-FAQ(1)
    SWISH-CONFIG(1)
    SWISH-RUN(1)
    SWISH-LIBRARY(1)

The man pages are installed, by default, in the system man directory. This directory is determined when configure is run; it can be set by passing a directory name to configure.

For example,

    ./configure --mandir=/usr/local/doc/man

The man directory is specified relative to the --prefix setting. If you use --prefix, you do not normally need to also specify --mandir.

Information on running configure can be found by typing:

    ./configure --help

Join the Swish-e discussion list

The final step, when installing Swish-e, is to join the Swish-e discussion list.

The Swish-e discussion list is the place to ask questions about installing and using Swish-e, see or post bug fixes or security announcements, and offer help to others. Please do not contact the developers directly.

The list is typically very low traffic, so it won't overload your inbox. Please take the time to subscribe. See http://Swish-e.org.

If you are using Swish-e on a public site, please let the list know, so that your URL can be added to the list of sites that use Swish-e!

Please review the next section before posting questions to the Swish-e list.

QUESTIONS AND TROUBLESHOOTING

Support for installation, configuration, and usage is available via the Swish-e discussion list. Visit http://swish-e.org for information. Do not contact developers directly for help -- always post your question to the list.

It's very important to provide the right information when asking for help.

Please search the Swish-e list archive before posting a question. Also, check the SWISH-FAQ to see if your question has already been asked and answered.

Before posting, use the available tools to narrow down the problem.

Swish-e has several switches (e.g., -T, -v, and -k) that may help you resolve issues. These switches are described on the SWISH-RUN page. For example, if you cannot find a document by a keyword that you believe should be indexed, try indexing just that single file and use the -T INDEXED_WORDS option to see if the word is actually being indexed. First, try it without any changes to default settings:

    swish-e -i testdoc.html -T indexed_words | less

if that works, add in your configuration file:

    swish-e -i testdoc.html -c swish.conf -T indexed_words | less

If it still isn't working as you expect, try to reduce the test document to a very small example. This will be very helpful to your readers, when you are asking for help.

Another useful trick is to use -H9 when searching, to display full headers in search results. Look at the "Parsed Words" header to see what words Swish-e is searching for.

When posting, please provide the following information:

Use these guidelines when asking for help. The most important tip is to provide the least amount of information that can be used to reproduce your problem. Do not paraphrase output -- copy-and-paste -- but trim text that is not necessary.

  • The exact version of Swish-e that you are using. Running Swish-e with the -V switch will print the version number. Also, supply the output from uname -a or similar command that identifies the operating system you are running on. If you are running an old version of swish, be prepared for a response of "upgrade" to your question.

  • A summary of the problem. This should include the commands issued (e.g. for indexing or searching) and their output, along with an explanation of why you don't think it's working correctly. Please copy-and-paste the exact commands and their output, instead of retyping, to avoid errors.

  • Include a copy of the configuration file you are using, if any. Swish-e has reasonable defaults, so in many cases you can run it without using a configuration file. But, if you need to use a configuration file, reduce it down to the absolute minimum number of commands that is required to demonstrate your problem. Again, copy-and-paste.

  • A small copy of a source document that demonstrates the problem.

    If you are having problems spidering a web server, use lwp-download or wget to copy the file locally, then make sure you can index the document using the file system method. This will help you determine if the problem is with spidering or indexing.

    If you expect help with spidering, don't post fake URLs, as it makes it impossible to test. If you don't want to expose your web page to the people on the Swish-e list, find some other site to test spidering on. If that works, but you still cannot spider your own site, you may need to request help from others. If so, you must post your real URL or make a test document available via some other source.

  • If you are having trouble building Swish-e, please copy-and-paste the output from make (or from ./configure, if that's where the problem is).

The key is to provide enough information so that others may reproduce the problem.

ADDITIONAL INSTALLATION OPTIONS

These steps are not required for normal use of Swish-e.

The SWISH::API Perl Module

The Swish-e distribution includes a module that provides a Perl interface to the Swish-e C library. This module provides a way to search a Swish-e index without running the Swish-e program. Searching an index will be many times faster when running under a persistent environment such as Apache/mod_perl with the SWISH::API module.

See the perl/README file for information on installing and using the SWISH::API Perl module.

GENERAL CONFIGURATION AND USAGE

This section should give you a basic overview of indexing and searching with Swish-e. Other examples can be found in the conf directory; these will step you through a number of different configurations. Also, please review the SWISH-FAQ.

Swish-e is a command-line program. The program is controlled by passing switches on the command line. A configuration file may be used, but often is not required. Swish-e does not include a graphical user interface. Example CGI scripts are provided in the distribution, but they require additional setup to use.

Introduction to Indexing and Searching

Swish-e can index files that are located on the local file system. For example, running:

     swish-e -i /var/www/htdocs

will index all files in the /var/www/htdocs directory. You may specify one or more files or directories with the -i option. By default, this will create an index called index.swish-e in the current directory.

To search the resulting index for a given word, try:

     swish-e -w apache

This will find the word "apache" in the body or title of the indexed documents.

As mentioned above, Swish-e will index all files in a directory, unless instructed otherwise. So, if /var/www/htdocs contains non-HTML files, you will need a configuration file to limit the files that Swish-e indexes. Create a file called swish.conf:

    # Example configuration file

    # Tell Swish-e what to index (same as -i switch above)
    IndexDir /var/www/htdocs

    # Only index HTML and text files
    IndexOnly .htm .html .txt

    # Tell Swish-e that .txt files are to use the text parser.
    IndexContents TXT* .txt

    # Otherwise, use the HTML parser
    DefaultContents HTML*

    # Ask libxml2 to report any parsing errors and warnings or 
    # any UTF-8 to 8859-1 conversion errors
    ParserWarnLevel 9

After saving the configuration file, reindex:

    swish-e -c swish.conf

The Swish-e configuration settings are described in the SWISH-CONFIG manual page. The order of statements in the configuration file is typically not important, although some statements depend on previously set statements. There are many possible settings. Good advice is to use as few settings as possible when first starting out with Swish-e.

The runtime options (switches) are described in the SWISH-RUN manual page. You may also see a summary of options by running:

    swish-e -h

Swish-e has two other methods for reading input files. One method uses a Perl helper script and the LWP Perl library to spider remote web sites:

    swish-e -S http -i http://localhost/index.html -v2

This will spider the web server running on the local host. The -S option defines the input source method to be "http", -i specifies the URL to spider, and -v sets the verbose level to two. There are a number of configuration options that are specific to the -S http input source. See SWISH-CONFIG. Note that only files of Content-Type text/* will be indexed.

The -S http method is deprecated, however, in favor of a variation on the following input method.

There is a general-purpose input method wherein Swish-e reads input from a program that produces documents in a special format. The program might read and format data stored in a database, or parse and format messages in a mailing list archive, or run a program that spiders web sites (like the previous method).

The Swish-e distribution includes a spider program that uses this method of input. This spider program is much more configurable and feature-rich than the previous (-S http) method.

To duplicate the previous example, create a configuration file called swish2.conf:

    # Example for spidering
    # Use the "spider.pl" program included with Swish-e
    IndexDir spider.pl

    # Define what site to index
    SwishProgParameters default http://localhost/index.html

Then, create the index using the command:

    swish-e -S prog -c swish2.conf

This says to use the -S prog input source method. Note that, in this case, the IndexDir setting does not specify a file or directory to index, but a program name to be run. This program, spider.pl, does the work of fetching the documents from the web server and passing them to Swish-e for indexing.

The SwishProgParameters option is a special feature that allows passing command-line parameters to the program specified with IndexDir. In this case, we are passing the word default (which tells spider.pl to use default settings) and the URL to spider.

Running a script under Windows requires specifying the interpreter (e.g., perl.exe) and then using SwishPropParameters to specify the script and the script's parameters. See Notes when using -S prog on MS Windows on the SWISH-RUN page.

The advantage of the -S prog method of spidering (over the previous -S http method) is that the Perl code is only compiled once instead of once for every document fetched from the web server. In addition, it is a much more advanced spider with many, many features. Still, as used here, spider.pl will automatically index PDF or MS Word documents if (when) Xpdf and Catdoc are installed.

A special form of the -S prog input source method is:

    ./myprog --option | swish-e -S prog -i stdin -c config

This allows running Swish-e from a program (instead of running the external program from Swish-e). So, this also can be done as:

    ./myprog --option > outfile
    swish-e -S prog -i stdin -c config < outfile

or

    ./myprog --option > outfile
    cat outfile | swish-e -S prog -i stdin -c config

One final note about the -S prog input source method. The program specified with -i or IndexDir needs to be an absolute path. The exception is when the program is installed in the libexecdir directory. Then, a plain program name may be specified (as in the example showing spider.pl, above).

All three input source methods are described in more detail on the SWISH-RUN page.

Metanames and Properties

There are two key Swish-e concepts that you need to be familiar with: Metanames and Properties.

  • Metanames

    Swish-e creates a reverse (i.e., inverted) index. Just like an index in a book, you look up a word and it lists the pages (or documents) where that word can be found.

    Swish-e can create multiple index tables within the same index file. For example, you might want to create an index that only contains words in HTML titles, so that searches can be limited to title text. Or, you might have descriptive words that you would like to search, stored in a meta tag called "keywords".

    Some database systems might call these different "fields" or "columns", but Swish-e calls them MetaNames (as a result of its first indexing HTML "meta" tags).

    To find documents containing "foo" in their titles, you might run:

        swish-e -w swishtitle=foo

    or, a more advanced example:

        swish-e -w swishtitle=(foo or bar) or swishdefault=(baz)

    The Metaname "swishdefault" is the name that is used by Swish-e if no other name is specified. The following two searches are thus equivalent:

        swish-e -w foo
        swish-e -w swishdefault=foo

    When indexing HTML documents, Swish-e indexes words in the body and title under the Metaname "swishdefault".

  • Properties

    Swish-e's search result is a list of files -- actually, Swish-e uses file numbers internally. Data can be associated with each file number when indexing. For example, by default Swish-e associates the file's name, title, last modified date, and size with the file number. These items can be printed in search results.

    In Swish-e, this associated data is called a file's Properties. Properties can be any data you wish to associated with a document -- in fact, the entire text of the document can be stored in the index. What data is stored as a Property is controlled by the PropertyNames (and other) configuration directives.

    What properties are printed with search results depends on the -x or -p switches. By default, Swish-e returns the rank, path/URL, title, and file size in bytes for each result.

Getting Started With Swish-e

Swish-e reads a configuration file (see SWISH-CONFIG) for directives that control whether and how Swish-e indexes files. Swish-e is also controlled by command-line arguments (see SWISH-RUN). Many of the command-line arguments have equivalent configuration directives (e.g., -i and IndexDir).

Swish-e does not require a configuration file, but most people change its default behavior by placing settings in a configuration file.

To try the examples below, go to the tests subdirectory of the distribution. The tests will use the *.html files in this directory when creating the test index. You may wish to review these *.html files to get an idea of the various native file formats that Swish-e supports.

You may also use your own test documents. It's recommended to use small test documents when first using Swish-e.

Step 1: Create a Configuration File

The configuration file controls what and how Swish-e indexes. The configuration file consists of directives, comments, and blank lines. The configuration file can be any name you like.

This example will work with the documents in the tests directory. You may wish to review the tests/test.config configuration file used for the make test tests.

For example, a simple configuration file (swish-e.conf):

    # Example Swish-e Configuration file

    # Define *what* to index
    # IndexDir can point to a directories and/or a files
    # Here it's pointing to the current directory
    # Swish-e will also recurse into sub-directories.
    IndexDir .

    # But only index the .html files
    IndexOnly .html

    # Show basic info while indexing
    IndexReport 1

And that's a simple configuration file. It says to index all the .html files in the current directory and sub-directories, if any, and provide some basic output while indexing.

As mentioned above, the complete list of all configuration file directives is detailed in SWISH-CONFIG.

Step 2: Index your Files

Run Swish-e, using the -c switch to specify the name of the configuration file.

    swish-e -c swish-e.conf

    Indexing Data Source: "File-System"
    Indexing "."
    Removing very common words...
    no words removed.
    Writing main index...
    Sorting words ...
    Sorting 55 words alphabetically
    Writing header ...
    Writing index entries ...
      Writing word text: Complete
      Writing word hash: Complete
      Writing word data: Complete
    55 unique words indexed.
    4 properties sorted.                                              
    5 files indexed.  1252 total bytes.  140 total words.
    Elapsed time: 00:00:00 CPU time: 00:00:00
    Indexing done!

This created the index file index.swish-e. This is the default index file name, unless the IndexFile directive is specified in the configuration file:

    IndexFile ./website.index

You may use the -f switch to specify a index file at indexing time. The -f option overrides any IndexFile setting that may be in the configuration file.

Step 3: Search

You specify your search terms with the -w switch. For example, to find the files that contain the word sample, you would issue the command:

    swish-e -w sample

This example assumes that you are in the tests directory. Swish-e returns the following, in response to this command:

    swish-e -w sample

    # SWISH format: 2.4.0
    # Search words: sample
    # Number of hits: 2
    # Search time: 0.000 seconds
    # Run time: 0.005 seconds
    1000 ./test_xml.html "If you are seeing this, the METATAG XML search was successful!" 159
    1000 ./test.html "If you are seeing this, the test was successful!" 437
    .

So, the word sample was found in two documents. The first number shown is the relevance (or rank) of the search term, followed by the file containing the search term, the title of the document, and finally, the length of the document (in bytes).

The period ("."), sitting alone at the end, marks the end of the search results.

Much more information may be retrieved while searching, by using the -x and -H switches (see SWISH-RUN) and by using Document Properties (see SWISH-CONFIG).

Phrase Searching

To search for a phrase in a document, use double-quotes to delimit your search terms. (The default phrase delimiter is set in src/swish.h.)

You must protect the quotes from the shell.

For example, under Unix:

    swish-e -w '"this is a phrase" or (this and that)'
    swish-e -w 'meta1=("this is a phrase") or (this and that)'

Or under the Windows command.com shell.

    swish-e -w \"this is a phrase\" or (this and that)

The phrase delimiter can be set with the -P switch.

Boolean Searching

You can use the Boolean operators and, or, or not in searching. Without these Boolean operatots, Swish-e will assume you're anding the words together.

Here are some examples:

    swish-e -w 'apples oranges'
    swish-e -w 'apples and oranges'  ( Same thing )

    swish-e -w 'apples or oranges'

    swish-e -w 'apples or oranges not juice' -f myIndex 

retrieves first the files that contain both the words "apples" and "oranges"; then among those, selects the ones that do not contain the word "juice".

A few other examples to ponder:

    swish-e -w 'apples and oranges or pears'
    swish-e -w '(apples and oranges) or pears'  ( Same thing )
    swish-e -w 'apples and (oranges or pears)'  ( Not the same thing )

Swish processes the query left to right.

See SWISH-SEARCH for more information.

Context Searching

The -t option in the search command line allows you to search for words that exist only in specific HTML tags. This option takes a string of characters as its argument. Each character represents a different tag in which the word is searched; that is, you can use any combinations of the following characters:

    H search in all <HEAD> tags
    B search in the <BODY> tags
    t search in <TITLE> tags
    h is <H1> to <H6> (header) tags
    e is emphasized tags (this may be <B>, <I>, <EM>, or <STRONG>)
    c is HTML comment tags (<!-- ... -->)

For example:

    # Find only documents with the word "linux" in the <TITLE> tags.
    swish-e -w linux -t t

    # Find the word "apple" in titles or comments
    swish-e -w apple -t tc

META Tags

As mentioned above, Metanames are a way to define "fields" in your documents. You can use the Metanames in your queries to limit the search to just the words contained in that META name of your document. For example, you might have a META-tagged field called subjects in your documents. This would let you search your documents for the word "foo", but only return documents where "foo" is within the subjects META tag.

Document Properties are somewhat related: Properties allow the content of a META tag in a source document to be stored within the index, and that text to be returned along with search results.

META tags can have two formats in your documents.

    <META NAME="keyName" CONTENT="some Content">

And in XML format

    <keyName>
        Some Content
    </keyName>

If using libxml, you can optionally use a non-HTML tag as a metaname:

    <html>
        <body>
            Hello swish users!
            <keyName>
                this is meta data
            </keyName>.
        </body>

This, of course, is invalid HTML.

To continue with our sample Swish-e.conf file, add the following lines:

    # Define META tags
    MetaNames meta1 meta2 meta3

Reindex to include the changes:

    swish-e -c swish-e.conf

Now search, but this time limit your search to META tag meta1:

    swish-e -w 'meta1=metatest1'

Again, please see SWISH-RUN and SWISH-CONFIG for complete documentation of the various indexing and searching options.

Spidering and Searching with a Web form.

This example demonstrates how to spider a web site and set up the included CGI script to provide a web-based search page. This example uses Perl programs that are included in the Swish-e distribution: spider.pl will be used for reading files from the web server; swish.cgi will provide the web search form and display results.

As an example, we will index the Apache Web Server documentation, installed on the local computer at http://localhost/apache_docs/index.html.

  1. 1 Make a Working Directory

    Create a directory to store the Swish-e configuration and the Swish-e index.

        ~$ mkdir web_index
        ~$ cd web_index/
        ~/web_index$
  2. 2 Create a Swish-e Configuration file
        ~/web_index$ cat swish.conf 
        # Swish-e config to index the Apache documentation
        #
        # Use spider.pl for indexing (location of spider.pl set at installation time)
        IndexDir spider.pl
    
        # Use spider.pl's default configuration and specify the URL to spider
        SwishProgParameters default http://localhost/apache_docs/index.html
    
        # Allow extra searching by title, path
        Metanames swishtitle swishdocpath
    
        # Set StoreDescription for each parser
        #  to display context with search results
        StoreDescription TXT* 10000
        StoreDescription HTML* <body> 10000
  3. 3 Generate the Index

    Now, run Swish-e to create the index:

        ~/web_index$ swish-e -S prog -c swish.conf 
    
        Indexing Data Source: "External-Program"
        Indexing "spider.pl"
        /usr/local/lib/swish-e/spider.pl: Reading parameters from 'default'
    
        Summary for: http://localhost/apache_docs/index.html
            Duplicates:     4,188  (349.0/sec)
        Off-site links:       276  (23.0/sec)
               Skipped:         1  (0.1/sec)
           Total Bytes: 2,090,125  (174177.1/sec)
            Total Docs:       147  (12.2/sec)
           Unique URLs:       149  (12.4/sec)
        Removing very common words...
        no words removed.
        Writing main index...
        Sorting words ...
        Sorting 7736 words alphabetically
        Writing header ...
        Writing index entries ...
          Writing word text: Complete
          Writing word hash: Complete
          Writing word data: Complete
        7736 unique words indexed.
        5 properties sorted.                                              
        147 files indexed.  2090125 total bytes.  200783 total words.
        Elapsed time: 00:00:13 CPU time: 00:00:02
        Indexing done!

    The above output is actually a mix of output from both Swish-e and spider.pl. spider.pl reports the "Summary for: http://localhost/apache_docs/index.html".

    Also note that Swish-e knows to find spider.pl at /usr/local/lib/swish-e/spider.pl. The script installation directory (called libexecdir) is set at configure time. You can see your setting by running swish-e -h:

        ~/web_index$ swish-e -h | grep libexecdir
         Scripts and Modules at: (libexecdir) = /usr/local/lib/swish-e

    This directory will be needed in the next step, when setting up the CGI script.

    Finally, verify that the index can be searched from the command line:

        ~/web_index$ swish-e -w installing -m3
        # SWISH format: 2.4.0
        # Search words: installing
        # Removed stopwords: 
        # Number of hits: 17
        # Search time: 0.018 seconds
        # Run time: 0.050 seconds
        1000 http://localhost/apache_docs/install.html "Compiling and Installing Apache" 17960
        718 http://localhost/apache_docs/install-tpf.html "Installing Apache on TPF" 25734
        680 http://localhost/apache_docs/windows.html "Using Apache with Microsoft Windows" 27165
        .

    Now, try limiting the search to the title:

        ~/web_index$ swish-e -w swishtitle=installing -m3 
        # SWISH format: 2.3.5
        # Search words: swishtitle=installing
        # Removed stopwords: 
        # Number of hits: 2
        # Search time: 0.018 seconds
        # Run time: 0.048 seconds
        1000 http://localhost/apache_docs/install-tpf.html "Installing Apache on TPF" 25734
        1000 http://localhost/apache_docs/install.html "Compiling and Installing Apache" 17960
        .

    Note that the above can also be done using the -t option:

        ~/web_index$ swish-e -w installing -m3 -tH
  4. 4 Set up the CGI script

    Swish-e does not include a web server. So, you must use your locally installed web server. Apache is highly recommended, of course.

    Locate your web server's CGI directory. This may be a cgi-bin directory in your home directory or a central cgi-bin directory set up by the web server administrator. Once this is located, copy the swish.cgi script into the cgi-bin directory.

    Where CGI scripts can be located depends completely on the web server that is being used and how it has been configured. See your web server's documentation or your site's administrator for additional information.

    This example will use a site cgi-bin directory, located at /usr/lib/cgi-bin. Copy the swish.cgi script into the cgi-bin directory. Again, we will need the location of the libexecdir directory:

        ~/web_index$ swish-e -h | grep libexecdir
         Scripts and Modules at: (libexecdir) = /usr/local/lib/swish-e
    
        ~/web_index$ cd /usr/lib/cgi-bin
        /usr/lib/cgi-bin$ su
        Password: 
        /usr/lib/cgi-bin# cp /usr/local/lib/swish-e/swish.cgi.

    If your operating system supports symbolic links and your web server allows programs to be symbolic links, then you may wish to create a link to the swish.cgi program, instead.

        /usr/lib/cgi-bin# ln -s /usr/local/lib/swish-e/swish.cgi

    We need to tell the swish.cgi script where to look for the index created in the previous step. It's also recommended to enter the path to the swish-e binary. Otherwise, the swish.cgi script will look for the binary in the PATH, and that may change when running under the CGI environment.

    Here's the configuration file:

        /usr/lib/cgi-bin# cat .swishcgi.conf 
        return {
            title        => 'Search Apache Documentation',
            swish_binary => '/usr/local/bin/swish-e',
            swish_index  => '/home/moseley/web_index/index.swish-e',
        }

    Now, test the script from the command line (as a normal user!):

        /usr/lib/cgi-bin# exit
        exit
    
        /usr/lib/cgi-bin$  ./swish.cgi | head
        Content-Type: text/html; charset=ISO-8859-1
    
        <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
        <html>
            <head>
               <title>
                  Search Apache Documentation
               </title>
            </head>
            <body>

    Notice that the CGI script returns the HTTP header (Content-Type) and the body of the web page, just like a well behaved CGI scrip should do.

    Now, test using the web server (this step depends on the location of your cgi-bin directory). This example uses the "GET" command that is part of the LWP Perl library, but any web browser can run this test.

        /usr/lib/cgi-bin$ GET http://localhost/cgi-bin/swish.cgi | head
        <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Tranitional//EN">
        <html>
            <head>
               <title>
                  Search Apache Documentation
               </title>
            </head>
            <body>
                <h2>

    The script reports errors to stderr, so consult the web server's error log if problems occur. The message "Service currently unavailable", reported by running swish.cgi, typically indicates a configuration error; the exact problem will be listed in the web server's error log.

    Detailed instructions on using the swish.cgi script and debugging tips can be found by running:

        $ perldoc swish.cgi

    while in the cgi-bin directory where swish.cgi was copied.

    The spider program spider.pl also has a large number of configuration options.

    Documentation is also available in the directory $prefix/share/doc/swish-e or at http://swish-e.org.

    Note: Also check out the search.cgi script, found at the same location as the swish.cgi script. This is more of a skeleton script, for those that want to create a custom search script.

Now you are ready to search.

Indexing Other Types of Documents - Filtering

Swish-e can only index HTML, XML, and text documents. In order to index other documents, such as PDF or MS Word documents, you must use a utility to convert or "filter" those documents.

How documents are filtered with Swish-e has changed over time. This has resulting in a bit of confusion. It's also a somewhat complex process, as different programs need to communicate with each other.

You may wish to read the Swish-e FAQ question on filtering, before continuing here. How do I filter documents?

Filtering Overview

There are two ways to filter documents with Swish-e. Both are described in the SWISH-CONFIG man page. They use the FileFilter directive and the SWISH::Filter Perl module.

The FileFilter directive is a general-purpose method of filtering. It allows running of an external program for each document processed (based on file extension), and requires one or more external programs. These programs open an input file, convert as needed, and write their output to standard output.

Previous versions of Swish-e (before 2.4.0) used a collection of filter programs for converting files such as PDF or MS Word documents. The external programs call other program to do the work of filtering (e.g. pdftotext to extract the contents from PDF files). Although these filter programs are still included with the Swish-e distribution as examples, it is recommended to use the SWISH::Filter method, instead.

One disadvantage of using FileFilter is that the filter program is run once for every document that needs to be filtered. This can slow down the indexing process substantially.

The SWISH::Filter Perl module works very much like the old system and uses the same helper programs. Convieniently, however, it provides a single interface for filtering all types of documents. The primary advantage of SWISH::Filter is that it is built into the program used for spidering web sites (spider.pl), so all that's required is installing the filter programs that do the actual work of filtering (e.g. catdoc, xpdf). (The Windows binary includes some of the filter programs.)

But, Swish-e will not use SWISH::Filter by default when using the file system method of indexing. To use SWISH::Filter when indexing by file system method (-S fs), you can use a FileFilter directive with the swish_filter.pl filter (which is just a program that uses SWISH::Filter) or use the -S prog method of indexing and use the DirTree.pl program for fetching documents.

DirTree.pl is included with the Swish-e distribution and is designed to work with SWISH::Filter. Using DirTree.pl will likely be a faster way to index, since the SWISH::Filter set of modules does not need to be compiled for every document that needs to be filtered.

See the contents of swish_filter.pl and DirTree.pl for specifics on their use.

Filtering Examples

The FileFilter directive can be used in your config file to convert documents, based on their extensions. This is the old way of filtering, but provides an easy way to add filters to Swish-e.

For example:

    FileFilter .pdf  pdftotext   "'%p' -"
    IndexContents TXT* .pdf

will cause all .pdf files to be filtered through the pdftotext program (part of the Xpdf package) and to parse the resulting output (from pdftotext) with the text ("TXT") parser.

The other way to filter documents is to use a -S prog prograam and convert the documents before passing them onto Swish-e.

For example, spider.pl makes use of the SWISH::Filter" Perl module, included with the Swish-e distribution. SWISH::Filter is passed a document and the document's content type; it looks for modules and utilities to convert the document into one of the types that Swish-e can index.

Swish-e comes ready to index PDF, MS Word, MP3 ID3 tags, and MS Excel file types. But these filters need extra modules or tools to do the actual conversion.

For example, the Swish-e distribution includes a module called SWISH::Filter::Pdf2HTML that uses the pdftotext and pdfinfo utilities provided by the Xpdf package.

This means that if you are using spider.pl to spider your web site and you wish to index PDF documents, all that is needed is to install the Xpdf package and Swish-e (with the help of spider.pl) will begin indexing your PDF files.

Ok, so what does all that mean? For a very simple site, you should be able to run this:

    $ /usr/local/lib/swish-e/spider.pl default http://localhost/ | swish-e -S prog -i stdin

which is running the spider with default spider settings, indexing the Web server on localhost, and piping its output into Swish-e (using the default indexing settings). Documents will be filtered automatically, if you have the required helper applications installed.

Most people will not want to just use the default settings (for one thing, the spider will take a while because its default is to delay a few seconds between every request). So, read the documentation for spider.pl, to learn how to use a spider config file. Also read SWISH-CONFIG to learn about what configuration options can be used with Swish-e.

The SWISH::Filter documentation provides more details on filtering and hints for debugging problems when filtering.

Document Info

$Id: INSTALL.pod 1978 2007-12-08 01:59:17Z karpet $

.

swish-e-2.4.7/html/spider.html0000644000077100017500000017160411166010466013160 00000000000000 Swish-e :: spider.pl - Example Perl program to spider web servers
home | support | download

spider.pl - Example Perl program to spider web servers

Swish-e version 2.4.7

Table of Contents


SYNOPSIS

    spider.pl [<spider config file>] [<URL> ...]

    # Spider using some common defaults and capture the output
    # into a file

    ./spider.pl default http://myserver.com/ > output.txt

    # or using a config file

    spider.config:
    @servers = (
        {
            base_url    => 'http://myserver.com/',
            email       => 'me@myself.com',
            # other spider settings described below
        },
    );

    ./spider.pl spider.config > output.txt

    # or using the default config file SwishSpiderConfig.pl
    ./spider.pl > output.txt

    # using with swish-e

    ./spider.pl spider.config | swish-e -c swish.config -S prog -i stdin

    # or in two steps
    ./spider.pl spider.config > output.txt
    swish-e -c swish.config -S prog -i stdin < output.txt

    # or with compression
    ./spider.pl spider.config | gzip > output.gz
    gzip -dc output.gz | swish-e -c swish.config -S prog -i stdin

    # or having swish-e call the spider directly using the
    # spider config file SwishSpiderConfig.pl:
    swish-e -c swish.config -S prog -i spider.pl

    # or the above but passing passing a parameter to the spider:
    echo "SwishProgParameters  spider.config" >> swish.config
    echo "IndexDir spider.pl" >> swish.config
    swish-e -c swish.config -S prog

    Note: When running on some versions of Windows (e.g. Win ME and Win 98 SE)
    you may need to tell Perl to run the spider directly:

        perl spider.pl | swish-e -S prog -c swish.conf -i stdin

    This pipes the output of the spider directly into swish.

DESCRIPTION

spider.pl is a program for fetching documnts from a web server, and outputs the documents to STDOUT in a special format designed to be read by Swish-e.

The spider can index non-text documents such as PDF and MS Word by use of filter (helper) programs. These programs are not part of the Swish-e distribution and must be installed separately. See the section on filtering below.

A configuration file is noramlly used to control what documents are fetched from the web server(s). The configuration file and its options are described below. The is also a "default" config suitable for spidering.

The spider is designed to spider web pages and fetch documents from one host at a time -- offsite links are not followed. But, you can configure the spider to spider multiple sites in a single run.

spider.pl is distributed with Swish-e and is installed in the swish-e library directory at installation time. This directory (libexedir) can be seen by running the command:

    swish-e -h

Typically on unix-type systems the spider is installed at:

    /usr/local/lib/swish-e/spider.pl

This spider stores all links in memory while processing and does not do parallel requests.

Running the spider

The output from spider.pl can be captured to a temporary file which is then fed into swish-e:

    ./spider.pl > docs.txt
    swish-e -c config -S prog -i stdin < docs.txt

or the output can be passed to swish-e via a pipe:

   ./spider.pl | swish-e -c config -S prog -i stdin

or the swish-e can run the spider directly:

   swish-e -c config -S prog -i spider.pl

One advantage of having Swish-e run spider.pl is that Swish-e knows where to locate the program (based on libexecdir compiled into swish-e).

When running the spider without any parameters it looks for a configuration file called SwishSpiderConfig.pl in the current directory. The spider will abort with an error if this file is not found.

A configuration file can be specified as the first parameter to the spider:

    ./spider.pl spider.config > output.txt

If running the spider via Swish-e (i.e. Swish-e runs the spider) then use the Swish-e config option SwishProgParameters to specify the config file:

In swish.config:

    # Use spider.pl as the external program:
    IndexDir spider.pl
    # And pass the name of the spider config file to the spider:
    SwishProgParameters spider.config

And then run Swish-e like this:

    swish-e -c swish.config -S prog

Finally, by using the special word "default" on the command line the spider will use a default configuration that is useful for indexing most sites. It's a good way to get started with the spider:

    ./spider.pl default http://my_server.com/index.html > output.txt

There's no "best" way to run the spider. I like to capture to a file and then feed that into Swish-e.

The spider does require Perl's LWP library and a few other reasonably common modules. Most well maintained systems should have these modules installed. See /"REQUIREMENTS" below for more information. It's a good idea to check that you are running a current version of these modules.

Note: the "prog" document source in Swish-e bypasses many Swish-e configuration settings. For example, you cannot use the IndexOnly directive with the "prog" document source. This is by design to limit the overhead when using an external program for providing documents to swish; after all, with "prog", if you don't want to index a file, then don't give it to swish to index in the first place.

So, for spidering, if you do not wish to index images, for example, you will need to either filter by the URL or by the content-type returned from the web server. See /"CALLBACK FUNCTIONS" below for more information.

Robots Exclusion Rules and being nice

By default, this script will not spider files blocked by robots.txt. In addition, The script will check for <meta name="robots"..> tags, which allows finer control over what files are indexed and/or spidered. See http://www.robotstxt.org/wc/exclusion.html for details.

This spider provides an extension to the <meta> tag exclusion, by adding a NOCONTENTS attribute. This attribute turns on the no_contents setting, which asks swish-e to only index the document's title (or file name if not title is found).

For example:

      <META NAME="ROBOTS" CONTENT="NOCONTENTS, NOFOLLOW">

says to just index the document's title, but don't index its contents, and don't follow any links within the document. Granted, it's unlikely that this feature will ever be used...

If you are indexing your own site, and know what you are doing, you can disable robot exclusion by the ignore_robots_file configuration parameter, described below. This disables both robots.txt and the meta tag parsing. You may disable just the meta tag parsing by using ignore_robots_headers.

This script only spiders one file at a time, so load on the web server is not that great. And with libwww-perl-5.53_91 HTTP/1.1 keep alive requests can reduce the load on the server even more (and potentially reduce spidering time considerably).

Still, discuss spidering with a site's administrator before beginning. Use the delay_sec to adjust how fast the spider fetches documents. Consider running a second web server with a limited number of children if you really want to fine tune the resources used by spidering.

Duplicate Documents

The spider program keeps track of URLs visited, so a document is only indexed one time.

The Digest::MD5 module can be used to create a "fingerprint" of every page indexed and this fingerprint is used in a hash to find duplicate pages. For example, MD5 will prevent indexing these as two different documents:

    http://localhost/path/to/some/index.html
    http://localhost/path/to/some/

But note that this may have side effects you don't want. If you want this file indexed under this URL:

    http://localhost/important.html

But the spider happens to find the exact content in this file first:

    http://localhost/developement/test/todo/maybeimportant.html

Then only that URL will be indexed.

Broken relative links

Sometimes web page authors use too many /../ segments in relative URLs which reference documents above the document root. Some web servers such as Apache will return a 400 Bad Request when requesting a document above the root. Other web servers such as Micorsoft IIS/5.0 will try and "correct" these errors. This correction will lead to loops when spidering.

The spider can fix these above-root links by placing the following in your spider config:

    remove_leading_dots => 1,

It is not on by default so that the spider can report the broken links (as 400 errors on sane webservers).

Compression

If The Perl module Compress::Zlib is installed the spider will send the

   Accept-Encoding: gzip x-gzip

header and uncompress the document if the server returns the header

   Content-Encoding: gzip
   Content-Encoding: x-gzip

If The Perl distribution IO-Compress-Zlib is installed the spider will use this module to uncompress "gzip" (x-gzip) and also "deflate" compressed documents.

The "compress" method is not supported.

See RFC 2616 section 3.5 for more information.

MD5 checksomes are done on the compressed data.

MD5 may slow down indexing a tiny bit, so test with and without if speed is an issue (which it probably isn't since you are spidering in the first place). This feature will also use more memory.

REQUIREMENTS

Perl 5 (hopefully at least 5.00503) or later.

You must have the LWP Bundle on your computer. Load the LWP::Bundle via the CPAN.pm shell, or download libwww-perl-x.xx from CPAN (or via ActiveState's ppm utility). Also required is the the HTML-Parser-x.xx bundle of modules also from CPAN (and from ActiveState for Windows).

    http://search.cpan.org/search?dist=libwww-perl
    http://search.cpan.org/search?dist=HTML-Parser

You will also need Digest::MD5 if you wish to use the MD5 feature. HTML::Tagset is also required. Other modules may be required (for example, the pod2xml.pm module has its own requirementes -- see perldoc pod2xml for info).

The spider.pl script, like everyone else, expects perl to live in /usr/local/bin. If this is not the case then either add a symlink at /usr/local/bin/perl to point to where perl is installed or modify the shebang (#!) line at the top of the spider.pl program.

Note that the libwww-perl package does not support SSL (Secure Sockets Layer) (https) by default. See README.SSL included in the libwww-perl package for information on installing SSL support.

CONFIGURATION FILE

The spider configuration file is a read by the script as Perl code. This makes the configuration a bit more complex than simple text config files, but allows the spider to be configured programmatically.

For example, the config file can contain logic for testing URLs against regular expressions or even against a database lookup while running.

The configuration file sets an array called @servers. This array can contain one or more hash structures of parameters. Each hash structure is a configuration for a single server.

Here's an example:

    my %main_site = (
        base_url   => 'http://example.com',
        same_hosts => 'www.example.com',
        email      => 'admin@example.com',
    );

    my %news_site = (
        base_url   => 'http://news.example.com',
        email      => 'admin@example.com',
    );

    @servers = ( \%main_site, \%news_site );
    1;

The above defines two Perl hashes (%main_site and %news_site) and then places a *reference* (the backslash before the name of the hash) to each of those hashes in the @servers array. The "1;" at the end is required at the end of the file (Perl must see a true value at the end of the file).

The config file path is the first parameter passed to the spider script.

    ./spider.pl F<config>

If you do not specify a config file then the spider will look for the file SwishSpiderConfig.pl in the current directory.

The Swish-e distribution includes a SwishSpiderConfig.pl file with a few example configurations. This example file is installed in the prog-bin/ documentation directory (on unix often this is /usr/local/share/swish-e/prog-bin).

When the special config file name "default" is used:

    SwishProgParameters default http://www.mysite/index.html [<URL>] [...]

Then a default set of parameters are used with the spider. This is a good way to start using the spider before attempting to create a configuration file.

The default settings skip any urls that look like images (well, .gif .jpeg .png), and attempts to filter PDF and MS Word documents IF you have the required filter programs installed (which are not part of the Swish-e distribution). The spider will follow "a" and "frame" type of links only.

Note that if you do use a spider configuration file that the default configuration will NOT be used (unless you set the "use_default_config" option in your config file).

CONFIGURATION OPTIONS

This describes the required and optional keys in the server configuration hash, in random order...

  • base_url

    This required setting is the starting URL for spidering.

    This sets the first URL the spider will fetch. It does NOT limit spidering to URLs at or below the level of the directory specified in this setting. For that feature you need to use the test_url callback function.

    Typically, you will just list one URL for the base_url. You may specify more than one URL as a reference to a list and each will be spidered:

        base_url => [qw! http://swish-e.org/ http://othersite.org/other/index.html !],

    but each site will use the same config opions. If you want to index two separate sites you will likely rather add an additional configuration to the @servers array.

    You may specify a username and password:

        base_url => 'http://user:pass@swish-e.org/index.html',

    If a URL is protected by Basic Authentication you will be prompted for a username and password. The parameter max_wait_time controls how long to wait for user entry before skipping the current URL. See also credentials below.

  • same_hosts

    This optional key sets equivalent authority name(s) for the site you are spidering. For example, if your site is www.mysite.edu but also can be reached by mysite.edu (with or without www) and also web.mysite.edu then:

    Example:

        $serverA{base_url} = 'http://www.mysite.edu/index.html';
        $serverA{same_hosts} = ['mysite.edu', 'web.mysite.edu'];

    Now, if a link is found while spidering of:

        http://web.mysite.edu/path/to/file.html

    it will be considered on the same site, and will actually spidered and indexed as:

        http://www.mysite.edu/path/to/file.html

    Note: This should probably be called same_host_port because it compares the URI host:port against the list of host names in same_hosts. So, if you specify a port name in you will want to specify the port name in the the list of hosts in same_hosts:

        my %serverA = (
            base_url    => 'http://mytest.site.invalid:4444/',
            same_hosts  => [ qw/www.mytest.site.invalid:4444/ ],
            email       => 'my@email.address',
        );
  • email

    This required key sets the email address for the spider. Set this to your email address.

  • agent

    This optional key sets the name of the spider.

  • link_tags

    This optional tag is a reference to an array of tags. Only links found in these tags will be extracted. The default is to only extract links from >a< tags.

    For example, to extract tags from a tags and from frame tags:

        my %serverA = (
            base_url    => 'http://mytest.site.invalid:4444/',
            same_hosts  => [ qw/www.mytest.site.invalid:4444/ ],
            email       => 'my@email.address',
            link_tags   => [qw/ a frame /],
        );
  • use_default_config

    This option is new for Swish-e 2.4.3.

    The spider has a hard-coded default configuration that's available when the spider is run with the configuration file listed as "default":

        ./spider.pl default <url>

    This default configuration skips urls that match the regular expression:

        /\.(?:gif|jpeg|png)$/i

    and the spider will attempt to use the SWISH::Filter module for filtering non-text documents. (You still need to install programs to do the actual filtering, though).

    Here's the basic config for the "default" mode:

        @servers = (
        {
            email               => 'swish@user.failed.to.set.email.invalid',
            link_tags           => [qw/ a frame /],
            keep_alive          => 1,
            test_url            => sub {  $_[0]->path !~ /\.(?:gif|jpeg|png)$/i },
            test_response       => $response_sub,
            use_head_requests   => 1,  # Due to the response sub
            filter_content      => $filter_sub,
        } );

    The filter_content callback will be used if SWISH::Filter was loaded and ready to use. This doesn't mean that filtering will work automatically -- you will likely need to install aditional programs for filtering (like Xpdf or Catdoc).

    The test_response callback will be set to test if a given content type can be filtered by SWISH::Filter (if SWISH::Filter was loaded), otherwise, it will check for content-type of text/* -- any text type of document.

    Normally, if you specify your own config file:

        ./spider.pl my_own_spider.config

    then you must setup those features available in the default setting in your own config file. But, if you wish to build upon the "default" config file then set this option.

    For example, to use the default config but specify your own email address:

        @servers = (
            {
                email               => my@email.address,
                use_default_config  => 1,
                delay_sec           => 0,
            },
        );
        1;

    What this does is "merge" your config file with the default config file.

  • delay_sec

    This optional key sets the delay in seconds to wait between requests. See the LWP::RobotUA man page for more information. The default is 5 seconds. Set to zero for no delay.

    When using the keep_alive feature (recommended) the delay will be used only where the previous request returned a "Connection: closed" header.

  • delay_min (deprecated)

    Set the delay to wait between requests in minutes. If both delay_sec and delay_min are defined, delay_sec will be used.

  • max_wait_time

    This setting is the number of seconds to wait for data to be returned from the request. Data is returned in chunks to the spider, and the timer is reset each time a new chunk is reported. Therefore, documents (requests) that take longer than this setting should not be aborted as long as some data is received every max_wait_time seconds. The default it 30 seconds.

    NOTE: This option has no effect on Windows.

  • max_time

    This optional key will set the max minutes to spider. Spidering for this host will stop after max_time minutes, and move on to the next server, if any. The default is to not limit by time.

  • max_files

    This optional key sets the max number of files to spider before aborting. The default is to not limit by number of files. This is the number of requests made to the remote server, not the total number of files to index (see max_indexed). This count is displayted at the end of indexing as Unique URLs.

    This feature can (and perhaps should) be use when spidering a web site where dynamic content may generate unique URLs to prevent run-away spidering.

  • max_indexed

    This optional key sets the max number of files that will be indexed. The default is to not limit. This is the number of files sent to swish for indexing (and is reported by Total Docs when spidering ends).

  • max_size

    This optional key sets the max size of a file read from the web server. This defaults to 5,000,000 bytes. If the size is exceeded the resource is skipped and a message is written to STDERR if the DEBUG_SKIPPED debug flag is set.

    Set max_size to zero for unlimited size. If the server returns a Content-Length header then that will be used. Otherwise, the document will be checked for size limitation as it arrives. That's a good reason to have your server send Content-Length headers.

    See also use_head_requests below.

  • keep_alive

    This optional parameter will enable keep alive requests. This can dramatically speed up spidering and reduce the load on server being spidered. The default is to not use keep alives, although enabling it will probably be the right thing to do.

    To get the most out of keep alives, you may want to set up your web server to allow a lot of requests per single connection (i.e MaxKeepAliveRequests on Apache). Apache's default is 100, which should be good.

    When a connection is not closed the spider does not wait the "delay_sec" time when making the next request. In other words, there is no delay in requesting documents while the connection is open.

    Note: try to filter as many documents as possible before making the request to the server. In other words, use test_url to look for files ending in .html instead of using test_response to look for a content type of text/html if possible. Do note that aborting a request from test_response will break the current keep alive connection.

    Note: you must have at least libwww-perl-5.53_90 installed to use this feature.

  • use_head_requests

    This option is new as of swish-e 2.4.3 and can effect the speed of spidering and the load of the web server.

    To understand this you will likely need to read about the /"CALLBACK FUNCTIONS" below -- specifically about the test_response callback function. This option is also only used when keep_alive is also enabled (although it could be debated that it's useful without keep alives).

    This option tells the spider to use http HEAD requests before each request.

    Normally, the spider simply does a GET request and after receiving the first chunk of data back from the web server calls the test_response callback function (if one is defined in your config file). The test_response callback function is a good place to test the content-type header returned from the server and reject types that you do not want to index.

    Now, *if* you are using the keep_alive feature then rejecting a document will often (always?) break the keep alive connection.

    So, what the use_head_requests option does is issue a HEAD request for every document, checks for a Content-Length header (to check if the document is larger than max_size, and then calls your test_response callback function. If your callback function returns true then a GET request is used to fetch the document.

    The idea is that by using HEAD requests instead of GET request a false return from your test_response callback function (i.e. rejecting the document) will not break the keep alive connection.

    Now, don't get too excited about this. Before using this think about the ratio of rejected documents to accepted documents. If you reject no documents then using this feature will double the number of requests to the web server -- which will also double the number of connections to the web server. But, if you reject a large percentage of documents then this feature will help maximize the number of keep alive requests to the server (i.e. reduce the number of separate connections needed).

    There's also another problem with using HEAD requests. Some broken servers may not respond correctly to HEAD requests (some issues a 500 error), but respond fine to a normal GET request. This is something to watch out for.

    Finally, if you do not have a test_response callback AND max_size is set to zero then setting use_head_requests will have no effect.

    And, with all other factors involved you might find this option has no effect at all.

  • skip

    This optional key can be used to skip the current server. It's only purpose is to make it easy to disable a specific server hash in a configuration file.

  • debug

    Set this item to a comma-separated list of debugging options.

    Options are currently:

        errors, failed, headers, info, links, redirect, skipped, url

    Here are basically the levels:

        errors      =>   general program errors (not used at this time)
        url         =>   print out every URL processes
        headers     =>   prints the response headers
        failed      =>   failed to return a 200
        skipped     =>   didn't index for some reason
        info        =>   a little more verbose
        links       =>   prints links as they are extracted
        redirect    =>   prints out redirected URLs

    Debugging can be also be set by an environment variable SPIDER_DEBUG when running spider.pl. You can specify any of the above debugging options, separated by a comma.

    For example with Bourne type shell:

        SPIDER_DEBUG=url,links spider.pl [....]

    Before Swish-e 2.4.3 you had to use the internal debugging constants or'ed together like so:

        debug => DEBUG_URL | DEBUG_FAILED | DEBUG_SKIPPED,

    You can still do this, but the string version is easier. In fact, if you want to turn on debugging dynamically (for example in a test_url() callback function) then you currently *must* use the DEBUG_* constants. The string is converted to a number only at the start of spiderig -- after that the debug parameter is converted to a number.

  • quiet

    If this is true then normal, non-error messages will be supressed. Quiet mode can also be set by setting the environment variable SPIDER_QUIET to any true value.

        SPIDER_QUIET=1
  • max_depth

    The max_depth parameter can be used to limit how deeply to recurse a web site. The depth is just a count of levels of web pages descended, and not related to the number of path elements in a URL.

    A max_depth of zero says to only spider the page listed as the base_url. A max_depth of one will spider the base_url page, plus all links on that page, and no more. The default is to spider all pages.

  • ignore_robots_file

    If this is set to true then the robots.txt file will not be checked when spidering this server. Don't use this option unless you know what you are doing.

  • use_cookies

    If this is set then a "cookie jar" will be maintained while spidering. Some (poorly written ;) sites require cookies to be enabled on clients.

    This requires the HTTP::Cookies module.

  • use_md5

    If this setting is true, then a MD5 digest "fingerprint" will be made from the content of every spidered document. This digest number will be used as a hash key to prevent indexing the same content more than once. This is helpful if different URLs generate the same content.

    Obvious example is these two documents will only be indexed one time:

        http://localhost/path/to/index.html
        http://localhost/path/to/

    This option requires the Digest::MD5 module. Spidering with this option might be a tiny bit slower.

  • validate_links

    Just a hack. If you set this true the spider will do HEAD requests all links (e.g. off-site links), just to make sure that all your links work.

  • credentials

    You may specify a username and password to be used automatically when spidering:

        credentials => 'username:password',

    A username and password supplied in a URL will override this setting. This username and password will be used for every request.

    See also the get_password callback function below. get_password, if defined, will be called when a page requires authorization.

  • credential_timeout

    Sets the number of seconds to wait for user input when prompted for a username or password. The default is 30 seconds.

    Set this to zero to wait forever. Probably not a good idea.

    Set to undef to disable asking for a password.

        credential_timeout => undef,
  • remove_leading_dots

    Removes leading dots from URLs that might reference documents above the document root. The default is to not remove the dots.

CALLBACK FUNCTIONS

Callback functions can be defined in your parameter hash. These optional settings are callback subroutines that are called while processing URLs.

A little perl discussion is in order:

In perl, a scalar variable can contain a reference to a subroutine. The config example above shows that the configuration parameters are stored in a perl hash.

    my %serverA = (
        base_url    => 'http://mytest.site.invalid:4444/',
        same_hosts  => [ qw/www.mytest.site.invalid:4444/ ],
        email       => 'my@email.address',
        link_tags   => [qw/ a frame /],
    );

There's two ways to add a reference to a subroutine to this hash:

sub foo { return 1; }

    my %serverA = (
        base_url    => 'http://mytest.site.invalid:4444/',
        same_hosts  => [ qw/www.mytest.site.invalid:4444/ ],
        email       => 'my@email.address',
        link_tags   => [qw/ a frame /],
        test_url    => \&foo,  # a reference to a named subroutine
    );

Or the subroutine can be coded right in place:

    my %serverA = (
        base_url    => 'http://mytest.site.invalid:4444/',
        same_hosts  => [ qw/www.mytest.site.invalid:4444/ ],
        email       => 'my@email.address',
        link_tags   => [qw/ a frame /],
        test_url    => sub { reutrn 1; },
    );

The above example is not very useful as it just creates a user callback function that always returns a true value (the number 1). But, it's just an example.

The function calls are wrapped in an eval, so calling die (or doing something that dies) will just cause that URL to be skipped. If you really want to stop processing you need to set $server->{abort} in your subroutine (or send a kill -HUP to the spider).

The first two parameters passed are a URI object (to have access to the current URL), and a reference to the current server hash. The server hash is just a global hash for holding data, and useful for setting flags as described below.

Other parameters may be also passed in depending the the callback function, as described below. In perl parameters are passed in an array called "@_". The first element (first parameter) of that array is $_[0], and the second is $_[1], and so on. Depending on how complicated your function is you may wish to shift your parameters off of the @_ list to make working with them easier. See the examples below.

To make use of these routines you need to understand when they are called, and what changes you can make in your routines. Each routine deals with a given step, and returning false from your routine will stop processing for the current URL.

  • test_url

    test_url allows you to skip processing of urls based on the url before the request to the server is made. This function is called for the base_url links (links you define in the spider configuration file) and for every link extracted from a fetched web page.

    This function is a good place to skip links that you are not interested in following. For example, if you know there's no point in requesting images then you can exclude them like:

        test_url => sub {
            my $uri = shift;
            return 0 if $uri->path =~ /\.(gif|jpeg|png)$/;
            return 1;
        },

    Or to write it another way:

        test_url => sub { $_[0]->path !~ /\.(gif|jpeg|png)$/ },

    Another feature would be if you were using a web server where path names are NOT case sensitive (e.g. Windows). You can normalize all links in this situation using something like

        test_url => sub {
            my $uri = shift;
            return 0 if $uri->path =~ /\.(gif|jpeg|png)$/;
    
            $uri->path( lc $uri->path ); # make all path names lowercase
            return 1;
        },

    The important thing about test_url (compared to the other callback functions) is that it is called while extracting links, not while actually fetching that page from the web server. Returning false from test_url simple says to not add the URL to the list of links to spider.

    You may set a flag in the server hash (second parameter) to tell the spider to abort processing.

        test_url => sub {
            my $server = $_[1];
            $server->{abort}++ if $_[0]->path =~ /foo\.html/;
            return 1;
        },

    You cannot use the server flags:

        no_contents
        no_index
        no_spider

    This is discussed below.

  • test_response

    This function allows you to filter based on the response from the remote server (such as by content-type).

    Web servers use a Content-Type: header to define the type of data returned from the server. On a web server you could have a .jpeg file be a web page -- file extensions may not always indicate the type of the file.

    If you enable use_head_requests then this function is called after the spider makes a HEAD request. Otherwise, this function is called while the web pages is being fetched from the remote server, typically after just enought data has been returned to read the response from the web server.

    The test_response callback function is called with the following parameters:

        ( $uri, $server, $response, $content_chunk )

    The $response variable is a HTTP::Response object and provies methods of examining the server's response. The $content_chunk is the first chunk of data returned from the server (if not a HEAD request).

    When not using use_head_requests the spider requests a document in "chunks" of 4096 bytes. 4096 is only a suggestion of how many bytes to return in each chunk. The test_response routine is called when the first chunk is received only. This allows ignoring (aborting) reading of a very large file, for example, without having to read the entire file. Although not much use, a reference to this chunk is passed as the forth parameter.

    If you are spidering a site with many different types of content that you do not wish to index (and cannot use a test_url callback to determine what docs to skip) then you will see better performance using both the use_head_requests and keep_alive features. (Aborting a GET request kills the keep-alive session.)

    For example, to only index true HTML (text/html) pages:

        test_response => sub {
            my $content_type = $_[2]->content_type;
            return $content_type =~ m!text/html!;
        },

    You can also set flags in the server hash (the second parameter) to control indexing:

        no_contents -- index only the title (or file name), and not the contents
        no_index    -- do not index this file, but continue to spider if HTML
        no_spider   -- index, but do not spider this file for links to follow
        abort       -- stop spidering any more files

    For example, to avoid index the contents of "private.html", yet still follow any links in that file:

        test_response => sub {
            my $server = $_[1];
            $server->{no_index}++ if $_[0]->path =~ /private\.html$/;
            return 1;
        },

    Note: Do not modify the URI object in this call back function.

  • filter_content

    This callback function is called right before sending the content to swish. Like the other callback function, returning false will cause the URL to be skipped. Setting the abort server flag and returning false will abort spidering.

    You can also set the no_contents flag.

    This callback function is passed four parameters. The URI object, server hash, the HTTP::Response object, and a reference to the content.

    You can modify the content as needed. For example you might not like upper case:

        filter_content => sub {
            my $content_ref = $_[3];
    
            $$content_ref = lc $$content_ref;
            return 1;
        },

    I more reasonable example would be converting PDF or MS Word documents for parsing by swish. Examples of this are provided in the prog-bin directory of the swish-e distribution.

    You may also modify the URI object to change the path name passed to swish for indexing.

        filter_content => sub {
            my $uri = $_[0];
            $uri->host('www.other.host') ;
            return 1;
        },

    Swish-e's ReplaceRules feature can also be used for modifying the path name indexed.

    Note: Swish-e now includes a method of filtering based on the SWISH::Filter Perl modules. See the SwishSpiderConfig.pl file for an example how to use SWISH::Filter in a filter_content callback function.

    If you use the "default" configuration (i.e. pass "default" as the first parameter to the spider) then SWISH::Filter is used automatically. This only adds code for calling the programs to filter your content -- you still need to install applications that do the hard work (like xpdf for pdf conversion and catdoc for MS Word conversion).

    The a function included in the spider.pl for calling SWISH::Filter when using the "default" config can also be used in your config file. There's a function called swish_filter() that returns a list of two subroutines. So in your config you could do:

        my ($filter_sub, $response_sub ) = swish_filter();
    
        @server = ( {
            test_response   => $response_sub,
            filter_content  => $filter_sub,
            [...],
        } );

    The $response_sub is not required, but is useful if using HEAD requests (use_head_requests): It tests the content type from the server to see if there's any filters that can handle the document. The $filter_sub does all the work of filtering a document.

    Make sense? If not, then that's what the Swish-e list is for.

  • spider_done

    This callback is called after processing a server (after each server listed in the @servers array if more than one).

    This allows your config file to do any cleanup work after processing. For example, if you were keeping counts during, say, a test_response() callback function you could use the spider_done() callback to print the results.

  • output_function

    If defined, this callback function is called instead of printing the content and header to STDOUT. This can be used if you want to store the output of the spider before indexing.

    The output_function is called with the following parameters:

       ($server, $content, $uri, $response, $bytecount, $path);

    Here is an example that simply shows two of the params passed:

        output_function => sub {
            my ($server, $content, $uri, $response, $bytecount, $path) = @_;
            print STDERR  "passed: uri $uri, bytecount $bytecount...\n";
            # no output to STDOUT for swish-e
        }

    You can do almost the same thing with a filter_content callback.

  • get_password

    This callback is called when a HTTP password is needed (i.e. after the server returns a 401 error). The function can test the URI and Realm and then return a username and password separated by a colon:

        get_password => sub {
            my ( $uri, $server, $response, $realm ) = @_;
            if ( $uri->path =~ m!^/path/to/protected! && $realm eq 'private' ) {
                return 'joe:secret931password';
            }
            return;  # sorry, I don't know the password.
        },

    Use the credentials setting if you know the username and password and they will be the same for every request. That is, for a site-wide password.

Note that you can create your own counters to display in the summary list when spidering is finished by adding a value to the hash pointed to by $server->{counts}.

    test_url => sub {
        my $server = $_[1];
        $server->{no_index}++ if $_[0]->path =~ /private\.html$/;
        $server->{counts}{'Private Files'}++;
        return 1;
    },

Each callback function must return true to continue processing the URL. Returning false will cause processing of the current URL to be skipped.

More on setting flags

Swish (not this spider) has a configuration directive NoContents that will instruct swish to index only the title (or file name), and not the contents. This is often used when indexing binary files such as image files, but can also be used with html files to index only the document titles.

As shown above, you can turn this feature on for specific documents by setting a flag in the server hash passed into the test_response or filter_content subroutines. For example, in your configuration file you might have the test_response callback set as:

    test_response => sub {
        my ( $uri, $server, $response ) = @_;
        # tell swish not to index the contents if this is of type image
        $server->{no_contents} = $response->content_type =~ m[^image/];
        return 1;  # ok to index and spider this document
    }

The entire contents of the resource is still read from the web server, and passed on to swish, but swish will also be passed a No-Contents header which tells swish to enable the NoContents feature for this document only.

Note: Swish will index the path name only when NoContents is set, unless the document's type (as set by the swish configuration settings IndexContents or DefaultContents) is HTML and a title is found in the html document.

Note: In most cases you probably would not want to send a large binary file to swish, just to be ignored. Therefore, it would be smart to use a filter_content callback routine to replace the contents with single character (you cannot use the empty string at this time).

A similar flag may be set to prevent indexing a document at all, but still allow spidering. In general, if you want completely skip spidering a file you return false from one of the callback routines (test_url, test_response, or filter_content). Returning false from any of those three callbacks will stop processing of that file, and the file will not be spidered.

But there may be some cases where you still want to spider (extract links) yet, not index the file. An example might be where you wish to index only PDF files, but you still need to spider all HTML files to find the links to the PDF files.

    $server{test_response} = sub {
        my ( $uri, $server, $response ) = @_;
        $server->{no_index} = $response->content_type ne 'application/pdf';
        return 1;  # ok to spider, but don't index
    }

So, the difference between no_contents and no_index is that no_contents will still index the file name, just not the contents. no_index will still spider the file (if it's text/html) but the file will not be processed by swish at all.

Note: If no_index is set in a test_response callback function then the document will not be filtered. That is, your filter_content callback function will not be called.

The no_spider flag can be set to avoid spiderering an HTML file. The file will still be indexed unless no_index is also set. But if you do not want to index and spider, then simply return false from one of the three callback funtions.

SIGNALS

Sending a SIGHUP to the running spider will cause it to stop spidering. This is a good way to abort spidering, but let swish index the documents retrieved so far.

CHANGES

List of some of the changes

Thu Sep 30 2004 - changes for Swish-e 2.4.3

Code reorganization and a few new featues. Updated docs a little tiny bit. Introduced a few spelling mistakes.

  • Config opiton: use_default_config

    It used to be that you could run the spider like:

        spider.pl default <some url>

    and the spider would use its own internal config. But if you used your own config file then the defaults were not used. This options allows you to merge your config with the default config. Makes making small changes to the default easy.

  • Config option: use_head_requests

    Tells the spider to make a HEAD request before GET'ing the document from the web server. Useful if you use keep_alive and have a test_response() callback that rejects many documents (which breaks the connection).

  • Config option: spider_done

    Callback to tell you (or tell your config as it may be) that the spider is done. Useful if you need to do some extra processing when done spidering -- like record counts to a file.

  • Config option: get_password

    This callback is called when a document returns a 401 error needing a username and password. Useful if spidering a site proteced with multiple passwords.

  • Config option: output_function

    If defined spider.pl calls this instead of sending ouptut to STDOUT.

  • Config option: debug

    Now you can use the words instead of or'ing the DEBUG_* constants together.

TODO

Add a "get_document" callback that is called right before making the "GET" request. This would make it easier to use cached documents. You can do that now in a test_url callback or in a test_response when using HEAD request.

Save state of the spider on SIGHUP so spidering could be restored at a later date.

COPYRIGHT

Copyright 2001 Bill Moseley

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SUPPORT

Send all questions to the The SWISH-E discussion list.

See http://swish-e.org/

swish-e-2.4.7/html/swish-faq.html0000644000077100017500000024231211166010464013565 00000000000000 Swish-e :: The Swish-e FAQ - Answers to Common Questions
home | support | download

The Swish-e FAQ - Answers to Common Questions

Swish-e version 2.4.7

Table of Contents


OVERVIEW

List of commonly asked and answered questions. Please review this document before asking questions on the Swish-e discussion list.

General Questions

What is Swish-e?

Swish-e is Simple Web Indexing System for Humans - Enhanced. With it, you can quickly and easily index directories of files or remote web sites and search the generated indexes for words and phrases.

So, is Swish-e a search engine?

Well, yes. Probably the most common use of Swish-e is to provide a search engine for web sites. The Swish-e distribution includes CGI scripts that can be used with it to add a search engine for your web site. The CGI scripts can be found in the example directory of the distribution package. See the README file for information about the scripts.

But Swish-e can also be used to index all sorts of data, such as email messages, data stored in a relational database management system, XML documents, or documents such as Word and PDF documents -- or any combination of those sources at the same time. Searches can be limited to fields or MetaNames within a document, or limited to areas within an HTML document (e.g. body, title). Programs other than CGI applications can use Swish-e, as well.

Should I upgrade if I'm already running a previous version of Swish-e?

A large number of bug fixes, feature additions, and logic corrections were made in version 2.2. In addition, indexing speed has been drastically improved (reports of indexing times changing from four hours to 5 minutes), and major parts of the indexing and search parsers have been rewritten. There's better debugging options, enhanced output formats, more document meta data (e.g. last modified date, document summary), options for indexing from external data sources, and faster spidering just to name a few changes. (See the CHANGES file for more information.

Since so much effort has gone into version 2.2, support for previous versions will probably be limited.

Are there binary distributions available for Swish-e on platform foo?

Foo? Well, yes there are some binary distributions available. Please see the Swish-e web site for a list at http://swish-e.org/.

In general, it is recommended that you build Swish-e from source, if possible.

Do I need to reindex my site each time I upgrade to a new Swish-e version?

At times it might not strictly be necessary, but since you don't really know if anything in the index has changed, it is a good rule to reindex.

What's the advantage of using the libxml2 library for parsing HTML?

Swish-e may be linked with libxml2, a library for working with HTML and XML documents. Swish-e can use libxml2 for parsing HTML and XML documents.

The libxml2 parser is a better parser than Swish-e's built-in HTML parser. It offers more features, and it does a much better job at extracting out the text from a web page. In addition, you can use the ParserWarningLevel configuration setting to find structural errors in your documents that could (and would with Swish-e's HTML parser) cause documents to be indexed incorrectly.

Libxml2 is not required, but is strongly recommended for parsing HTML documents. It's also recommended for parsing XML, as it offers many more features than the internal Expat xml.c parser.

The internal HTML parser will have limited support, and does have a number of bugs. For example, HTML entities may not always be correctly converted and properties do not have entities converted. The internal parser tends to get confused when invalid HTML is parsed where the libxml2 parser doesn't get confused as often. The structure is better detected with the libxml2 parser.

If you are using the Perl module (the C interface to the Swish-e library) you may wish to build two versions of Swish-e, one with the libxml2 library linked in the binary, and one without, and build the Perl module against the library without the libxml2 code. This is to save space in the library. Hopefully, the library will someday soon be split into indexing and searching code (volunteers welcome).

Does Swish-e include a CGI interface?

Yes. Kind of.

There's two example CGI scripts included, swish.cgi and search.cgi. Both are installed at $prefix/lib/swish-e.

Both require a bit of work to setup and use. Swish.cgi is probably what most people will want to use as it contains more features. Search.cgi is for those that want to start with a small script and customize it to fit their needs.

An example of using swish.cgi is given in the INSTALL man page, and it the swish.cgi documentation. Like often is the case, it will be easier to use if you first read the documentation.

Please use caution about CGI scripts found on the Internet for use with Swish-e. Some are not secure.

The included example CGI scripts were designed with security in mind. Regardless, you are encouraged to have your local Perl expert review it (and all other CGI scripts you use) before placing it into production. This is just a good policy to follow.

How secure is Swish-e?

We know of no security issues with using Swish-e. Careful attention has been made with regard to common security problems such as buffer overruns when programming Swish-e.

The most likely security issue with Swish-e is when it is run via a poorly written CGI interface. This is not limited to CGI scripts written in Perl, as it's just as easy to write an insecure CGI script in C, Java, PHP, or Python. A good source of information is included with the Perl distribution. Type perldoc perlsec at your local prompt for more information. Another must-read document is located at http://www.w3.org/Security/faq/wwwsf4.html.

Note that there are many free yet insecure and poorly written CGI scripts available -- even some designed for use with Swish-e. Please carefully review any CGI script you use. Free is not such a good price when you get your server hacked...

Should I run Swish-e as the superuser (root)?

No. Never.

What files does Swish-e write?

Swish writes the index file, of course. This is specified with the IndexFile configuration directive or by the -f command line switch.

The index file is actually a collection of files, but all start with the file name specified with the IndexFile directive or the -f command line switch.

For example, the file ending in .prop contains the document properties.

When creating the index files Swish-e appends the extension .temp to the index file names. When indexing is complete Swish-e renames the .temp files to the index files specified by IndexFile or -f. This is done so that existing indexes remain untouched until it completes indexing.

Swish-e also writes temporary files in some cases during indexing (e.g. -s http, -s prog with filters), when merging, and when using -e). Temporary files are created with the mkstemp(3) function (with 0600 permission on unix-like operating systems).

The temporary files are created in the directory specified by the environment variables TMPDIR and TMP in that order. If those are not set then swish uses the setting the configuration setting TmpDir. Otherwise, the temporary file will be located in the current directory.

Can I index PDF and MS-Word documents?

Yes, you can use a Filter to convert documents while indexing, or you can use a program that "feeds" documents to Swish-e that have already been converted. See Indexing below.

Can I index documents on a web server?

Yes, Swish-e provides two ways to index (spider) documents on a web server. See Spidering below.

Swish-e can retrieve documents from a file system or from a remote web server. It can also execute a program that returns documents back to it. This program can retrieve documents from a database, filter compressed documents files, convert PDF files, extract data from mail archives, or spider remote web sites.

Can I implement keywords in my documents?

Yes, Swish-e can associate words with MetaNames while indexing, and you can limit your searches to these MetaNames while searching.

In your HTML files you can put keywords in HTML META tags or in XML blocks.

META tags can have two formats in your source documents:

    <META NAME="DC.subject" CONTENT="digital libraries">

And in XML format (can also be used in HTML documents when using libxml2):

    <meta2>
        Some Content
    </meta2>

Then, to inform Swish-e about the existence of the meta name in your documents, edit the line in your configuration file:

    MetaNames DC.subject meta1 meta2

When searching you can now limit some or all search terms to that MetaName. For example, to look for documents that contain the word apple and also have either fruit or cooking in the DC.subject meta tag.

What are document properties?

A document property is typically data that describes the document. For example, properties might include a document's path name, its last modified date, its title, or its size. Swish-e stores a document's properties in the index file, and they can be reported back in search results.

Swish-e also uses properties for sorting. You may sort your results by one or more properties, in ascending or descending order.

Properties can also be defined within your documents. HTML and XML files can specify tags (see previous question) as properties. The contents of these tags can then be returned with search results. These user-defined properties can also be used for sorting search results.

For example, if you had the following in your documents

   <meta name="creator" content="accounting department">

and creator is defined as a property (see PropertyNames in SWISH-CONFIG) Swish-e can return accounting department with the result for that document.

    swish-e -w foo -p creator

Or for sorting:

    swish-e -w foo -s creator

What's the difference between MetaNames and PropertyNames?

MetaNames allows keywords searches in your documents. That is, you can use MetaNames to restrict searches to just parts of your documents.

PropertyNames, on the other hand, define text that can be returned with results, and can be used for sorting.

Both use meta tags found in your documents (as shown in the above two questions) to define the text you wish to use as a property or meta name.

You may define a tag as both a property and a meta name. For example:

   <meta name="creator" content="accounting department">

placed in your documents and then using configuration settings of:

    PropertyNames creator
    MetaNames creator

will allow you to limit your searches to documents created by accounting:

    swish-e -w 'foo and creator=(accounting)'

That will find all documents with the word foo that also have a creator meta tag that contains the word accounting. This is using MetaNames.

And you can also say:

    swish-e -w foo -p creator

which will return all documents with the word foo, but the results will also include the contents of the creator meta tag along with results. This is using properties.

You can use properties and meta names at the same time, too:

    swish-e -w creator=(accounting or marketing) -p creator -s creator

That searches only in the creator meta name for either of the words accounting or marketing, prints out the contents of the contents of the creator property, and sorts the results by the creator property name.

(See also the -x output format switch in SWISH-RUN.)

Can Swish-e index multi-byte characters?

No. This will require much work to change. But, Swish-e works with eight-bit characters, so many characters sets can be used. Note that it does call the ANSI-C tolower() function which does depend on the current locale setting. See locale(7) for more information.

Indexing

How do I pass Swish-e a list of files to index?

Currently, there is not a configuration directive to include a file that contains a list of files to index. But, there is a directive to include another configuration file.

    IncludeConfigFile /path/to/other/config

And in /path/to/other/config you can say:

    IndexDir file1 file2 file3 file4 file5 ...
    IndexDir file20 file21 file22

You may also specify more than one configuration file on the command line:

    ./swish-e -c config_one config_two config_three

Another option is to create a directory with symbolic links of the files to index, and index just that directory.

How does Swish-e know which parser to use?

Swish can parse HTML, XML, and text documents. The parser is set by associating a file extension with a parser by the IndexContents directive. You may set the default parser with the DefaultContents directive. If a document is not assigned a parser it will default to the HTML parser (HTML2 if built with libxml2).

You may use Filters or an external program to convert documents to HTML, XML, or text.

Can I reindex and search at the same time?

Yes. Starting with version 2.2 Swish-e indexes to temporary files, and then renames the files when indexing is complete. On most systems renames are atomic. But, since Swish-e also generates more than one file during indexing there will be a very short period of time between renaming the various files when the index is out of sync.

Settings in src/config.h control some options related to temporary files, and their use during indexing.

Can I index phrases?

Phrases are indexed automatically. To search for a phrase simply place double quotes around the phrase.

For example:

    swish-e -w 'free and "fast search engine"'

How can I prevent phrases from matching across sentences?

Use the BumpPositionCounterCharacters configuration directive.

Swish-e isn't indexing a certain word or phrase.

There are a number of configuration parameters that control what Swish-e considers a "word" and it has a debugging feature to help pinpoint any indexing problems.

Configuration file directives (SWISH-CONFIG) WordCharacters, BeginCharacters, EndCharacters, IgnoreFirstChar, and IgnoreLastChar are the main settings that Swish-e uses to define a "word". See SWISH-CONFIG and SWISH-RUN for details.

Swish-e also uses compile-time defaults for many settings. These are located in src/config.h file.

Use of the command line arguments -k, -v and -T are useful when debugging these problems. Using -T INDEXED_WORDS while indexing will display each word as it is indexed. You should specify one file when using this feature since it can generate a lot of output.

     ./swish-e -c my.conf -i problem.file -T INDEXED_WORDS

You may also wish to index a single file that contains words that are or are not indexing as you expect and use -T to output debugging information about the index. A useful command might be:

    ./swish-e -f index.swish-e -T INDEX_FULL

Once you see how Swish-e is parsing and indexing your words, you can adjust the configuration settings mentioned above to control what words are indexed.

Another useful command might be:

     ./swish-e -c my.conf -i problem.file -T PARSED_WORDS INDEXED_WORDS

This will show white-spaced words parsed from the document (PARSED_WORDS), and how those words are split up into separate words for indexing (INDEXED_WORDS).

How do I keep Swish-e from indexing numbers?

Swish-e indexes words as defined by the WordCharacters setting, as described above. So to avoid indexing numbers you simply remove digits from the WordCharacters setting.

There are also some settings in src/config.h that control what "words" are indexed. You can configure swish to never index words that are all digits, vowels, or consonants, or that contain more than some consecutive number of digits, vowels, or consonants. In general, you won't need to change these settings.

Also, there's an experimental feature called IgnoreNumberChars which allows you to define a set of characters that describe a number. If a word is made up of only those characters it will not be indexed.

Swish-e crashes and burns on a certain file. What can I do?

This shouldn't happen. If it does please post to the Swish-e discussion list the details so it can be reproduced by the developers.

In the mean time, you can use a FileRules directive to exclude the particular file name, or pathname, or its title. If there are serious problems in indexing certain types of files, they may not have valid text in them (they may be binary files, for instance). You can use NoContents to exclude that type of file.

Swish-e will issue a warning if an embedded null character is found in a document. This warning will be an indication that you are trying to index binary data. If you need to index binary files try to find a program that will extract out the text (e.g. strings(1), catdoc(1), pdftotext(1)).

How to I prevent indexing of some documents?

When using the file system to index your files you can use the FileRules directive. Other than FileRules title, FileRules only works with the file system (-S fs) indexing method, not with -S prog or -S http.

If you are spidering a site you have control over, use a robots.txt file in your document root. This is a standard way to excluded files from search engines, and is fully supported by Swish-e. See http://www.robotstxt.org/

If spidering a website with the included spider.pl program then add any necessary tests to the spider's configuration file. Type <perldoc spider.pl> in the prog-bin directory for details or see the spider documentation on the Swish-e website. Look for the section on callback functions.

If using the libxml2 library for parsing HTML (which you probably are), you may also use the Meta Robots Exclusion in your documents:

    <meta name="robots" content="noindex">

See the obeyRobotsNoIndex directive.

How do I prevent indexing parts of a document?

To prevent Swish-e from indexing a common header, footer, or navigation bar, AND you are using libxml2 for parsing HTML, then you may use a fake HTML tag around the text you wish to ignore and use the IgnoreMetaTags directive. This will generate an error message if the ParserWarningLevel is set as it's invalid HTML.

IgnoreMetaTags works with XML documents (and HTML documents when using libxml2 as the parser), but not with documents parsed by the text (TXT) parser.

If you are using the libxml2 parser (HTML2 and XML2) then you can use the the following comments in your documents to prevent indexing:

       <!-- SwishCommand noindex -->
       <!-- SwishCommand index -->

and/or these may be used also:

       <!-- noindex -->
       <!-- index -->

How do I modify the path or URL of the indexed documents.

Use the ReplaceRules configuration directive to rewrite path names and URLs. If you are using -S prog input method you may set the path to any string.

How can I index data from a database?

Use the "prog" document source method of indexing. Write a program to extract out the data from your database, and format it as XML, HTML, or text. See the examples in the prog-bin directory, and the next question.

How do I index my PDF, Word, and compressed documents?

Swish-e can internally only parse HTML, XML and TXT (text) files by default, but can make use of filters that will convert other types of files such as MS Word documents, PDF, or gzipped files into one of the file types that Swish-e understands.

Please see SWISH-CONFIG and the examples in the filters and filter-bin directory for more information.

See the next question to learn about the filtering options with Swish-e.

How do I filter documents?

The term "filter" in Swish-e means the converstion of a document of one type (one that swish-e cannot index directly) into a type that Swish-e can index, namely HTML, plain text, or XML. To add to the confusion, there are a number of ways to accomplish this in Swish-e. So here's a bit of background.

The FileFilter directive was added to swish first. This feature allows you to specify a program to run for documents that match a given file extension. For example, to filter PDF files (files that end in .pdf) you can specify the configuation setting of:

    FileFilter .pdf pdftotext   "'%p' -"

which says to run the program "pdftotext" passing it the pathname of the file (%p) and a dash (which tells pdftotext to output to stdout). Then for each .pdf file Swish-e runs this program and reads in the filtered document from the output from the filter program.

This has the advantage that it is easy to setup -- a single line in the config file is all that is needed to add the filter into Swish-e. But it also has a number of problems. For example, if you use a Perl script to do your filtering it can be very slow since the filter script must be run (and thus compiled) for each processed document. This is exacerbated when using the -S http method since the -S http method also uses a Perl script that is run for every URL fetched. Also, when using -S prog method of input (reading input from a program) using FileFilter means that Swish-e must first read the file in from the external program and then write the file out to a temporary file before running the filter.

With -S prog it makes much more sense to filter the document in the program that is fetching the documents than to have swish-e read the file into memory, write it to a temporary file and then run an external program.

The Swish-e distribution contains a couple of example -S prog programs. spider.pl is a reasonably full-featured web spider that offers many more options than the -S http method. And it is much faster than running -S http, too.

The spider has a perl configuration file, which means you can add programming logic right into the configuration file without editing the spider program. One bit of logic that is provided in the spider's configuration file is a "call-back" function that allows you to filter the content. In other words, before the spider passes a fetched web document to swish for indexing the spider can call a simple subroutine in the spider's configuration file passing the document and its content type. The subroutine can then look at the content type and decide if the document needs to be filtered.

For example, when processing a document of type "application/msword" the call-back subroutine might call the doc2txt.pm perl module, and a document of type "appliation/pdf" could use the pdf2html.pm module. The prog-bin/SwishSpiderConfig.pl file shows this usage.

This system works reasonably well, but also means that more work is required to setup the filters. First, you must explicitly check for specific content types and then call the appropriate Perl module, and second, you have to know how each module must be called and how each returns the possibly modified content.

In comes SWISH::Filter.

To make things easier the SWISH::Filter Perl module was created. The idea of this module is that there is one interface used to filter all types of documents. So instead of checking for specific types of content you just pass the content type and the document to the SWISH::Filter module and it returns a new content type and document if it was filtered. The filters that do the actual work are designed with a standard interface and work like filter "plug-ins". Adding new filters means just downloading the filter to a directory and no changes are needed to the spider's configuation file. Download a filter for Postscript and next time you run indexing your Postscript files will be indexed.

Since the filters are standardized, hopefully when you have the need to filter documents of a specific type there will already be a filter ready for your use.

Now, note that the perl modules may or may not do the actual conversion of a document. For example, the PDF conversion module calls the pdfinfo and pdftotext programs. Those programs (part of the Xpfd package) must be installed separately from the filters.

The SwishSpiderConfig.pl examle spider configuration file shows how to use the SWISH::Filter module for filtering. This file is installed at $prefix/share/doc/swish-e/examples/prog-bin, where $prefix is normally /usr/local on unix-type machines.

The SWISH::Filter method of filtering can also be used with the -S http method of indexing. By default the swishspider program (the Perl helper script that fetches documents from the web) will attempt to use the SWISH::Filter module if it can be found in Perls library path. This path is set automatically for spider.pl but not for swishspider (because it would slow down a method that's already slow and spider.pl is recommended over the -S http method).

Therefore, all that's required to use this system with -S http is setting the @INC array to point to the filter directory.

For example, if the swish-e distribution was unpacked into ~/swish-e:

   PERL5LIB=~/swish-e/filters swish-e -c conf -S http

will allow the -S http method to make use of the SWISH::Filter module.

Note that if you are not using the SWISH::Filter module you may wish to edit the swishspider program and disable the use of the SWISH::Filter module using this setting:

    use constant USE_FILTERS  => 0;  # disable SWISH::Filter

This prevents the program from attempting to use the SWISH::Filter module for every non-text URL that is fetched. Of course, if you are concerned with indexing speed you should be using the -S prog method with spider.pl instead of -S http.

If you are not spidering, but you still want to make use of the SWISH::Filter module for filtering you can use the DirTree.pl program (in $prefix/lib/swish-e). This is a simple program that traverses the file system and uses SWISH::Filter for filtering.

Here's two examples of how to run a filter program, one using Swish-e's FileFilter directive, another using a prog input method program. See the SwishSpiderConfig.pl file for an example of using the SWISH::Filter module.

These filters simply use the program /bin/cat as a filter and only indexes .html files.

First, using the FileFilter method, here's the entire configuration file (swish.conf):

    IndexDir .
    IndexOnly .html
    FileFilter .html "/bin/cat"   "'%p'"

and index with the command

    swish-e -c swish.conf -v 1

Now, the same thing with using the -S prog document source input method and a Perl program called catfilter.pl. You can see that's it's much more work than using the FileFilter method above, but provides a place to do additional processing. In this example, the prog method is only slightly faster. But if you needed a perl script to run as a FileFilter then prog will be significantly faster.

    #!/usr/local/bin/perl -w
    use strict;
    use File::Find;  # for recursing a directory tree

    $/ = undef;
    find(
        { wanted => \&wanted, no_chdir => 1, },
        '.',
    );

    sub wanted {
        return if -d;
        return unless /\.html$/;

        my $mtime  = (stat)[9];

        my $child = open( FH, '-|' );
        die "Failed to fork $!" unless defined $child;
        exec '/bin/cat', $_ unless $child;

        my $content = <FH>;
        my $size = length $content;

        print <<EOF;
    Content-Length: $size
    Last-Mtime: $mtime
    Path-Name: $_

    EOF

        print <FH>;
    }

And index with the command:

    swish-e -S prog -i ./catfilter.pl -v 1

This example will probably not work under Windows due to the '-|' open. A simple piped open may work just as well:

That is, replace:

    my $child = open( FH, '-|' );
    die "Failed to fork $!" unless defined $child;
    exec '/bin/cat', $_ unless $child;

with this:

    open( FH, "/bin/cat $_ |" ) or die $!;

Perl will try to avoid running the command through the shell if meta characters are not passed to the open. See perldoc -f open for more information.

Eh, but I just want to know how to index PDF documents!

See the examples in the conf directory and the comments in the SwishSpiderConfig.pl file.

See the previous question for the details on filtering. The method you decide to use will depend on how fast you want to index, and your comfort level with using Perl modules.

Regardless of the filtering method you use you will need to install the Xpdf packages available from http://www.foolabs.com/xpdf/.

I'm using Windows and can't get Filters or the prog input method to work!

Both the -S prog input method and filters use the popen() system call to run the external program. If your external program is, for example, a perl script, you have to tell Swish-e to run perl, instead of the script. Swish-e will convert forward slashes to backslashes when running under Windows.

For example, you would need to specify the path to perl as (assuming this is where perl is on your system):

    IndexDir e:/perl/bin/perl.exe

Or run a filter like:

    FileFilter .foo e:/perl/bin/perl.exe 'myscript.pl "%p"'

It's often easier to just install Linux.

How do I index non-English words?

Swish-e indexes 8-bit characters only. This is the ISO 8859-1 Latin-1 character set, and includes many non-English letters (and symbols). As long as they are listed in WordCharacters they will be indexed.

Actually, you probably can index any 8-bit character set, as long as you don't mix character sets in the same index and don't use libxml2 for parsing (see below).

The TranslateCharacters directive (SWISH-CONFIG) can translate characters while indexing and searching. You may specify the mapping of one character to another character with the TranslateCharacters directive.

TranslateCharacters :ascii7: is a predefined set of characters that will translate eight-bit characters to ascii7 characters. Using the :ascii7: rule will, for example, translate "Ääç" to "aac". This means: searching "Çelik", "çelik" or "celik" will all match the same word.

Note: When using libxml2 for parsing, parsed documents are converted internally (within libxml2) to UTF-8. This is converted to ISO 8859-1 Latin-1 when indexing. In cases where a string can not be converted from UTF-8 to ISO 8859-1 (because it contains non 8859-1 characters), the string will be sent to Swish-e in UTF-8 encoding. This will results in some words indexed incorrectly. Setting ParserWarningLevel to 1 or more will display warnings when UTF-8 to 8859-1 conversion fails.

Can I add/remove files from an index?

Try building swish-e with the --enable-incremental option.

The rest of this FAQ applies to the default swish-e format.

Swish-e currently has no way to add or remove items from its index. But, Swish-e indexes so quickly that it's often possible to reindex the entire document set when a file needs to be added, modified or removed. If you are spidering a remote site then consider caching documents locally compressed.

Incremental additions can be handled in a couple of ways, depending on your situation. It's probably easiest to create one main index every night (or every week), and then create an index of just the new files between main indexing jobs and use the -f option to pass both indexes to Swish-e while searching.

You can merge the indexes into one index (instead of using -f), but it's not clear that this has any advantage over searching multiple indexes.

How does one create the incremental index?

One method is by using the -N switch to pass a file path to Swish-e when indexing. It will only index files that have a last modification date newer than the file supplied with the -N switch.

This option has the disadvantage that Swish-e must process every file in every directory as if they were going to be indexed (the test for -N is done last right before indexing of the file contents begin and after all other tests on the file have been completed) -- all that just to find a few new files.

Also, if you use the Swish-e index file as the file passed to -N there may be files that were added after indexing was started, but before the index file was written. This could result in a file not being added to the index.

Another option is to maintain a parallel directory tree that contains symlinks pointing to the main files. When a new file is added (or changed) to the main directory tree you create a symlink to the real file in the parallel directory tree. Then just index the symlink directory to generate the incremental index.

This option has the disadvantage that you need to have a central program that creates the new files that can also create the symlinks. But, indexing is quite fast since Swish-e only has to look at the files that need to be indexed. When you run full indexing you simply unlink (delete) all the symlinks.

Both of these methods have issues where files could end up in both indexes, or files being left out of an index. Use of file locks while indexing, and hash lookups during searches can help prevent these problems.

I run out of memory trying to index my files.

It's true that indexing can take up a lot of memory! Swish-e is extremely fast at indexing, but that comes at the cost of memory.

The best answer is install more memory.

Another option is use the -e switch. This will require less memory, but indexing will take longer as not all data will be stored in memory while indexing. How much less memory and how much more time depends on the documents you are indexing, and the hardware that you are using.

Here's an example of indexing all .html files in /usr/doc on Linux. This first example is without -e and used about 84M of memory:

    270279 unique words indexed.
    23841 files indexed.  177640166 total bytes.
    Elapsed time: 00:04:45 CPU time: 00:03:19

This is with -e, and used about 26M or memory:

    270279 unique words indexed.
    23841 files indexed.  177640166 total bytes.
    Elapsed time: 00:06:43 CPU time: 00:04:12

You can also build a number of smaller indexes and then merge together with -M. Using -e while merging will save memory.

Finally, if you do build a number of smaller indexes, you can specify more than one index when searching by using the -f switch. Sorting large results sets by a property will be slower when specifying multiple index files while searching.

"too many open files" when indexing with -e option

Some platforms report "too many open files" when using the -e economy option. The -e feature uses many temporary files (something like 377) plus the index files and this may exceed your system's limits.

Depending on your platform you may need to set "ulimit" or "unlimit".

For example, under Linux bash shell:

  $ ulimit -n 1024

Or under an old Sparc

  % unlimit openfiles

My system admin says Swish-e uses too much of the CPU!

That's a good thing! That expensive CPU is supposed to be busy.

Indexing takes a lot of work -- to make indexing fast much of the work is done in memory which reduces the amount of time Swish-e is waiting on I/O. But, there's two things you can try:

The -e option will run Swish-e in economy mode, which uses the disk to store data while indexing. This makes Swish-e run somewhat slower, but also uses less memory. Since it is writing to disk more often it will be spending more time waiting on I/O and less time in CPU. Maybe.

The other thing is to simply lower the priority of the job using the nice(1) command:

    /bin/nice -15 swish-e -c search.conf

If concerned about searching time, make sure you are using the -b and -m switches to only return a page at a time. If you know that your result sets will be large, and that you wish to return results one page at a time, and that often times many pages of the same query will be requested, you may be smart to request all the documents on the first request, and then cache the results to a temporary file. The perl module File::Cache makes this very simple to accomplish.

Spidering

How can I index documents on a web server?

If possible, use the file system method -S fs of indexing to index documents in you web area of the file system. This avoids the overhead of spidering a web server and is much faster. (-S fs is the default method if -S is not specified).

If this is impossible (the web server is not local, or documents are dynamically generated), Swish-e provides two methods of spidering. First, it includes the http method of indexing -S http. A number of special configuration directives are available that control spidering (see SWISH-CONFIG/"Directives for the HTTP Access Method Only"). A perl helper script (swishspider) is included in the src directory to assist with spidering web servers. There are example configurations for spidering in the conf directory.

As of Swish-e 2.2, there's a general purpose "prog" document source where a program can feed documents to it for indexing. A number of example programs can be found in the prog-bin directory, including a program to spider web servers. The provided spider.pl program is full-featured and is easily customized.

The advantage of the "prog" document source feature over the "http" method is that the program is only executed one time, where the swishspider.pl program used in the "http" method is executed once for every document read from the web server. The forking of Swish-e and compiling of the perl script can be quite expensive, time-wise.

The other advantage of the spider.pl program is that it's simple and efficient to add filtering (such as for PDF or MS Word docs) right into the spider.pl's configuration, and it includes features such as MD5 checks to prevent duplicate indexing, options to avoid spidering some files, or index but avoid spidering. And since it's a perl program there's no limit on the features you can add.

Why does swish report "./swishspider: not found"?

Does the file swishspider exist where the error message displays? If not, either set the configuration option SpiderDirectory to point to the directory where the swishspider program is found, or place the swishspider program in the current directory when running swish-e.

If you are running Windows, make sure "perl" is in your path. Try typing perl from a command prompt.

If you not running windows, make sure that the shebang line (the first line of the swishspider program that starts with #!) points to the correct location of perl. Typically this will be /usr/bin/perl or /usr/local/bin/perl. Also, make sure that you have execute and read permissions on swishspider.

The swishspider perl script is only used with the -S http method of indexing.

I'm using the spider.pl program to spider my web site, but some large files are not indexed.

The spider.pl program has a default limit of 5MB file size. This can be changed with the max_size parameter setting. See perldoc spider.pl for more information.

I still don't think all my web pages are being indexed.

The spider.pl program has a number of debugging switches and can be quite verbose in telling you what's happening, and why. See perldoc spider.pl for instructions.

Swish is not spidering Javascript links!

Swish cannot follow links generated by Javascript, as they are generated by the browser and are not part of the document.

How do I spider other websites and combine it with my own (filesystem) index?

You can either merge -M two indexes into a single index, or use -f to specify more than one index while searching.

You will have better results with the -f method.

Searching

How do I limit searches to just parts of the index?

If you can identify "parts" of your index by the path name you have two options.

The first options is by indexing the document path. Add this to your configuration:

    MetaNames swishdocpath

Now you can search for words or phrases in the path name:

    swish-e -w 'foo AND swishdocpath=(sales)'

So that will only find documents with the word "foo" and where the file's path contains "sales". That might not works as well as you like, though, as both of these paths will match:

    /web/sales/products/index.html
    /web/accounting/private/sales_we_messed_up.html

This can be solved by searching with a phrase (assuming "/" is not a WordCharacter):

    swish-e -w 'foo AND swishdocpath=("/web/sales/")'
    swish-e -w 'foo AND swishdocpath=("web sales")'  (same thing)

The second option is a bit more powerful. With the ExtractPath directive you can use a regular expression to extract out a sub-set of the path and save it as a separate meta name:

    MetaNames department
    ExtractPath department regex !^/web/([^/]+).+$!$1/

Which says match a path that starts with "/web/" and extract out everything after that up to, but not including the next "/" and save it in variable $1, and then match everything from the "/" onward. Then replace the entire matches string with $1. And that gets indexed as meta name "department".

Now you can search like:

    swish-e -w 'foo AND department=sales'

and be sure that you will only match the documents in the /www/sales/* path. Note that you can map completely different areas of your file system to the same metaname:

    # flag the marketing specific pages
    ExtractPath department regex !^/web/(marketing|sales)/.+$!marketing/
    ExtractPath department regex !^/internal/marketing/.+$!marketing/

    # flag the technical departments pages
    ExtractPath department regex !^/web/(tech|bugs)/.+$!tech/

Finally, if you have something more complicated, use -S prog and write a perl program or use a filter to set a meta tag when processing each file.

How is ranking calculated?

The swishrank property value is calculated based on which Ranking Scheme (or algorithm) you have selected. In this discussion, any time the word fancy is used, you should consult the actual code for more details. It is open source, after all.

Things you can do to affect ranking:

  • MetaNamesRank

    You may configure your index to bias certain metaname values more or less than others. See the MetaNamesRank configuration option in SWISH-CONFIG.

  • IgnoreTotalWordCountWhenRanking

    Set to 1 (default) or 0 in your config file. See SWISH-CONFIG. NOTE: You must set this to 0 to use the IDF Ranking Scheme.

  • structure

    Each term's position in each HTML document is given a structure value based on the context in which the word appears. The structure value is used to artificially inflate the frequency of each term in that particular document. These structural values are defined in config.h:

     #define RANK_TITLE		7
     #define RANK_HEADER		5
     #define RANK_META		3
     #define RANK_COMMENTS		1
     #define RANK_EMPHASIZED 	0

    For example, if the word foo appears in the title of a document, the Scheme will treat that document as if foo appeared 7 additional times.

All Schemes share the following characteristics:

  • AND searches

    The rank value is averaged for all AND'd terms. Terms within a set of parentheses () are averaged as a single term (this is an acknowledged weakness and is on the TODO list).

  • OR searches

    The rank value is summed and then doubled for each pair of OR'd terms. This results in higher ranks for documents that have multiple OR'd terms.

  • scaled rank

    After a document's raw rank score is calculated, a final rank score is calculated using a fancy log() function. All the documents are then scaled against a base score of 1000. The top-ranked document will therefore always have a swishrank value of 1000.

Here is a brief overview of how the different Schemes work. The number in parentheses after the name is the value to invoke that scheme with swish-e -R or RankScheme().

  • Default (0)

    The default ranking scheme considers the number of times a term appears in a document (frequency), the MetaNamesRank and the structure value. The rank might be summarized as:

     DocRank = Sum of ( structure + metabias )

    Consider this output with the DEBUG_RANK variable set at compile time:

     Ranking Scheme: 0 
     Word entry 0 at position 6 has struct 7
     Word entry 1 at position 64 has struct 41
     Word entry 2 at position 71 has struct 9
     Word entry 3 at position 132 has struct 9
     Word entry 4 at position 154 has struct 9
     Word entry 5 at position 423 has struct 73
     Word entry 6 at position 541 has struct 73
     Word entry 7 at position 662 has struct 73
     File num: 1104.  Raw Rank: 21.  Frequency: 8 scaled rank: 30445
      Structure tally:
      struct 0x7 = count of 1 ( HEAD TITLE FILE ) x rank map of 8 = 8
    
      struct 0x9 = count of 3 ( BODY FILE ) x rank map of 1 = 3
    
      struct 0x29 = count of 1 ( HEADING BODY FILE ) x rank map of 6 = 6
    
      struct 0x49 = count of 3 ( EM BODY FILE ) x rank map of 1 = 3

    Every word instance starts with a base score of 1. Then for each instance of your word, a running sum is taken of the structural value of that word position plus any bias you've configured. In the example above, the raw rank is 1 + 8 + 3 + 6 + 3 = 21.

    Consider this line:

      struct 0x7 = count of 1 ( HEAD TITLE FILE ) x rank map of 8 = 8

    That means there was one instance of our word in the title of the file. It's context was in the <head> tagset, inside the <title>. The <title> is the most specific structure, so it gets the RANK_TITLE score: 7. The base rank of 1 plus the structure score of 7 equals 8. If there had been two instances of this word in the title, then the score would have been 8 + 8 = 16.

  • IDF (1)

    IDF is short for Inverse Document Frequency. That's fancy ranking lingo for taking into account the total frequency of a term across the entire index, in addition to the term's frequency in a single document. IDF ranking also uses the relative density of a word in a document to judge its relevancy. Words that appear more often in a doc make that doc's rank higher, and longer docs are not weighted higher than shorter docs.

    The IDF Scheme might be summarized as:

      DocRank = Sum of ( density * idf * ( structure + metabias ) )

    Consider this output from DEBUG_RANK:

     Ranking Scheme: 1 
     File num: 1104  Word Score: 1  Frequency: 8  Total files: 1451   
     Total word freq: 108   IDF: 2564  
     Total words: 1145877   Indexed words in this doc: 562   
     Average words: 789   Density: 1120    Word Weight: 28716   
     Word entry 0 at position 6 has struct 7
     Word entry 1 at position 64 has struct 41
     Word entry 2 at position 71 has struct 9
     Word entry 3 at position 132 has struct 9
     Word entry 4 at position 154 has struct 9
     Word entry 5 at position 423 has struct 73
     Word entry 6 at position 541 has struct 73
     Word entry 7 at position 662 has struct 73
     Rank after IDF weighting: 574321  
     scaled rank: 132609
      Structure tally:
      struct 0x7 = count of  1 ( HEAD TITLE FILE ) x rank map of 8 = 8
    
      struct 0x9 = count of  3 ( BODY FILE ) x rank map of 1 = 3
    
      struct 0x29 = count of  1 ( HEADING BODY FILE ) x rank map of 6 = 6
    
      struct 0x49 = count of  3 ( EM BODY FILE ) x rank map of 1 = 3

    It is similar to the default Scheme, but notice how the total number of files in the index and the total word frequency (as opposed to the document frequency) are both part of the equation.

Ranking is a complicated subject. SWISH-E allows for more Ranking Schemes to be developed and experimented with, using the -R option (from the swish-e command) and the RankScheme (see the API documentation). Experiment and share your findings via the discussion list.

How can I limit searches to the title, body, or comment?

Use the -t switch.

I can't limit searches to title/body/comment.

Or, I can't search with meta names, all the names are indexed as "plain".

Check in the config.h file if #define INDEXTAGS is set to 1. If it is, change it to 0, recompile, and index again. When INDEXTAGS is 1, ALL the tags are indexed as plain text, that is you index "title", "h1", and so on, AND they loose their indexing meaning. If INDEXTAGS is set to 0, you will still index meta tags and comments, unless you have indicated otherwise in the user config file with the IndexComments directive.

Also, check for the UndefinedMetaTags setting in your configuration file.

I've tried running the included CGI script and I get a "Internal Server Error"

Debugging CGI scripts are beyond the scope of this document. Internal Server Error basically means "check the web server's log for an error message", as it can mean a bad shebang (#!) line, a missing perl module, FTP transfer error, or simply an error in the program. The CGI script swish.cgi in the example directory contains some debugging suggestions. Type perldoc swish.cgi for information.

There are also many, many CGI FAQs available on the Internet. A quick web search should offer help. As a last resort you might ask your webadmin for help...

When I try to view the swish.cgi page I see the contents of the Perl program.

Your web server is not configured to run the program as a CGI script. This problem is described in perldoc swish.cgi.

How do I make Swish-e highlight words in search results?

Short answer:

Use the supplied swish.cgi or search.cgi scripts located in the example directory.

Long answer:

Swish-e can't because it doesn't have access to the source documents when returning results, of course. But a front-end program of your creation can highlight terms. Your program can open up the source documents and then use regular expressions to replace search terms with highlighted or bolded words.

But, that will fail with all but the most simple source documents. For HTML documents, for example, you must parse the document into words and tags (and comments). A word you wish to highlight may span multiple HTML tags, or be a word in a URL and you wish to highlight the entire link text.

Perl modules such as HTML::Parser and XML::Parser make word extraction possible. Next, you need to consider that Swish-e uses settings such as WordCharacters, BeginCharacters, EndCharacters, IgnoreFirstChar, and IgnoreLast, char to define a "word". That is, you can't consider that a string of characters with white space on each side is a word.

Then things like TranslateCharacters, and HTML Entities may transform a source word into something else, as far as Swish-e is concerned. Finally, searches can be limited by metanames, so you may need to limit your highlighting to only parts of the source document. Throw phrase searches and stopwords into the equation and you can see that it's not a trivial problem to solve.

All hope is not lost, thought, as Swish-e does provide some help. Using the -H option it will return in the headers the current index (or indexes) settings for WordCharacters (and others) required to parse your source documents as it parses them during indexing, and will return a "Parsed Words:" header that will show how it parsed the query internally. If you use fuzzy indexing (word stemming, soundex, or metaphone) then you will also need to stem each word in your document before comparing with the "Parsed Words:" returned by Swish-e.

The Swish-e stemming code is available either by using the Swish-e Perl module (SWISH::API) or the C library (included with the swish-e distribution), or by using the SWISH::Stemmer module available on CPAN. Also on CPAN is the module Text::DoubleMetaphone. Using SWISH::API probably provides the best stemming support.

Do filters effect the performance during search?

No. Filters (FileFilter or via "prog" method) are only used for building the search index database. During search requests there will be no filter calls.

I have read the FAQ but I still have questions about using Swish-e.

The Swish-e discussion list is the place to go. http://swish-e.org/. Please do not email developers directly. The list is the best place to ask questions.

Before you post please read QUESTIONS AND TROUBLESHOOTING located in the INSTALL page. You should also search the Swish-e discussion list archive which can be found on the swish-e web site.

In short, be sure to include in the following when asking for help.

  • The swish-e version (./swish-e -V)
  • What you are indexing (and perhaps a sample), and the number of files
  • Your Swish-e configuration file
  • Any error messages that Swish-e is reporting

Document Info

$Id: SWISH-FAQ.pod 2147 2008-07-21 02:48:55Z karpet $

.

swish-e-2.4.7/html/swish-library.html0000644000077100017500000010734711166010465014473 00000000000000 Swish-e :: SWISH-LIBRARY - Interface to the Swish-e C library
home | support | download

SWISH-LIBRARY - Interface to the Swish-e C library

Swish-e version 2.4.7

Table of Contents


OVERVIEW

The C library in an interface to the Swish-e search code. It provides a way to embed Swish-e into your applications. This API is based on Swish-e version 2.3.

Note: This is a NEW API as of Swish-e version 2.3. The C language interface has changed as has the perl interface to Swish-e. The new Perl interface is the SWISH::API module and is included with the Swish-e distribution. The old SWISHE perl module has been rewritten to work with the new API. The SWISHE perl module is no longer included with the Swish-e distribution, but can be downloaded from the Swish-e web site.

The advantage of the library is that the index files or files can be opened one time and many queries made on the open index. This saves the startup time required to fork and run the swish-e binary, and the expensive time of opening up the index file. Some benchmarks have shown a three fold increase in speed.

The downside is that your program now has more code and data in it (the index tables can use quite a bit of memory), and if a fatal error happens in swish it will bring down your program. These are things to think about, especially if embedding swish into a web server such as Apache where there are many processes serving requests.

The best way to learn about the library is to look at two files included with the Swish-e distribution that make use of the library.

  • src/libtest.c

    This file gives a basic overview of linking a C program with the Swish-e library. Not all available functions are used in that example, but it should give you a good overview of building a C program with swish-e.

    To build and run libtest chdir to the src directory and run the commands:

        $ make libtest
        $ ./libtest [optional name of index file]

    You will be prompted for the search words. The default index used is index.swish-e. This can be overridden by placing a list of index files in a quote-protected string.

        $ ./libtest 'index1 index2 index3'
  • perl/API.xs

    The API.xs file is a Perl "xsub" interface to the C library and is part of the SWISH::API Perl module. This is an object-oriented interface to the Swish-e library and demonstrates how the various search "objects" are created by C calls and how they are destroyed when no longer needed.

Installing the Swish-e library

The Swish-e library is installed when you run "make install" when building Swish-e. No extra installation steps are required.

The library consists of a header file "swish-e.h" and a library "libswish-e.*" that can either be a static or shared library depending on your platform.

Library Overview

When you first attach to an index file (or index files) you are returned a "swish handle". From the handle you create one or more "search objects" which holds the parameters to query the index, such as the query string, sort order, search phrase delimiter, limit parameters and HTML structure bits. The "object" is really just a pointer to a C structure, but it's helpful to think of it as an object that data and functionality associated with it.

The search object is used to query the index. A query returns a "results object". The results object holds the number of hits, the parsed query per index, and the result set. The results object keeps track of the current position in the result set. You may "seek" to a specific record within the result set (useful for displaying a page of results).

Finally, a result object represents a single result from the result list. A result object provides access to the result's properties (such as file name, rank, etc.).

In addition to results, there are functions available to access the header values stored in the index file, functions to check and report errors, and a few utility functions.

Available Functions

Below is the list of available function included in the Swish-e C language API.

These functions (and typedefs) are defined in the swish-e.h header file. The common objects (e.g. structures) used are:

    SW_HANDLE  - swish handle that associates with an index file
    SW_SEARCH  - search "object" that holds search parameters
    SW_RESULTS - results "object" that holds a result set
    SW_RESULT  - a single result used for accessing the result's properties
    SW_FUZZYWORD - used for fuzzy (stemming) word conversion    

Searching

  • SW_HANDLE SwishInit(char *IndexFiles);

    This functions opens and reads the header info of the index files included in IndexFiles string. The string should contain a space-separated list of index files.

        SW_HANDLE myhandle;
        myhandle = SwishInit("file1.idx");

    Typically you will open a handle at the beginning of your program and use it to make multiple queries on an index.

    This function will always return a swish handle. You must check for errors, and on error free the memory used by the handle, or abort.

    Here's an example of aborting:

        SW_HANDLE swish_handle;
        swish_handle = SwishInit("file1.idx file2.idx");
        if ( SwishError( swish_handle ) )
            SwishAbortLastError( swish_handle );

    And here's an example of catching the error:

        SW_HANDLE swish_handle;
        swish_handle = SwishInit("file1.idx file2.idx");
        if ( SwishError( swish_handle ) )
        {
            printf("Failed to connect to swish. %s\n", SwishErrorString( swish_handle ) );
            SwishClose( swish_handle );  /* free the memory used */
            return 0;
        }

    You may have more than one handle active at a time.

    Swish-e will not tell you if the index file changes on disk (such as after reindexing). In a persistent environment (e.g. mod_perl) the calling program should check to see if the index file has changed on disk. A common way to do this is to store the inode number before opening the index file(s), and then stat the file name every so often and reopen the index files if the inode number changes.

  • void SwishClose(SW_HANDLE handle);

    This function closes and frees the memory of a Swish handle. Every swish handle should be freed when done searching the index. Failing to close the handle will result in a memory leak.

  • SW_SEARCH New_Search_Object(SW_HANDLE handle, const char *query);

    Returns a new search "object". The search object holds the parameters used for searching an index. A single search object can be used to query the index multiple times. The available settings listed below are "sticky" in that they remain set on the search object until change.

  • int SwishGetStructure( SW_SEARCH srch );

    Returns the "structure" flag of the search object passed or 0 if the search object is NULL.

  • void SwishPhraseDelimiter( SW_SEARCH srch, char delimiter );

    Sets the phrase delimiter character. The default is double-quotes.

  • char SwishGetPhraseDelimiter( SW_SEARCH srch );

    Returns the phrase delimiter character used in the search object or 0 if the search object is NULL.

  • void SwishSetStructure( SW_SEARCH srch, int structure );

    Sets the "structure" flag in the search object. The structure flag is used to limit searches to parts of HTML files (such as to the title or headers). The default is to not limit. This provides the functionality of the -H command line switch.

  • void SwishPhraseDelimiter( SW_SEARCH srch, char delimiter );

    Sets the phrase delimiter character. The default is double-quotes.

  • void SwishSetSort( SW_SEARCH srch, char *sort );

    Sets the sort order of the results. This is the same as the -s switch used with the swish-e binary.

  • void SwishSetQuery( SW_SEARCH srch, char *query );

    Sets the query string in the search object. This typically is not needed since it can be set when creating the search object or when executing a query.

  • void SwishSetSearchLimit( SW_SEARCH srch, char *propertyname, char *low, char *hi);

    Sets the limit parameters for a search. Provides the same functionality as the -L command line switch. You may specify a range of property values that search results must be within. You may call SwishSetSearchLimit() only one time for each property (but can set limits on more than one property at a time).

    Unlike the other settings on the search object, once you run a query on the search object you must call SwishResetSearchLimit() to change or clear the limit parameters.

  • void SwishResetSearchLimit( SW_SEARCH srch );

    Resets the limits set on a search object set by SwishSetSearchLimit().

  • void Free_Search_Object( SW_SEARCH srch );

    Frees the search object. This must be called when done with the search object. Generally, you can reuse a search object for multiple queries so typically you would call this right before calling SwishClose().

    You may free the search object before freeing and generated results objects.

  • SW_RESULTS SwishExecute( SW_SEARCH search, const char *query);

    Searches the index or indexes based on the parameters in the search object. Returns a results object. See below for functions to access the data stored in the results object.

    You should always check for errors after calling SwishExecute().

  • SW_RESULTS SwishQuery(SW_HANDLE, const char *words );

    This is a short-cut function that bypasses the creation of a search object (actually, bypasses the need to create and free a search object). This only allows passing in a query string; other search parameters cannot be set. The results are sorted by rank.

    You should always check for errors after calling SwishQuery().

Reading Results

  • int SwishHits( SW_RESULTS results );

    Returns the number of results in the results object.

  • SWISH_HEADER_VALUE SwishParsedWords( SW_RESULTS, const char *index_name );

    Returns the tokenized query. Words are split by WordCharacters and stopwords are removed. The parsed words are useful for highlighting search terms in your program.

    The "index_name" is the name of the index supplied in the SwishInit() function call.

    Returns a SWISH_HEADER_VALUE union of type SWISH_LIST which is a char **. See src/libtest.c for an example of accessing the strings in this list, but in general you may cast this to a (char **).

  • SWISH_HEADER_VALUE SwishRemovedStopwords( SW_RESULTS, const char *index_name );

    Returns a list of stopwords removed from the input query.

    Returns a SWISH_HEADER_VALUE union of type SWISH_LIST which is a char **. See src/libtest.c for an example of accessing the strings in this list, but in general you may cast this to a (char **).

  • int SwishSeekResult( SW_RESULTS, int position );

    Sets the current seek position in the list of results, with position zero being the first record (unlike -b where one is the first result).

    Returns the position or a negative number on error.

  • SW_RESULT SwishNextResult( SW_RESULTS );

    Returns the next result, or NULL if not more results are available.

    The result object returned does not need to be freed after use (unlike the swish handle, search object, and results object).

  • const char *SwishResultPropertyStr(SW_RESULT, char *propertyname);

    This function is mostly useful for testing as it returns odd results on errors.

    Aborts if called with a NULL SW_RESULT object

    Returns a string value of the specified property.

    Returns the empty string "" if the current result does not have the specified property assigned.

    Returns the string "(null)" on invalid property name (i.e. property name is not defined in the index) and sets an error (see below) indicating the invalid property name.

    The string returned does not need to be freed, but is only valid for the current result. If you wish to save the string you must copy it locally.

    Dates are formatted using the hard-coded format string: "%Y-%m-%d %H:%M:%S" in localtime.

  • unsigned long SwishResultPropertyULong(SW_RESULT r, char *propertyname);

    Returns a numeric property as an unsigned long. Numeric properties are used for both PropertyNamesNumeric and PropertyNamesDate type of properties. Dates are returned as a unix timestamp as reported by the system when the index was created.

    Swish-e will abort if called with a NULL SW_RESULT object. Without the SW_RESULT object swish-e cannot set any error codes.

    On error returns UMAX_LONG. This is commonly defined in limits.h. Check SwishError() (see below) for the type of error.

    If SwishError() returns false (zero) then it simply means that this result does not have any data for the specified property.

    If SwishError() returns true (non-zero) then either the propertyname specified is invalid, or the property requested is not a numeric (or date) property (e.g. it's a string property).

    See below on how to fetch the specific error message when SwishError() is true.

  • PropValue *getResultPropValue (SW_RESULT r, char *propertyname, int ID );

    This is a low-level function to fetch a property regardless of type. This is likely the best function for accessing properties.

    Swish-e will abort if called with a NULL SW_RESULT object. Propertyname is the name of the property. ID is the id number of the property, if known. ID is not normally used in the API, but it's purpose is to avoid looking up the property ID for every result displayed.

    The return PropValue is a structure that contains a flag to indicate the type, and a union that holds the property value. They flags and structure are defined in swish-e.h.

    The property must be copied locally and the returned "PropValue" value must be freed by calling freeResultPropValue() to avoid a memory leak.

    On error returns NULL. Check SwishError() (see below) for the type of error.

    If returns NULL but SwishError() returns false (zero) then it simply means that this result does not have any data for the specified property.

    If SwishError() returns true (non-zero) then the property name specified is invalid (i.e. not defined for the index).

    See below on how to fetch the specific error message when SwishError() is true.

    See perl/API.xs for an example on using this function.

  • void freeResultPropValue(void)

    Frees the "PropValue" returned after calling getResultPropValue().

  • void Free_Results_Object( SW_RESULTS results );

    Frees the results object (frees the result set). This must be called when done reading the results and before calling SwishClose().

Accessing the Index Header Values

Each index file has associated header values that describe the index. These functions provide access to this data. The header data is returned as a union SWISH_HEADER_VALUE, and a pointer to a SWISH_HEADER_TYPE is passed in and the returned value indicates the type of data that is returned. See src/libtest.c and perl/API.xs for examples.

  • const char **SwishHeaderNames( SW_HANDLE );

    Returns the list of possible header names. This list is the same for all index files of a given version of Swish-e. It provides a way to gain access to all headers without having to list them in your program.

  • const char **SwishIndexNames( SW_HANDLE );

    Returns a list of index files opened. This is just the list of index files specified in the SwishInit() call. You need the name of the index file to access a specific index's header values.

  • SWISH_HEADER_VALUE SwishHeaderValue( SW_HANDLE, const char *index_name, const char *cur_header, SWISH_HEADER_TYPE *type );

    Fetches the header value for the given index file, and the header name. The call sets the "type" passed in to the type of value returned.

    See src/libtest.c and perl/API.xs for examples.

  • SWISH_HEADER_VALUE SwishResultIndexValue( SW_RESULT, const char *name, SWISH_HEADER_TYPE *type );

    This is like SwishHeaderValue() above, but instead of supplying an index file name and a swish handle, supply a result object and the header value is fetched from the result's related index file.

Accessing Property Meta Data

In addition to the pre-defined standard properties, you have the option of adding additional "meta" properties to be indexed and/or added to the list of properties returned with each result. Consult the sections on the MetaNames and PropteryNames directives in the CONFIGURATION FILE for an explanation of how to do this.

These functions provide access to the meta data stored in an index. You can use them to determine what meta/property information is available for an index including all the pre-defined standard properties. See libtest.c for an example.

  • SWISH_META_LIST SwishMetaList( SW_HANDLE, const char *index_name );

    Returns the list of meta entries for the given index file as a null-terminated array of SW_META objects. Use the functions below to extract specific fields from the SW_META structure. Meta's are distinct from properties.

  • SWISH_META_LIST SwishPropertyList( SW_HANDLE, const char *index_name );

    This function is the same as SwishMetaList() but it returns an array of properties as opposed to meta objects. Property attributes can be extracted in the same was as meta objects using the functions below.

  • SWISH_META_LIST SwishResultMetaList( SW_RESULT );

    This is like SwishMetaList() above but determines the index to use from a result object.

  • SWISH_META_LIST SwishResultPropertyList( SW_RESULT );

    This is like SwishPropertyList() above but like SwishResultMetaList() uses a result object instead of an index name.

  • const char *SwishMetaName( SW_META );

    Given a SW_META object returned by one of the above, this function will return the meta/property's name. You can use this name to access a property's value for a given as described above.

  • int SwishMetaType( SW_META );

    Get the data type for the given meta/property. Known types are listed in swish-e.h

  • SwishMetaID( SW_META );

    Get the internal ID number for the given meta/property. These id's are unique per index file but are not unique per results.

Checking for Errors

You should check for errors after all calls. The last error is stored in the swish handle object, and is only valid until the next operation (which resets the error flags).

Currently, some errors are flagged as "critical" errors. In these cases you should destroy (by calling the SwishClose() function ) the current swish handle. If you have other objects in scope (e.g. a search object or results object) destroy those first.

The types of errors that are critical can be seen in src/error.c. Currently the list includes:

    Could not open index file
    Unknown index file format
    Index file(s) is empty
    Index file error
    Invalid swish handle
    Invalid results object
  • int SwishError( SW_HANDLE );

    This returns true if an error condition exists. It returns the error number, which is a integer less than zero on error. This should be checked before calling any of the other error functions below.

  • const char *SwishErrorString( SW_HANDLE );

    This returns a general text description of the current error.

  • const char *SwishLastErrorMsg( SW_HANDLE );

    In some cases this will return a string with specifics about the current error. For example, SwishErrorString() may return "Unknown metaname", but SwishLastErrorMsg() will return a string with the name of the unknown metaname.

  • int SwishCriticalError( SW_HANDLE );

    Returns true if the current error condition is a critical error. On critical errors you should free up any current objects and call SwishClose() as swish may be in an unstable state.

  • void SwishAbortLastError( SW_HANDLE );

    This is a convenience function that will format and print the last error message, and then abort the program.

  • void set_error_handle( FILE *where );

    Sets where errors and warnings are printed (when printed by swish). For historical reasons, when swish-e first starts up errors and warnings are sent to stdout.

  • void SwishErrorsToStderr( void );

    A convenience method to send errors to stderr instead of stdout.

Utility Functions

  • const char *SwishWordsByLetter(SWISH * sw, char *indexname, char c);

    Returns all the words in the index "indexname" that begin with the letter passed in. Returns NULL if the name of the index file is invalid.

    This fuction may change in the future since only 8-bit chars can currently be used.

  • char * SwsishStemWord( SW_HANDLE sw, char *in_word );

    Deprecated

    This can be used to convert a word to its stem. It uses only the original Porter Stemmer.

  • SW_FUZZYWORD SwishFuzzyWord( SW_RESULT r, char *word );

    Stems "word" based on the fuzzy mode selected during indexing.

    The fuzzy mode used during indexing is stored in the index file. Since each result is linked to a given index file this method allows stemming a word based on it's index file.

    One possible use for this is to highlight search terms in a document summary, which would be based on a given result.

    The methods below can be used to access the data returned. The SW_FUZZYWORD object must be freed when done to avoid a memory leak.

  • const char **SwishFuzzyWordList( SW_FUZZYWORD fw );

    Returns a null terminated list of strings returned by the stemmer. In most cases this will be a single string.

    Here's an example:

        SW_FYZZYWORD fuzzy_word = SwishFuzzyWord( result );
        const char **word_list = SwishFuzzyWordList( fuzzy_word );
        while ( *word_list )
        {
            printf("%s\n", *word_list );
            word_list++;
        }
        SwishFuzzyWordFree( fuzzy_word );

    If the stemmer does not convert the string (for example attempting to stem numeric data) the word_list will contain the original word. To tell if the stemmer actually stemmed the word check the return value with SwishFuzzyWordError().

  • int SwishFuzzyWordError( SW_FUZZYWORD fw );

    This returns zero if the stemming operation was sucessfull, otherwise it returns a value indicating the reason the word was not stemmed. The return values are defined in the swish-e src/stemmer.h file.

    Not all stemmers set this value correctly. But since SwishFuzzyWordList() will return a valid string regardless of the return value, you can often just ignore this setting. That's what I do.

  • int SwishFuzzyWordCount( SW_FUZZYWORD fw );

    Returns the count of string in the word list available by calling SwishFuzzyWordList().

    This is normally just one, but in the case of DoubleMetaphone it can be one or two (i.e. DoubleMetaphone can return one or two strings).

  • const char *SwishFuzzyMode( SW_RESULT r );

    Returns the name of the stemmer used for the given result (which is related to an index).

  • void SwishFuzzyWordFree( SW_FUZZYWORD fw );

    Frees the memory used by the SW_FUZZYWORD.

Bug-Reports

Please report bug reports to the Swish-e discussion group. Feel also free to improve or enhance this feature.

Author

Original interface: Aug 2000 Jose Ruiz jmruiz@boe.es

Updated: Aug 22, 2002 - Bill Moseley

Interface redesigned for Swish-e version 2.3 Oct 17, 2002 - Bill Moseley

Document Info

$Id: SWISH-LIBRARY.pod 1906 2007-02-07 19:25:16Z moseley $

.

swish-e-2.4.7/html/index.html0000644000077100017500000003156411166010467013002 00000000000000 Swish-e :: Documentation Table of Contents
home | support | download

Swish-e Documentation

Version 2.4.7

Table of Contents

swish-e-2.4.7/html/api.html0000644000077100017500000010256211166010465012437 00000000000000 Swish-e :: SWISH::API - Perl interface to the Swish-e C Library
home | support | download

SWISH::API - Perl interface to the Swish-e C Library

Swish-e version 2.4.7

Table of Contents


SYNOPSIS

    use SWISH::API;

    my $swish = SWISH::API->new( 'index.swish-e' );

    $swish->abort_last_error
        if $swish->Error;

    # A short-cut way to search

    my $results = $swish->query( "foo OR bar" );

    # Or more typically
    my $search = $swish->new_search_object;

    # then in a loop
    my $results = $search->execute( $query );

    # always check for errors (but aborting is not always necessary)

    $swish->abort_last_error
        if $swish->Error;

    # Display a list of results

    my $hits = $results->hits;
    if ( !$hits ) {
        print "No Results\n";
        return;  /* for example *.
    }

    print "Found ", $results->hits, " hits\n";

    # Seek to a given page - should check for errors
    $results->seek_result( ($page-1) * $page_size );

    while ( my $result = $results->next_result ) {
        printf("Path: %s\n  Rank: %lu\n  Size: %lu\n  Title: %s\n  Index: %s\n  Modified: %s\n  Record #: %lu\n  File   #: %lu\n\n",
            $result->property( "swishdocpath" ),
            $result->property( "swishrank" ),
            $result->property( "swishdocsize" ),
            $result->property( "swishtitle" ),
            $result->property( "swishdbfile" ),
            $result->result_property_str( "swishlastmodified" ),
            $result->property( "swishreccount" ),
            $result->property( "swishfilenum" )
        );
    }

    # display properties and metanames

    for my $index_name ( $swish->index_names ) {
        my @metas = $swish->meta_list( $index_name );
        my @props = $swish->property_list( $index_name );

        for my $m ( @metas ) {
            my $name = $m->name;
            my $id = $m->id;
            my $type = $m->type;
        }
        # (repeat above for @props)
    }

DESCRIPTION

This module provides a Perl interface to the Swish-e search engine. This module allows embedding the swish-e search code into your application avoiding the need to fork to run the swish-e binary and to keep an index file open when running multiple queries. This results in increased search performance.

DEPENDENCIES

You must have installed Swish-e version 2.4 before building this module. Download from:

    http://swish-e.org

OVERVIEW

This module includes a number of classes.

Searching consists of connecting to a swish-e index (or indexes), and then running queries against the open index. Connecting to the index creates a swish object blessed into the SWISH::API class.

A SWISH::API::Search object is created from the SWISH::API object. The SWISH::API::Search object can have associated parameters (e.g. result sort order).

The SWISH::API::Search object is used to query the associated index file or files. A query on a search object returns a results object of the class SWISH::API::Results. Then individual results of the SWISH::API::Result class can be fetched by calling a method of the results object.

Finally, a result's properties can be accessed by calling methods on the result object.

METHODS

SWISH::API - Swish Handle Object

To begin using Swish you must first create a Swish Handle object. This object makes the connection to one or more index files and is used to create objects used for searching the associated index files.

  • $swish = SWISH::API->new( $index_files );

    This method returns a swish handle object blessed into the SWISH::API class. $index_files is a space separated list of index files to open. This always returns an object, even on errors. Caller must check for errors (see below).

  • @indexes = $swish->index_names;

    Returns a list of index names associated with the swish handle. These were the indexes specified as a parameter on the SWISH::API->new call. This can be used in calls below that require specifying the index file name.

  • @header_names = $swish->header_names;

    Returns a list of possible header names. These can be used to lookup header values. See Swishheader_value method below.

  • @values = $swish->header_value( $index_file, $header_name );

    A swish-e index has data associated with it stored in the index header. This method provides access to that data.

    Returns the header value for the header and index file specified. Most headers are a single item, but some headers (e.g. "Stopwords") return a list.

    The list of possible header names can be obtained from the Swishheader_names method.

  • $swish->rank_scheme( 0|1 );

    Similar to the -R option with the swish-e command line tool. The default ranking scheme is 0. Set it to 1 to experiment with other ranking features. See the SWISH-CONFIG documentation for more on ranking schemes.

Error Handling

All errors are stored in and accessed via the SWISH::API object (the Swish Handle). That is, even an error that occurs when calling a method on a result (SWISH::API::Result) object will store the error in the parent SWISH:API object.

Check for errors after every method call. Some errors are critical errors and will require destruction of the SWISH::API object. Critical errors will typically only happen when attaching to the database and are errors such as an invalid index file name, permissions errors, or passing invalid objects to calls.

Typically, if you receive an error when attaching to an index file or files you should assume that the error is critical and let the swish object fall out of scope (and destroyed). Otherwise, if an error is detected you should check if it is a critical error. If the error is not critical you may continue using the objects that have been created (for example, an invalid meta name will generate a non-critical error, so you may continue searching using the same search object).

Error state is cleared upon a new query.

Again, all error methods need to be called on the parent swish object

  • $swish->error

    Returns true if an error occurred on the last operation. On errors the value returned is the internal Swish-e error number (which is less than zero).

  • $swish->critical_error

    Returns true if the last error was a critical error

  • $swish->abort_last_error

    Aborts the running program and prints an error message to STDERR.

  • $str = $swish->error_string

    Returns the string description of the current error (based on the value returned by $swish->error). This is a generic error string.

  • $msg = $swish->last_error_msg

    Returns a string with specific information about the last error, if any. For example, if a query of:

        badmeta=foo

    and "badmeta" is an invalid metaname $swish->error_string might return "Unknown metaname", but $swish->last_error_msg might return "badmeta".

Generating Search and Result Objects

  • $search = $swish->new_search_object( $query );

    This creates a new search object blessed into the SWISH::API::Search class. The optional $query parameter is a query string to store in the search object.

    See the section on SWISH::API::Search for methods available on the returned object.

    The advantage of this method is that a search object can be used for multiple queries:

        $search = $swish->New_Search_Objet;
        while ( $query = next_query() ) {
            $results = $search->execute( $query );
            ...
        }
  • $results = $swish->query( $query );

    This is a short-cut which avoids the step of creating a separate search object. It returns a results object blessed into the SWISH::API::Results class described below.

    This method basically is the equivalent of

        $results = $swish->new_search_object->execute( $query );

SWISH::API::Search - Search Objects

A search object holds the parameters used to generate a list of results. These methods are used to adjust these parameters and to create the list of results for the current set of search parameters.

  • $search->set_query( $query );

    This will set (or replace) the query string associated with a search object. This method is typically not used as the query can be set when executing the actual query or when creating a search object.

  • $search->set_structure( $structure_bits );

    This method may change in the future.

    A "structure" is a bit-mapped flag used to limit search results to specific parts of an HTML document, such as the title or in H tags. The possible bits are:

        IN_FILE         = 1      This is the default
        IN_TITLE        = 2      In <title> tag
        IN_HEAD         = 4      In <head> tag
        IN_BODY         = 8      In <body>
        IN_COMMENTS     = 16     In html comments
        IN_HEADER       = 32     In <h*>
        IN_EMPHASIZED   = 64     In <em>, <b>, <strong>, <i>
        IN_META         = 128    In a meta tag (e.g. not swishdefault)

    So if you wish to limit your searches to words in heading tags (e.g. <H1>) or in the <title> tag use:

        $search->set_structure( IN_HEAD | IN_TITLE );
  • $search->phrase_delimiter( $char );

    Sets the character used as the phrase delimiter in searches. The default is double-quotes (").

  • $search->set_search_limit( $property, $low, $high );

    Sets a range from $low to $high inclusive that the given $property must be in to be selected as a result. Call multiple times to set more than one limit on different properties. Limits are ANDed, that is, a result must be within the range of all limits specified to be included in a list of results.

    For example to limit searches to documents modified in the last 48 hours:

        my $start = time - 48 * 60 * 60;
        $search->set_search_limit( 'swishlastmodified', $start, time() );

    An error will be set if the property has already been specified or if $high < $low.

    Other errors may not be reported until running the query, such as the property name is invalid or if $low or $high are not numeric and the property specified is a numeric property.

    Once a query is run you cannot change the limit settings for the search object without calling the reset_search_limit method first.

  • $search->reset_search_limit;

    Clears the limit parameters for the given object. This must be called if the limit parameters need to be changed.

  • $search->set_sort( $sort_string );

    Sets the sort order of search results. The string is a space separated list of valid document properties. Each property may contain a qualifier that sets the direction of the sort.

    For example, to sort the results by path name in ascending order and by rank in descending order:

        $search->set_sort( 'swishdocpath asc swishrank desc' );

    The "asc" and "desc" qualifiers are optional, and if omitted ascending is assumed.

    Currently, errors (e.g invalid property name) are not detected on this call, but rather when executing a query. This may change in the future.

SWISH::API::Results - Generating and accessing results

Searching generates a results object blessed into the SWISH::API::Results class.

  • $results = $search->execute( $query );

    Executes a query based on the parameters in the search object. $query is an optional query string to use for the search ($query replaces the set query string in the search object).

    A typical use would be to create a search object once and then call this method for each query using the same search object changing only the passed in $query.

    The caller should check for errors after making this all.

Results Methods

A query creates a results object that contains information about the query (e.g. number of hits) and access to the individual results.

  • $hits = $results->hits;

    Returns the number of results for the query. If zero and no errors were reported after calling $search->execute then the query returned zero results.

  • @parsed_words = $results->parsed_words( $index_name );

    Returns an array of tokenized words and operators with stopwords removed. This is the array of tokens used by swish for the query.

    $index_name must match one of the index files specified on the creation of the swish object (via the SWISH::API->new call).

    The parsed words are useful for highlighting search terms in associated documents.

  • @removed_stopwords = $results->removed_stopwords( $index_name) ;

    Returns an array of stopwords removed from a query, if any, for the index specified.

    $index_name must match one of the index files specified on the creation of the swish object (via the SWISH::API->new call).

  • $results->seek_result( $position );

    Seeks to the position specified in the result list. Zero is the first position and $results->hits-1 is the last position. Seeking past the end of results sets a non-critical error condition.

    Useful for seeking to a specific "page" of results.

  • $result = $results->next_result;

    Fetches the next result from the list of results. Returns undef if no more results are available. $result is an object blessed into the SWISH::API::Result class.

SWISH::API::Result - Result Methods

The follow methods provide access to data related to an individual result.

  • $prop = $result->property( $prop_name );

    Fetches the property specified for the current result. An invalid property name will cause an exception (which can be caught by wrapping the call in an eval block).

    Can return undefined.

    Date properties are returned as a timestamp. Use something like Date::Format to format the strings (or just call scalar localtime( $prop ) ).

  • $prop = $result->result_property_str( $prop_name );

    Fetches and formats the property. Unlike above, invalid property names return the string "(null)" -- this will likely change to match the above (i.e. throw an exception).

    Undefined values are returned at the null string ("").

  • $value = $result->result_index_value( $header_name );

    Returns the header value specified. This is similar to $swish->header_value(), but the index file is not specified (it is determined by the result).

Utility Methods

  • @metas = $swish->meta_list( $index_name );

    Swish-e has "MetaNames" which allow searching by fields in the index. This method returns information about the Metanames.

    Pass in the name of an open index file name and returns a list of SWISH::API::MetaName objects. Three methods are currently defined on these objects:

        $meta->name;
        $meta->id;
        $meta->type;

    Name returns the name of the meta as defined in the MetaNames config option when the index was created.

    The id is the internal ID number used to represent the meta name.

    type is the type of metaname. Currently only one type exists and its value is zero.

  • @props = $swish->property_list( $index_name );

    Swish-e can store content or "properties" in the index and return this data when running a query. A document's path, URL, title, size, date or summary are examples of properites. Each property is accessed via its PropertyName. This method returns information about the PropertNames stored in the index.

    Pass in the name of an open index file name and returns a list of SWISH::API::MetaName objects. Three methods are currently defined on these objects:

        $prop->name;
        $prop->id;
        $prop->type;

    name returns the name of the meta as defined in the MetaNames config option when the index was created.

    The id is the internal ID number used to represent the meta name.

    type is the type of metaname. Currently only one type exists and its value is zero.

  • @propes = $result->property_list;
  • @meta = $result->meta_list;

    These also return a list of Property or Metaname description objects, but are accessed via a result record. Since the result comes from a specific index file there's no need to specify the index file name.

  • $stemmed_word = $swish->stem_word( $word );

    *Deprecated*

    Returns the stemmed version of the passed in word.

    Deprecated because only stems using the original Porter Stemmer and uses a shared memory location in the SW_HANDLE object to store the stemmed word. See below for other stemming options.

  • $fuzzy_word = $swish->Fuzzify( $indexname, $word );

    Like stem_word() used to work, only it uses whatever stemmer is named in $indexname. Returns the same kind of fuzzy_word object as the fuzzy_word() method.

  • $mode_string = $result->fuzzy_mode;

    Returns the string (e.g. "Stemming_en", "Soundex", "None" ) indicating the stemming method used while indexing the given document.

  • $fuzzy_word = $result->fuzzy_word( $word );

    Converts $word using the same fuzzy mode used to index the $result. Returns a SWISH::API::fuzzy_word object. Methods on the object are used to access the converted words and other data as shown below.

  • $count = $fuzzy_word->word_count;

    Returns the number of output words. Normally this is the value one, but may be more depending on the stemmer used. DoubleMetaphone can return two strings for a single input string.

  • $status = $fuzzy_word->word_error;

    Returns any error code that the stemmer might set. Normally, this return value is zero, indicating that the stemming/fuzzy operation succedded. The values returned are defined in the swish-e source file /src/stemmer.h.

  • @words = $fuzzy_word->word_list;

    Returns the converted words from the stemming/fuzzy operation. Normally, the array will contain a single element, although may contain more (i.e. if DoubleMetaphone is used and the input word returns two strings).

    In the event that a word does not stem (e.g. trying to stem a number), this method will return the original input word specified when $result->fuzzy_word( $word ) was called.

  • @parsed_words = $swish->swish_words( $string, $index_file );

    * Not implemented *

    Splits up the input string into tokens of swish words and operators.

NOTES

Perl's garbage collection makes it easy to write code for searching with Swish-e, but care must be taken not to keep objects around too long which can use up memory.

Here's an example of a potential problem. Say you have a very large number of documents indexed and you want to find the first hit for a number of popular keywords (error checking omitted in this bad example):

    sub first_hit {
      my $query = shift;
      my $handle = SWISH::API->new( 'index.swish-e');
      my $results = $handle->query( $query );
      my $first_hit = $results->next_result;
      return $first_hit;
    }

    my @first_hit_list;
    for ( @keywords )
        push @first_hit_list, $first_hit($_);
    }

The first_hit() subroutine is returning a SWISH::Result object. That makes it easy to access properties:

   # print file names
   for my $result ( @first_hit_list ) {
      print $result->property('swishdocpath'),"\n";
   }

But as long as a SWISH::API::Result object is around, so is the entire list of results generated by the $handle->query() call, and the index file is still open (because a SWISH::API::Result depends on a SWISH::API::Results object, which depends on a SWISH::API object).

In this case it would be better to return from first_hit() just the properties you need:

      ...
      my $first_hit = $results->next_result;
      return $first_hit->property('swishdocpath');
   }

Then when first_hit() sub ends the result list will be freed, and the index file closed, thanks to Perl's reference count tracking.

Note: the other problem with the above code is that the same index file is opened for each call to the function. Don't do that, instead open the index file once.

COPYRIGHT

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

AUTHOR

Bill Moseley moseley@hank.org. 2002/2003/2004

SUPPORT

Please contact the Swish-e discussion email list for support with this module or with Swish-e. Please do not contact the developers directly.

swish-e-2.4.7/html/filter.html0000644000077100017500000010117311166010467013152 00000000000000 Swish-e :: SWISH::Filter - Perl extension for filtering documents with Swish-e
home | support | download

SWISH::Filter - Perl extension for filtering documents with Swish-e

Swish-e version 2.4.7

Table of Contents


SYNOPSIS

  use SWISH::Filter;

  # load available filters into memory
  my $filter = SWISH::Filter->new;

  # convert a document

  my $doc = $filter->convert(
        document     => \$scalar_ref,  # path or ref to a doc
        content_type => $content_type, # content type if doc reference
        name         => $real_path,    # optional name for this file (useful for debugging)
        user_data    => $whatever,     # optional data to make available to filters
   );

  return unless $doc;  # empty doc, zero size, or no filters installed

  # Was the document converted by a filter?
  my $was_filtered = $doc->was_filtered;

  # Skip if the file is not text
  return if $doc->is_binary;

  # Print out the doc
  my $doc_ref = $doc->fetch_doc;
  print $$doc_ref;

  # Fetch the final content type of the document
  my $content_type = $doc->content_type;

  # Fetch Swish-e parser type (TXT*, XML*, HTML*, or undefined)
  my $doc_type = $doc->swish_parser_type;

DESCRIPTION

SWISH::Filter provides a unified way to convert documents into a type that Swish-e can index. Individual filters are installed as separate perl modules. For example, there might be a filter that converts from PDF format to HTML format.

Note that this is just a framework for filtering documents. Additional helper programs or Perl module may need to be installed to use SWISH::Filter to filter documents. For example, to filter PDF documents you must install the Xpdf package.

The filters are automatically loaded when SWISH::Filters->new() is called. Filters define a type and priority that determines the processing order of the filter. Filters are processed in this sort order until a filter accepts the document for filtering. The filter uses the document's content type to determine if the filter should handle the current document. The content-type is determined by the files suffix if not supplied by the calling program.

The individual filters are not designed to be used as separate modules. All access to the filters is through this SWISH::Filter module.

Normally, once a document is filtered processing stops. Filters can filter the document and then set a flag saying that filtering should continue (for example a filter that uncompresses a MS Word document before passing on to the filter that converts from MS Word to text). All this should be transparent to the end user. So, filters can be pipe-lined.

The idea of SWISH::Filter is that new filters can be created, and then downloaded and installed to provide new filtering capabilities. For example, if you needed to index MS Excel documents you might be able to download a filter from the Swish-e site and magically next time you run indexing MS Excel docs would be indexed.

The SWISH::Filter setup can be used with -S prog or -S http. It works best with the -S prog method because the filter modules only need to be loaded and compiled one time. The -S prog program spider.pl will automatically use SWISH::Filter when spidering with default settings (using "default" as the first parameter to spider.pl).

The -S http indexing method uses a Perl helper script called swishspider. swishspider has been updated to work with SWISH::Filter, but (unlike spider.pl) does not contain a "use lib" line to point to the location of SWISH::Filter. This means that by default swishspider will not use SWISH::Filter for filtering. The reason for this is because swishspider runs for every URL fetched, and loading the Filters for each document can be slow. The recommended way of spidering is using -S prog with spider.pl, but if -S http is desired the way to enable SWISH::Filter is to set PERL5LIB before running swish so that swishspider will be able to locate the SWISH::Filter module. Here's one way to set the PERL5LIB with the bash shell:

  $ export PERL5LIB=`swish-filter-test -path`

METHODS

  • $filter = SWISH::Filter->new()

    This creates a SWISH::Filter object. You may pass in options as a list or a hash reference.

SWISH::Filter-E<gt>new Options

There is currently only one option that can be passed in to new():

  • ignore_filters

    Pass in a reference to a list of filter names to ignore. For example, if you have two filters installed "Pdf2HTML" and "Pdf2XML" and want to avoid using "Pdf2XML":

        my $filter = SWISH::Filter->new( ignore_filters => ['Pdf2XML'];
  • $doc_object = $filter->convert();

    This method filters a document. Returns an object of the class SWISH::Filter::document or undefined if passed in an empty document, a filename that cannot be read off disk, or if no filters have been loaded.

    SWISH::Filter::document methods listed below can be called on the object to, for example, check if the document was filtered and to fetch the document content (filtered or not).

    You must pass in a hash (or hash reference) of parameters to the convert() method. The possible parameters are:

    • document

      This can be either a path to a file, or a scalar reference to a document in memory. This is required.

    • content_type

      The MIME type of the document. This is only required when passing in a scalar reference to a document. The content type string is what the filters use to match a document type.

      When passing in a file name and content_type is not set, then the content type will be determined from the file's extension by using the MIME::Types Perl module (available on CPAN).

    • name

      Optional name to pass in to filters that will be used in error and warning messages.

    • user_data

      Optional data structure that all filters may access. This can be fetched in a filter by:

          my $user_data = $doc_object->user_data;

      And used in the filter as:

          if ( ref $user_data && $user_data->{pdf2html}{title} ) {
             ...
          }

      It's up to the filter author to use a unique first-level hash key for a given filter.

    Example of using the convert() method:

        $doc_object = $filter->convert(
            document     => $doc_ref,
            content-type => 'application/pdf',
        );
  • $filter->mywarn()

    Internal function used for writing warning messages to STDERR if $ENV{FILTER_DEBUG} is set. Set the environment variable FILTER_DEBUG before running to see extra messages while processing.

  • @filters = $filter->filter_list;

    Returns a list of filter objects installed.

  • @filter = $filter->can_filter( $content_type );

    This is useful for testing to see if a mimetype might be handled by SWISH::Filter wihtout having to pass in a document. Helpful if doing HEAD requests.

    Returns an array of filters that can handle this type of document

WRITING FILTERS

Filters are standard perl modules that are installed into the SWISH::Filters name space. Filters are not complicated -- see the existing filters for examples.

Each filter defines the content-types (or mimetypes) that it can handle. These are specified as a list of regular expressions to match against the document's content-type. If one of the mimetypes of a filter match the incoming document's content-type the filter is called. The filter can then either filter the content or return undefined indicating that it decided not to filter the document for some reason. If the document is converted the filter returns either a reference to a scalar of the content or a file name where the content is stored. The filter also must change the content-type of the document to reflect the new document.

Filters typically use external programs or modules to do that actual work of converting a document from one type to another. For example, programs in the Xpdf packages are used for converting PDF files. The filter can (and should) test for those programs in its new() method.

Filters also can define a type and priority. These attributes are used to set the order filters are tested for a content-type match. This allows you to have more than one filter that can work on the same content-type.

If a filter calls die() then the filter is removed from the chain and will not be called again during the same run. Calling die when running with -S http or -S fs has no effect since the program is run once per document.

Once a filter returns something other than undef no more filters will be called. If the filter calls $filter->set_continue then processing will continue as if the file was not filtered. For example, a filter can uncompress data and then set $filter->set_continue and let other filters process the document.

This is the list of methods the filter should or may define (as specificed):

  • new() * required *

    This method returns either an object which provides access to the filter, or undefined if the filter is not to be used.

    The new() method is a good place to check for required modules or helper programs. Returning undefined prevents the filter from being included in the filter chain.

    The new method must return a blessed hash reference. The only required attribute is mimetypes. This attribute must contain a reference to an array of regular expressions used for matching the content-type of the document passed in.

    Example:

        sub new {
            my ( $class ) = @_;
    
            # List of regular expressions
            my @mimetypes = (
                qr[application/(x-)?msword],
                qr[application/worddoc],
            );
    
            my %settings = (
                mimetypes   => \@mimetypes,
    
                # Optional settings
                priority    => 20,
                type        => 2,
            );
    
            return bless \%settings, $class;
        }

    The attribute "mimetypes" returns an array reference to a list of regular expressions. Those patterns are matched against each document's content type.

  • filter() * required *

    This is the function that does the work of converting a document from one content type to another. The function is passed the document object. See document object methods listed below for what methods may be called on a document.

    The function can return undefined (or any false value) to indicate that the filter did not want to process the document. Other filters will then be tested for a content type match.

    If the document is filtered then the filter must set the new document's content type (if it changed) and return either a file name where the document can be found or a reference to a scalar containing the document.

  • type()

    Returns a number. Filters are sorted (for processing in a specific order) and this number is simply the primary key used in sorting. If not specified the filter's type used for sorting is 2.

    This is an optional method. You can also set the type in your new() constructor as shown above.

  • priority()

    Returns a number. Filters are sorted (for processing in a specific order) and this number is simply the secondary key used in sorting. If not specified the filter's priority is 50.

    This is an optional method. You can also set the priority in your new() constructor as shown above.

Again, the point of the type() and priority() methods is to allow setting the sort order of the filters. Useful if you have two filters for filtering the same content-type, but prefer to use one over the other. Neither are required.

Here's a module to convert MS Word documents using the program "catdoc":

    package SWISH::Filters::Doc2txt;
    use vars qw/ $VERSION /;

    $VERSION = '0.02';

    sub new {
        my ( $class ) = @_;

        my $self = bless {
            mimetypes   => [ qr!application/(x-)?msword! ],
            priority    => 50,
        }, $class;

        # check for helpers
        return $self->set_programs( 'catdoc' );

    }

    sub filter {
        my ( $self, $doc ) = @_;

        my $content = $self->run_catdoc( $doc->fetch_filename ) || return;

        # update the document's content type
        $filter->set_content_type( 'text/plain' );

        # return the document
        return \$content;
    }
    1;

The new() constructor creates a blessed hash which contains an array reference of mimetypes patterns that this filter accepts. The priority sets this filter to run after any other filters that might handle the same type of content. The set_programs() function says that we need to call a program called "catdoc". The function either returns $self or undefined if catdoc could not be found. The set_programs() function creates a new method for running catdoc.

The filter function runs catdoc passing in the name of the file (if the file is in memory a temporary file is created). That run_catdoc() function was created by the set_programs() call above.

SWISH::Filter::document Methods

These methods are available to Filter authors, and also provide access to the document after calling the convert() method to end-users of SWISH::Filter.

End users of SWISH::Filter will use a subset of these methods. Mostly:

   $doc_object->fetch_doc      # and alias for fetch_document_reference()
   $doc_object->was_filtered   # true the document was filtered
   $doc_object->content_type   # document's current content type (mime type)
   $doc_object->swish_parser_type # returns a parser type to use with Swish-e -S prog method
   $doc_object->is_binary      # returns $content_type !~ m[^text/];

These methods are also available to the individual filter modules. The filter's "filter" function is also passed a SWISH::Filter::document object. Method calls may be made on this object to check the document's current content type, or to fetch the document as either a file name or a reference to a scalar containing the document content.

Methods used by end-users and filter authors

  • $doc_ref = $doc_object->fetch_doc_reference;

    Returns a scalar reference to the document. This can be used when the filter can operate on the document in memory (or if an external program expects the input to be from standard input).

    If the file is currently on disk then it will be read into memory. If the file was stored in a temporary file on disk the file will be deleted once read into memory. The file will be read in binmode if $doc->is_binary is true.

    Note that $doc_object->fetch_doc is an alias.

  • $was_filtered = $doc_object->was_filtered

    Returns true if some filter processed the document

  • $content_type = $doc_object->content_type;

    Fetches the current content type for the document.

    Example:

        return unless $filter->content_type =~ m!application/pdf!;
  • $type = $doc_object->swish_parser_type

    Returns a parser type based on the content type

  • $doc_object->is_binary

    Returns true if the document's content-type does not match "text/".

Methods used by filter authors

  • $file_name = $doc_object->fetch_filename;

    Returns a path to the document as stored on disk. This name can be passed to external programs (e.g. catdoc) that expect input as a file name.

    If the document is currently in memory then a temporary file will be created. Do not expect the file name passed to be the real path of the document.

    The file will be written in binmode if $doc->is_binary is true.

    This method is not normally used by end-users of SWISH::Filter.

  • $doc_object->set_continue;

    Processing will continue to the next filter if this is set to a true value. This should be set for filters that change encodings or uncompress documents.

  • $doc_object->set_content_type( $type );

    Sets the content type for a document.

  • $doc_object->name

    Fetches the name of the current file. This is useful for printing out the name of the file in an error message. This is the name passed in to the SWISH::Filter->convert method. It is optional and thus may not always be set.

        my $name = $doc_object->name || 'Unknown name';
        warn "File '$name': failed to convert -- file may be corrupt\n";
  • $doc_object->user_data

    Fetches the the user_data passed in to the filter. This can be any data or data structure passed into SWISH::Filter->new.

    This is an easy way to pass special parameters into your filters.

    Example:

        my $data = $doc_object->user_data;
        # see if a choice for the <title> was passed in
        if ( ref $data eq 'HASH' && $data->{pdf2html}{title_field}  {
           ...
           ...
        }

SWISH::Filters::_BASE

Each filter is a subclass of SWISH::Filters::_BASE. A number of methods are available by default (and some can be overridden). Others are useful when writing your new() constructor.

  • $self->type

    This method fetches the type of the filter. The value returned sets the primary sort key for sorting the filters. You can override this in your filter, or just set it as an attribute in your object. The default is 2.

    The idea of the "type" is to create groups of filters, if needed. For example, you might have a set of filters that are used for uncompressing some documents before passing on to another group for filtering.

  • $self->priority

    This method fetches the priority of the filter. The value returned sets the secondary sort key for sorting the filters. You can override this in your filter, or just set it as an attribute in your object. The default method returns 50.

    The priority is useful if you have multiple filters for the same content type that use different methods for filtering (say one uses wvWare and another uses catdoc for filtering MS Word files). You might give the wvWare filter a lower priority number so it runs before the catdoc filter if both wvWare AND catdoc happen to be installed at the same time.

  • @types = $self->mimetypes

    Returns the list of mimetypes (as regular expressions) set for the filter.

  • $pattern = $self->can_filter_mimetype( $content_type )

    Returns true if passed in content type matches one of the filter's mimetypes Returns the pattern that matched.

  • mywarn( $message )

    method for printing out message if debugging is available

  • $boolean = $self->set_programs( @program_list );

    Returns true if all the programs listed in @program_list are found and can be executed as the current user. Creates a method for each program with the "run_" prefix. Returns false is ANY program cannot be found.

    Actually, it returns $self, so you can make it the last statement in your constructor.

    So in your constructor you might do:

        return $self->set_programs( qw/ pdftotext pdfinfo / );

    Then in your filter() method:

        my $content = $self->run_pdfinfo( $doc->fetch_filename, [options] );
  • $path = $self->find_binary( $prog );

    Use in a filter's new() method to test for a necesary program located in $PATH. Returns the path to the program or undefined if not found or does not pass the -x file test.

  • $bool = $self->use_modules( @module_list );

    Attempts to load each of the module listed and calls its import() method.

    Use to test and load required modules within a filter without aborting.

        return unless $self->use_modules( qw/ Spreadsheet::ParseExcel  HTML::Entities / );

    A warning message is displayed if the FILTER_DEBUG environment variable is true. Returns $self if no error.

  • $doc_ref = $self->run_program( $program, @args );

    Runs $program with @args. Must pass in @args.

    Under Windows calls IPC::Open2, which may pass data through the shell. Double-quotes are escaped (backslashed) and each parameter is wrapped in double-quotes.

    On other platforms a fork and exec is used to avoid passing any data through the shell. Returns a reference to a scalar containing the output from your program, or dies.

    This method is intended to read output from a program that converts one format into text. The output is read back in text mode -- on systems like Windows this means \r\n (CRLF) will be convertet to \n.

TESTING

Filters can be tested with the swish-filter-test program. Run:

   swish-filter-test -man

for documentation.

SUPPORT

Please contact the Swish-e discussion list. http://swish-e.org

Bugs, todo items, and other notes

TBD

AUTHOR

Bill Moseley

COPYRIGHT

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

swish-e-2.4.7/html/search.cgi.html0000644000077100017500000004735011166010466013700 00000000000000 Swish-e :: search.cgi -- Example Perl program for searching with Swish-e and SWISH::API
home | support | download

search.cgi -- Example Perl program for searching with Swish-e and SWISH::API

Swish-e version 2.4.7

Table of Contents


DESCRIPTION

This is a very simple program that shows how to use the SWISH::API module in a CGI script or mod_perl handler using Template-Toolkit to generate output. This program is intended for programmers that want to create a custom search script.

Unlike swish.cgi this script does not have many features, and provides no external configuration (with the execption of a few config options under mod_perl). So don't ask why it doesn't do something. The point is that this script is used as a starting point that YOU customize.

REQUIREMENTS

You must have swish-e and the SWISH::API module installed. See the README and INSTALL documents in the swish-e distribution. As of this writing SWISH::API is part of the swish-e distribution, but in the future may be provided as a separate package (provided on the CPAN). In either case SWISH::API is a separate installation procedure from installing swish-e. The Storable module is also required if using mod_perl.

This program does require that some modules are installed from CPAN. You will need Template-Toolkit and HTML::FillInForm (which depends on HTML::Parser). How those are installed depends on your computer's packaging system.

You will need a web server, obviously. The discussion below assumes Apache is used. If you are using MS IIS take note that IIS works differently in a number of ways.

OVERVIEW

The search.cgi script and related templates are installed when swish-e is installed. search.cgi is installed in $prefix/lib/swish-e/ and templates are installed in $prefix/share/swish-e/templates/. $prefix is /usr/local by default but can be changed when running the swish-e configure script. Upon installation search.cgi is updated with correct paths to your perl binary and

When running as a CGI script search.cgi is copied or symlinked to the location of your CGI scripts (or any directory that allows CGI scripts). By default, the search.cgi script looks for the index index.swish-e in the current directory (that's what the web server considers the current directory). On Apache running mod-cgi that's the same place as the script. On IIS it's not. If your index is elsewhere you will need to modify the script.

The script works by parsing the query, calling SWISH::API to run the actual search, then calls Template-Toolkit to generate the ouput.

The script calls the search.tt template. This template generates the query form and the search results. The search.tt template uses a Template-Toolkit "WRAPPER" function to wrap the search form and results in your site's design. This design is in the page_layout template. The idea is if you use Template-Toolkit to manage your entire site then your entire site would be formatted by the same page_layout template. The page_layout template calls two other templates common_header and common_footer to generate a common header and footer for the site. Those are just demonstrating Template-Toolkit's features.

The page_layout page only defines the basic structure of the site. The true design of the site is managed by style sheets. style.css defines the basic layout and markup.css sets fonts and colors.

Note: these style sheets are included directly in the output of the CGI script. In production the style sheets would be stored as separate style sheet files and imported by the browser instead of directly included in the search results page.

See the section MOD_PERL below for more on templates.

Highlighting of search terms is provided by the SWISH::PhraseHighlight module. That is a very slow module, so you may wish to disable it if you expect a lot of traffic.

INSTALLATION EXAMPLE

Enough talking, sometimes it's nice to see a complete example. Below swish-e is installed in the default location (/usr/local). The "$" is a normal user prompt, where "#" is a root prompt. Use ./configure --prefix to install in another location (e.g. if you do not have root access).

Download and install swish-e

    $ wget -q http://swish-e.org/Download/latest.tar.gz
    $ tar zxf latest.tar.gz
    $ cd swish-e-2.x.x
    $ (./configure && make) >/dev/null
    $ make check
    $ su
    # make install
    # exit

Install SWISH::API

    $ cd perl
    $ perl Makefile.PL && make && make test
    $ su
    # make install
    $ exit

Install requried Perl modules. You can install via RPMs, Debs or directly from the CPAN or by using the CPAN shell.

    # su
    # perl -MCPAN -e 'install Template'
    # perl -MCPAN -e 'install HTML::FillInForm'
    # exit

Now setup the script in someplace that allows CGI scripts.

    $ cd $HOME/apache
    $ ln -s /usr/local/lib/swish-e/search.cgi .
    $ cat .htaccess
    deny from all
    <files search.cgi>
        allow from all
        SetHandler cgi-script
        Options +ExecCGI
    </files>

Create an index

    $ cat swish.config
    IndexOnly .htm .html
    DefaultContents HTML*
    StoreDescription HTML* <body>
    metanames swishtitle swishdocpath

    $ swish-e -c swish.config -i /usr/share/doc/apache-doc/manual

Test the index and the CGI script:

    $ swish-e -w apache -m1 | grep hits
    # Number of hits: 152

    $ lynx -dump http://localhost/apache/search.cgi?query=apache | grep hits
        Showing page 1 (1 - 10 of 152 hits) [3]Next
              'hits' => 152,

Now, the above isn't very helpful because the Apache documentation indexed is not in the web space. You would likely index content available on your web site.

Using with SpeedyCGI

Perl CGI script must be compiled for each request. SpeedyCGI is a tool to speed up scripts by running them persistently. To run search.cgi with SpeedyCGI install the program (you can Google, right?) and then change the first line of search.cgi to run the speedy program.

For example:

    #!/usr/bin/speedy -w

Using with MOD_PERL

This script can be run directly as a mod_perl handler, and the same code can be used to run multiple sites by using separate Location directives and passing in a "site id." The script caches in memory different configurations based on this site id.

Below is a complete httpd.conf file. It requires an Apache httpd that has mod_perl compiled in statically. It runs mod_perl on a high port (port 5000) listening to all interfaces.

For testing I put this config file in a directory along with search.cgi, but that's just done to make the example simple (i.e. so I don't have to show any absolute paths). Normally the httpd.conf and the swish.cgi "module" would be in separate locations.

    # httpd.conf -- test file for search.cgi as mod_perl handler

    <ifModule mod_so.c>
        LoadModule mime_module /usr/lib/apache/1.3/mod_mime.so
    </IfModule>

    ErrorLog swish_error_log
    PidFile swish_httpd.pid

    Listen *:5000

    <perl>
        push @PerlSetVar, [
            index  => Apache->server_root_relative( 'index.swish-e'),
        ];
        $DocumentRoot =  Apache->server_root_relative;
        require "search.cgi";
    </perl>

    NameVirtualHost *:5000
    <VirtualHost *:5000>

        ServerName localhost

        <Location /search>
            SetHandler  perl-script
            PerlHandler SwishAPISearch
        </Location>

        <Location /othersite>
            SetHandler perl-script
            PerlHandler SwishAPISearch
            # Define this site
            PerlSetVar  site_id othersite
            PerlSetVar  title "Some other Site"
        </Location>

    </VirtualHost>

The server is started using this command:

    $ /usr/sbin/apache-perl -d $(pwd) -f $(pwd)/httpd.conf

which says to use the current directory as the ServerRoot. (See comments below.) Stop the server like:

    $ kill `cat swish_httpd.pid`

Then access either:

    http://localhost:5000/search
    http://localhost:5000/othersite

A few Notes:

I like test configurations to not care where things are located. Thus, the above httpd.conf does a few tricks in the "Perl Section" shown.

First, mod_perl, unlike CGI, doesn't set the working directory. So, the index file name must be absolute. This is accomplished by a PerlSetVar entry building the index file name from the ServerRoot.

Second, the DocumentRoot is set to the same as the ServerRoot. The DocumentRoot needs to be set so search.cgi can figure out the path to the script (for creating next and previous links).

Third, the script is loaded by a require statement. This works only because the current directory "." is in Perl's @INC path at Apache start up time and search.cgi is also in the current directory. Normally, set PERL5LIB on server startup or use a "use lib" line in your startup.pl file to point to the location of search.cgi.

The "PerlSetVar" lines pass config information into the script. Note that they can be set globally or specific to a given Location.

The following config options are currently available:

  • site_id

    The site_id options allow caching of configurations on a per-site basis. It's overkill in this example, but normally you might have expensive configuration processes that you might want to do only once. But, since there is caching by this id it's a good id to set a site_id if using more than one Location directive.

  • index

    This specifies the index file to use. The index file needs to be absolute as discussed above. Example:

        PerlSetVar index /usr/share/swish/site.index
  • title

    This options sets the title that's passed into the template.

  • template

    Sets the file name of the template use to generate the form. This might be useful if you want an "advanced" form, for example.

  • template_path

    This can be used to update the path where templates are searched. Useful if you wish to override templates.

  • page_size

    This allow changing the default number of results shown per page.

SUPPORT

Not much support is provided. But what support is provided is ONLY provided via the Swish-e discussion list.

    http://swish-e.org/

AUTHOR

Bill Moseley

LICENSE

Copyright 2003, 2004 Bill Moseley. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

SWISH::API, Template, HTML::FillInForm

swish-e-2.4.7/html/readme.html0000644000077100017500000004314111166010461013114 00000000000000 Swish-e :: The Swish-e README File
home | support | download

The Swish-e README File

Swish-e version 2.4.7

Table of Contents


Upgrading?

If you are upgrading Swish-e, please review the CHANGES file before installation. The index format may change and existing indexes may need to be re-created before use.

OVERVIEW

Swish-e is Simple Web Indexing System for Humans - Enhanced. Swish-e can quickly and easily index directories of files or remote web sites and search the generated indexes.

Swish-e is extremely fast in both indexing and searching, highly configurable, and can be seamlessly integrated with existing web sites to maintain a consistent design. Swish-e can index web pages, but can just as easily index text files, mailing list archives, or data stored in a relational database.

Swish is designed to index small- to medium-sized collection of documents, Although a few users are indexing over a million documents, typical usage is more often in the tens of thousands. Currently, Swish-e only indexes eight bit character encodings.

Swish-e version 2.2 was a major rewrite of the code and the addition of many new features. Memory requirements for indexing have been reduced and indexing speed is significantly improved from previous versions. New features allow more control over indexing, better document parsing, improved indexing and searching logic, better filter code, and the ability to index from any data source.

Swish-e version 2.4 includes a major rewrite of the C API and a new Perl module for accessing the Swish-e C library. In addition, Swish-e 2.4 uses the GNU Auto Tools. The significant changes are where files are installed, and the use of Libtool to create the Swish-e library as a shared library on many platforms. Basically, installation is easier than previous versions, and more files are installed in "standard" locations (e.g. documentation is installed in $prefix/share/doc/swish-e).

Note: Due to the new build and installation system in Swish-e 2.4, some documentation may incorrectly list the location of files. Please report any documentation errors to the Swish-e Discussion list.

Swish-e is not a "turn-key" indexing and searching solution. The Swish-e distribution contains most of the parts to create such a system, but you need to put the parts together as best meets your needs. This gives you the power to index and search your documents the way you wish and to seamlessly integrate a search engine into your web site or application.

To use Swish-e, you will need to configure Swish-e to index your documents, create an index by running Swish-e, and setup an interface such as a CGI script (a script is included) to search the index and display results. Swish uses helper programs to index documents of types that Swish-e cannot natively index. These programs may need to be installed separately from Swish-e.

Swish-e is an Open Source (see: http://opensource.org ) program supported by developers and a large group of users. Please take time to join the Swish-e discussion list at http://Swish-e.org .

Key features

  • Quickly index a large number of documents in different formats including text, HTML, and XML.

  • Use "filters" to index other types of files such as PDF, gzip, or PostScript.

  • Includes a web spider for indexing remote documents over HTTP. Follows Robots Exclusion Rules (including META tags).

  • Can use an external program to supply documents to Swish-e, such as an advanced spider for your web server or a program to read and format records from a relational database.

  • Document "properties" (some subset of the source document, usually defined as a META or XML elements) may be stored in the index and returned with search results.

  • Document summaries can be returned with each search.

  • Word stemming, soundex, metaphone, and double-metaphone indexing for "fuzzy" searching

  • Phrase searching and wildcard searching

  • Limit searches to HTML links.

  • Use powerful Regular Expressions to select documents for indexing or exclusion.

  • Easily limit searches to parts or all of your web site.

  • Results can be sorted by relevance or by any number of properties in ascending or descending order.

  • Limit searches to parts of documents such as certain HTML tags (META, TITLE, comments, etc.) or to XML elements.

  • Can report structural errors in your XML and HTML documents.

  • Index file is portable between platforms.

  • A Swish-e library is provided to allow embedding Swish-e into your applications for very fast searching. A Perl module is available that provides a standard API for accessing Swish-e.

  • Includes example search script with context summaries and search term and phrase highlighting. Can be used with popular Perl templating systems.

  • Swish-e is fast.

  • It's Open Source and FREE! You can customize Swish-e and you can contribute your fancy new features to the project.

  • Supported by on-line user and developer groups.

Where do I get Swish-e?

The current version of Swish-e can be found at:

http://Swish-e.org

Please make sure you use a current version of Swish-e.

Information about Windows binary distributions can also be found at this site.

How Do I Install Swish-e?

Read the INSTALL page.

Building from source is recommended. On most platforms, Swish-e should build without problems. A list of platforms where Swish-e has been built can be found in the INSTALL page. Information on building for VMS and Win32 can be found in sub-directories of the src directory. Check the Swish-e site for information about binary distributions (such as for Windows).

In addition to the INSTALL page, make sure you read the SWISH-FAQ page if you have any questions, or to get an idea of questions that you might someday ask.

Problems or questions about installing Swish-e should be directed to the Swish-e discussion list (see the Swish-e web site at http://Swish-e.org).

Please read Where do I get help with Swish-e? below before posting any questions to the Swish-e list.

The Swish-e Documentation

Documentation is provided as HTML pages installed in $prefix/share/doc/swish-e where $prefix is /usr/local if building from source, or /usr if installed as part of a package from your OS vendor. Under Windows $prefix is selected at installation time.

A subset of the documentation is installed as system man pages as well.

Documentation is also available on-line at http://swish-e.org.

Patches or updates to the documentation should be done against the POD files, located in the pod directory of the distribution, or (preferably) against the CVS repository.

Where do I get help with Swish-e?

If you need help with installing or using Swish-e, please subscribe to the Swish-e mailing list. Visit the Swish-e web site (listed above) for information on subscribing to the mailing list.

Before posting any questions, please read QUESTIONS AND TROUBLESHOOTING.

Speling mistakes

Please contact the Swish-e list with corrections to this documentation. Any help in cleaning up the docs will be appreciated!

Any patches should be made against the .pod files, not the .html files.

Swish-e Development

Swish-e is currently being developed as an Open-Source project on SourceForge http://sourceforge.net.

Contact the Swish-e list for questions about Swish-e development.

Swish-e's History

SWISH was created by Kevin Hughes, circa 1994, to fill the need of the growing number of Web administrators on the Internet - many of the indexing systems were not well documented, were hard to use and install, and were too complex for their own good. The system was widely used for several years, long enough to collect some bug fixes and requests for enhancements.

In Fall 1996, The Library of UC Berkeley received permission from Kevin Hughes to implement bug fixes and enhancements to the original binary. The result is Swish-enhanced or Swish-e, brought to you by the Swish-e Development Team.

Document Info

Each document in the Swish-e distribution contains this section. It refers only to the specific page it's located in, and not to the Swish-e program or the documentation as a whole.

$Id: README.pod 1663 2005-02-11 17:00:13Z whmoseley $

.

swish-e-2.4.7/html/Makefile.in0000664000077100017500000002605211166010112013033 00000000000000# Makefile.in generated by automake 1.9.6 from Makefile.am. # @configure_input@ # Copyright (C) 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, # 2003, 2004, 2005 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. @SET_MAKE@ # $id$ # # Conditionally install the html documentation srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = .. am__cd = CDPATH="$${ZSH_VERSION+.}$(PATH_SEPARATOR)" && cd INSTALL = @INSTALL@ install_sh_DATA = $(install_sh) -c -m 644 install_sh_PROGRAM = $(install_sh) -c install_sh_SCRIPT = $(install_sh) -c INSTALL_HEADER = $(INSTALL_DATA) transform = $(program_transform_name) NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : build_triplet = @build@ host_triplet = @host@ subdir = html DIST_COMMON = $(srcdir)/Makefile.am $(srcdir)/Makefile.in ACLOCAL_M4 = $(top_srcdir)/aclocal.m4 am__aclocal_m4_deps = $(top_srcdir)/config/acinclude.m4 \ $(top_srcdir)/configure.in am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \ $(ACLOCAL_M4) mkinstalldirs = $(SHELL) $(top_srcdir)/config/mkinstalldirs CONFIG_HEADER = $(top_builddir)/src/acconfig.h CONFIG_CLEAN_FILES = SOURCES = DIST_SOURCES = am__vpath_adj_setup = srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; am__vpath_adj = case $$p in \ $(srcdir)/*) f=`echo "$$p" | sed "s|^$$srcdirstrip/||"`;; \ *) f=$$p;; \ esac; am__strip_dir = `echo $$p | sed -e 's|^.*/||'`; am__installdirs = "$(DESTDIR)$(htmldir)" htmlDATA_INSTALL = $(INSTALL_DATA) DATA = $(html_DATA) DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST) ACLOCAL = @ACLOCAL@ ALLOCA = @ALLOCA@ AMDEP_FALSE = @AMDEP_FALSE@ AMDEP_TRUE = @AMDEP_TRUE@ AMTAR = @AMTAR@ AR = @AR@ AS = @AS@ AUTOCONF = @AUTOCONF@ AUTOHEADER = @AUTOHEADER@ AUTOMAKE = @AUTOMAKE@ AWK = @AWK@ BTREE_OBJS = @BTREE_OBJS@ BUILDDOCS_FALSE = @BUILDDOCS_FALSE@ BUILDDOCS_TRUE = @BUILDDOCS_TRUE@ CC = @CC@ CCDEPMODE = @CCDEPMODE@ CFLAGS = @CFLAGS@ CPP = @CPP@ CPPFLAGS = @CPPFLAGS@ CXX = @CXX@ CXXCPP = @CXXCPP@ CXXDEPMODE = @CXXDEPMODE@ CXXFLAGS = @CXXFLAGS@ CYGPATH_W = @CYGPATH_W@ DEFS = @DEFS@ DEPDIR = @DEPDIR@ DLLTOOL = @DLLTOOL@ ECHO = @ECHO@ ECHO_C = @ECHO_C@ ECHO_N = @ECHO_N@ ECHO_T = @ECHO_T@ EGREP = @EGREP@ EXEEXT = @EXEEXT@ F77 = @F77@ FFLAGS = @FFLAGS@ INSTALLDOCS_FALSE = @INSTALLDOCS_FALSE@ INSTALLDOCS_TRUE = @INSTALLDOCS_TRUE@ INSTALL_DATA = @INSTALL_DATA@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@ LARGEFILES_MACROS = @LARGEFILES_MACROS@ LDFLAGS = @LDFLAGS@ LIBOBJS = @LIBOBJS@ LIBS = @LIBS@ LIBTOOL = @LIBTOOL@ LIBXML2_CFLAGS = @LIBXML2_CFLAGS@ LIBXML2_LIB = @LIBXML2_LIB@ LIBXML2_OBJS = @LIBXML2_OBJS@ LIBXML_REQUIRED_VERSION = @LIBXML_REQUIRED_VERSION@ LN_S = @LN_S@ LTLIBOBJS = @LTLIBOBJS@ MAINT = @MAINT@ MAINTAINER_MODE_FALSE = @MAINTAINER_MODE_FALSE@ MAINTAINER_MODE_TRUE = @MAINTAINER_MODE_TRUE@ MAKEINFO = @MAKEINFO@ OBJDUMP = @OBJDUMP@ OBJEXT = @OBJEXT@ PACKAGE = @PACKAGE@ PACKAGE_BUGREPORT = @PACKAGE_BUGREPORT@ PACKAGE_NAME = @PACKAGE_NAME@ PACKAGE_STRING = @PACKAGE_STRING@ PACKAGE_TARNAME = @PACKAGE_TARNAME@ PACKAGE_VERSION = @PACKAGE_VERSION@ PATH_SEPARATOR = @PATH_SEPARATOR@ PCRE_CFLAGS = @PCRE_CFLAGS@ PCRE_CONFIG = @PCRE_CONFIG@ PCRE_LIBS = @PCRE_LIBS@ PCRE_REQUIRED_VERSION = @PCRE_REQUIRED_VERSION@ PERL = @PERL@ POD2MAN = @POD2MAN@ RANLIB = @RANLIB@ SET_MAKE = @SET_MAKE@ SHELL = @SHELL@ STRIP = @STRIP@ SWISH_WEB = @SWISH_WEB@ VERSION = @VERSION@ XML2_CONFIG = @XML2_CONFIG@ Z_CFLAGS = @Z_CFLAGS@ Z_LIBS = @Z_LIBS@ ac_ct_AR = @ac_ct_AR@ ac_ct_AS = @ac_ct_AS@ ac_ct_CC = @ac_ct_CC@ ac_ct_CXX = @ac_ct_CXX@ ac_ct_DLLTOOL = @ac_ct_DLLTOOL@ ac_ct_F77 = @ac_ct_F77@ ac_ct_OBJDUMP = @ac_ct_OBJDUMP@ ac_ct_RANLIB = @ac_ct_RANLIB@ ac_ct_STRIP = @ac_ct_STRIP@ am__fastdepCC_FALSE = @am__fastdepCC_FALSE@ am__fastdepCC_TRUE = @am__fastdepCC_TRUE@ am__fastdepCXX_FALSE = @am__fastdepCXX_FALSE@ am__fastdepCXX_TRUE = @am__fastdepCXX_TRUE@ am__include = @am__include@ am__leading_dot = @am__leading_dot@ am__quote = @am__quote@ am__tar = @am__tar@ am__untar = @am__untar@ bindir = @bindir@ build = @build@ build_alias = @build_alias@ build_cpu = @build_cpu@ build_os = @build_os@ build_vendor = @build_vendor@ datadir = @datadir@ exec_prefix = @exec_prefix@ host = @host@ host_alias = @host_alias@ host_cpu = @host_cpu@ host_os = @host_os@ host_vendor = @host_vendor@ includedir = @includedir@ infodir = @infodir@ install_sh = @install_sh@ libdir = @libdir@ libexecdir = @libexecdir@ localstatedir = @localstatedir@ mandir = @mandir@ mkdir_p = @mkdir_p@ oldincludedir = @oldincludedir@ prefix = @prefix@ program_transform_name = @program_transform_name@ sbindir = @sbindir@ sharedstatedir = @sharedstatedir@ sysconfdir = @sysconfdir@ target_alias = @target_alias@ # Where docs are installed htmldir = $(datadir)/doc/$(PACKAGE)/html @BUILDDOCS_TRUE@DISTCLEANFILES = \ @BUILDDOCS_TRUE@ $(html_DATA) @INSTALLDOCS_TRUE@html_DATA = \ @INSTALLDOCS_TRUE@ $(html_files) html_files = \ api.html \ changes.html \ filter.html \ index.html \ install.html \ readme.html \ search.cgi.html \ spider.html \ swish-3.0.html \ swish-bugs.html \ swish.cgi.html \ swish-config.html \ swish.css \ swish-faq.html \ swish-library.html \ swish-run.html \ swish-search.html EXTRA_DIST = \ $(html_DATA) all: all-am .SUFFIXES: $(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am $(am__configure_deps) @for dep in $?; do \ case '$(am__configure_deps)' in \ *$$dep*) \ cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh \ && exit 0; \ exit 1;; \ esac; \ done; \ echo ' cd $(top_srcdir) && $(AUTOMAKE) --foreign html/Makefile'; \ cd $(top_srcdir) && \ $(AUTOMAKE) --foreign html/Makefile .PRECIOUS: Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status @case '$?' in \ *config.status*) \ cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh;; \ *) \ echo ' cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe)'; \ cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe);; \ esac; $(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh $(top_srcdir)/configure: @MAINTAINER_MODE_TRUE@ $(am__configure_deps) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh $(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh mostlyclean-libtool: -rm -f *.lo clean-libtool: -rm -rf .libs _libs distclean-libtool: -rm -f libtool uninstall-info-am: install-htmlDATA: $(html_DATA) @$(NORMAL_INSTALL) test -z "$(htmldir)" || $(mkdir_p) "$(DESTDIR)$(htmldir)" @list='$(html_DATA)'; for p in $$list; do \ if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \ f=$(am__strip_dir) \ echo " $(htmlDATA_INSTALL) '$$d$$p' '$(DESTDIR)$(htmldir)/$$f'"; \ $(htmlDATA_INSTALL) "$$d$$p" "$(DESTDIR)$(htmldir)/$$f"; \ done uninstall-htmlDATA: @$(NORMAL_UNINSTALL) @list='$(html_DATA)'; for p in $$list; do \ f=$(am__strip_dir) \ echo " rm -f '$(DESTDIR)$(htmldir)/$$f'"; \ rm -f "$(DESTDIR)$(htmldir)/$$f"; \ done tags: TAGS TAGS: ctags: CTAGS CTAGS: distdir: $(DISTFILES) @srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; \ topsrcdirstrip=`echo "$(top_srcdir)" | sed 's|.|.|g'`; \ list='$(DISTFILES)'; for file in $$list; do \ case $$file in \ $(srcdir)/*) file=`echo "$$file" | sed "s|^$$srcdirstrip/||"`;; \ $(top_srcdir)/*) file=`echo "$$file" | sed "s|^$$topsrcdirstrip/|$(top_builddir)/|"`;; \ esac; \ if test -f $$file || test -d $$file; then d=.; else d=$(srcdir); fi; \ dir=`echo "$$file" | sed -e 's,/[^/]*$$,,'`; \ if test "$$dir" != "$$file" && test "$$dir" != "."; then \ dir="/$$dir"; \ $(mkdir_p) "$(distdir)$$dir"; \ else \ dir=''; \ fi; \ if test -d $$d/$$file; then \ if test -d $(srcdir)/$$file && test $$d != $(srcdir); then \ cp -pR $(srcdir)/$$file $(distdir)$$dir || exit 1; \ fi; \ cp -pR $$d/$$file $(distdir)$$dir || exit 1; \ else \ test -f $(distdir)/$$file \ || cp -p $$d/$$file $(distdir)/$$file \ || exit 1; \ fi; \ done check-am: all-am check: check-am all-am: Makefile $(DATA) installdirs: for dir in "$(DESTDIR)$(htmldir)"; do \ test -z "$$dir" || $(mkdir_p) "$$dir"; \ done install: install-am install-exec: install-exec-am install-data: install-data-am uninstall: uninstall-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am installcheck: installcheck-am install-strip: $(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \ install_sh_PROGRAM="$(INSTALL_STRIP_PROGRAM)" INSTALL_STRIP_FLAG=-s \ `test -z '$(STRIP)' || \ echo "INSTALL_PROGRAM_ENV=STRIPPROG='$(STRIP)'"` install mostlyclean-generic: clean-generic: distclean-generic: -test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES) -test -z "$(DISTCLEANFILES)" || rm -f $(DISTCLEANFILES) maintainer-clean-generic: @echo "This command is intended for maintainers to use" @echo "it deletes files that may require special tools to rebuild." clean: clean-am clean-am: clean-generic clean-libtool mostlyclean-am distclean: distclean-am -rm -f Makefile distclean-am: clean-am distclean-generic distclean-libtool dvi: dvi-am dvi-am: html: html-am info: info-am info-am: install-data-am: install-htmlDATA install-exec-am: install-info: install-info-am install-man: installcheck-am: maintainer-clean: maintainer-clean-am -rm -f Makefile maintainer-clean-am: distclean-am maintainer-clean-generic mostlyclean: mostlyclean-am mostlyclean-am: mostlyclean-generic mostlyclean-libtool pdf: pdf-am pdf-am: ps: ps-am ps-am: uninstall-am: uninstall-htmlDATA uninstall-info-am .PHONY: all all-am check check-am clean clean-generic clean-libtool \ distclean distclean-generic distclean-libtool distdir dvi \ dvi-am html html-am info info-am install install-am \ install-data install-data-am install-exec install-exec-am \ install-htmlDATA install-info install-info-am install-man \ install-strip installcheck installcheck-am installdirs \ maintainer-clean maintainer-clean-generic mostlyclean \ mostlyclean-generic mostlyclean-libtool pdf pdf-am ps ps-am \ uninstall uninstall-am uninstall-htmlDATA uninstall-info-am # build the docs from website src @BUILDDOCS_TRUE@$(html_files): @BUILDDOCS_TRUE@ $(SWISH_WEB) -swishsrc $(top_srcdir) -poddest . -v -all # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: swish-e-2.4.7/html/swish.cgi.html0000644000077100017500000016446411166010466013576 00000000000000 Swish-e :: swish.cgi -- Example Perl script for searching with the SWISH-E search engine.
home | support | download

swish.cgi -- Example Perl script for searching with the SWISH-E search engine.

Swish-e version 2.4.7

Table of Contents


DESCRIPTION

swish.cgi is a CGI script for searching with the SWISH-E search engine version 2.1-dev and above. It returns results a page at a time, with matching words from the source document highlighted, showing a few words of content on either side of the highlighted word.

The script is highly configurable. Features include searching multiple (or selectable) indexes, limiting searches to a subset of documents, sorting by a number of different properties, and limiting results to a date range.

On unix type systems the swish.cgi script is installed in the directory $prefix/lib/swish-e, which is typically /usr/local/lib/swish-e. This can be overridden by the configure options --prefix or --libexecdir.

The standard configuration (i.e. not using a config file) should work with most swish index files. Customization of the parameters will be needed if you are indexing special meta data and want to search and/or display the meta data. The configuration can be modified by editing this script directly, or by using a configuration file (.swishcgi.conf by default). The script's configuration file is described below.

You are strongly encouraged to get the default configuration working before making changes. Most problems using this script are the result of configuration modifications.

The script is modular in design. Both the highlighting code and output generation is handled by modules, which are included in the example/modules distribution directory and installed in the $libexecdir/perl directory. This allows for easy customization of the output without changing the main CGI script.

Included with the Swish-e distribution is a module to generate standard HTML output. There's also modules and template examples to use with the popular Perl templating systems HTML::Template and Template-Toolkit. This is very useful if your site already uses one of these templating systems The HTML::Template and Template-Toolkit packages are not distributed with Swish-e. They are available from the CPAN (http://search.cpan.org).

This scipt can also run basically unmodified as a mod_perl handler, providing much better performance than running as a CGI script. Usage under mod_perl is described below.

Please read the rest of the documentation. There's a DEBUGGING section, and a FAQ section.

This script should work on Windows, but security may be an issue.

REQUIREMENTS

A reasonably current version of Perl. 5.00503 or above is recommended (anything older will not be supported).

The Date::Calc module is required to use the date range feature of the script. The Date::Calc module is also available from CPAN.

INSTALLATION

Here's an example installation session under Linux. It should be similar for other operating systems.

For the sake of simplicity in this installation example all files are placed in web server space, including files such as swish-e index and configuration files that would normally not be made available via the web server. Access to these files should be limited once the script is running. Either move the files to other locations (and adjust the script's configuration) or use features of the web server to limit access (such as with .htaccess).

Please get a simple installation working before modifying the configuration file. Most problems reported for using this script have been due to improper configuration.

The script's default settings are setup for initial testing. By default the settings expect to find most files and the swish-e binary in the same directory as the script.

For security reasons, once you have tested the script you will want to change settings to limit access to some of these files by the web server (either by moving them out of web space, or using access control such as .htaccess). An example of using .htaccess on Apache is given below.

It's expected that swish-e has already been unpacked and the swish-e binary has be compiled from source and "make install" has been run. If swish-e was installed from a vendor package (such as from a RPM or Debian package) see that pakage's documentation for where files are installed.

Example Installation:

  1. 1 Symlink or copy the swish.cgi.

    Symlink (or copy if your platform or webserver does not allow symlinks) the swish.cgi script from the installation directory to a local directory. Typically, this would be the cgi-bin directory or a location where CGI script are located. In this example a new directory is created and the script is symlinked.

        ~$ mkdir swishdir
        ~$ cd swishdir
        ~/swishdir$ ln -s /usr/local/lib/swish-e/swish.cgi

    The installation directory is set at configure time with the --prefix or --libexecdir options, but by default is in /usr/local/lib/swish-e.

  2. 2 Create an index

    Use an editor and create a simple configuration file for indexing your files. In this example the Apache documentation is indexed. Last we run a simple query to test that the index works correctly.

        ~/swishdir$ cat swish.conf
        IndexDir /usr/local/apache/htdocs
        IndexOnly .html .htm
        DefaultContents HTML*
        StoreDescription HTML* <body> 200000
        MetaNames swishdocpath swishtitle
        ReplaceRules remove /usr/local/apache/

    If you do not have the Apache docs installed then pick another directory to index such as /usr/share/doc.

    Create the index.

        ~/swishdir$ swish-e -c swish.conf
        Indexing Data Source: "File-System"
        Indexing "/usr/local/apache/htdocs"
        Removing very common words...
        no words removed.
        Writing main index...
        Sorting words ...
        Sorting 7005 words alphabetically
        Writing header ...
        Writing index entries ...
          Writing word text: Complete
          Writing word hash: Complete
          Writing word data: Complete
        7005 unique words indexed.
        5 properties sorted.
        124 files indexed.  1485844 total bytes.  171704 total words.
        Elapsed time: 00:00:02 CPU time: 00:00:02
        Indexing done!

    Now, verify that the index can be searched:

        ~/swishdir$ swish-e -w install -m 1
        # SWISH format: 2.1-dev-25
        # Search words: install
        # Number of hits: 14
        # Search time: 0.001 seconds
        # Run time: 0.040 seconds
        1000 htdocs/manual/dso.html "Apache 1.3 Dynamic Shared Object (DSO) support" 17341
        .

    Let's see what files we have in our directory now:

        ~/swishdir$ ls -1
        index.swish-e
        index.swish-e.prop
        swish.cgi
        swish.conf
  3. 3 Test the CGI script

    This is a simple step, but often overlooked. You should test from the command line instead of jumping ahead and testing with the web server. See the DEBUGGING section below for more information.

        ~/swishdir$ ./swish.cgi | head
        Content-Type: text/html; charset=ISO-8859-1
    
        <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
        <html>
            <head>
               <title>
                  Search our site
               </title>
            </head>
            <body>

    The above shows that the script can be run directly, and generates a correct HTTP header and HTML.

    If you run the above and see something like this:

        ~/swishdir >./swish.cgi
        bash: ./swish.cgi: No such file or directory

    then you probably need to edit the script to point to the correct location of your perl program. Here's one way to find out where perl is located (again, on unix):

        ~/swishdir$ which perl
        /usr/local/bin/perl
    
        ~/swishdir$ /usr/local/bin/perl -v
        This is perl, v5.6.0 built for i586-linux
        ...

    Good! We are using a reasonably current version of perl.

    Now that we know perl is at /usr/local/bin/perl we can adjust the "shebang" line in the perl script (e.g. the first line of the script):

        ~/swishdir$ pico swish.cgi
        (edit the #! line)
        ~/swishdir$ head -1 swish.cgi
        #!/usr/local/bin/perl -w
  4. 4 Test with the web server

    How you do this is completely dependent on your web server, and you may need to talk to your web server admin to get this working. Often files with the .cgi extension are automatically set up to run as CGI scripts, but not always. In other words, this step is really up to you to figure out!

    This example shows creating a symlink from the web server space to the directory used above. This will only work if the web server is configured to follow symbolic links (the default for Apache).

    This operation requires root access:

        ~/swishdir$ su -c "ln -s $HOME/swishdir /usr/local/apache/htdocs/swishdir"
        Password: *********

    If your account is on an ISP and your web directory is ~/public_html the you might just move the entire directory:

        mv ~/swishdir ~/public_html

    Now, let's make a real HTTP request:

        ~/swishdir$ GET http://localhost/swishdir/swish.cgi | head -3
        #!/usr/local/bin/perl -w
        package SwishSearch;
        use strict;

    Oh, darn. It looks like Apache is not running the script and instead returning it as a static page. Apache needs to be told that swish.cgi is a CGI script.

    .htaccess comes to the rescue:

        ~/swishdir$ cat .htaccess
    
        # Deny everything by default
        Deny From All
    
        # But allow just CGI script
        <files swish.cgi>
            Options ExecCGI
            Allow From All
            SetHandler cgi-script
        </files>

    That "Deny From All" prevents access to all files (such as config and index files), and only access is allowed to the swish.cgi script.

    Let's try the request one more time:

        ~/swishdir >GET http://localhost/swishdir/swish.cgi | head
        <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
        <html>
            <head>
               <title>
                  Search our site
               </title>
            </head>
            <body>
                <h2>
                <a href="http://swish-e.org">

    That looks better! Now use your web browser to test.

    Now, you may note that the links are not valid on the search results page. The swish config file contained the line:

         ReplaceRules remove /usr/local/apache/

    To make those links works (and assuming your web server will follow symbolic links):

        ~/swishtest$ ln -s /usr/local/apache/htdocs

    BTW - "GET" used above is a program included with Perl's LWP library. If you do no have this you might try something like:

        wget -O - http://localhost/swishdir/swish.cgi | head

    and if nothing else, you can always telnet to the web server and make a basic request.

        ~/swishtest$ telnet localhost 80
        Trying 127.0.0.1...
        Connected to localhost.
        Escape character is '^]'.
        GET /swishtest/swish.cgi http/1.0
    
        HTTP/1.1 200 OK
        Date: Wed, 13 Feb 2002 20:14:31 GMT
        Server: Apache/1.3.20 (Unix) mod_perl/1.25_01
        Connection: close
        Content-Type: text/html; charset=ISO-8859-1
    
        <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
        <html>
            <head>
               <title>
                  Search our site
               </title>
            </head>
            <body>

    This may seem like a lot of work compared to using a browser, but browsers are a poor tool for basic CGI debugging.

If you have problems check the DEBUGGING section below.

CONFIGURATION

If you want to change the location of the swish-e binary or the index file, use multiple indexes, add additional metanames and properties, change the default highlighting behavior, etc., you will need to adjust the script's configuration settings.

Again, please get a test setup working with the default parameters before making changes to any configuration settings. Better to debug one thing at a time...

In general, you will need to adjust the script's settings to match the index file you are searching. For example, if you are indexing a hypermail list archive you may want to make the script use metanames/properties of Subject, Author, and, Email address. Or you may wish to provide a way to limit searches to subsets of documents (e.g. parts of your directory tree).

To make things somewhat "simple", the configuration parameters are included near the top of the swish.cgi program. That is the only place that the individual parameters are defined and explained, so you will need to open up the swish.cgi script in an editor to view the options. Further questions about individual settings should be referred to the swish-e discussion list.

The parameters are all part of a perl hash structure, and the comments at the top of the program should get you going. The perl hash structure may seem a bit confusing, but it makes it easy to create nested and complex parameters. Syntax is important, so cut-n-paste should be your best defense if you are not a perl programmer.

By the way, Perl has a number of quote operators. For example, to quote a string you might write:

    title => 'Search My Site',

Some options take more than one parameter, where each parameter must be quoted. For example:

    metanames => [ 'swishdefault', 'swishtitle',  'swishdocpath' ],

which assigns an array ( [...] ) of three strings to the "metanames" variable. Lists of quoted strings are so common in perl that there's a special operator called "qw" (quote word) to save typing all those quotes:

    metanames => [ qw/ swishdefault swishtitle swishdocpath / ],

or to use the parenthesis as the quote character (you can pick any):

    metanames => [ qw( swishdefault swishtitle swishdocpath ) ],

There are two options for changing the configuration settings from their default values: One way is to edit the script directly, or the other was is to use a separate configuration file. In either case, the configuration settings are a basic perl hash reference.

Using a configuration file is described below, but contains the same hash structure.

There are many configuration settings, and some of them are commented out either by using a "#" symbol, or by simply renaming the configuration directive (e.g. by adding an "x" to the parameter name).

A very basic configuration setup might look like:

    return {
        title           => 'Search the Swish-e list',   # Title of your choice.
        swish_binary    => 'swish-e',                   # Location of swish-e binary
        swish_index     => 'index.swish-e',             # Location of your index file
    };

Or if searching more than one index:

    return {
        title           => 'Search the Swish-e list',
        swish_binary    => 'swish-e',
        swish_index     => ['index.swish-e', 'index2'],
    };

Both of these examples return a reference to a perl hash ( return {...} ). In the second example, the multiple index files are set as an array reference.

Note that in the example above the swish-e binary file is relative to the current directory. If running under mod_perl you will need to use absolute paths.

The script can also use the SWISH::API perl module (included with the swish-e distribution in the perl directory) to access the swish-e index. The use_library option is used to enable the use of the SWISH::API module:

    return {
        title           => 'Search the Swish-e list',
        swish_index     => ['index.swish-e', 'index2'],
        use_library     => 1, # enable use of the SWISH::API module
    };

The module must be available via the @INC array, like all Perl modules.

Using the SWISH::API module avoids the need to fork and execute a the swish-e program. Under mod_perl you will may see a significant performance improvement when using the SWISH::API module. Under normal CGI usage you will probably not see any speed improvements.

Using A Configuration File

As mentioned above, configuration settings can be either set in the swish.cgi script, or set in a separate configuration file. Settings in a configuration file will override the settings in the script.

By default, the swish.cgi script will attempt to read settings from the file .swishcgi.conf. For example, you might only wish to change the title used in the script. Simply create a file called .swishcgi.conf in the same directory as the CGI script:

    > cat .swishcgi.conf
    # Example swish.cgi configuration script.
    return {
       title => 'Search Our Mailing List Archive',
    };

The settings you use will depend on the index you create with swish:

   return {
        title           => 'Search the Apache documentation',
        swish_binary    => 'swish-e',
        swish_index     => 'index.swish-e',
        metanames       => [qw/swishdefault swishdocpath swishtitle/],
        display_props   => [qw/swishtitle swishlastmodified swishdocsize swishdocpath/],
        title_property  => 'swishdocpath',
        prepend_path    => 'http://myhost/apachedocs',

        name_labels => {
            swishdefault        => 'Search All',
            swishtitle          => 'Title',
            swishrank           => 'Rank',
            swishlastmodified   => 'Last Modified Date',
            swishdocpath        => 'Document Path',
            swishdocsize        => 'Document Size',
        },

    };

The above configuration defines metanames to use on the form. Searches can be limited to these metanames.

"display_props" tells the script to display the property "swishlastmodified" (the last modified date of the file), the document size, and path with the search results.

The parameter "name_labels" is a hash (reference) that is used to give friendly names to the metanames.

Here's another example. Say you want to search either (or both) the Apache 1.3 documentation and the Apache 2.0 documentation indexed seperately.

    return {
       title       => 'Search the Apache Documentation',
       date_ranges => 0,
       swish_index => [ qw/ index.apache index.apache2 / ],
       select_indexes  => {
            method  => 'checkbox_group',
            labels  => [ '1.3.23 docs', '2.0 docs' ],  # Must match up one-to-one to swish_index
            description => 'Select: ',
        },

    };

Now you can select either or both sets of documentation while searching.

All the possible settings are included in the default configuration located near the top of the swish.cgi script. Open the swish.cgi script with an editor to look at the various settings. Contact the Swish-e Discussion list for help in configuring the script.

DEBUGGING

Most problems with using this script have been a result of improper configuration. Please get the script working with default settings before adjusting the configuration settings.

The key to debugging CGI scripts is to run them from the command line, not with a browser.

First, make sure the program compiles correctly:

    $ perl -c swish.cgi
    swish.cgi syntax OK

Next, simply try running the program:

    $ ./swish.cgi | head
    Content-Type: text/html; charset=ISO-8859-1

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    <html>
        <head>
           <title>
              Search our site
           </title>
        </head>
        <body>

Under Windows you will need to run the script as:

   C:\wwwroot\swishtest> perl swish.cgi

Now, you know that the program compiles and will run from the command line. Next, try accessing the script from a web browser.

If you see the contents of the CGI script instead of its output then your web server is not configured to run the script. With Apache look at settings like ScriptAlias, SetHandler, and Options.

If an error is reported (such as Internal Server Error or Forbidden) you need to locate your web server's error_log file and carefully read what the problem is. Contact your web administrator for help locating the web server's error log.

If you don't have access to the web server's error_log file, you can modify the script to report errors to the browser screen. Open the script and search for "CGI::Carp". (Author's suggestion is to debug from the command line -- adding the browser and web server into the equation only complicates debugging.)

The script does offer some basic debugging options that allow debugging from the command line. The debugging options are enabled by setting an environment variable "SWISH_DEBUG". How that is set depends on your operating system and the shell you are using. These examples are using the "bash" shell syntax.

Note: You can also use the "debug_options" configuration setting, but the recommended method is to set the environment variable.

You can list the available debugging options like this:

    $ SWISH_DEBUG=help ./swish.cgi >outfile
    Unknown debug option 'help'.  Must be one of:
           basic: Basic debugging
         command: Show command used to run swish
         headers: Show headers returned from swish
          output: Show output from swish
         summary: Show summary of results
            dump: Show all data available to templates

Debugging options may be combined:

    $ SWISH_DEBUG=command,headers,summary ./swish.cgi >outfile

You will be asked for an input query and the max number of results to return. You can use the defaults in most cases. It's a good idea to redirect output to a file. Any error messages are sent to stderr, so those will still be displayed (unless you redirect stderr, too).

Here are some examples:

    ~/swishtest$ SWISH_DEBUG=basic ./swish.cgi >outfile
    Debug level set to: 1
    Enter a query [all]:
    Using 'not asdfghjklzxcv' to match all records
    Enter max results to display [1]:

    ------ Can't use DateRanges feature ------------

    Script will run, but you can't use the date range feature
    Can't locate Date/Calc.pm in @INC (@INC contains: modules /usr/local/lib/perl5/5.6.0/i586-linux /usr/local/lib/perl5/5.6.0 /usr/local/lib/perl5/site_perl/5.6.0/i586-linux /usr/local/lib/perl5/site_perl/5.6.0 /usr/local/lib/perl5/site_perl/5.005/i586-linux /usr/local/lib/perl5/site_perl/5.005 /usr/local/lib/perl5/site_perl .) at modules/DateRanges.pm line 107, <STDIN> line 2.
    BEGIN failed--compilation aborted at modules/DateRanges.pm line 107, <STDIN> line 2.
    Compilation failed in require at ./swish.cgi line 971, <STDIN> line 2.

    --------------
    Can't exec "./swish-e": No such file or directory at ./swish.cgi line 1245, <STDIN> line 2.
    Child process Failed to exec './swish-e' Error: No such file or directory at ./swish.cgi line 1246, <STDIN> line 2.
    Failed to find any results

The above indicates two problems. First problem is that the Date::Calc module is not installed. The Date::Calc module is needed to use the date limiting feature of the script.

The second problem is a bit more serious. It's saying that the script can't find the swish-e binary file. In this example it's specified as being in the current directory. Either correct the path to the swish-e binary, or make a local copy or symlink to the swish-e binary.

    ~/swishtest$ cat .swishcgi.conf
        return {
           title       => 'Search the Apache Documentation',
           swish_binary => '/usr/local/bin/swish-e',
           date_ranges => 0,
        };

Now, let's try again:

    ~/swishtest$ SWISH_DEBUG=basic ./swish.cgi >outfile
    Debug level set to: 1

    ---------- Read config parameters from '.swishcgi.conf' ------
    $VAR1 = {
              'date_ranges' => 0,
              'title' => 'Search the Apache Documentation'
            };
    -------------------------
    Enter a query [all]:
    Using 'not asdfghjklzxcv' to match all records
    Enter max results to display [1]:
    Found 1 results

    Can't locate SWISH::TemplateDefault.pm in @INC (@INC contains: modules /usr/local/lib/perl5/5.6.0/i586-linux /usr/local/lib/perl5/5.6.0 /usr/local/lib/perl5/site_perl/5.6.0/i586-linux /usr/local/lib/perl5/site_perl/5.6.0 /usr/local/lib/perl5/site_perl/5.005/i586-linux /usr/local/lib/perl5/site_perl/5.005 /usr/local/lib/perl5/site_perl .) at ./swish.cgi line 608.

This means that the swish.cgi script could not locate a required module. To correct this locate where the SWISH::Template module is installed and add a "use lib" line to your configuration file (or to the swish.cgi script):

    ~/swishtest$ cat .swishcgi.conf
    use lib '/home/bill/local/lib/perl';

    return {
       title       => 'Search the Apache Documentation',
       date_ranges => 0,
    };

    ~/swishtest$ SWISH_DEBUG=basic ./swish.cgi >outfile
    Debug level set to: 1

    ---------- Read config parameters from '.swishcgi.conf' ------
    $VAR1 = {
              'date_ranges' => 0,
              'title' => 'Search the Apache Documentation'
            };
    -------------------------
    Enter a query [all]:
    Using 'not asdfghjklzxcv' to match all records
    Enter max results to display [1]:
    Found 1 results

That is much better!

The "use lib" statement tells Perl where to look for modules by adding the path supplied to an array called @INC.

Note that most modules are in the SWISH namespace. For example, the default output module is called SWISH::TemplateDefault. When Perl is looking for that module it is looking for the file SWISH/TemplateDefault.pm. If the "use lib" statement is set as:

    use lib '/home/bill/local/lib/perl';

then Perl will look (among other places) for the file

    /home/bill/local/lib/perl/SWISH/TemplateDefault.pm

when attempting to load the SWISH::TemplateDefault module. Relative paths may also be used.

    use lib 'modules';

will cause Perl to look for the file:

    ./modules/SWISH/TemplateDefault.pm

relative to where the swish.cgi script is running. (This is not true when running under mod_perl).

Here's another common problem. Everything checks out, but when you run the script you see the message:

    Swish returned unknown output

Ok, let's find out what output it is returning:

    ~/swishtest$ SWISH_DEBUG=headers,output ./swish.cgi >outfile
    Debug level set to: 13

    ---------- Read config parameters from '.swishcgi.conf' ------
    $VAR1 = {
              'swish_binary' => '/usr/local/bin/swish-e',
              'date_ranges' => 0,
              'title' => 'Search the Apache Documentation'
            };
    -------------------------
    Enter a query [all]:
    Using 'not asdfghjklzxcv' to match all records
    Enter max results to display [1]:
      usage: swish [-i dir file ... ] [-S system] [-c file] [-f file] [-l] [-v (num)]
      ...
    version: 2.0
       docs: http://sunsite.berkeley.edu/SWISH-E/

    *** 9872 Failed to run swish: 'Swish returned unknown output' ***
    Failed to find any results

Oh, looks like /usr/local/bin/swish-e is version 2.0 of swish. We need 2.1-dev and above!

Frequently Asked Questions

Here's some common questions and answers.

How do I change the way the output looks?

The script uses a module to generate output. By default it uses the SWISH::TemplateDefault.pm module. The module used is selected in the swish.cgi configuration file. Modules are located in the example/modules/SWISH directory in the distribution, but are installed in the $prefix/lib/swish-e/perl/SWISH/ directory.

To make simple changes you can edit the installed SWISH::TemplatDefault module directly, otherwise make a copy of the module and modify its package name. For example, change directories to the location of the installed module and copy the module to a new name:

    $ cp TemplateDefault.pm MyTemplateDefault.pm

Then at the top of the module adjust the "package" line to:

    package SWISH::MyTemplateDefault;

To use this modules you need to adjust the configuration settings (either at the top of swish.cgi or in a configuration file:

        template => {
            package     => 'SWISH::MyTemplateDefault',
        },

The module does not need to be in the SWISH namespace, and can be stored in any location as long as the module can be found via the @INC array (i.e. modify the "use lib" statement in swish.cgi if needed).

How do I use a templating system with swish.cgi?

In addition to the TemplateDefault.pm module, the swish-e distribution includes two other Perl modules for generating output using the templating systems HTML::Template and Template-Toolkit.

Templating systems use template files to generate the HTML, and make maintaining the look of a large (or small) site much easier. HTML::Template and Template-Toolkit are separate packages and can be downloaded from the CPAN. See http://search.cpan.org.

Two basic templates are provided as examples for generating output using these templating systems. The example templates are located in the example directory. The module SWISH::TemplateHTMLTemplate uses the file swish.tmpl to generate its output, while the module SWISH::TemplateToolkit uses the swish.tt file. (Note: swish.tt was renamed from search.tt Jun 03, 2004.)

To use either of these modules you will need to adjust the "template" configuration setting. Examples for both templating systems are provided in the configuration settings near the top of the swish.cgi program.

Use of these modules is an advanced usage of swish.cgi and are provided as examples only.

All of the output generation modules are passed a hash with the results from the search, plus other data use to create the output page. You can see this hash by using the debugging option "dump" or by using the included SWISH::TemplateDumper module:

    ~/swishtest >cat .swishcgi.conf
        return {
           title       => 'Search the Apache Documentation',
           template => {
                package     => 'SWISH::TemplateDumper',
            },
        };

And run a query. For example:

    http://localhost/swishtest/swish.cgi?query=install

Why are there three different highlighting modules?

Three are three highlighting modules included with the swish-e distribution. Each is a trade-off of speed vs. accuracy:

    SWISH::DefaultHighlight - reasonably fast, but does not highlight phrases
    SWISH::PhraseHighlight  - reasonably slow, but is reasonably accurate
    SWISH::SimpleHighlight  - fast, some phrases, but least accurate

Eh, the default is actually "PhraseHighlight". Oh well.

All of the highlighting modules slow down the script. Optimizations to these modules are welcome!

My ISP doesn't provide access to the web server logs

There are a number of options. One way it to use the CGI::Carp module. Search in the swish.cgi script for:

    use Carp;
    # Or use this instead -- PLEASE see perldoc CGI::Carp for details
    # use CGI::Carp qw(fatalsToBrowser warningsToBrowser);

And change it to look like:

    #use Carp;
    # Or use this instead -- PLEASE see perldoc CGI::Carp for details
    use CGI::Carp qw(fatalsToBrowser warningsToBrowser);

This should be only for debugging purposes, as if used in production you may end up sending quite ugly and confusing messages to your browsers.

Why does the output show (NULL)?

Swish-e displays (NULL) when attempting to display a property that does not exist in the index.

The most common reason for this message is that you did not use StoreDescription in your config file while indexing.

    StoreDescription HTML* <body> 200000

That tells swish to store the first 200,000 characters of text extracted from the body of each document parsed by the HTML parser. The text is stored as property "swishdescription".

The index must be recreated after changing the swish-e configuration.

Running:

    ~/swishtest > ./swish-e -T index_metanames

will display the properties defined in your index file.

This can happen with other properties, too. For example, this will happen when you are asking for a property to display that is not defined in swish.

    ~/swishtest > ./swish-e -w install -m 1 -p foo
    # SWISH format: 2.1-dev-25
    # Search words: install
    err: Unknown Display property name "foo"
    .

    ~/swishtest > ./swish-e -w install -m 1 -x 'Property foo=<foo>\n'
    # SWISH format: 2.1-dev-25
    # Search words: install
    # Number of hits: 14
    # Search time: 0.000 seconds
    # Run time: 0.038 seconds
    Property foo=(NULL)
    .

To check that a property exists in your index you can run:

    ~/swishtest > ./swish-e -w not dkdk -T index_metanames | grep foo
            foo : id=10 type=70  META_PROP:STRING(case:ignore) *presorted*

Ok, in this case we see that "foo" is really defined as a property. Now let's make sure swish.cgi is asking for "foo" (sorry for the long lines):

    ~/swishtest > SWISH_DEBUG=command ./swish.cgi > /dev/null
    Debug level set to: 3
    Enter a query [all]:
    Using 'not asdfghjklzxcv' to match all records
    Enter max results to display [1]:
    ---- Running swish with the following command and parameters ----
    ./swish-e  \
    -w  \
    'swishdefault=(not asdfghjklzxcv)'  \
    -b  \
    1  \
    -m  \
    1  \
    -f  \
    index.swish-e  \
    -s  \
    swishrank  \
    desc  \
    swishlastmodified  \
    desc  \
    -x  \
    '<swishreccount>\t<swishtitle>\t<swishdescription>\t<swishlastmodified>\t<swishdocsize>\t<swishdocpath>\t<fos>\t<swishrank>\t<swishdocpath>\n'  \
    -H  \
    9

If you look carefully you will see that the -x parameter has "fos" instead of "foo", so there's our problem.

How do I use the SWISH::API perl module with swish.cgi?

Use the use_library configuration directive:

    use_library => 1,

This will only provide improved performance when running under mod_perl or other persistent environments.

Why does the "Run time" differ when using the SWISH::API module

When using the SWISH::API module the run (and search) times are calculated within the script. When using the swish-e binary the swish-e program reports the times. The "Run time" may include the time required to load and compile the SWISH::API module.

MOD_PERL

This script can be run under mod_perl (see http://perl.apache.org). This will improve the response time of the script compared to running under CGI by loading the swish.cgi script into the Apache web server.

You must have a mod_perl enabled Apache server to run this script under mod_perl.

Configuration is simple. In your httpd.conf or your startup.pl file you need to load the script. For example, in httpd.conf you can use a perl section:

    <perl>
        use lib '/usr/local/apache/cgi-bin';  # location of the swish.cgi file
        use lib '/home/yourname/swish-e/example/modules';  # modules required by swish.cgi
        require "swish.cgi";
    </perl>

Again, note that the paths used will depend on where you installed the script and the modules. When running under mod_perl the swish.cgi script becomes a perl module, and therefore the script does not need to be installed in the cgi-bin directory. (But, you can actually use the same script as both a CGI script and a mod_perl module at the same time, read from the same location.)

The above loads the script into mod_perl. Then to configure the script to run add this to your httpd.conf configuration file:

    <location /search>
        PerlSetVar Swish_Conf_File /home/yourname/swish-e/myconfig.pl
        allow from all
        SetHandler perl-script
        PerlHandler SwishSearch
    </location>

Note that you use the "Swish_Conf_File" setting in httpd.conf to tell the script which config file to use. This means you can use the same script (and loaded modules) for different search sites (running on the same Apache server). You can just specify differnt config files for each Location and they can search different indexes and have a completely different look for each site, but all share the same code.

Note that the config files are cached in the swish.cgi script. Changes to the config file will require restarting the Apache server before they will be reloaded into the swish.cgi script. This avoids calling stat() for every request.

Unlike CGI, mod_perl does not change the current directory to the location of the script, so your settings for the swish binary and the path to your index files must be absolute paths (or relative to the server root).

Using the SWISH::API module with mod_perl will provide the most performance improvements. Use of the SWISH::API module can be enabled by the configuration setting use_library:

    use_library     => 1,

Without highlighting code enabled, using the SWISH::API module resulted in about 20 requests per second, where running the swish-e binary slowed the script down to about 8 requests per second.

Note that the highlighting code is slow. For the best search performance turn off highlighting. In your config file you can add:

    highlighting    => 0,  # disable highlighting

and the script will show the first 500 chars of the description (or whatever you set for "max_chars"). Without highlight one test was processing about 20 request per second. With The "PhraseHighlight" module that dropped to a little better than two requests per second, "DefaultHighlight" was about 2.3 request per second, and "SimpleHighlight" was about 6 request per second.

Experiement with different highlighting options when testing performance.

Please post to the swish-e discussion list if you have any questions about running this script under mod_perl.

Here's some general request/second on an Athlon XP 1800+ with 1/2GB RAM, Linux 2.4.20.

                              Highlighting Mode

                      None     Phrase    Default     Simple
   Using SWISH::API   45        1.5        2          12
   ----------------------------------------------------------------------------
   Using swish-e      12        1.3       1.8         7.5
     binary

As you can see the highlighting code is a limiting factor.

SpeedyCGI

SpeedyCGI (also called PersistentPerl) is another way to run Perl scripts persistently. SpeedyCGI is good if you do not have mod_perl available or do not have root access. SpeedyCGI works on Unix systems by loading the script into a "back end" process and keeping it in memory between requests. New requests are passed to the back end processes which avoids the startup time required by a Perl CGI script.

Install SpeedyCGI from http://daemoninc.com/ (your OS may provide a packaged version of SpeedyCGI) and then change the first line of swish.cgi. For example, if the speedy binary is installed in /usr/bin/speedy, use the line:

    #! /usr/bin/speedy -w -- -t60

The -w option is passed to Perl, and all options following the double-dash are SpeedyCGI options.

Note that when using SpeedyCGI configuration data is cached in memory. If you change the swish.cgi configuration file (.swishcgi.conf) then touch the main swish.cgi script to force reloading of configuration data.

Spidering

There are two ways to spider with swish-e. One uses the "http" input method that uses code that's part of swish. The other way is to use the new "prog" method along with a perl helper program called spider.pl.

Here's an example of a configuration file for spidering with the "http" input method. You can see that the configuration is not much different than the file system input method. (But, don't use the http input method -- use the -S prog method shown below.)

    # Define what to index
    IndexDir http://www.myserver.name/index.html
    IndexOnly .html .htm

    IndexContents HTML* .html .htm
    DefaultContents HTML*
    StoreDescription HTML* <body> 200000
    MetaNames swishdocpath swishtitle

    # Define http method specific settings -- see swish-e documentation
    SpiderDirectory ../swish-e/src/
    Delay 0

You index with the command:

    swish-e -S http -c spider.conf

Note that this does take longer. For example, spidering the Apache documentation on a local web server with this method took over a minute, where indexing with the file system took less than two seconds. Using the "prog" method can speed this up.

Here's an example configuration file for using the "prog" input method:

    # Define the location of the spider helper program
    IndexDir ../swish-e/prog-bin/spider.pl

    # Tell the spider what to index.
    SwishProgParameters default http://www.myserver.name/index.html

    IndexContents HTML* .html .htm
    DefaultContents HTML*
    StoreDescription HTML* <body> 200000
    MetaNames swishdocpath swishtitle

Then to index you use the command:

    swish-e -c prog.conf -S prog -v 0

Spidering with this method took nine seconds.

Stemmed Indexes

Many people enable a feature of swish called word stemming to provide "fuzzy" search options to their users. The stemming code does not actually find the "stem" of word, rather removes and/or replaces common endings on words. Stemming is far from perfect, and many words do not stem as you might expect. Plus, currently only English is supported. But, it can be a helpful tool for searching your site. You may wish to create both a stemmed and non-stemmed index, and provide a checkbox for selecting the index file.

To enable a stemmed index you simply add to your configuration file:

    UseStemming yes

If you want to use a stemmed index with this program and continue to highlight search terms you will need to install a perl module that will stem words. This section explains how to do this.

The perl module is included with the swish-e distribution. It can be found in the examples directory (where you found this file) and called something like:

    SWISH-Stemmer-0.05.tar.gz

The module should also be available on CPAN (http://search.cpan.org/).

Here's an example session for installing the module. (There will be quite a bit of output when running make.)

    % gzip -dc SWISH-Stemmer-0.05.tar.gz |tar xof -
    % cd SWISH-Stemmer-0.05
    % perl Makefile.PL
    or
    % perl Makefile.PL PREFIX=$HOME/perl_lib
    % make
    % make test

    (perhaps su root at this point if you did not use a PREFIX)
    % make install
    % cd ..

Use the PREFIX if you do not have root access or you want to install the modules in a local library. If you do use a PREFIX setting, add a use lib statement to the top of this swish.cgi program.

For example:

    use lib qw(
        /home/bmoseley/perl_lib/lib/site_perl/5.6.0
        /home/bmoseley/perl_lib/lib/site_perl/5.6.0/i386-linux/
    );

Once the stemmer module is installed, and you are using a stemmed index, the swish.cgi script will automatically detect this and use the stemmer module.

DISCLAIMER

Please use this CGI script at your own risk.

This script has been tested and used without problem, but you should still be aware that any code running on your server represents a risk. If you have any concerns please carefully review the code.

See http://www.w3.org/Security/Faq/www-security-faq.html

Security on Windows questionable.

SUPPORT

The SWISH-E discussion list is the place to ask for any help regarding SWISH-E or this example script. See http://swish-e.org.

Before posting please review:

    http://swish-e.org/2.2/docs/INSTALL.html#When_posting_please_provide_the_

Please do not contact the author or any of the swish-e developers directly.

LICENSE

swish.cgi $Revision: 1830 $ Copyright (C) 2001 Bill Moseley search@hank.org Example CGI program for searching with SWISH-E

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

AUTHOR

Bill Moseley

swish-e-2.4.7/html/swish-3.0.html0000644000077100017500000003446211166010465013324 00000000000000 Swish-e :: Proposed changes for Swish-e 3.0
home | support | download

Proposed changes for Swish-e 3.0

Swish-e version 2.4.7

Table of Contents


OVERVIEW

This pages is intended to give users of Swish-e an idea of the changes to come, to foster discussion of the direction of Swish-e, and a place where developers can map out new ideas.

None of this is written in stone. Any of the developers can write their ideas in this document, but that doesn't mean it will actually happen ;).

UTF-8 support

Supporting Unicode basically requires a full re-write of all the code.

drop expat-based parsers, require libxml2

This might simplify the code somewhat as well.

Support Incremental Indexing

The Swish-e index structure currently makes it difficult to do incremental indexing, range limiting, and presents limits to indexing due to memory requirements. A database may solve some of these issues, at possibly a cost of performance.

Swish-e has been linked with Berkeley DB. Although much slower in indexing, this may allow incremental indexing. Currently, the idea is to offer both database backends.

UPDATE: Mon Nov 8 15:07:59 CST 2004 (karman@cray.com)

This feature is in the 2.5 branch already. What kind of requirements do we have to label it 'stable'?

Split code into Search and Indexing code

There may be a small benefit from creating a smaller search-only program. CGI scripts may be faster, and the code would be smaller for those that want to embed Swish-e in to other applications.

Currently, linking libswish-e into a program adds about 720K. Not real significant, but it could be if a number of processes are running with Swish-e. Another option is to build libswish-e as a shared library.

UPDATE: Mon Nov 8 15:09:12 CST 2004 (karman@cray.com)

This seems done in the 2.4 release. Is that true?

Switch to Content-Types

Moseley: Dec 28, 2000

I'm wondering if it might be smart to switch from the current "Document Types" to Content-Types. Currently, Swish-e know how to parse three types of documents TXT, HTML, and XML. There's currently two new configuration directives DefaultContents and IndexContents that map file extensions to one of the three types. This doesn't really work when spidering since it's the content-type that describes the document and not the file extension.

It's an issue that can wait, but I'm concerned about backward compatiblity before people start using the IndexContents and DefaultContents config directives and then we change to content-type in the future. There's probably not that many people using those, but it might be work noting in the documentation that it will change, if we agree.

The main reason to use content-type instead is for http processing where you can't depend on the file extension to determine the document type, so with http we have to use content-type to determine how to deal with the file. This is somewhat moot, as mapping can now be done with -S prog.

I'd propose that Swish-e uses a mime.types file to map from extension to content-type. You could add or override mappings in the config file:

   AddType text/plain .doc .log

   DefaultType text/html  # like DefaultContents currently

The file source "plug-in" (whatever that ends up being) would return a content-type, but if not returned then Swish-e would map the type from the file name using the mime.types file or any AddType directives.

Again, internally Swish-e only knows about text/[TXT|HTML|XML], so there should be a way to map other types, otherwise Swish-e might ignore the file. We could continue to use the three type names or switch completely to content-types.

For example, if we continued to use [TXT|HTML|XML]

    MapType TXT  text/directory text/logfile
    MapType HTML text/html

Or maybe just extend the current directives

    IndexContents HTML .htm .html text/html

Where the content-type would have precedence over the file extensions.

This would tell Swish-e that those types are handled by those internal handlers.

Then as I've mentioned before, you might specify filters as such

   FilterDocument application/msword /path/to/word-to-text

And word-to-text would convert to text and return one of the three content-types that Swish-e knows how to parse, or a different content type if were to chain filters.

Enhanced the PropertyNames directive

Moseley: Updated Jan 13, 2001

If the PropertyNames directive was enhanced to be able to limit the number of characters stored, optionally extract text from HTML, and was able to define what type of docs (text, XML, HTML) it applied to, then the existing PropertyNames feature would work like the new StoreDescription feature but be useful for more than just one use.

I'm not clear how to enhance the syntax of Properties and/or Metanames, but here's some ideas. Rainer suggested that an xml-type of format might be best and commonly understood. That's a good idea. Below are some older ideas that I had. But you will get the idea...

The metaname structure could have flags for properties:

    1 - limiting to a length
    2 - stripping HTML
    3 - encoding HTML entities on output

Oct 9, 2001 - The code is now in Swish-e to limit a string property to a length. The stripping of HTML is an issue for discussion. And encoding entities on output should be a result_outpu.c issue.

UPDATE: Mon Nov 8 15:13:26 CST 2004 (karman@cray.com)

Is this fully supported in 2.4?

Apache/XML style configuration

This would be to allow some directives to be set per directory, or perl file extenstion (or content-type).

Document Info

$Id: SWISH-3.0.pod 1613 2005-02-02 22:53:39Z whmoseley $

.

swish-e-2.4.7/html/swish-search.html0000644000077100017500000005715011166010464014267 00000000000000 Swish-e :: SWISH-SEARCH - Swish-e Searching Instructions
home | support | download

SWISH-SEARCH - Swish-e Searching Instructions

Swish-e version 2.4.7

Table of Contents


OVERVIEW

This page describes the process of searching with Swish-e. Please see the SWISH-CONFIG page for information the Swish-e configuration file directives, and SWISH-RUN for a complete list of command line arguments.

Searching a Swish-e index involves passing command line arguments to it that specify the index file to use, and the query (or search words) to locate in the index. Swish-e returns a list of file names (or URLs) that contain the matched search words. Perl is often used as a front-end to Swish-e such as in CGI applications, and perl modules exist to for interfacing with Swish-e.

Searching Syntax and Operations

The -w command line argument is used specify the search query to Swish-e.

    swish-e -w airplane

will find all documents that contain the word airplane.

When running Swish-e from a shell prompt, be careful to protect your query from shell metacharacters and shell expansions. This often means placing single or double quotes around your query. See Searching with Perl if you plan to use Perl as a front end to Swish-e. In the examples below single quotes are used to protect the search from the shell.

The following section describes various aspects of searching with Swish-e.

Boolean Operators

You can use the Boolean operators and, or, near or not in searching. Without these Boolean operators Swish-e will assume you're and'ing the words together. The operators are not case sensitive. These three searches are the same:

    swish-e -w foo bar
    swish-e -w bar foo
    swish-e -w foo AND bar

[Note: you can change the default to oring by changing the variable DEFAULT_RULE in the config.h file and recompiling Swish-e.]

The not operator inverts the results of a search.

   swish-e -w not foo

finds all the documents that do not contain the word foo.

Parentheses can be used to group searches.

   swish-e -w 'not (foo and bar)'

The result is all documents that have none or one term, but not both.

To search for the words and, or, near or not, place them in a double quotes. Remember to protect the quotes from the shell:

    swish-e -w '"not"'
    swish-e -w \"not\"

will search for the word "not".

Other examples:

     swish-e -w smilla or snow

Retrieves files containing either the words "smilla" or "snow".

     swish-e -w smilla snow not sense 
     swish-e -w '(smilla and snow) and not sense'  (same thing)

retrieves first the files that contain both the words "smilla" and "snow"; then among those the ones that do not contain the word "sense".

The near keyword is similar to and but implies a proximity between the words. The near keyword takes a integer argument as well, indicating the maximum distance between two words to consider a valid match.

Example:

 swish-e -w smilla near5 snow

would match the document if the words smilla and snow appeared within 5 positions of one another.

A near search with no argument or argument of 0 is the same as an and search.

Wildcards

Two different wildcard characters are available, each evoking different behaviour.

The * means "match zero or more characters."

The ? means "match exactly one character."

The wildcard * may only be used at the end of a word. Otherwise * is considered a normal character (i.e. can be searched for if included in the WordCharacters directive).

Example:

    swish-e -w librarian

this query only retrieves files which contain the given word.

On the other hand:

    swish-e -w 'librarian*'

retrieves "librarians", "librarianship", etc. along with "librarian".

Note that wildcard searches combined with word stemming can lead to unexpected results. If stemming is enabled, a search term with a wildcard will be stemmed internally before searching. So searching for running* will actually be a search for run*, so running* would find runway. Also, searching for runn* will not find running as you might expect, since running stems to run in the index, and thus runn* will not find run.

The ? wildcard matches exactly one character, but may not be used at the start of a word.

Example:

    swish-e -w 's?ow'

will match snow, slow and show but not strow.

This:

    swish-e -w '?how'

will throw an error.

Order of Evaluation

In general, the order of evaluation is not important. Internally swish-e processes the search terms from left to right. Parenthesis can be used to group searches together, effectively changing the order of evaluation. For example these three are the same:

    swish-e -w foo not bar baz
    swish-e -w not bar foo baz
    swish-e -w baz foo not bar

but these two are not the same:

    swish-e -w foo not bar baz
    swish-e -w foo not (bar baz)

The first finds all documents that contain both foo and baz, but do not contain bar. The second finds all that contain foo, and contain either bar or baz, but not both.

It is often helpful in understanding searches to use the boolean terms and parenthesis. So the above two become:

    swish-e -w foo AND (not bar) AND baz
    swish-e -w foo AND (not (bar AND baz))

These four examples are all the same search (assuming that AND is the default search type):

    swish-e -w 'juliet not ophelia and pac'
    swish-e -w '(juliet) AND (NOT ophelia) AND (pac)'
    swish-e -w 'juliet not ophelia pac'
    swish-e -w 'pac and juliet and not ophelia'

Looking at the the first three searches, first Swish-e finds all the documents with "juliet". Then it finds all documents that do not contain "ophelia". Those two lists are then combined with the boolean AND operator resulting with a list of documents that include "juliet" but not "ophelia". Finally, that list is ANDed with the list of documents that contain "pac" resulting.

However it is always possible to force the order of evaluation by using parenthesis. For example:

    swish-e -w 'juliet not (ophelia and pac)'

retrieves files with "juliet" that do not contain both words "ophelia" and "pac".

Meta Tags

MetaNames are used to represent fields (called columns in a database) and provide a way to search in only parts of a document. See SWISH-CONFIG for a description of MetaNames, and how they are specified in the source document.

To limit a search to words found in a meta tag you prefix the keywords with the name of the meta tag, followed by the equal sign:

    metaname = word
    metaname = (this or that)
    metaname = ( (this or that) or "this phrase" )

It is not necessary to have spaces at either side of the "=", consequently the following are equivalent:

    swish-e -w "metaName=word"
    swish-e -w "metaName = word"
    swish-e -w "metaName= word"

To search on a word that contains a "=", precede the "=" with a "\" (backslash).

    swish-e -w "test\=3 = x\=4 or y\=5"

this query returns the files where the word "x=4" is associated with the metaName "test=3" or that contains the word "y=5" not associated with any metaName.

Queries can be also constructed using any of the usual search features, moreover metaName and plain search can be mixed in a single query.

     swish-e -w "metaName1 = (a1 or a4) not (a3 and a7)"

This query will retrieve all the files in which "a1" or "a2" are found in the META tag "metaName1" and that do not contain the words "a3" and "a7", where "a3" and "a7" are not associated to any meta name.

Phrase Searching

To search for a phrase in a document use double-quotes to delimit your search terms. (The phrase delimiter is set in src/swish.h.)

You must protect the quotes from the shell.

For example, under Unix:

    swish-e -w '"this is a phrase" or (this and that)'
    swish-e -w 'meta1=("this is a phrase") or (this and that)'

Or under Windows:

    swish-e -w \"this is a phrase\" or (this and that)

You can not use boolean search terms inside a phrase. That is:

    swish-e -w 'this and that'

finds documents with both words "this" and "that", but:

    swish-e -w '"this and that"'

finds documents that have the phrase "that and that". A phrase can consist of a single word, so this is how to search for the words used as boolean operators:

   swish-e -w 'this "and" that'

finds documents that contain all three words, but in any order.

You can use the -P switch to set the phrase delimiter character. See SWISH-RUN for examples.

Context

At times you might not want to search for a word in every part of your files since you know that the word(s) are present in a particular tag. The ability to search according to context greatly increases the chances that your hits will be relevant, and Swish-e provides a mechanism to do just that.

The -t option in the search command line allows you to search for words that exist only in specific HTML tags. Each character in the string you specify in the argument to this option represents a different tag in which the word is searched; that is you can use any combinations of the following characters:

    H means all<HEAD> tags
    B stands for <BODY> tags
    t is all <TITLE> tags
    h is <H1> to <H6> (header) tags
    e is emphasized tags (this may be <B>, <I>, <EM>, or <STRONG>)
    c is HTML comment tags (<!-- ... -->)

    # This search will look for files with these two words in their titles only.
    swish-e -w "apples oranges" -t t

    # This search will look for files with these words in comments only.
    swish-e -w "keywords draft release" -t c

    This search will look for words in titles, headers, and emphasized tags.
    swish-e -w "world wide web" -t the

Searching with Perl

Perl ( http://www.perl.com/ ) is probably the most common programming language used with Swish-e, especially in CGI interfaces. Perl makes searching and parsing results with Swish-e easy, but if not done properly can leave your server vulnerable to attacks.

When designing your CGI scripts you should carefully screen user input, and include features such as paged results and a timer to limit time required for a search to complete. These are to protect your web site against a denial of service (DoS) attack.

Included with every distribution of Perl is a document called perlsec -- Perl Security. Please take time to read and understand that document before writing CGI scripts in perl.

Type at your shell/command prompt:

    perldoc perlsec

If nothing else, start every CGI program in perl as such:

    #!/usr/local/bin/perl -wT
    use strict;

That alone won't make your script secure, but may help you find insecure code.

CGI Danger!

There are many examples of CGI scripts on the Internet. Many are poorly written and insecure. A commonly seen way to execute Swish-e from a perl CGI script is with a piped open. For example, it is common to see this type of open():

    open(SWISH, "$swish -w $query -f $index|");

This open() gives shell access to the entire Internet! Often an attempt is made to strip $query of bad characters. But, this often fails since it's hard to guess what every bad character is. Would you have thought about a null? A better approach is to only allow in known safe characters.

Even if you can be sure that any user supplied data is safe, this piped open still passes the command parameters through the shell. If nothing else, it's just an extra unnecessary step to running Swish-e.

Therefore, the recommended approach is to fork and exec swish-e directly without passing through the shell. This process is described in the perl man page perlipc under the appropriate heading Safe Pipe Opens.

Type:

    perldoc perlipc

If all this sounds complicated you may wish to use a Perl module that does all the hard work for you.

Perl Modules

The Swish-e distribution includes a Perl module called SWISH::API. SWISH::API provides access to the Swish-e C Library.

The SWISH::API module is not installed by default.

The SWISH::API module will embed Swish-e into your perl program so that searching does not require running an external program. Embedding the Swish-e program into your perl program results in faster Swish-e searches, especially when running under a persistent environment like mod_perl since it avoids the cost of opening the index file for every request (mod_perl is much also much faster than CGI because it avoids the need to compile Perl code for every request).

See the README file in the perl directory of the Swish-e distribution for installation instructions. Documentation for the SWISH::API module is available at http://swish-e.org and is installed along with other HTML documentation on your computer.

Document Info

$Id: SWISH-SEARCH.pod 1815 2006-08-27 20:22:54Z karman $

.

swish-e-2.4.7/html/changes.html0000644000077100017500000022534611166010462013301 00000000000000 Swish-e :: CHANGES - List of revisions
home | support | download

CHANGES - List of revisions

Swish-e version 2.4.7

Table of Contents


OVERVIEW

This document contains list of bug fixes and feature additions to Swish-e.

Version 2.4.7 - 4 April 2009

  • Added ReturnRawRank for raw rank score

    Setting ReturnRawRank to a true value will return the rank score unscaled. Can be set with the -a command line option (mnemonic: "a"bsolute rank score).

  • Yanked setenv feature introduced in 2.4.6

    The ranking debugging feature using setenv introduced in 2.4.6 was yanked. Some platforms (notably HP-UX and Windows) lack the setenv feature, and the convenience of setting the env var was not worth the limitations.

Version 2.4.6 - 10 March 2008

  • MinWordLength respected in query parser

    Clark Vent reported that the query parser was not respecting MinWordLength settings. See http://dev.swish-e.org/changeset/2145

  • Patch to file.c.

    The file.c patch was in response to http://swish-e.org/archive/2007-03/11321.html although that user never responded about that patch.

  • SWISH_DEBUG_RANK env var now enables rank debugging

    Set SWISH_DEBUG_RANK to a true value to enable lots of rank debugging on stderr.

  • Perl Makefile.PL patched to fix MakeMaker issue

    Recent versions of ExtUtils::MakeMaker revealed a bug in Makefile.PL. Patch from mschwern via RT, report by mpeters.

  • LARGEFILE support detected automatically in configure

    jrobinson852@yahoo.com suggest LARGEFILE support be auto-detected since it is needed so often on Linux systems.

  • New Snowball stemmers

    Trygve Falch contributed patches to update the Snowball stemmers, including new Hungarian and Romanian stemmers.

  • Patched leaks

    Anthony Dovgal patched two leaks. One when there's a failure to open a file the file name was not freed.

    SwishSetSearchLimit() was nulling the search limits when an error was found in the parameters, but not freeing the existing limits.

  • Leak in SwishResetSearchLimit

    Fixed a leak if a limit was set and then reset but not prepared. Patch provided by Antony Dovgal.

  • New API functions added

    Added SwishGetStructure() and SwishGetPhraseDelimiter() functions which return relevant properties of the search object. Patch provided by Antony Dovgal.

Version 2.4.5 - 22 Jan 2007

  • Fixed 'deflate' handling in spider.pl

    spider.pl was using the wrong method do uncompress HTTP responses that were 'deflate' encoded. Also decode content based on the document's charset and encode back to charset before outputting.

  • re-indexing required

    The magic numbers in src/swish.h were changed to require re-indexing from version 2.4.4 indexes. This should have been done in 2.4.4 as well, and anytime the index format changes. -- karman

  • fixed stemmer bug introduced in 2.4.4

    stemmer.c had a mix up in the deprecated stemmer assignments for "Stemmer_en" and "Stem". Also fixed stemmer.h so that 2.4.3 indexes can be read correctly. -- karman

  • Now fork/exec to run filters

    FileFilter* was using popen to run the filter, which could pass user data though the shell. Now uses fork/exec if fork is available which should be everywhere except Windows. In windows popen is used but all parameters are double-quoted. -- moseley

  • fixed signed/unsigned warnings from gcc 4.x

    Cleaned up search.c to catch mismatched signedness warnings from newer GCC versions. This issue pre-existed 2.4.4 but the new wildcard features in search.c made for a lot more warnings. -- karman

  • Makefile.mingw included in distrib

    Modified root Makefile to include the perl/Makefile.mingw file. -- karman

Version 2.4.4 - 11 Oct 2006

  • Version 2.4.4 RC1

    Release Candidate 1 for 2.4.4, 2 Oct 2006.

  • quote fix for FileFilter config param

    Ludovic Drolez contributed a patch to fix a quoting issue with filenames. This affects non-Windows builds only.

  • SWISH::Filter now on CPAN

    SWISH::Filter is now available on http://cpan.org/. The version in the distribution is not kept in sync with the CPAN version. Install the CPAN version if you want the latest and greatest version.

  • SWISH::API updated to 0.04

    Added several fixes, including:

    • Perlish method names from mpeters@plusthree.com
    • switched to XSLoader with DynaLoader as fallback
    • added VERSION method to satisfy some versions of MakeMaker
    • Fuzzify() method now actually works as advertised
  • added proximity feature and single character wildcard with '?' instead of '*'

    Herman Knoops contributed these patches. See http://swish-e.org/archive/2006-05/10543.html

    Error messages were also changed to better reflect correct use of wildcards.

  • fixed bug when using DoubleMetaphone

    Fixed problem reported by Andreas Völter where a query that generated a two-word query with DoubleMetaphone fuzzy mode was not working.

  • fix sparc64 property issue

    Sorithy Seng (pourlassi@gmail.com) submitted a patch against docprop.c to fix an issue on sparc64 platforms. It is unknown whether this bug affected other 64-bit architectures.

  • fixed bug when StopWords resulted in no unique words

    Added check in db_native.c to check that some words exist before writing index.

  • updates to SWISH-RUN.1

    Added doc for -u and -r options.

  • filename only in SWISH::Filters

    added fix to SWISH::Filters::pp2html and SWISH::Filters::XLtoHTML to save only filename as title without full path

  • Removed Stem and Stemmer_en

    The legacy Porter stemmer was removed. This had been deprecated some time ago. A warning will issue if the old stemmer is indicated in config file, and Stemmer_en1 will be used instead.

  • GPL'd all the source files with the new Swish-e License

    After a source code review, the developers decided to put Swish-e under the GPL with a special exception for linking against libswish-e. See http://swish-e.org/license.html for the details.

  • Fixed Segfault with updating incremental index

    Dobrica Pavlinusic reported a segfaut after updating an index multiple times. José provided updated worddata.c. - April 27, 2005

  • Fixed NOT check with incremental indexes

    Swish was returning results for deleted files when the NOT operator was used.

  • Fixed bug when using old parsers with zero length input

    Thomas Angst reported swish consuming memory when using -S prog to process large number of empty documents.

    When -S prog generated a zero length file the old parsers (e.g. TXT) would attempt to read in *all* content from the -S prog program into a buffer. The old parser incorrectly assumed it was reading from a filter and tried to read to eof().

  • Changes to ParserWarnLevel

    The default value for ParserWarnLevel was changed form zero to two.

    The ParserWarnLevel controls the error handling of the libxml2 parser. The higher the setting, the more verbose the output. The change to the default is to report when libxml2 has problems parsing a document (which often times results in processing only part of a document).

    To get the old behavior, either set ParserWarnLevel to zero in your config file, or use the new -W command line option to set the ParserWarnLevel at run time. If ParserWarnLevel is set in the config file, it will override the -W option.

    Also, to see UTF-8 to 8859-1 conversion errors set ParserWarnLevel to 3 or more. Previously, these warning were issues at ParserWarnLevel of one.

  • Documentation changes

    Removed all the target documentation (html, pdf, ps) from cvs. There's now a separate cvs module "swish_website" that is used to generate both the website and the html docs. If building swish-e from cvs please see the README.cvs file for instructions.

  • Fixed bug in pre-sorted indexes with USE_BTREE

    Gunnar Mätzler reported a problem with reading the pre-sorted property index tables when running with USE_BTREE (--enable-enremental). Not all entries were being written to disk. There was/is a question if the "array" code used for pre-sorted indexes with USE_BTREE would be slower. So, added a separate define USE_PRESORT_ARRAY to enable that code when USE_BTREE is set. This allows using the old integer arrays with USE_BTREE. Gunnar reported that this is working, but more testing is needed. Need to compare speed of the array code vs. the non-array code, and to verify the workings of USE_PRESORT_ARRAY code.

  • Add strcoll() usage for sorting properties

    Andreas Seltenreich provided a patch to use strcoll when sorting properties. strcoll is locale dependent.

  • Fix incremental indexing when adding back a file

    Jose fixed a problem with incremental indexing where a file could not be added back to the index once removed.

    Patch initially provided by Dobrica Pavlinusic:

        http://swish-e.org/Discussion/archive/2004-12/8694.html
  • Documentation correction

    A change in the default way the index is compressed was not documented in 2.4.3. The change resulted in larger indexes. See CompressPositions below and in SWISH-CONFIG.

  • libxml2 UTF-8 conversion failures

    Fixed issue where a UTF-8 to Latin1 encoding failure would skip more input than just the failed character. Libxml2 passes swish text that is not null terminated, but the libxml2 functions to skip UTF-8 chars expected a null-terminated string. Replace libxml2 call with fixed version.

Version 2.4.3 December 9, 2004

  • New config directive: CompressPositions

    This option enables zlib compression for word data in the index. Previously word data was always compressed but resulted in slower wildcard searches. The default now is to not compress the word data, but results in larger index files. Set to "YES" to get pre-2.4.3 index sizes.

    [This CHANGES entry was added after 2.4.3 was released]

  • Improved error messsages when using incremental indexing

    There was a bit of confusion on how to use incremental indexing (still experimental) so added better logic for error messages.

    Also fixed a logic error when setting the incremental update mode. Caught by Paul Loner.

Version 2.4.3-pr1 - Wed Dec 1 09:52:50 PST 2004

  • "Fixed" libxml2's change in UTF8Toisolat1() return value

    Bernhard Weisshuhn supplied a patch to parser.c for checking the return value of UTF8Toisolat1(). Seems that libxml2 now returns the number of characters converted instead of zero for success.

       http://bugzilla.gnome.org/show_bug.cgi?id=153937
  • Added swish-config and pkg-config

    Swish now provides a swish-config script and config file for the pkg-config utility. These tools help when building programs that link with the swish-e library.

    The SWISH::API Makefile.PL program uses swish-config to locate the installation directory of swish-e. This should make building SWISH::API easier when swish-e is installed in a non-standard location.

  • Fixed rank bias in merge

    Peter van Dijk noticed that MetaNamesRank settings were not being copied to the output index when merging.

  • Added SwishFuzzy function

    SwishFuzzy function (SWISH::API::Fuzzy) lets you stem a word without first searching. This might be helpful for playing with queries prior to the search.

  • Fixed translate character table

    Michael Levy found an error in the table used to translate 8859-1 to ascii7. Luckily, it was an upper case translation and the table is only used on lower case characters.

  • MetaNamesRank documentation

    Changed the 'not yet implemented' caveat to 'implemented but experimental'.

  • Added Continuation option to config processing

    You can now use continuation lines in the config file:

        IgnoreWords \
            the \
            am \
            is \
            are \
            was

    There may not be any characters following the backslash.

  • Fixed Buzzwords (and other word lists entered in the config)

    Words entered in config were not converted to lower case before storing in the index.

  • Fixed metaname mapping problem in Merge

    Peter Karman found an error when merging indexes where the source indexes had the same metanames, but listed in a different order in their config files. Words would then be indexed under the wrong metaID number in the output index.

  • SWISH::Filters and spider.pl updates

    The web spider spider.pl was updated to work better with SWISH::Filter by default and also make it easier to use the spider default along with a spider config file. See spider.pl for details.

    SWISH::Filter was updated. The way filters are created has changed. If you created your own filters you will need to update them. Take a look at SWISH::Filter and the filters included in the distribution.

  • Updates to Documentation

    Richard Morin submitted formatting and punctuation dates to the README and INSTALL docs.

  • Added -R option to support IDF word weighting in ranking. (karman)

    Added Inverse Document Frequency calculation to the getrank() routine. This will allow the relative frequency of a word in relationship to other words in the query to impact the ranking of documents.

    Example: if 'foo' is present twice as often as 'bar' in the collection as a whole, a search for 'foo bar' will weight documents with 'bar' more heavily (i.e., higher rank) than those with 'foo'.

    The impact is greatest when OR'ing words in a query rather than AND'ing them (which is the default).

    Also added Rank discussion to the FAQ.

  • Updates to the example scripts

    Updated PhraseHighlight.pm as suggested by Bill Schell for an optimization when all words in a document are highlighted.

    Updated search.cgi and PhraseHighlight.pm to use the internal stemmers via the SWISH::API module as suggested by Jonas Wolf.

  • Leak when using C library

    David Windmueller found a memory leak when calling multiple searches on a swish handle. The problem was swish loading the pre-sorted property index on every search, even after the table had been loaded into memory.

  • Swish.cgi now kills swish-e on time out

    The example script swish.cgi uses an alarm (on platforms that support alarm) to abort processing after some number of seconds, but it was not killing the child process, swish-e. Bill Schell submitted a patch to kill the child when the alarm triggers.

  • The template search.tt was renamed to swish.tt

    The template was renamed because it's used by swish.cgi, not by search.cgi, which was confusing.

  • Updates to the search.cgi

    The example script search.cgi was updated to work better with mod_perl and to use external template files and style sheets.

  • New MS Word Filter

    James Job provided the SWISH::Filter::Doc2html filter that uses the wvWare (http://wvware.sourceforge.net/) program for filtering MS Word documents. If both catdoc and wvWare are installed then wvWare will be used.

    wvWare is reported to do a good job at converting MS Word docs to HTML. In a few tests it did work well, but other cases it failed to generate correct output. It was also much, much slower than catdoc. I tested with wvWare 0.7.3 on Debian Linux. Testing with both is recommended.

  • Change in way symbolic links are followed

    John-Marc Chandonia pointed out that if a symlink is skipped by FileRules, then the actual file/directory is marked as "already seen" and cannot be indexed by other links or directly.

    Now, files and directories are not marked "already seen" until after passing FileRules (i.e after a file is actually indexed or a directory is processed).

  • Could not set SwishSetSort() more than once

    David Windmueller found a problem when trying to set the sort order more than once on an existing search object. Memory was not correctly reset after clearing the previous sort values.

  • Access MetaNames and PropertyNames from API

    Patch provided by Jamie Herre to access the MetaNames and PropertyNames via the C API and to test via the testlib program. Swish::API also updated to access this data.

  • SwishResultPropertyULong() bug fixed

    David Windmueller reported that SwishResultPropertyULong() was returning ULONG_MAX on all calls. This was fixed.

  • Null written to wrong location in file.c

    Bill Schell with the help of valgrind found a null written past the end of a buffer in file.c in the code that supports the old parsers. This resulted in a segfault while indexing a large set of XML documents.

  • Fixed problem when indexing very large files

    Steve Harris reported a problem when indexing a very large document that caused an integer overflow. José Ruiz updated to used unsigned integers.

  • Bump word position on block tags with HTML2 parser

    Peter Karman pointed out the the libxml2 HTML parser was allowing phrase matches across block level html elements. Swish now bumps the word position on these elements.

Version 2.4.2 - March 09, 2004

  • UseStemming didn't take no for an answer

    UseStemming was coded as an alias for FuzzyIndexingMode when Snowball was compiled in (the default), but "no" doesn't always mean no when the Norwegian stemmer is available.

  • Fixed problem building incremental version

    Fixed compile problem with building incremental indexing mode. This is an experimental option with swish-e to allow adding files to an index. See configure --help for build option. Incremental indexes are not compatible with standard indexes.

  • Updated build instructions in INSTALL

    Added a few comments about use of CPPFLAGS and LDFLAGS.

  • Updated the index_hypermail.pl

    Updated to work with latest version of hypermail (pre-2.1.9).

  • Time zone in ResultPropertyStr()

    Format string for generating date did not include the time zone in location. Add strftime format string to config.h

  • Undefined and Blank Properties and (NULL)

    Fixed a few problems with printing properties:

    1) Using -p and -x showed different results if a bad property value was given:

        $ swish-e -w not dkdk -p badname -H0
        err: Unknown Display property name "badname"
        .
        $ swish-e -w not dkdk -x '<badname>\n' -H0
        (NULL)

    Now both return an error.

    2) Fixed bug where using a "fmt" string with -x output generated (bad) output if the result did not have the specified property.

        $ swish-e -w not dkdk -x '<somedate>\n' -H0  # undefined value
    
        $ swish-e -w not dkdk -x '<somedate fmt="%Y %B %d">\n' -H0
        %Y %B 1075353525

    Now nothing is printed if the property does not exist.

    3) Updated SWISH::API to croak() on invalid property names, and to return undefined values for missing properties.

    4) Updated swish.cgi and search.cgi to not generate warnings on undefined values return as properties. Note that swish.cgi will now die on undefined properties. Previously would just display (NULL).

  • Fixed segfault when generating warnings while parsing

    Parser.c was incorrectly calling warning() incorrectly. And -Wall was not catching this!

  • Added check for internal property names.

    Parser was not checking for use of Swish-e reserved property names.

       <swishrank>foo</swishrank>

    This will now generate a warning.

Version 2.4.1 - December 17, 2003

  • Added new example CGI script

    search.cgi is a new skeleton CGI script that uses SWISH::API for searching. It is installed in the same location as swish.cgi.

  • Add Fuzzy access to C and Perl interfaces

    Added a number of functions to the C API (and SWISH::API) to access the stemmer used when indexing a given index.

  • Commas in numbers

    Added commas to summary display at end of indexing.

  • Insert whitespace between tags

    Parser.c was updated to flush the text buffer before and after every (non-inline HTML) tag.

    The problem was that:

        foo<tag>bar</tag>baz

    would index as a single word "foobarbaz".

  • DirTree.pl

    DirTree.pl was updated to work with SWISH::Filter and to work on Windows. DirTree.pl is a program to fetch files from the file system and works with the -S prog input method.

  • Problem with --enable-incremental option

    Fixed configure script to build incremental option. Note that this is still experimental. But testers are welcome.

  • headers.c bug

    Mark Fletcher with the help of valgrind found a bug in headers.c function SwishIndexHeaderNames used by the C API.

  • Clarify documentation regarding search order

    At the prompting of Doralyn Rossmann updated SEARCH.pod to try and make the explanation of searching clearer, and to fix an error in the description of nested searches.

Version 2.4.0 - October 27, 2003

  • Note: Different Index Format

    Swish-e version 2.4.0 has a different index file format from previous versions of Swish-e. Upgrading will require reindexing -- version 2.4.0 cannot read indexes created with previous versions.

Version 2.4.0 (Release Candidate 4) September 26, 2003

  • robots.txt not closed correctly

    When using -S http method robots.txt was not closed and that caused the (last) .contents file to not be unlinked under Windows. Windows seems to think filenames are related to files.

  • SWISH::Filter and locating programs on Windows

    SWISH::Filter now scans $libexecdir in addition to the PATH for programs (such at catdoc and pdftotext), and also checks for programs by adding the extensions ".exe" and ".bat" to the program name.

  • Install sample templates

    The sample templates included with swish.cgi are now installed in $pkgdatadir (typically /usr/local/share/swish-e).

Version 2.4.0 (Release Candidate 3) September 11, 2003

  • Fix parser bug meta=(foo*)

    Fixed bug in query parser caused in rc2's (pr2) attempt to catch wildcards errors.

Version 2.4.0 (Release Candidate 2) September 10, 2003

  • Indexing HTML title

    Fixed a problem when these were used in combination:

      MetaNames swishtitle
      MetaNameAlias swishtitle title

    That failed to correctly reset the metaname stack and indexed text under the wrong metaID.

  • Single Wildcards

    Due to the way the query parser "works" a search of

       "foo *"

    would result in a search of "foo*". Now that results in:

       err: Single wildcard not allowed as word 
  • Fixed search parsing bug

    Brad Miele reported that the word "andes" was not being found. It was being stemmed to "and" when was then considered an operator. [moseley]

  • Add new directive PropertyNamesSortKeyLength

    PropertyNamesSortKeyLength sets the sort key length to use when sorting string properties. The default is 100 characters. There was a hard-coded 100 char limit before, but that was a problem where people were not building from source (Windows). The value of this is questionable -- it's intended to limit how much memory is used when sorting while indexing and searching. [moseley]

  • Fixed sorting issues with multiple indexes and reverse sorting

    Reworked much of the sorting code. Still to do is setting the character sort order. [moseley]

  • Fixed minor memory leak

    Fixed leak of not releasing memory of index file name and swish_handle destroy, and fixed SwishStemWord to default to the Stemmer_en. [moseley]

    Fixed libtest.c example program that was not cleaning up memory after an error condition.

  • Replaced Swish-e's Porter Stemmer with Snowball

    Swish-e now has support for Snowball stemmers (http://snowball.tartarus.org/). The stemmers are enabled for an index with FuzzyIndexingMode Stemming_* where "*" can be:

      de, dk, en1, en2, es, fi, fr, it, nl, no, pt, ru, se

    In addition, UseStemming yes or FuzzyIndexingMode Stemming_en will use the old stemmer.

Version 2.4.0 (Release Candidate 1) May 21, 2003

  • Security Fix: swish.cgi

    The swish.cgi script was not correctly escaping HTML when searching by the right combination of metanames and highlighting module. This could lead to cross-site scripting if indexing un-trusted documents. [moseley]

  • Added Support for building a Debian Package

    To build as a .deb unpack the distribution and chdir then run

       $ fakeroot debian/build binary

    Then install the generated .deb file with dpkg -i

  • Use SWISH::Filter by default with spider.pl

    spider.pl is installed in the libexecdir directory as well as the SWISH::Filter modules. PDF, MS Word, MP3, and XML documents will be indexed automatically if the required helper applications (e.g. catdoc, pdftotext) or scripts (e.g. MP3::Tag) are installed.

    Swish also knows about libexecdir, so you you specify a relative path with -S prog swish-e will look for the program in libexecdir. This is mostly for spider.pl so indexing only requires:

        IndexDir spider.pl
        SwishProgParameters default http://localhost/index.html

    And swish-e will find spider.pl and SWISH::Filter will be used to convert docs.

  • Fixed Document-Type bug

    Document-Type was not being reset after set input from a -S prog program causing the wrong parser to be used. [moseley]

  • New Directive: PropertyNamesNoStripChars

    Swish replaces all series of low ASCII chars with a single space character. This option instructs swish to store all chars in the property. [moseley]

  • Change HTTP access defaults

    Defaults used with -S http access method were changed.

    Delay was reduced from one minute between start of each request to five seconds between requests.

    MaxDepth was changed from five to zero, meaning there is no limit to depth indexed by default. [moseley]

  • swishspider location and SpiderDirectory

    The swishspider program is now installed in $prefix/lib/swish-e by default. This can be changed by the --libexecdir option to configure.

    The SpiderDirectory option now defaults to the value of libexecdir instead of the current directory. [moseley]

  • Added libtool and automake support

    Replaces the build system with Autotools. Now builds libswish-e as a shared library on systems that support shared libraries. The swish-e binary links against this shared library. Can also build outside the source tree on platforms with GNU make. [moseley]

  • Updates to installation

    Running "make install" now installs additional files. Files include the swish-e binary, the libswish-e search library, swish-e.h header, documentation files, the swishspider program, and Perl modules used for the example swish.cgi search script. Directories will be created if they do not already exist. Installation directories can be specified at build time.

  • Fixed bug when searching at end of inverted index

    Swish was not correctly detecting the end of the inverted index when searching a wildcard word that was past the last word in the index. Caught by Frank Heasley. [moseley]

  • Increase sort key length from 50 to 100 characters

    The setting MAX_SORT_STRING_LEN in src/config.h sets the max length used when sorting in swish-e. You may reduce this number to save memory while sorting, or increase it if you have very long properties to sort.

  • Remove &quot; entity from -p output

    The -p option to print properties was escaping double quotes in properties with the &amp;quot; entity. -x does not do that, so inconsistent. -p no longer converts double quotes. The user should pick a good delimiter with -d or preferably use the -x method for generating output.

  • XML parser and Windows

    The XML parser was being passed the incorrect buffer length when used on Windows platform causing the parser to abort with an error.

  • Version Numbering

    SWISH-E versions starting with 2.3.4 use kernel version numbering. Versions are in the form: Major.Minor.Build. Odd minor versions are development. Even minor versions are releases. 2.3.4 would be a development version. 2.4.0 would be a release version. 2.3.20 would be the 20th build of 2.3.

  • Added RPM support

    RPMs can be built with:

        ./configure
        make dist

    Copy the resulting tarball to RPM's SOURCES directory and then run as a superuser:

        rpmbuild -ba rpm/swish-e.spec

    You should have swish-e packages in your RPMS/$arch directory. [augur]

  • Changed default perl binary location

    Most perl scripts provided with SWISH-E now use /usr/bin/perl by default. Note that some scripts are generated at build time, so those will look in the path for the location of the perl binary.

  • New Feature: MetaNamesRank

    MetaNamesRank can be used to adjust the ranking for words based on the word's MetaName.

  • New Swish Library API and Perl Module

    The Swish-e C library interface was rewritten to provide better memory management and better separation of data. Most indexing related code has been removed from the library. A new header file is provided for the API: swish-e.h.

    The Perl module SWISHE was replaced with the SWISH::API module in the Swish-e distribution.

    Previous versions of the SWISHE module will not work with this version of Swish-e.

    If you are using the SWISHE module from a previous version of Swish then you must either rewrite your code to use the new SWISH::API module (highly recommended) or use the replacement SWISHE module. The replacement SWISHE module is a thin interface to the SWISH::API module. It can be downloaded from

        http://swish-e.org/Download/old/SWISHE-0.03.tar.gz
  • NoContents not working with libxml2 parser

    Corrected problem when using NoContents with binary files and the HTML2 parser.

    Trying to index image file names with:

        IndexOnly .gif .jpeg
        NoContents .gif .jpeg

    failed to index the path names because the default parser (HTML2 when libxml2 is linked with swish-e) was not finding any text in the binary files. [moseley]

  • Updates to swish.cgi

    The example/swish.cgi script can now use the SWISH::API module for searching an index. Combined with mod_perl this module can improve search performance considerably.

    The Perl modules used with the swish.cgi script have all been moved into the SWISH::* namespace. Hence, files in the modules directory were moved into the modules::SWISH directory.

Version 2.2.3 - December 11, 2002

Multiple -L options were ORing instead of ANDing. Catch by Patrick Mouret. [moseley]

Version 2.2.2 - November 14, 2002

Pass non- text/* files onto indexing code IF there is a FileFilter associated with the *extension* of the URL. Fixes the problem of not being able to index, say, pdf files by using the FileFilter configuation option.

Fixed bug where nulls were stripped when using FileFilter with -S prog. Catch by Greg Fenton. [moseley]

Version 2.2.1 - September 26, 2002

  • NoContents with -S prog

    Failed to use the correct default parser when using the No-Contents header and libxml2 linked in. [moseley]

  • Add tests for IRIX and sparc machines

    8-byte alignment in mem_zones is is required for these machine [moseley]

  • Fixed code when removing files

    Was not correctly removing words from index when parser aborted [jmruiz]

  • Merge segfault

    Fixed segfault caused by trying to print null dates while merging duplicate files. [moseley]

  • Documentation patches

    Spelling corrections to the SWISH-CONFIG pod page [Steve Eckert]

  • Configure corrections

    Fixed a zlib test error that used "==" in a test [Steve Eckert]

  • Updates to VMS build

    The VMS build was updated [Jean-François PIÉRONNE]

  • MANIFEST corrections

    Added missing filters and vms build file into MANIFEST [moseley]

Version 2.2 - September 18, 2002

  • Default parser

    Swish-e will now use the HTML2 (libxml2) parser by default if libxml2 is installed and DefaultContents or IndexContents is not used.

  • Selecting parsers

    Allow HTML*, XML*, and TXT* to automatically select the libxml2-based parsers if libxml2 is linked with Swish-e, otherwise fallback to the built-in parsers.

  • SwishSpider and Filters

    Filters (FileFilter directive) did not work correctly when spidering with the -S http method. A new filter system was developed and now filtering of documents (e.g. pdf->html or MSWord->text) is handled by the src/SwishSpider program.

    When indexing with the -S http method only documents of content-type "text/*" are indexed. Other documents must be converted to text by using the filter system.

  • Buffer overflow in xml.c

    Fixed bug in xml.c reported by Rodney Barnett when very long words were indexed. [moseley]

  • configure script updates

    Updated from _WIN32 checks to feature checks using autoconf [moseley, norris]

  • updates to run on Alpha (Linux 2.4 (Debian 3.0))

    Fixed a cast error when calling zlib, and the calls to read/write a packed longs to disk. [jmruiz, moseley]

  • COALESCE_BUFFER_MAX_SIZE

    Some people were seeing the following error:

        err: Buffer too short in coalesce_word_locations.
        Increase COALESCE_BUFFER_MAX_SIZE in config.h and rebuild.

    This was due to indexing binary data or files with very large number of words. The best solution is to not index binary data or files with a very large number of words.

    Swish-e will now automatically reallocate the buffer as needed. [jmruiz]

Version 2.2rc1 - August 29, 2002

Many large changes were made internally in the code, some for performance reasons, some for feature changes and additions, and some to prepare for new features in later versions of Swish-e.

  • Documentation!

    Documentation is now included in the source distribution as .pod (perldoc) files, and as HTML files. In addition, the distribution can now generate PDF, postscript, and unix man pages from the source .pod files. See README for more information.

  • Indexing and searching speed

    The indexing process has been imporoved. Depending on a number of factors, you may see a significant improvement in indexing speed, especially if upgrading from version 1.x.

    Searching speed has also been improved. Properties are not loaded until results are displayed, and properties are pre-sorted during indexing to speed up sorting results by properties while searching.

  • Properties are written to a sepearte file

    Swish-e now stores document properties in a separate file. This means there are now two files that make up a Swish-e index. The default files are index.swish-e and index.swish-e.prop.

    This change frees memory while indexing, allowing larger collections to be indexed in memory.

  • Internal data stored as Properties

    Pre 2.2 some internal data was stored in fixed locations within the index, namely the file name, file size, and title. 2.2 introduced new internal data such as the last modified date, and document summaries. This data is considered meta data since it is data about a document.

    Instead of adding new data to the internal structure of the index file, it was decided to use the MetaNames and PropertyNames feature of Swish-e to store this meta information. This allows for new meta data to be added at a later time (e.g. Content-type), and provides an easy and customizable way to print results with the -p switch and the new -x switch. In addition, search results can now be sorted and limited by properties.

    For example, to sort by the rank and title:

        swish-e -w foo -s swishrank desc swishtitle asc
  • The header display has been slightly reorganized.

    If you are parsing output headers in a program then you may need to adjust your code. There's a new switch '-H' to control the level of header output when searching.

  • Results are now combined when searching more than one index.

    Swish-e now merges (and sorts) the results from multiple indexes when using -f to specify more than one index. This change effects the way maxhits (-m) works. Here's a summary of the way it works for the different versions.

        1.3.2 - MaxHits returns first N results starting from the first index.
                e.g. maxhits=20; 15 hits Index1, 40 hits Index2
                All 15 from Index1 plus first five from Index2 = 20 hits.
    
        2.0.0 - MaxHits returns first N results from each index.
                e.g. Maxhits=20; 15 hits Index1, 40 hits Index2
                All 15 from Index1 plus 15 from Index2.
    
        2.2.0 - Results are merged and first N results are returned.
                e.g. Maxhits=20; 15 hits Index1, 40 hits Index2
                Results are merged from each index and sorted
                (rank is the default sort) and only the first
                20 are returned.
  • New prog document source indexing method

    You can now use -S prog to use an external program to supply documents to Swish-e. This external program can be used to spider web servers, index databases, or to convert any type of document into html, xml, or text, so it can be indexed by Swish-e. Examples are given in the prog-bin directory.

  • The indexing parser was rewritten to be more logical.

    TranslateCharacters now is done before WordCharacters is checked. For example,

        WordCharacters abcdefghijklmnopqrstuvwxyz
        TranslateCharacters ñ n

    Now El Niño will be indexed as El Nino (el and nino), even though ñ is not listed in WordCharacters.

    Previously, stopwords were checked after stemming and soundex conversions, as well as most of the other word checks (WordCharacters, min/max length and so on). This meant that the stopword list probably didn't work as expected when using stemming.

  • The search parser was rewritten to be more logical

    The search parser was rewritten to correct a number of logic errors. Swish-e did not differentiate between meta names, Swish-e operators and search words when parsing the query. This meant, for example, that metanames might be broken up by the WordCharacters setting, and that they could be stemmed.

    Swish-e operator characters "*()= can now be searched by escaping with a backslash. For example:

        ./swish-e -w 'this\=odd\)word'

    will end up searching for the word this=odd)word. To search for a backslash character preceed it with a backslash.

    Currently, searching for:

        ./swish-e -w 'this\*'

    is the same as a wildcard search. This may be fixed in the future.

    Searching for buzzwords with those characters will still require backslashing. This also may change to allow some un-escaped operator characters, but some will always need to be escaped (e.g. the double-quote phrase character).

  • Quotes and Backslash escapes in strings

    A bug was fixed in the parse_line() function (in string.c) where backslashes were not escaping the next character. parse_line() is used to parse a string of text into tokens (words). Normally splitting is done at whitespace. You may use quotes (single or double) to define a string (that might include whitespace) as a single parameter. The backslash can also be used to escape the following character when *within* quotes (e.g. to escape an embedded quote character).

        ReplaceRules append "foo bar"   <- define "foo bar" as a single word
        ReplaceRules append "foo\"bar"  <- escape the quotes
        ReplaceRules append 'foo"bar'   <- same thing
  • Example user.config file removed.

    Previous versions of Swish-e included a configuration file called user.config which contained examples of all directives. This has been replaced by a series of example configuration files located in the conf directory. The configuration directives are now described in SWISH-CONFIG.

  • Ports to Win32 and VMS

    David Norris has included the files required to build Swish-e under Windows. See src/win32. A self-extracting Windows version is available from the Download page of the swish-e.org web site.

    Jean-François Piéronne has provided the files required to build Swish-e under OpenVMS. See src/vms for more information.

  • String properties are concatenated

    Multiple string properties of the same name in a document are now concatenated into one property. A space character is added between the strings if needed. A warning will be generated if multiple numeric or date properties are found in the same document, and the additional properties will be ignored.

    Previously, properties of the same name were added to the index, but could not be retrieved.

    To do: remove the next pointer, and allow user-defined character to place between properties.

  • regex type added to ReplaceRules

    A more general purpose pattern replacement syntax.

  • New Parsers

    Swish-e's XML parser was replaced with James Clark's expat XML parser library.

    Swish-e can now use Daniel Veillard's libxml2 library for parsing HTML and XML. This requires installation of the library before building Swish-e. See the INSTALL document for information. libxml2 is not required, but is strongly recommended for parsing HTML over Swish-e's internal HTML parser, and provides more features for both HTML and XML parsing.

  • Support for zlib

    Swish-e can be compiled with zlib. This is useful for compressing large properties. Building Swish-e with zlib is stronly recommended if you use its StoreDescription feature.

  • LST type of document no longer supported

    LST allowed indexing of files that contained multiple documents.

  • Temporary files

    To improve security Swish-e now uses the mkstemp(3) function to create temporary files. Temporary files are used while indexing only. This may result in some portability issues, but the security issues were overriding.

    (Currently this does not apply to the -S http indexing method.)

    mkstemp opens the temporary with O_EXCL|O_CREAT flags. This prevents overwriting existing files. In addition, the name of the file created is a lot harder to guess by attackers. The temporary file is created with only owner permissions.

    Please report any portability issues on the Swish-e discussion list.

  • Temporary file locations

    Swish-e now uses the environment variables TMPDIR, TMP, and TEMP (in that order) to decide where to write temporary files. The configuration setting of TmpDir will be used if none of the environment variables are set. Swish-e uses the current directory otherwise; there is no default temporary directory.

    Since the environment variables override the configuration settings, a warning will be issued if you set TmpDir in the configuration file and there's also an environment variable set.

    Temporary files begin with the letters "swtmp" (which can be changed in config.h), followed by two or more letters that indicate the type of temporary file, and some random characters to complete the file name. If indexing is aborted for some reason you may find these temporary files left behind.

  • New Fuzzy indexing method Double Metaphone

    Based on Lawrence Philips' Metaphone algorithm, add two new methods of creating a fuzzy index (in addition to Stemming and Soundex).

Changes to Configuration File Directives. Please see SWISH-CONFIG for more info.

  • New directives: IndexContents and DefaultContents

    The IndexContents directive assigns internal Swish-e document parsers to files based on their file type. The DefaultContents directive assigns a parser to be used on file that are not assigned a parser with IndexContents.

  • New directive: UndefinedMetaTags [error|ignore|index|auto]

    This describes what to do when a meta tag is found in a document that is not listed in the MetaNames directive.

  • New directive: IgnoreTags

    Will ignore text with the listed tags.

  • New directive: SwishProgParameters *list of words*

    Passes words listed to the external Swish-e program when running with -S prog document source method.

  • New directive: ConvertHTMLEntities [yes|no]

    Controls parsing and conversion of HTML entities.

  • New directive: DontBumpPositionOnMetaTags

    The word position is now bumped when a new metatag is found -- this is to prevent phrases from matching across meta tags. This directive will disable this behavior for the listed tags.

    This directive works for HTML and XML documents.

  • Changed directive: IndexComments

    This has been changed such that comments are not indexed by default.

  • Changed directive: IgnoreWords

    The builtin list of stopwords has been removed. Use of the SwishDefault word will generate a warning, and no stop words will be used. You must now specify a list of stopwords, or specify a file of stopwords.

    A sample file stopwords.txt has been included in the conf/stopwords directory of the distribution, and can be used by the directive:

        IgnoreWords File: /path/to/stopwords.txt
  • Change of the default for IgnoreTotalWordCountWhenRanking

    The default is now "yes".

  • New directive: Buzzwords

    Buzzwords are words that should be indexed as-is, without checking for stopwords, word length, WordCharacters, or any other of the word limiting features. This allows indexing of things like C++ when "+" is not listed in WordCharacters.

    Currenly, IgnoreFirstChar and IgnoreLastChar will be stripped before processing Buzzwords.

    In the future we may use separate IgnoreFirst/Last settings for buzzwords since, for example, you may wish to index all + within Swish-e words, but strip + from the start/end of Swish-e words, but not from the buzzword C++.

  • New directives: PropertyNamesNumeric PropertyNamesDate

    Before Swish-e 2.2 all user-defined document properties were stored in the index as strings. PropertyNamesNumeric and PropertyNamesDate tell it that a property should be stored in binary format. This allows for correct sorting of numeric properties.

    Currenly, only integers can be stored, such as a unix timestamp. (Swish-e uses strtoul to convert the number to an unsigned long internally.)

    PropertyNamesDate only indicates to Swish-e that a number is a unix timestamp, and to display the property as a formatted time when printing results. Swish does not currently parse date strings; you must provide a unix timestamp.

  • New directive: MetaNameAlias

    You may now create alias names for MetaNames. This allow you to map or group multiple names to the same MetaName.

  • New directive: PropertyNameAlias

    Creates aliases for a PropertyName.

  • New directive: PropertyNamesMaxLength

    Sets the max length of a text property.

  • New directive: HTMLLinksMetaName

    Defines a metaname to use for indexing href links in HTML documents. Available only with libxml2 parser.

  • New directive: ImageLinksMetaName

    Defines a metaname to use for indexing src links in <img> tags. Allow you to search image pathnames within HTML pages. Available only with libxml2 parser.

  • New directive: IndexAltTagMetaName

    Allows indexing of image ALT tags. Only available when using the libxml2 parser.

  • New directive: AbsoluteLinks

    Attempts to convert relative links indexed with HTMLLinksMetaName and ImageLinksMetaName to absolute links. Available only with libxml2 parser.

  • New directive: ExtractPath

    Allows you to use a regular expression to extract out part of the path of each file and index it with a meta name. For example, this allows searches to be limited to parts of your file tree.

  • New directive: FileMatch

    FileMatch is similar to FileRules. Where FileRules is used to exclude files and directoires, FileMatch is used to include files.

  • New directive: PreSortedIndex

    Controls which properties are pre-sorted while indexing. All properties are sorted by default.

  • New directive: ParserWarnLevel

    Sets the level of warning printed when using libxml2.

  • New directive: obeyRobotsNoIndex [yes|NO]

    When using libxml2 to parse HTML, Swish-e will skip files marked as NOINDEX.

        <meta name="robots" content="noindex">

    Also, comments may be used within HTML and XML source docs to block sections of content from indexing:

           <!-- SwishCommand noindex -->
           <!-- SwishCommand index -->

    and/or these may be used also:

           <!-- noindex -->
           <!-- index -->
  • New directive: UndefinedXMLAttributes

    This describes how the content of XML attributes should be indexed, if at all. This is similar to UndefinedMetaTags, but is only for XML attributes and when parsed by libxml2. The default is to not index XML attributes.

  • New directive: XMLClassAttributes

    XMLClassAttributes can specify a list of attribute names whose content is combined with the element name to form metanames.

  • New directive: PropCompressionLevel [0-9]

    If compiled with zlib, Swish-e uses this setting to control the level of compression applied to properties. Properties must be long enough (defined in config.h) to be compressed. Useful for StoreDescription.

  • Experimental directive: IgnoreNumberChars

    Defines a set of characters. If a word is made of of *only* those characters the word will not be indexed.

  • New directive: FuzzyIndexingMode

    This configuration directive is used to define the type of "fuzzy" index to create. Currently the options are:

        None
        Stemming
        Soundex
        Metaphone
        DoubleMetaphone

Changes to command line arguments. See SWISH-RUN for documentation on these switches.

  • New command line argument -H

    Controls the level (verbosity) of header information printed with search results.

  • New command line argument -x

    Provides additional header output and allows for a format string to describe what data to print.

  • New command line argument -k

    Prints words stored in the Swish-e index.

  • New command line argument -N

    Provides a way to do incremental indexing by comparing last modification dates. You pass -N a path to a file and only files newer than the last modified date of that file will be indexed.

  • Removed command line argument -D

    -D no longer dumps the index file data. Use -T instead.

  • New command line argument -T

    -T is used for debugging indexing and searching.

  • Enhanced command line argument -d

    Now -d can accept some back-slashed characters to be used as output separators.

  • Enhanced command line argument -P

    Now -P sets the phrase delimiter character in searches.

  • New command line argument -L

    Swish-e 2.2 contains an experimental feature to limit results by a range of property values. This behavior of this feature may change in the future.

  • Modified command line argument -v

    Now the argument -v 0 results in *no* output unless there is an error. This is a bit more handy when indexing with cron.

swish-e-2.4.7/html/swish-run.html0000644000077100017500000015370611166010463013631 00000000000000 Swish-e :: SWISH-RUN - Running Swish-e and Command Line Switches
home | support | download

SWISH-RUN - Running Swish-e and Command Line Switches

Swish-e version 2.4.7

Table of Contents


OVERVIEW

The Swish-e program is controlled by command line arguments (called switches). Often, it is run manually from a shell (command prompt), or from a program such as a CGI script that passes the command line arguments to swish.

Note: A number of the command line switches may be specified in the Swish-e configuration file specified with the -c command line argument. Please see SWISH-CONFIG for a complete description of available configuration file directives.

There are two basic operating modes of Swish-e: indexing and searching. There are command line arguments that are unique to each mode, and others that apply to both (yet may have different meaning depending on the operating mode). These command line arguments are listed below, grouped by:

INDEXING -- describes the command line arguments used while indexing.

SEARCHING -- lists the command line arguments used while searching.

OTHER SWITCHES -- lists switches that don't apply to searching or indexing.

Beginning with Swish-e version 2.1, you may embed its search engine into your applications. Please see SWISH-LIBRARY.

INDEXING

Swish-e indexing is initiated by passing command line arguments to swish. The command line arguments used for searching are described in SEARCHING. Also, see SWISH-SEARCH for examples of searching with Swish-e.

Swish-e usage:

    swish-e [-i dir file ... ] [-c file] [-f file] [-l] \
            [-v (num)] [-S method(fs|http|prog)] [-N path]

The -h switch (help) will list the available Swish-e command line arguments:

    swish-e -h

Typically, most if not all indexing settings are placed in a configuration file (specified with the -c switch). Once the configuration file is setup indexing is initiated as:

    swish-e -c /path/to/config/file

See SWISH-CONFIG for information on the configuration file.

Security Note: If the swish binary is named swish-search then swish will not allow any operation that would cause swish to write to the index file.

When indexing it may be advisable to index to a temporary file, and then after indexing has successfully completed rename the file to the final location. This is especially important when replacing an index that is currently in use.

    swish-e -c swish.config -f index.tmp
    [check return code from swish or look for err: output]
    mv index.tmp index.swish-e

Indexing Command Line Arguments

  • -i *directories and/or files* (input file)

    This specifies the directories and/or files to index. Directories will be indexed recursively. This is typically specified in the configuration file with the IndexDir directive instead of on the command line. Use of this switch overrides the configuration file settings.

  • -S [fs|http|prog] (document source/access mode)

    This specifies the method to use for accessing documents to index. Can be either fs for local indexing via the file system (the default), http for spidering, or prog for reading documents from an external program.

    Located in the conf directory are example configuration files that demonstrate indexing with the different document source methods.

    See the SWISH-FAQ for a discussion on the different indexing methods, and the difference between spidering with the http method vs. using the file system method.

    • fs - file system

      The fs method simply reads files from a local (or networked) drive. This is the default method if the -S switch is not specified. See SWISH-CONFIG for configuration directives specific to the fs method.

    • http - spider a web server

      The http method is used to spider web servers. It uses an included helper program called swishspider. See SWISH-CONFIG for configuration directives specific to the http method.

      Security Note: Under Windows swish passes the URLs fetched from remote documents through the shell (swish uses the system() command for running swishspider under Windows), and this may be considered an additional security risk.

      The http method is deprecated (or at least not very well appreciated). Consider using the prog method described below for spidering. There's a spider program available in the prog-bin directory for use with the prog method. Here's a number of limitation with this method that are solved with the prog method:

      • swishspider only spiders standard <a href="..."> links. Frames and other links are not followed.

      • By default, this method of spidering only indexes files that have a content type of "text/*" (e.g. text/plain, text/html, text/xml). You should use DefaultContents and IndexContents to map file extensions to parsers used by swish (e.g. IndexContents HTML* .html .htm), but this will fail where a document does not have a file extension.

      • Swish-e's FileFilter directive can be used with the http access method, although it requires a separate process (in addition to the swsihspider process) for each document filtered.

      • The SWISH::Filter modules can be used with the swishspider program. SWISH::Filter provides a general purpose filtering system (see SWISH::Filter documentation). To use SWISH::Filter set PERL5LIB to point to the location of the SWISH module name space (typically /usr/local/lib/swish-e under Unix). For example:

           export PERL5LIB=/usr/local/lib/swish-e  # bash, bourne shells
           setenv PERL5LIB /usr/local/lib/swish-e  # csh, tcsh

        or under Windows

           set PERL5LIB=c:\program files\swish-e2.4\lib\swish-e

        SWISH::Filter is not enabled by default due to the overhead of loading the modules for every document fetched.

        The Swish-e distribution includes perl modules in the SWISH::Filters::* namespace to make converting non-text documents into a format that Swish-e can parse easy. As mentioned above, the helper script swishspider will use these modules if can be found via PERL5LIB. These modules only provide an interface to programs that do the conversion. For example, you will need to download and install the "catdoc" program to convert MSWord documents into text for indexing. Please see filters/README to see how to use this filter system.

    • prog - general purpose access method

      The prog method is new to Swish-e version 2.2. It's designed as a general purpose method to feed documents to swish from an external program.

      For example, the external program can read a database (e.g. MySQL), spider a web server, or convert documents from one format to another (e.g. pdf to html). Or, you can simply use it to read the files of the file system (like -S fs), yet provide you with full control of what files are indexed.

      The external program name to run is passed to swish either by the IndexDir directive, or via the -i option.

      The program specified should be an absolute path as swish-e will attempt to stat() the program to make sure it exists. Swish does this to help in error reporting.

      If the program specified with -i or IndexDir is not an absolute path (i.e. does not include "/" ) then swish-e will append the "libexecdir" directory defined during configuration. Typically, libexecdir is set to "$prefix/lib/swish-e" (/usr/local/lib/swish-e), but is platform and installation dependent. Running swish-e -h will report the directory.

      For example, the -S prog program "spider.pl" is a Perl helper program for use with -S prog and is installed in libexecdir.

          IndexDir spider.pl
          SwishProgParameters default http://localhost/index.html

      and swish-e will find spider.pl in libexecdir.

      Additional parameters may be passed to the external program via the SwishProgParameters directive. In the example above swish-e will pass two parameters to spider.pl, "default" and "http://localhost/index.html".

      A special name "stdin" may be used with -i or IndexDir which tells swish to read from standard input instead of from an external program. See example below.

      The external program prints to standard output (which swish captures) a set of headers followed by the content of the file to index. The output looks similar to an email message or a HTTP document returned by a web server in that it includes name/value pairs of headers, a blank line, and the content.

      The content length is determined by a content-length header supplied to swish by the program; there is no "end of record" character or flag sent between documents. Therefore, it is critical that the content-length header is correct. This is a common source of errors.

      One advantage of this method (over using filters, for example) is that the external program is run only once for the entire indexing job, instead of once for every document. This avoids forking and creating a new process for every document, and makes a huge difference when your external program is something like perl that has a large startup cost.

      Here's a simple example written in Perl:

          #!/usr/local/bin/perl -w
          use strict;
      
          # Build a document
          my $doc = <<EOF;
          <html>
          <head>
              <title>Document Title</title>
          </head>
              <body>
                  This is the text.
              </body>
          </html>
          EOF
      
          # Prepare the headers for swish
          my $path = 'Example.file';
          my $size = length $doc;
          my $mtime = time;
      
          # Output the document (to swish)
          print <<EOF;
          Path-Name: $path
          Content-Length: $size
          Last-Mtime: $mtime
          Document-Type: HTML*
      
          EOF
      
              print $doc;

      The external program passes to swish a header. The header is separated from the body of the document with a blank line. The available headers are:

      • Path-Name:

        This is the name of the file you are indexing. This can be any string, so for example it could be an ID of a record in a database, a URL or a simple file name.

        This header is required.

      • Content-Length:

        This header specifies the length in bytes of the document that follows the header. This length must be exactly the length of the document -- do not make the mistake of adding an extra line feed at the end of the document.

        This header is required.

      • Last-Mtime:

        Thi parameter is the last modification time of the file, and must be a time stamp (seconds since the Epoch on your platform).

        This header is not required.

      • Document-Type:

        You may override swish's determination of document type (Indexcontents) by using the Document-Type: header. The document type is used to select which parser Swish-e uses to parse the document's contents.

        For example, a spider program might map the content-type returned from a web server to one of the types Swish-e understands. For example,

            my $doc_type = 'HTML*' if $response->content_type =~ m!text/html!'

        This header is not required.

      • Update-Mode:

        When updating an incremental index this header can be used to select the mode for updating the index. There are three possible values:

            Update
            Remove
            Index

        "Update" will update the index with the given file if the date of the given file is newer than the date of the file already in the index. Setting to "Update" is the same as using -u on the command line.

        "Remove" mode will remove the file specified by the Path-Name header. Setting "Remove" is the same as using -r on the command line.

        "Index" will add the file to the index. NOTE: swish-e will not check to see if the file already exists.

        If this header is not specified, the default is the mode specified on the command line (-u, -r, or none).

        This option is still experimental and is subject to change in the future. Ask on the Swish-e list before using.

      The above example program only returns one document and exits, which is not very useful. Normally, your program would read data from some source, such as files or a database, format as XML, HTML, or text, and pass them to swish, one after another. The Content-Length: header tells swish where each document ends -- there is not any special "end of record" character or marker.

      To index with the above example you need to make sure that the program is executable (and that the path to perl is correct), and then call swish telling to run in prog mode, and the name of the program to use for input.

          % chmod 755 example.pl
          % ./swish-e -S prog -i ./example.pl

      Programs can and should be tested prior to running swish. For example:

          % ./example.pl > test.out

      A few more useful example programs are provided in the swish-e distribution located in the prog-bin directory. Some include documentation:

          % cd prog-bin
          % perldoc spider.pl

      Others are small examples that include comments:

          % cd prog-bin
          % less DirTree.pl

      The spider.pl program can be used as a replacement for the -S http method. It is far more feature-rich and offers much more control over indexing.

      If you use the special program name "stdin" with -i or IndexDir then swish-e will read from standard input instead of from a program. For example:

          % ./example.pl --count=1000 /path/to/data | ./swish-e -S prog -i stdin

      This is basically the same as using a swish-e configuration file of:

          SwishProgParameters --count=1000 /path/to/data
          IndexDir ./example.pl

      in a config file and running

          % ./swish-e -S prog -c swish.conf

      This gives an easy way to run swish without a configuration file with a -S prog program that requires parameters. It also means you can capture data to a file and then index more once with the same data:

          % ./example.pl /path/to/data --count=1000 > docs.txt
          % cat docs.txt | ./swish-e -S prog -i stdin -c normal_index
          % cat docs.txt | ./swish-e -S prog -i stdin -c fuzzy_index

      Using "stdin" might also be useful for programs that call swish (instead of swish calling the program).

      (The reason "stdin" is used instead of the more common "-" dash is due to the rotten way swish parses the command line. This should be fixed in the future.)

      The prog method bypasses some of the configuration parameters available to the file system method -- settings such as IndexOnly, FileRules, FileMatch and FollowSymLinks are ignored when using the prog method. It's expected that these operations are better accomplished in the external program before passing the document onto swish. In other words, when using the prog method, only send the documents to swish that you want indexed.

      You may use swish's filter feature with the prog method, but performance will be better if you run filtering programs from within your external program. See also filters/README for an example how to easily add document converstion and filtering into your Perl-based programs.

      Notes when using -S prog on MS Windows

      Windows does not use the shebang (#!) line of a program to determine the program to run. So, when running, for example, a perl program you may need to specify the perl.exe binary as the program, and use the SwishProgParameters to name the file.

          IndexDir e:/perl/bin/perl.exe
          SwishProgParameters read_database.pl

      Swish will replace the forward slashes with backslashes before running the command specified with IndexDir. Swish uses the popen(3) command which passes the command through the shell.

  • -f *indexfile* (index file)

    If you are indexing, this specifies the file to save the generated index in, and you can only specify one file. See also IndexFile in the configuration file.

    If you are searching, this specifies the index files (one or more) to search from. The default index file is index.swish-e in the current directory.

  • -c *file ...* (configuration files)

    Specify the configuration file(s) to use for indexing. This file contains many directives that control how Swish-e proceeds. See SWISH-CONFIG for a complete listing of configuration file directives.

    Example:

        swish-e -c docs.conf

    If you specify a directory to index, an index file, or the verbose option on the command-line, these values will override any specified in the configuration file.

    You can specify multiple configuration files. For example, you may have one configuration file that has common site-wide settings, and another for a specific index.

    Examples:

        1) swish-e -c swish-e.conf
        2) swish-e -i /usr/local/www -f index.swish-e -v -c swish-e.conf
        3) swish-e -c swish-e.conf stopwords.conf
    1. 1

      The settings in the configuration file will be used to index a site.

    2. 2

      These command-line options will override anything in the configuration file.

    3. 3

      The variables in swish-e.conf will be read, then the variable in stopwords.conf will be read. Note that if the same variables occur in both files, older values may be written over.

  • -e (economy mode)

    For large sites indexing may require more RAM than is available. The -e switch tells swish to use disk space to store data structures while indexing, saving memory. This option is recommended if swish uses so much RAM that the computer begins to swap excessively, and you cannot increase available memory. The trade-off is slightly longer indexing times, and a busy disk drive.

  • -l (symbolic links)

    Specifying this option tells swish to follow symbolic links when indexing. The configuration file value FollowSymLinks will override the command-line value.

    The default is not to follow symlinks. A small improvement in indexing time my result from enabling FollowSymLinks since swish does not need to stat every directory and file processed to determine if it is a symbolic link.

  • -N path (index only newer files)

    The -N option takes a path to a file, and only files newer than the specified file will be indexed. This is helpful for creating incremental indexes -- that is, indexes that contain just files added since the last full index was created of all files.

    Example (bad example)

        swish-e -c config.file -N index.swish-e -f index.new

    This will index as normal, but only files with a modified date newer than index.swish-e will be indexed.

    This is a bad example because it uses index.swish-e which one might assume was the date of last indexing. The problem is that files might have been added between the time indexing read the directory and when the index.swish-e file was created -- which can be quite a bit of time for very large indexing jobs.

    The only solution is to prevent any new file additions while full indexing is running. If this is impossible then it will be slightly better to do this:

    Full indexing:

        touch indexing_time.file
        swish-e -c config.file -f index.tmp
        mv index.tmp index.full

    Incremental indexing:

        swish-e -c config.file -N indexing_time.file -f index.tmp
        mv index.tmp index.incremental

    Then search with

        swish-e -w foo -f index.full index.incremental

    or merge the indexes

        swish-e -M index.full index.incremental index.tmp
        mv index.tmp index.swish-e
        swish-e -w foo
  • -r

    **incremental index format only** The -r option puts swish-e into "removal" mode. Any input files (given with -i or the IndexDir parameter) are removed from an existing index.

    Example:

      swish-e -r -i file.html

    would remove file.html from the existing index.

  • -u

    **incremental index format only** The -u option puts swish-e into "update" mode. The timestamp of each input file is compared against the corresponding file in the existing index. If swish-e encounters an input file that either does not exist yet in the index or exists with a timestamp older than the input file, the input file is updated in the index. Any words in the input file that have been added or removed are reflected as such in the index.

    Example:

      swish-e -i file.html -u

    would update the index.swish-e index with the contents of file.html. If file.html was new, it would be added. If file.html already existed in the index, its contents would be updated in the index.

  • -v [0|1|2|3] (verbosity level)

    The -v option can take a numerical value from 0 to 3. Specify 0 for completely silent operation and 3 for detailed reports.

    If no value is given then 1 is assumed. See also IndexReport in the configuration file.

    Warnings and errors are reported regardless of the verbosity level. In addition, all error and warnings are written to standard out. This is for historical reasons (many scripts exist that parse standard out for error messages).

  • -W (0|1|2|3) (parser warning level)

    If using the libxml2 parser, the default parser warning level is set at 2. Use the -W option to override that default. Most often, you might want to turn it off altogether:

      swish-e -W0 -i path/to/files

    would fail silently if the parser encountered any errors.

SEARCHING

The following command line arguments are available when searching with Swish-e. These switches are used to select the index to search, what fields to search, and how and what to print as results.

This section just lists the available command line arguments and their usage. Please see SWISH-SEARCH for detailed searching instructions.

Warning: If using Swish-e via a CGI interface, please see CGI Danger!

Security Note: If the swish binary is named swish-search then swish will not allow any operation that would cause swish to write to the index file.

Searching Command Line Arguments

  • -w *word1 word2 ...* (query words)

    This performs a case-insensitive search using a number of keywords. If no index file to search is specified (via the -f switch), swish-e will try to search a file called index.swish-e in the current directory.

        swish-e -w word

    Phrase searching is accomplished by placing the quote delimiter (a double-quote by default) around the search phrase.

        swish-e -w 'word or "this phrase"'

    Search would should be protected from the shell by quotes. Typically, this is single quotes when running under Unix.

    Under Windows command.com you may not need to use quotes, but you will need to backslash the quotes used to delimit phrases:

        swish-e -w \"a phrase\"

    The phrase delimiter can be set with the -P switch.

    The search may be limited to a MetaName. For example:

        swish-e -w meta1=(foo or baz)

    will only search within the meta1 tag.

    Please see SWISH-SEARCH for a description of MetaNames

  • -f *file1 file2 ...* (index files)

    Specifies the index file(s) used while searching. More than one file may be listed, and each file will be searched. If no -f switch is specified then the file index.swish-e in the current directory will be used as the index file.

  • -m *number* (max results)

    While searching, this specifies the maximum number of results to return. The default is to return all results.

    This switch is often used in conjunction with the -b switch to return results one page at a time (strongly recommended for large indexes).

  • -b *number* (beginning result)

    Sets the begining search result to return (records are numbered from 1). This switch can be used with the -m switch to return results in groups or pages.

    Example:

        swish-e -w 'word' -b 1 -m 20    # first 'page'
        swish-e -w 'word' -b 21 -m 20   # second 'page'
  • -t HBthec (context searching)

    The -t option allows you to search for words that exist only in specific HTML tags. Each character in the string you specify in the argument to this option represents a different tag in which to search for the word. H means all HEAD tags, B stands for BODY tags, t is all TITLE tags, h is H1 to H6 (header) tags, e is emphasized tags (this may be B, I, EM, or STRONG), and c is HTML comment tags

    search only in header (<H*>) tags

        swish-e -w word -t h
  • -d *string* (delimiter)

    Set the delimiter used when printing results. By default, Swish-e separates the output fields by a space, and places double-quotes around the document title. This output may be hard to parse, so it is recommended to use -d to specify a character or string used as a separator between fields.

    The string dq means "double-quotes".

        swish-e -w word -d ,    # single char
        swish-e -w word -d ::   # string
        swish-e -w word -d '"'  # double quotes under Unix
        swish-e -w word -d \"   # double quotes under Windows
        swish-e -w word -d dq   # double quotes

    The following control characters may also be specified: \t \r \n \f.

    Warning: This string is passed directly to sprintf() and therefore exposes a securty hole. Do not allow user data to set -d format strings directly.

  • -P *character*

    Sets the delimiter used for phrase searches. The default is double quotes ".

    Some examples under bash: (be careful about you shell metacharacters)

        swish-e -P ^ -w 'title=^words in a phrase^'
        swish-e -P \' -w "title='words in a pharse"'
  • -p *property1 property2 ...* (display properties)

    This causes swish to print the listed property in the search results. The properties are returned in the order they are listed in the -p argument.

    Properties are defined by the ProperNames directive in the configuration file (see SWISH-CONFIG) and properties must also be defined in MetaNames. Swish stores the text of the meta name as a property, and then will return this text while searching if this option is used.

    Properties are very useful for returning data included in a source documnet without having to re-read the source document while searching. For example, this could be used to return a short document description. See also see Document Summeries and PropertyNames in SWISH-CONFIG.

    To return the subject and category properties while indexing.

        swish-e -w word -p subject category

    Properties are returned in double quotes. If a property contains a double quote it is HTML escaped (&quot;). See the -x switch for a more advanced method of returning a list of properties.

    NOTE: it is necessary to have indexed with the proper PropertyNames directive in the user config file in order to use this option.

  • -s *property [asc|desc] ...* (sort)

    Normally, search results are printed out in order of relevancy, with the most relevant listed first. The -s sort switch allows you to sort results in order of a specified property, where a property was defined using the MetaNames and PropertyNames directives during indexing (see SWISH-CONFIG).

    The string passed can include the strings asc and desc to specify the sort order, and more than one property may be specified to sort on more than one key.

    Examples:

    sort by title property ascending order

        -s title

    sort descending by title, ascending by name

        -s title desc name asc

    Note: Swish limits sort keys to 100 characters. This limit can be changed by changing MAX_SORT_STRING_LEN in src/config.h and rebuilding swish-e.

  • -L limit to a range of property values (Limit)

    This is an experimental feature!

    The -L switch can be used to limit search results to a range of property values

    Example:

        swish-e -w foo -L swishtitle a m

    finds all documents that contain the word foo, and where the document's title is in the range of a to m, inclusive. By default, the case of the property is ignored, but this can be changed by using PropertyNamesCompareCase configuation directive.

    Limiting may be done with user-defined properties, as well.

    For example, if you indexed documents that contain a created timestamp in a meta tag:

        <meta name="created_on" content="982648324">

    Then you tell Swish that you have a property called created_on, and that it's a timestamp.

        PropertyNamesDate created_on

    After indexing you will be able to limit documents to a range of timestamps:

        -w foo -L created_on  946684800 949363199

    will find documents containing the word foo and that have a created_on date from the start of Jan 1, 2000 to the end of Jan 31, 2000.

    Note: swish currently does not parse dates; Unix timestamps must be used.

    Two special formats can be used:

        -L swishtitle <= m
        -L swishtitle >= m

    Finds titles less than or equal, or grater than or equal to the letter m.

    This feature will not work with swishrank or swishdbfile properties.

    This feature takes advantages of the pre-sorted tables built by swish during indexing to make this feature fast while searching. You should see in the indexing output a line such as:

       6 properties sorted.

    That indicates that six pre-sorted tables were built during indexing. By default, all properties are presorted while indexing. What properties are pre-sorted can be controlled by the configuration parameter PreSortedIndex.

    Using the -L switch on a property that was not pre-sorted will still work, but may be much slower during searching.

    Note that the PropertyNamesSortKeyLength setting is used for sorting properties. Using too small a PropertyNamesSortKeyLength could result in -L selecting the wrong properties due to incomplete sorting.

    This is an experimental feature, and its use and interface are subject to change.

  • -x formatstring (extended output format)

    The -x switch defines the output format string. The format string can contain plain text and property names (including swish-defined internal property names) and is used to generate the output for every result. In addition, the output format of the property name can be controlled with C-like printf format strings. This feature overrides the cmdline switches -d and -p, and a warning will be generated if -d or -p are used with -x.

    Warning: The format string (fmt) is passed directly to sprintf() and therefore exposes a securty hole. Do not allow user data to set -x format strings directly.

    For example, to return just the title, one per line, in the search results:

        swish-e  -w ...   -x '<swishtitle>\n' ...

    Note: the \n may need to be protected from your shell.

    See also ResultExtFormatName for a way to define named format strings in the swish configuration file.

    Format of "formatstring":

        "text<propertyname>text<propertyname fmt=propfmtstr>text..."

    Where propertyname is:

    • the name of a user property as specified with the config file directive "PropertyNames"

    • the name of a swish Auto property (see below). These properties are defined automatically by swish -- you do not need to specify them with PropertyNames directive. (This may change in the future.)

    propertynames must be placed within "<" and ">".

    User properties:

    Swish-e allows you to specify certain META tags within your documents that can be used as document properties. The contents of any META tag that has been identified as a document property can be returned as part of the search results. Doucment properties must be defined while indexing using the PropertyNames configuration directive (see SWISH-CONFIG).

    Examples of user-defined PropertyNames:

        <keywords>
        <author>
        <deliveredby>
        <reference>
        <id>

    Auto properties:

    Swish defines a number of "Auto" properties for each document indexed. These are available for output when using the -x format.

        Name               Type     Contents
        --------------     -------  ----------------------------------------------
        swishreccount      Integer  Result record counter
        swishtitle         String   Document title
        swishrank          Integer  Result rank for this hit
        swishdocpath       String   URL or filepath to document
        swishdocsize       Integer  Document size in bytes
        swishlastmodified  Date     Last modified date of document
        swishdescription   String   Description of document (see:StoreDescription)
        swishdbfile        String   Path of swish database indexfile

    The Auto properties can also be specified using shortcuts:

        Shortcut    Property Name        
        --------    --------------
          %c        swishreccount
          %d        swishdescription
          %D        swishlastmodified
          %I        swishdbfile
          %p        swishdocpath
          %r        swishrank
          %l        swishdocsize
          %t        swishtitle

    For example, these are equivalent:

       -x '<swishrank>:<swishdocpath>:<swishtitle>\n'
       -x '%r:%p:%t\n'

    Use a double percent sign "%%" to enter a literal percent sign in the output.

    Formatstrings of properties:

    Properties listed in an -x format string can include format control strings. These "propertyformats" are used to control how the contents of the associated property are printed. Property formats are used like C-language printf formats. The property format is specified by including the attribute "fmt" within the property tag.

    Format strings cannot be used with the "%" shortcuts described above.

    General syntax:

        -x '<propertyname fmt="propfmtstr">'

    where subfmt controls the output format of propertyname.

    Examples of property format strings:

            date type:    <swishlastmodified fmt="%d.%m.%Y">
            string type:  <swishtitle fmt="%-40.35s">
            integer type: <swishreccount fmt=/%8.8d/>

    Please see the manual pages for strftime(3) and sprintf(3) for an explanation of format strings. Note: some versions of strftime do not offer the %s format string (number of seconds since the Epoch), so swish provides a special format string "%ld" to display the number of seconds since the Epoch.

    The first character of a property format string defines the delimiter for the format string. For example,

        -x  "<author  fmt=[%20s]> ...\n"
        -x  "<author  fmt='%20s'> ...\n"
        -x  "<author  fmt=/%20s/> ...\n"

    Standard predefined formats:

    If you ommit the sub-format, the following formats are used:

        String type:       "%s"  (like printf char *)
        Integer type:      "%d"  (like printf int)
        Float type:        "%f"  (like printf double) 
        Date type:         "%Y-%m-%d %H:%M:%S" (like strftime)

    Text in "formatstring" or "propfmtstr":

    Text will be output as-is in format strings (and property format strings). Special characters can be escaped with a backslash. To get a new line for each result hit, you have to include the Newline-Character "\n" at the end of "fmtstr".

        -x "<swishreccount>|<swishrank>|<swishdocpath>\n"
        -x "Count=<swishreccount>, Rank=<swishrank>\n"
        -x "Title=\<b\><swishtitle>\</b\>"
        -x 'Date: <swishlastmodified fmt="%m/%d/%Y">\n'
        -x 'Date in seconds: <swishlastmodified fmt=/%ld/>\n'

    Control/Escape charcters:

    you can use C-like control escapes in the format string:

       known controls:      \a, \b, \f, \n, \r, \t, \v,
       digit escapes:       \xhexdigits   \0octaldigits
       character escapes:   \anychar  

    Example,

        swish -x "%c\t%r\t%p\t\"<swishtitle fmt=/%40s/>\"\n"

    Examples of -x format strings:

        -x "%c|%r|%p|%t|%D|%d\n"
        -x "%c|%r|%p|%t|<swishdate fmt=/%A, %d. %B %Y/>|%d\n"
        -x "<swishrank>\t<swishdocpath>\t<swishtitle>\t<keywords>\n
        -x "xml_out: \<title\><swishtitle>\>\</title\>\n"
        -x "xml_out: <swishtitle fmt='<title>%s</title>'>\n"
  • -H [0|1|2|3|<n>] (header output verbosity)

    The -H n switch generates extened header output. This is most useful when searching more than one index file at a time by specifying more than one index file with the -f switch. -H 2 will generate a set of headers specific to each index file. This gives access to the settings used to generate each index file.

    Even when searching a single index file, -H n will provided additional information about the index file, how it was indexed, and how swish is interperting the query.

        -H 0 : print no header information, output only search result entries.
        -H 1 : print standard result header (default).
        -H 2 : print additional header information for each searched index file.
        -H 3 : enhanced header output (e.g. print stopwords).
        -H 9 : print diagnostic information in the header of the results (changed from: C<-v 4>)
  • -R [0|1] (Ranking Scheme)

    This is an experimental feature!

    The default ranking scheme in SWISH-E evaluates each word in a query in terms of its frequency and position in each document. The default scheme is 0.

    New in version 2.4.3 you may optionally select an experimental ranking scheme that, in addition to document frequency and position, uses Inverse Document Frequency (IDF), or the relative frequency of each word across all the indexes being searched, and Relative Density, or the normalization of the frequency of a word in relationship to the number of words in the document.

    NOTE: IgnoreTotalWordCountWhenRanking must be set to no or 0 in your index(es) for -R 1 to work.

    Specify -R 1 to turn on IDF ranking. See the API documentation for how to set the ranking scheme in your Perl or C program.

OTHER SWITCHES

  • -V (version)

    Print the current version.

  • -k *letter* (print out keywords)

    The -k switch is used for testing and will cause swish to print out all keywords in the index beginning with that letter. You may enter -k '*' to generate a list of all words indexed by swish.

  • -D *index file* (debug index)

    The -D option is no longer supported in version 2.2.

  • -T *options* (trace/debug swish)

    The -T option is used to print out information that may be helpful when debugging swish-e's operation. This option replaced the -D option of previous versions.

    Running -T help will print out a list of available *options*

Merging Index Files

In previous versions of Swish-e indexing would require a very large amount of memory and the indexing process could be very slow. Merging provided a way to index in chunks and then combine the indexes together into a single index.

Indexing is much faster now and uses much less memory, and with the -e switch very little memory is needed to index a large site.

Still, at times it can be useful to merge different index files into one file for searching. This could be because you want to keep separate site indexes and a common one for a global search, or you have separate collections of documents that you wish to search all at one time, but manage separately.

  • -M *index1 index2 ... indexN out_index

    Merges the indexes specified on the command line -- the last file name entered is the output file. The output index must not exist (otherwise merge will not proceed).

    Only indexes that were indexed with common settings may be merged. (e.g. don't mix stemming and non-stemming indexes, or indexes with different WordCharacter settings, etc.).

    Use the -e switch while merging to reduce memory usage.

    Merge generates progress messages regardless of the setting of -v.

  • -c *configuration file*

    Specify a configuration file while indexing to add administrative information to the output index file.

Document Info

$Id: SWISH-RUN.pod 1741 2005-05-17 02:22:40Z karman $

.

swish-e-2.4.7/filters/0000777000077100017500000000000011166013167011564 500000000000000swish-e-2.4.7/filters/SWISH/0000777000077100017500000000000011166013167012461 500000000000000swish-e-2.4.7/filters/SWISH/Makefile.am0000664000077100017500000000136711166010112014425 00000000000000perlmoduledir = $(libexecdir)/perl/SWISH perlmodule_SCRIPTS = Filter.pm CLEANFILES = Filter.pm # This is done here to stay in the GNU coding standards # libexecdir can be modified at make time, so can't use # variable substitution at configure time Filter.pm: Filter.pm.in @rm -f Filter.pm @sed \ -e 's,@@mylibexecdir@@,$(libexecdir),' \ $(srcdir)/Filter.pm.in > Filter.pm nobase_perlmodule_SCRIPTS = \ Filters/Doc2txt.pm \ Filters/Doc2html.pm \ Filters/Pdf2HTML.pm \ Filters/ID3toHTML.pm \ Filters/XLtoHTML.pm \ Filters/pp2html.pm EXTRA_DIST = \ Filter.pm.in \ Filters/Doc2txt.pm \ Filters/Doc2html.pm \ Filters/Pdf2HTML.pm \ Filters/ID3toHTML.pm \ Filters/XLtoHTML.pm \ Filters/pp2html.pm swish-e-2.4.7/filters/SWISH/Filters/0000777000077100017500000000000011166013170014063 500000000000000swish-e-2.4.7/filters/SWISH/Filters/ID3toHTML.pm0000664000077100017500000001365011166010112015744 00000000000000package SWISH::Filters::ID3toHTML; use strict; #use MP3::Tag; #use HTML::Entities; use vars qw/ @ISA $VERSION /; $VERSION = '0.03'; @ISA = ('SWISH::Filter'); # Convert known ID3v2 tags to metanames. my %id3v2_tags = ( TIT2 => 'song', # 4.2.1 TIT2 Title/songname/content description TYER => 'year', # 4.2.1 TYER Year TRCK => 'track', # 4.2.1 TRCK Track number/Position in set TCOP => 'copyright', # 4.2.1 TCOP Copyright message # * WinAMP seems to prepend a (C) to this value. TPE1 => 'artist', # 4.2.1 TPE1 Lead performer(s)/Soloist(s) TALB => 'album', # 4.2.1 TALB Album/Movie/Show title TENC => 'encoded', # 4.2.1 TENC Encoded by TOPE => 'artist_original', # 4.2.1 TOPE Original artist(s)/performer(s) TCOM => 'composer', # 4.2.1 TCOM Composer TCON => 'genre', # 4.2.1 TCON Content type # 4.3.2 WXXX User defined URL link frame WXXX_URL => 'url', # * URL => http://URL/HERE WXXX_Description => 'url_description', # * Description => WinAMP provides no description # 4.11 COMM Comments COMM_Text => 'comment', # * Text => COMMENT COMM_Language => 'comment_lang', # * Language => eng COMM_short => 'comment_short' # * short => WinAMP provides no short ); sub new { my ( $class ) = @_; my $self = bless { mimetypes => [ qr!audio/mpeg! ], }, $class; return $self->use_modules( qw/ MP3::Tag HTML::Entities / ); } sub filter { my ( $self, $doc ) = @_; # We need a file name to pass to the conversion function my $file = $doc->fetch_filename; my $content_ref = get_id3_content_ref( $file ); return unless $content_ref; # update the document's content type $doc->set_content_type( 'text/html' ); # If filtered must return either a reference to the doc or a pathname. return \$content_ref; } # ======================================================================= sub get_id3_content_ref { my $filename = shift; my $mp3 = MP3::Tag->new($filename); # return unless we have a file with tags return format_empty_doc( $filename ) unless ref $mp3 && $mp3->get_tags(); # Here we will store all of the tag info my %metadata; # Convert tags to metadata giving ID3v2 precedence get_id3v1_tags($mp3, \%metadata); get_id3v2_tags($mp3, \%metadata); # will replace any v1 tags that are the same # HTML or bust return %metadata ? format_as_html( \%metadata ) : format_empty_doc( $filename ); } sub get_id3v1_tags { my ($mp3, $metadata) = @_; return unless exists $mp3->{ID3v1}; # Read all ID3v1 tags into metadata hash my $id3v1 = $mp3->{ID3v1}; for ( qw/ artist album comment genre song track year / ) { $metadata->{$_} = $id3v1->$_ if $id3v1->$_; } } sub get_id3v2_tags { my ($mp3, $metadata) = @_; # Do we even have an ID3 v2 tag? return unless exists $mp3->{ID3v2}; # Get the tag and a hash of frame ids. my $id3v2 = $mp3->{ID3v2}; # keys are 4-character-codes and values are the long names my $frameIDs_hash = $id3v2->get_frame_ids; # Go through each frame and translate it to usable metadata foreach my $frame (keys %$frameIDs_hash) { my ($info, $name) = $id3v2->get_frame($frame); # We have a user defined frame if (ref $info) { # $$$ We really only want COMM and WXXX while(my ($key,$val)=each %$info) { next if $key =~ /^_/ || !$val; # leading underscore means binary data # Concatenate frame and key for our lookup hash my $code = ${frame} . "_" .${key}; # fails when frame is appended with digits (e.g. "COMM01"); my $metaname = $id3v2_tags{$code} || $code; # Assign value if not empty and has a key $metadata->{$metaname} = encode_entities($val) if $val; } } # We have a simple frame else { my $metaname = $id3v2_tags{$frame} || $frame || 'blank frame'; $metadata->{$metaname} = encode_entities($info) unless !$info; } } } sub format_as_html { my $metadata = shift; my $title = $metadata->{song} || $metadata->{album} || $metadata->{artist} || 'No Title'; my $metas = join "\n", map { qq[] } sort keys %$metadata; my $url = ''; if ( $metadata->{url} ) { my $desc = $metadata->{url_description} || $metadata->{url}; $url = qq[

$desc]; } my $comment = ''; if ( $metadata->{comment} ) { my $lang = get_iso_lang($metadata->{comment_lang} || 'en'); # wrong assuming "en"? $comment = qq[

$metadata->{comment}

]; } return < $title $metas $url $comment EOF } sub format_empty_doc { my $filename = shift; require File::Basename; my $base = File::Basename::basename( $filename, '.mp3' ); return format_as_html( { song => $base, notag => 1 } ); } sub get_iso_lang { my $lang = shift; # Do we need to translate undocumented ID3 Lang codes to ISO? # 4.11.Comments # Language $xx xx xx # * WinAMP may be mistaken for using "eng" instead of an ISO designator return $lang unless $lang == "eng"; return "en"; } 1; __END__ =head1 NAME SWISH::Filters::ID3toHTML - ID3 tag to HTML filter module =head1 DESCRIPTION SWISH::Filters::ID3toHTML translates ID3 tags into HTML metadata for use by the SWISH::Filter module and SWISH-E. Depends on two perl modules: MP3::Tag; HTML::Entities; =head1 SUPPORT Please contact the Swish-e discussion list. http://swish-e.org/ =cut swish-e-2.4.7/filters/SWISH/Filters/XLtoHTML.pm0000664000077100017500000000656211166010112015714 00000000000000package SWISH::Filters::XLtoHTML; use strict; require File::Spec; use vars qw/ $VERSION /; $VERSION = '0.02'; sub new { my ( $class ) = @_; my $self = bless { mimetypes => [ qr!application/vnd.ms-excel!, qr!application/excel!, ], }, $class; return $self->use_modules( qw/ Spreadsheet::ParseExcel HTML::Entities / ); } sub filter { my ( $self, $doc ) = @_; # We need a file name to pass to the conversion function my $file = $doc->fetch_filename; my $content_ref = get_xls_content_ref( $file ) || return; # update the document's content type $doc->set_content_type( 'text/html' ); # If filtered must return either a reference to the doc or a pathname. return \$content_ref; } sub get_xls_content_ref { my $file = shift; my $oExcel = Spreadsheet::ParseExcel->new; return unless $oExcel; my $oBook = $oExcel->Parse($file) || return; my($iR, $iC, $oWkS, $oWkC, $ExcelWorkBook); # Here we gather up all the workbook metadata my ($vol,$dirs,$filename) = File::Spec->splitpath( $oBook->{File} ); my $ExcelFilename = encode_entities( $filename ); my $ExcelSheetCount = encode_entities($oBook->{SheetCount}); my $ExcelAuthor = encode_entities($oBook->{Author}) if defined $oBook->{Author}; my $ExcelVersion = encode_entities($oBook->{Version}) if defined $oBook->{Version}; # Name of the first worksheet my $ExcelFirstWorksheetName = encode_entities($oBook->{Worksheet}[0]->{Name}); my $ReturnValue = < $ExcelFirstWorksheetName - $ExcelFilename v.$ExcelVersion EOF # Here we collect content from each worksheet for(my $iSheet=0; $iSheet < $oBook->{SheetCount} ; $iSheet++) { # For each Worksheet do the following $oWkS = $oBook->{Worksheet}[$iSheet]; # Name of the worksheet my $ExcelWorkSheet = "

" . encode_entities($oWkS->{Name}) . "

\n"; $ExcelWorkSheet .= "\n"; for(my $iR = $oWkS->{MinRow} ; defined $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ; $iR++) { # For each row do the following $ExcelWorkSheet .= "\n"; for(my $iC = $oWkS->{MinCol} ; defined $oWkS->{MaxCol} && $iC <= $oWkS->{MaxCol} ; $iC++) { # For each cell do the following $oWkC = $oWkS->{Cells}[$iR][$iC]; my $CellData = encode_entities($oWkC->Value) if($oWkC); $ExcelWorkSheet .= "\t\n" if $CellData; } $ExcelWorkSheet .= "\n"; # Our last duty $ExcelWorkBook .= $ExcelWorkSheet; $ExcelWorkSheet = ""; } $ExcelWorkBook .= "
" . $CellData . "
\n"; } $ReturnValue .= < $ExcelWorkBook EOF return $ReturnValue; } __END__ =head1 NAME SWISH::Filters::XLtoHTML - MS Excel to HTML filter module =head1 DESCRIPTION SWISH::Filters::XLtoHTML extracts data from MS Excel spreadsheets for indexing. Depends on two perl modules: Spreadsheet::ParseExcel HTML::Entities; =head1 SUPPORT Please contact the Swish-e discussion list. http://swish-e.org/ =cut swish-e-2.4.7/filters/SWISH/Filters/Doc2html.pm0000664000077100017500000000225711166010112016012 00000000000000package SWISH::Filters::Doc2html; use strict; use vars qw/ $VERSION /; $VERSION = '0.02'; sub new { my ( $class ) = @_; my $self = bless { mimetypes => [ qr!application/(x-)?msword! ], # list of types this filter handles }, $class; return $self->set_programs( 'wvWare' ); } sub filter { my ( $self, $doc ) = @_; # Grab output from running program my $content = $self->run_wvWare( "-1", $doc->fetch_filename ) || return; # update the document's content type $doc->set_content_type( 'text/html' ); # return the document return \$content; } 1; __END__ =head1 NAME SWISH::Filters::Doc2html - Perl extension for filtering MSWord documents with Swish-e =head1 DESCRIPTION This is a plug-in module that uses the "wvware" program to convert MS Word documents to HTML for indexing by Swish-e. "wvware" can be downloaded from: http://wvware.sourceforge.net/ The program "wvware" must be installed and in your PATH before running Swish-e. This has been tested only under Win32- binary package from http://gnuwin32.sourceforge.net/packages/wv.htm Tested with Debian Linux and wvWare wvWare 0.7.3. =head1 SEE ALSO L swish-e-2.4.7/filters/SWISH/Filters/Doc2txt.pm0000664000077100017500000000272111166010112015661 00000000000000package SWISH::Filters::Doc2txt; use strict; use vars qw/ $VERSION /; $VERSION = '0.02'; sub new { my ( $class ) = @_; my $self = bless { mimetypes => [ qr!application/(x-)?msword! ], # list of types this filter handles priority => 50, # Make this a higher number (lower priority than the wvware filter }, $class; # check for helpers return $self->set_programs( 'catdoc' ); } sub filter { my ( $self, $doc ) = @_; my $content = $self->run_catdoc( $doc->fetch_filename ) || return; # update the document's content type $doc->set_content_type( 'text/plain' ); # return the document return \$content; } 1; __END__ =head1 NAME SWISH::Filters::Doc2txt - Perl extension for filtering MSWord documents with Swish-e =head1 DESCRIPTION This is a plug-in module that uses the "catdoc" program to convert MS Word documents to text for indexing by Swish-e. "catdoc" can be downloaded from: http://www.ice.ru/~vitus/catdoc/ver-0.9.html The program "catdoc" must be installed and your PATH before running Swish-e. =head1 BUGS This filter does not specify input or output character encodings. This will change in the future to all use of the user_data to set the encoding. A minor optimization during spidering (i.e. when docs are in memory instead of on disk) would be to use open2() call to let catdoc read from stdin instead of from a file. =head1 AUTHOR Bill Moseley =head1 SEE ALSO L swish-e-2.4.7/filters/SWISH/Filters/Pdf2HTML.pm0000664000077100017500000000615711166010112015621 00000000000000package SWISH::Filters::Pdf2HTML; use strict; use vars qw/ $VERSION /; $VERSION = '0.02'; sub new { my ( $class ) = @_; my $self = bless { mimetypes => [ qr!application/pdf! ], }, $class; return $self->set_programs( qw/ pdftotext pdfinfo / ); } sub filter { my ( $self, $doc ) = @_; my $user_data = $doc->user_data; my $title_tag = $user_data->{pdf}{title_tag} if ref $user_data eq 'HASH'; my $file = $doc->fetch_filename; my $metadata = $self->get_pdf_headers( $file ); my $headers = format_metadata( $metadata ); if ( $title_tag && exists $metadata->{ $title_tag } ) { my $title = escapeXML( $metadata->{ $title_tag } ); $headers = "$title\n" . $headers } # Check for encrypted content my $content_ref; # patch provided by Martial Chartoire if ( $metadata->{encrypted} && $metadata->{encrypted} =~ /yes\.*\scopy:no\s\.*/i ) { $content_ref = \''; } else { $content_ref = $self->get_pdf_content_ref( $file ); } # update the document's content type $doc->set_content_type( 'text/html' ); my $txt = < $headers
$$content_ref
EOF return \$txt; } sub get_pdf_headers { my ($self, $file ) = @_; # We need a file name to pass to the pdf conversion programs my %metadata; my $headers = $self->run_pdfinfo( $file ); return \%metadata unless $headers; for (split /\n/, $headers ) { if ( /^\s*([^:]+):\s+(.+)$/ ) { my ( $metaname, $value ) = ( lc( $1 ), $2 ); $metaname =~ tr/ /_/; $metadata{$metaname} = $value; } } return \%metadata; } sub format_metadata { my $metadata = shift; my $metas = join "\n", map { qq['; #' } sort keys %$metadata; return $metas; } sub get_pdf_content_ref { my ( $self, $file ) = @_; my $content = escapeXML( $self->run_pdftotext( $file, '-' ) ); return \$content; } # How are URLs printed with pdftotext? sub escapeXML { my $str = shift; return '' unless $str; for ( $str ) { s/&/&/go; s/"/"/go; s//>/go; } return $str; } 1; __END__ =head1 NAME SWISH::Filters::Pdf2HTML - Perl extension for filtering PDF documents with Swish-e =head1 DESCRIPTION This is a plug-in module that uses the xpdf package to convert PDF documents to html for indexing by Swish-e. Any info tags found in the PDF document are created as meta tags. This filter plug-in requires the xpdf package available at: http://www.foolabs.com/xpdf/ You may pass into SWISH::Filter's new method a tag to use as the html if found in the PDF info tags: my %user_data; $user_data{pdf}{title_tag} = 'title'; $was_filtered = $filter->filter( document => $filename, user_data => \%user_data, ); Then if a PDF info tag of "title" is found that will be used as the HTML <title>. =head1 AUTHOR Bill Moseley =head1 SEE ALSO L<SWISH::Filter> �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/filters/SWISH/Filters/pp2html.pm������������������������������������������������������0000664�0000771�0001750�00000002432�11166010112�015717� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������package SWISH::Filters::pp2html; use strict; use vars qw/ $VERSION /; $VERSION = '0.01'; require File::Spec; sub new { my ( $class ) = @_; my $self = bless { mimetypes => [ qr!application/vnd.ms-powerpoint! ], }, $class; return $self->set_programs( 'ppthtml' ); } sub filter { my ( $self, $doc ) = @_; my $content = $self->run_ppthtml( $doc->fetch_filename ) || return; # use just the file name as title with no path my ($title) = ( $content =~ m!<title>(.*?)!io ); my ($volume,$directories,$file) = File::Spec->splitpath( $title ); $content =~ s,.*?,$file,i; # update the document's content type $doc->set_content_type( 'text/html' ); return \$content; } 1; __END__ =head1 NAME SWISH::Filters::pp2html - Perl extension for filtering MS PowerPoint documents with Swish-e =head1 DESCRIPTION This is a plug-in module that uses the xlhtml package to convert MS PowerPoint documents to html for indexing by Swish-e. This filter plug-in requires the xlhtml package which includes ppthtml available at: http://chicago.sourceforge.net/xlhtml Currently produces document titles like /tmp/foo1234. Need to alter to pass actual document title. =head1 AUTHOR Randy Thomas =head1 SEE ALSO L swish-e-2.4.7/filters/SWISH/Makefile.in0000664000077100017500000003120211166010112014425 00000000000000# Makefile.in generated by automake 1.9.6 from Makefile.am. # @configure_input@ # Copyright (C) 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, # 2003, 2004, 2005 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. @SET_MAKE@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = ../.. am__cd = CDPATH="$${ZSH_VERSION+.}$(PATH_SEPARATOR)" && cd INSTALL = @INSTALL@ install_sh_DATA = $(install_sh) -c -m 644 install_sh_PROGRAM = $(install_sh) -c install_sh_SCRIPT = $(install_sh) -c INSTALL_HEADER = $(INSTALL_DATA) transform = $(program_transform_name) NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : build_triplet = @build@ host_triplet = @host@ subdir = filters/SWISH DIST_COMMON = $(srcdir)/Makefile.am $(srcdir)/Makefile.in ACLOCAL_M4 = $(top_srcdir)/aclocal.m4 am__aclocal_m4_deps = $(top_srcdir)/config/acinclude.m4 \ $(top_srcdir)/configure.in am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \ $(ACLOCAL_M4) mkinstalldirs = $(SHELL) $(top_srcdir)/config/mkinstalldirs CONFIG_HEADER = $(top_builddir)/src/acconfig.h CONFIG_CLEAN_FILES = am__vpath_adj_setup = srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; am__vpath_adj = case $$p in \ $(srcdir)/*) f=`echo "$$p" | sed "s|^$$srcdirstrip/||"`;; \ *) f=$$p;; \ esac; am__strip_dir = `echo $$p | sed -e 's|^.*/||'`; am__installdirs = "$(DESTDIR)$(perlmoduledir)" \ "$(DESTDIR)$(perlmoduledir)" nobase_perlmoduleSCRIPT_INSTALL = $(install_sh_SCRIPT) perlmoduleSCRIPT_INSTALL = $(INSTALL_SCRIPT) SCRIPTS = $(nobase_perlmodule_SCRIPTS) $(perlmodule_SCRIPTS) SOURCES = DIST_SOURCES = DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST) ACLOCAL = @ACLOCAL@ ALLOCA = @ALLOCA@ AMDEP_FALSE = @AMDEP_FALSE@ AMDEP_TRUE = @AMDEP_TRUE@ AMTAR = @AMTAR@ AR = @AR@ AS = @AS@ AUTOCONF = @AUTOCONF@ AUTOHEADER = @AUTOHEADER@ AUTOMAKE = @AUTOMAKE@ AWK = @AWK@ BTREE_OBJS = @BTREE_OBJS@ BUILDDOCS_FALSE = @BUILDDOCS_FALSE@ BUILDDOCS_TRUE = @BUILDDOCS_TRUE@ CC = @CC@ CCDEPMODE = @CCDEPMODE@ CFLAGS = @CFLAGS@ CPP = @CPP@ CPPFLAGS = @CPPFLAGS@ CXX = @CXX@ CXXCPP = @CXXCPP@ CXXDEPMODE = @CXXDEPMODE@ CXXFLAGS = @CXXFLAGS@ CYGPATH_W = @CYGPATH_W@ DEFS = @DEFS@ DEPDIR = @DEPDIR@ DLLTOOL = @DLLTOOL@ ECHO = @ECHO@ ECHO_C = @ECHO_C@ ECHO_N = @ECHO_N@ ECHO_T = @ECHO_T@ EGREP = @EGREP@ EXEEXT = @EXEEXT@ F77 = @F77@ FFLAGS = @FFLAGS@ INSTALLDOCS_FALSE = @INSTALLDOCS_FALSE@ INSTALLDOCS_TRUE = @INSTALLDOCS_TRUE@ INSTALL_DATA = @INSTALL_DATA@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@ LARGEFILES_MACROS = @LARGEFILES_MACROS@ LDFLAGS = @LDFLAGS@ LIBOBJS = @LIBOBJS@ LIBS = @LIBS@ LIBTOOL = @LIBTOOL@ LIBXML2_CFLAGS = @LIBXML2_CFLAGS@ LIBXML2_LIB = @LIBXML2_LIB@ LIBXML2_OBJS = @LIBXML2_OBJS@ LIBXML_REQUIRED_VERSION = @LIBXML_REQUIRED_VERSION@ LN_S = @LN_S@ LTLIBOBJS = @LTLIBOBJS@ MAINT = @MAINT@ MAINTAINER_MODE_FALSE = @MAINTAINER_MODE_FALSE@ MAINTAINER_MODE_TRUE = @MAINTAINER_MODE_TRUE@ MAKEINFO = @MAKEINFO@ OBJDUMP = @OBJDUMP@ OBJEXT = @OBJEXT@ PACKAGE = @PACKAGE@ PACKAGE_BUGREPORT = @PACKAGE_BUGREPORT@ PACKAGE_NAME = @PACKAGE_NAME@ PACKAGE_STRING = @PACKAGE_STRING@ PACKAGE_TARNAME = @PACKAGE_TARNAME@ PACKAGE_VERSION = @PACKAGE_VERSION@ PATH_SEPARATOR = @PATH_SEPARATOR@ PCRE_CFLAGS = @PCRE_CFLAGS@ PCRE_CONFIG = @PCRE_CONFIG@ PCRE_LIBS = @PCRE_LIBS@ PCRE_REQUIRED_VERSION = @PCRE_REQUIRED_VERSION@ PERL = @PERL@ POD2MAN = @POD2MAN@ RANLIB = @RANLIB@ SET_MAKE = @SET_MAKE@ SHELL = @SHELL@ STRIP = @STRIP@ SWISH_WEB = @SWISH_WEB@ VERSION = @VERSION@ XML2_CONFIG = @XML2_CONFIG@ Z_CFLAGS = @Z_CFLAGS@ Z_LIBS = @Z_LIBS@ ac_ct_AR = @ac_ct_AR@ ac_ct_AS = @ac_ct_AS@ ac_ct_CC = @ac_ct_CC@ ac_ct_CXX = @ac_ct_CXX@ ac_ct_DLLTOOL = @ac_ct_DLLTOOL@ ac_ct_F77 = @ac_ct_F77@ ac_ct_OBJDUMP = @ac_ct_OBJDUMP@ ac_ct_RANLIB = @ac_ct_RANLIB@ ac_ct_STRIP = @ac_ct_STRIP@ am__fastdepCC_FALSE = @am__fastdepCC_FALSE@ am__fastdepCC_TRUE = @am__fastdepCC_TRUE@ am__fastdepCXX_FALSE = @am__fastdepCXX_FALSE@ am__fastdepCXX_TRUE = @am__fastdepCXX_TRUE@ am__include = @am__include@ am__leading_dot = @am__leading_dot@ am__quote = @am__quote@ am__tar = @am__tar@ am__untar = @am__untar@ bindir = @bindir@ build = @build@ build_alias = @build_alias@ build_cpu = @build_cpu@ build_os = @build_os@ build_vendor = @build_vendor@ datadir = @datadir@ exec_prefix = @exec_prefix@ host = @host@ host_alias = @host_alias@ host_cpu = @host_cpu@ host_os = @host_os@ host_vendor = @host_vendor@ includedir = @includedir@ infodir = @infodir@ install_sh = @install_sh@ libdir = @libdir@ libexecdir = @libexecdir@ localstatedir = @localstatedir@ mandir = @mandir@ mkdir_p = @mkdir_p@ oldincludedir = @oldincludedir@ prefix = @prefix@ program_transform_name = @program_transform_name@ sbindir = @sbindir@ sharedstatedir = @sharedstatedir@ sysconfdir = @sysconfdir@ target_alias = @target_alias@ perlmoduledir = $(libexecdir)/perl/SWISH perlmodule_SCRIPTS = Filter.pm CLEANFILES = Filter.pm nobase_perlmodule_SCRIPTS = \ Filters/Doc2txt.pm \ Filters/Doc2html.pm \ Filters/Pdf2HTML.pm \ Filters/ID3toHTML.pm \ Filters/XLtoHTML.pm \ Filters/pp2html.pm EXTRA_DIST = \ Filter.pm.in \ Filters/Doc2txt.pm \ Filters/Doc2html.pm \ Filters/Pdf2HTML.pm \ Filters/ID3toHTML.pm \ Filters/XLtoHTML.pm \ Filters/pp2html.pm all: all-am .SUFFIXES: $(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am $(am__configure_deps) @for dep in $?; do \ case '$(am__configure_deps)' in \ *$$dep*) \ cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh \ && exit 0; \ exit 1;; \ esac; \ done; \ echo ' cd $(top_srcdir) && $(AUTOMAKE) --foreign filters/SWISH/Makefile'; \ cd $(top_srcdir) && \ $(AUTOMAKE) --foreign filters/SWISH/Makefile .PRECIOUS: Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status @case '$?' in \ *config.status*) \ cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh;; \ *) \ echo ' cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe)'; \ cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe);; \ esac; $(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh $(top_srcdir)/configure: @MAINTAINER_MODE_TRUE@ $(am__configure_deps) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh $(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh install-nobase_perlmoduleSCRIPTS: $(nobase_perlmodule_SCRIPTS) @$(NORMAL_INSTALL) test -z "$(perlmoduledir)" || $(mkdir_p) "$(DESTDIR)$(perlmoduledir)" @$(am__vpath_adj_setup) \ list='$(nobase_perlmodule_SCRIPTS)'; for p in $$list; do \ $(am__vpath_adj) p=$$f; \ if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \ if test -f $$d$$p; then \ f=`echo "$$p" | sed 's|^.*/||;$(transform)'`; \ f=`echo "$$p" | sed 's|[^/]*$$||'`"$$f"; \ echo " $(nobase_perlmoduleSCRIPT_INSTALL) '$$d$$p' '$(DESTDIR)$(perlmoduledir)/$$f'"; \ $(nobase_perlmoduleSCRIPT_INSTALL) "$$d$$p" "$(DESTDIR)$(perlmoduledir)/$$f"; \ else :; fi; \ done uninstall-nobase_perlmoduleSCRIPTS: @$(NORMAL_UNINSTALL) @$(am__vpath_adj_setup) \ list='$(nobase_perlmodule_SCRIPTS)'; for p in $$list; do \ $(am__vpath_adj) p=$$f; \ f=`echo "$$p" | sed 's|^.*/||;$(transform)'`; \ f=`echo "$$p" | sed 's|[^/]*$$||'`"$$f"; \ echo " rm -f '$(DESTDIR)$(perlmoduledir)/$$f'"; \ rm -f "$(DESTDIR)$(perlmoduledir)/$$f"; \ done install-perlmoduleSCRIPTS: $(perlmodule_SCRIPTS) @$(NORMAL_INSTALL) test -z "$(perlmoduledir)" || $(mkdir_p) "$(DESTDIR)$(perlmoduledir)" @list='$(perlmodule_SCRIPTS)'; for p in $$list; do \ if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \ if test -f $$d$$p; then \ f=`echo "$$p" | sed 's|^.*/||;$(transform)'`; \ echo " $(perlmoduleSCRIPT_INSTALL) '$$d$$p' '$(DESTDIR)$(perlmoduledir)/$$f'"; \ $(perlmoduleSCRIPT_INSTALL) "$$d$$p" "$(DESTDIR)$(perlmoduledir)/$$f"; \ else :; fi; \ done uninstall-perlmoduleSCRIPTS: @$(NORMAL_UNINSTALL) @list='$(perlmodule_SCRIPTS)'; for p in $$list; do \ f=`echo "$$p" | sed 's|^.*/||;$(transform)'`; \ echo " rm -f '$(DESTDIR)$(perlmoduledir)/$$f'"; \ rm -f "$(DESTDIR)$(perlmoduledir)/$$f"; \ done mostlyclean-libtool: -rm -f *.lo clean-libtool: -rm -rf .libs _libs distclean-libtool: -rm -f libtool uninstall-info-am: tags: TAGS TAGS: ctags: CTAGS CTAGS: distdir: $(DISTFILES) $(mkdir_p) $(distdir)/Filters @srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; \ topsrcdirstrip=`echo "$(top_srcdir)" | sed 's|.|.|g'`; \ list='$(DISTFILES)'; for file in $$list; do \ case $$file in \ $(srcdir)/*) file=`echo "$$file" | sed "s|^$$srcdirstrip/||"`;; \ $(top_srcdir)/*) file=`echo "$$file" | sed "s|^$$topsrcdirstrip/|$(top_builddir)/|"`;; \ esac; \ if test -f $$file || test -d $$file; then d=.; else d=$(srcdir); fi; \ dir=`echo "$$file" | sed -e 's,/[^/]*$$,,'`; \ if test "$$dir" != "$$file" && test "$$dir" != "."; then \ dir="/$$dir"; \ $(mkdir_p) "$(distdir)$$dir"; \ else \ dir=''; \ fi; \ if test -d $$d/$$file; then \ if test -d $(srcdir)/$$file && test $$d != $(srcdir); then \ cp -pR $(srcdir)/$$file $(distdir)$$dir || exit 1; \ fi; \ cp -pR $$d/$$file $(distdir)$$dir || exit 1; \ else \ test -f $(distdir)/$$file \ || cp -p $$d/$$file $(distdir)/$$file \ || exit 1; \ fi; \ done check-am: all-am check: check-am all-am: Makefile $(SCRIPTS) installdirs: for dir in "$(DESTDIR)$(perlmoduledir)" "$(DESTDIR)$(perlmoduledir)"; do \ test -z "$$dir" || $(mkdir_p) "$$dir"; \ done install: install-am install-exec: install-exec-am install-data: install-data-am uninstall: uninstall-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am installcheck: installcheck-am install-strip: $(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \ install_sh_PROGRAM="$(INSTALL_STRIP_PROGRAM)" INSTALL_STRIP_FLAG=-s \ `test -z '$(STRIP)' || \ echo "INSTALL_PROGRAM_ENV=STRIPPROG='$(STRIP)'"` install mostlyclean-generic: clean-generic: -test -z "$(CLEANFILES)" || rm -f $(CLEANFILES) distclean-generic: -test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES) maintainer-clean-generic: @echo "This command is intended for maintainers to use" @echo "it deletes files that may require special tools to rebuild." clean: clean-am clean-am: clean-generic clean-libtool mostlyclean-am distclean: distclean-am -rm -f Makefile distclean-am: clean-am distclean-generic distclean-libtool dvi: dvi-am dvi-am: html: html-am info: info-am info-am: install-data-am: install-nobase_perlmoduleSCRIPTS \ install-perlmoduleSCRIPTS install-exec-am: install-info: install-info-am install-man: installcheck-am: maintainer-clean: maintainer-clean-am -rm -f Makefile maintainer-clean-am: distclean-am maintainer-clean-generic mostlyclean: mostlyclean-am mostlyclean-am: mostlyclean-generic mostlyclean-libtool pdf: pdf-am pdf-am: ps: ps-am ps-am: uninstall-am: uninstall-info-am uninstall-nobase_perlmoduleSCRIPTS \ uninstall-perlmoduleSCRIPTS .PHONY: all all-am check check-am clean clean-generic clean-libtool \ distclean distclean-generic distclean-libtool distdir dvi \ dvi-am html html-am info info-am install install-am \ install-data install-data-am install-exec install-exec-am \ install-info install-info-am install-man \ install-nobase_perlmoduleSCRIPTS install-perlmoduleSCRIPTS \ install-strip installcheck installcheck-am installdirs \ maintainer-clean maintainer-clean-generic mostlyclean \ mostlyclean-generic mostlyclean-libtool pdf pdf-am ps ps-am \ uninstall uninstall-am uninstall-info-am \ uninstall-nobase_perlmoduleSCRIPTS uninstall-perlmoduleSCRIPTS # This is done here to stay in the GNU coding standards # libexecdir can be modified at make time, so can't use # variable substitution at configure time Filter.pm: Filter.pm.in @rm -f Filter.pm @sed \ -e 's,@@mylibexecdir@@,$(libexecdir),' \ $(srcdir)/Filter.pm.in > Filter.pm # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: swish-e-2.4.7/filters/SWISH/Filter.pm.in0000664000077100017500000011435211166010112014560 00000000000000package SWISH::Filter; use 5.005; use strict; use File::Basename; #use MIME::Types; # require below use Carp; use FindBin; # for locating libexecdir (mostly under windows) use vars qw/ $VERSION %extra_methods/; $VERSION = '0.02'; # Define the available parameters %extra_methods = map {$_ => 1} qw/name user_data /; # For testing only if ( $0 =~ 'Filter.pm' && @ARGV >= 2 && shift =~ /^test/i) { die "Please use the 'swish-filter-test' program.\n"; } =head1 NAME SWISH::Filter - Perl extension for filtering documents with Swish-e =head1 SYNOPSIS use SWISH::Filter; # load available filters into memory my $filter = SWISH::Filter->new; # convert a document my $doc = $filter->convert( document => \$scalar_ref, # path or ref to a doc content_type => $content_type, # content type if doc reference name => $real_path, # optional name for this file (useful for debugging) user_data => $whatever, # optional data to make available to filters ); return unless $doc; # empty doc, zero size, or no filters installed # Was the document converted by a filter? my $was_filtered = $doc->was_filtered; # Skip if the file is not text return if $doc->is_binary; # Print out the doc my $doc_ref = $doc->fetch_doc; print $$doc_ref; # Fetch the final content type of the document my $content_type = $doc->content_type; # Fetch Swish-e parser type (TXT*, XML*, HTML*, or undefined) my $doc_type = $doc->swish_parser_type; =head1 DESCRIPTION SWISH::Filter provides a unified way to convert documents into a type that Swish-e can index. Individual filters are installed as separate perl modules. For example, there might be a filter that converts from PDF format to HTML format. Note that this is just a framework for filtering documents. Additional helper programs or Perl module may need to be installed to use SWISH::Filter to filter documents. For example, to filter PDF documents you must install the Xpdf package. The filters are automatically loaded when Cnew()> is called. Filters define a type and priority that determines the processing order of the filter. Filters are processed in this sort order until a filter accepts the document for filtering. The filter uses the document's content type to determine if the filter should handle the current document. The content-type is determined by the files suffix if not supplied by the calling program. The individual filters are not designed to be used as separate modules. All access to the filters is through this SWISH::Filter module. Normally, once a document is filtered processing stops. Filters can filter the document and then set a flag saying that filtering should continue (for example a filter that uncompresses a MS Word document before passing on to the filter that converts from MS Word to text). All this should be transparent to the end user. So, filters can be pipe-lined. The idea of SWISH::Filter is that new filters can be created, and then downloaded and installed to provide new filtering capabilities. For example, if you needed to index MS Excel documents you might be able to download a filter from the Swish-e site and magically next time you run indexing MS Excel docs would be indexed. The SWISH::Filter setup can be used with -S prog or -S http. It works best with the -S prog method because the filter modules only need to be loaded and compiled one time. The -S prog program F will automatically use SWISH::Filter when spidering with default settings (using "default" as the first parameter to spider.pl). The -S http indexing method uses a Perl helper script called F. F has been updated to work with SWISH::Filter, but (unlike spider.pl) does not contain a "use lib" line to point to the location of SWISH::Filter. This means that by default F will B use SWISH::Filter for filtering. The reason for this is because F runs for every URL fetched, and loading the Filters for each document can be slow. The recommended way of spidering is using -S prog with spider.pl, but if -S http is desired the way to enable SWISH::Filter is to set PERL5LIB before running swish so that F will be able to locate the SWISH::Filter module. Here's one way to set the PERL5LIB with the bash shell: $ export PERL5LIB=`swish-filter-test -path` =head1 METHODS =over 4 =item $filter = SWISH::Filter-Enew() This creates a SWISH::Filter object. You may pass in options as a list or a hash reference. =back =head2 SWISH::Filter-Enew Options There is currently only one option that can be passed in to new(): =over 4 =item ignore_filters Pass in a reference to a list of filter names to ignore. For example, if you have two filters installed "Pdf2HTML" and "Pdf2XML" and want to avoid using "Pdf2XML": my $filter = SWISH::Filter->new( ignore_filters => ['Pdf2XML']; =cut sub new { my $class = shift; $class = ref( $class ) || $class; my %attr = ref $_[0] ? %{$_[0]} : @_ if @_; my $self = bless {}, $class; $self->{skip_filters} = {}; $self->ignore_filters( delete $attr{ignore_filters} ) if $attr{ignore_filters}; warn "Unknown SWISH::Filter->new() config setting '$_'\n" for keys %attr; $self->create_filter_list( %attr ); eval { require MIME::Types }; if ( $@ ) { $class->mywarn( "Failed to load MIME::Types\n$@\nInstall MIME::Types for more complete MIME support"); # handle the lookup for a small number of types locally $self->{mimetypes} = $self; } else { $self->{mimetypes} = MIME::Types->new; } return $self; } # Here's some common mime types my %mime_types = ( doc => 'application/msword', pdf => 'application/pdf', html => 'text/html', htm => 'text/html', txt => 'text/plain', text => 'text/plain', xml => 'text/xml', mp3 => 'audio/mpeg', ); sub mimeTypeOf { my ( $self, $file ) = @_; $file =~ s/.*\.//; return $mime_types{$file} || undef; } sub ignore_filters { my ( $self, $filters ) = @_; unless ( $filters ) { return unless $self->{ignore_filter_list}; return @{$self->{ignore_filter_list}}; } @{$self->{ignore_filter_list}} = @$filters; # create lookup hash for filters to skip $self->{skip_filters} = map { $_, 1 } @$filters; } =item $doc_object = $filter-Econvert(); This method filters a document. Returns an object of the class SWISH::Filter::document or undefined if passed in an empty document, a filename that cannot be read off disk, or if no filters have been loaded. SWISH::Filter::document methods listed below can be called on the object to, for example, check if the document was filtered and to fetch the document content (filtered or not). You must pass in a hash (or hash reference) of parameters to the convert() method. The possible parameters are: =over 8 =item document This can be either a path to a file, or a scalar reference to a document in memory. This is required. =item content_type The MIME type of the document. This is only required when passing in a scalar reference to a document. The content type string is what the filters use to match a document type. When passing in a file name and C is not set, then the content type will be determined from the file's extension by using the MIME::Types Perl module (available on CPAN). =item name Optional name to pass in to filters that will be used in error and warning messages. =item user_data Optional data structure that all filters may access. This can be fetched in a filter by: my $user_data = $doc_object->user_data; And used in the filter as: if ( ref $user_data && $user_data->{pdf2html}{title} ) { ... } It's up to the filter author to use a unique first-level hash key for a given filter. =back Example of using the convert() method: $doc_object = $filter->convert( document => $doc_ref, content-type => 'application/pdf', ); =cut sub convert { my $self = shift; my %attr = ref $_[0] ? %{$_[0]} : @_ if @_; # Any filters? return unless $self->filter_list; my $doc = delete $attr{document} || die "Failed to supply document attribute 'document' when calling filter()\n"; my $content_type = delete $attr{content_type}; # Allow a reference to a file name (where is this used??) if ( ref $content_type ) { my $type = $self->decode_content_type( $$content_type ); unless ( $type ) { warn "Failed to set content type for file reference '$$content_type'\n"; return; } $content_type = $type; } if ( ref $doc ) { die "Must supply a content type when passing in a reference to a document\n" unless $content_type; } else { $content_type ||= $self->decode_content_type( $doc ); unless ( $content_type ) { warn "Failed to set content type for document '$doc'\n"; return; } $attr{name} ||= $doc; # Set default name of document } $self->mywarn("\n>> Starting to process new document: $content_type"); ## Create a new document object my $doc_object = SWISH::Filter::document->new( $doc, $content_type ); return unless $doc_object; # fails on empty doc or doc not readable local $SIG{__DIE__}; local $SIG{__WARN__}; # Look for left over config settings that we do not know about for my $setting ( keys %extra_methods ) { next unless $attr{$setting}; my $method = "set_" . $setting; $doc_object->$method(delete $attr{$setting}); # if given a document name then use that in error messages if ( $setting eq 'name' ) { $SIG{__DIE__} = sub { die "$$ Error- ", $doc_object->name, ": ", @_ }; $SIG{__WARN__} = sub { warn "$$ Warning - ", $doc_object->name, ": ", @_ }; } } warn "Unknown filter config setting '$_'\n" for keys %attr; # Now run through the filters my $done; for my $filter ( $self->filter_list ) { $self->mywarn(" ++Checking filter [$filter] for $content_type" ); # can this filter handle this content type? next unless $filter->can_filter_mimetype( $doc_object->content_type ); my $start_content_type = $doc_object->content_type; my $filtered_doc; # run the filter eval { local $SIG{__DIE__}; $filtered_doc = $filter->filter($doc_object); }; if ( $@ ) { $self->mywarn("Problems with filter '$filter'. Filter disabled:\n -> $@"); $self->filter_list( [ grep { $_ != $filter } $self->filter_list ] ); next; } $self->mywarn(" ++ $content_type " . ($filtered_doc ? '*WAS*' : 'was not') . " filtered by $filter\n"); # save the working filters in this list if ( $filtered_doc ) { # either a file name or a reference to the doc # Track chain of filters push @{$doc_object->{filters_used}}, { name => $filter, start_content_type => $start_content_type, end_content_type => $doc_object->content_type, }; $doc_object->cur_doc($filtered_doc); # and save it (filename or reference) # All done? last unless $doc_object->continue( 0 ); } } $doc_object->dump_filters_used if $ENV{FILTER_DEBUG}; return $doc_object; } =item $filter-Emywarn() Internal function used for writing warning messages to STDERR if $ENV{FILTER_DEBUG} is set. Set the environment variable FILTER_DEBUG before running to see extra messages while processing. =cut sub mywarn { my $self = shift; print STDERR @_,"\n" if $ENV{FILTER_DEBUG}; } =item @filters = $filter-Efilter_list; Returns a list of filter objects installed. =cut sub filter_list { my ( $self, $filter_ref ) = @_; unless ( $filter_ref ) { return ref $self->{filters} ? @{ $self->{filters} } : (); } $self->{filters} = $filter_ref; } # Creates the list of filters sub create_filter_list { my ( $self, %attr ) = @_; my @filters; # Look for filters to load for my $inc_path ( @INC ) { my $cur_path = "$inc_path/SWISH/Filters"; next unless opendir( DIR, $cur_path ); while ( my $file = readdir( DIR ) ) { my $full_path = "$cur_path/$file"; next unless -f $full_path; my ($base,$path,$suffix) = fileparse( $full_path,"\.pm"); next unless $suffix eq '.pm'; # Should this filter be skipped? next if $self->{skip_filters}{$base}; $self->mywarn("\n>> Loading filter: [SWISH/Filters/${base}$suffix]"); eval { require "SWISH/Filters/${base}$suffix" }; if ( $@ ) { if ( $ENV{FILTER_DEBUG} ) { print STDERR "Failed to load 'SWISH/Filters/${base}$suffix'\n", '-+' x 40, "\n", $@, '-+' x 40, "\n"; } next; } my $package = "SWISH::Filters::" . $base; # Provide a base class for each filter { no strict 'refs'; push @{"$package\::ISA"}, 'SWISH::Filters::_BASE'; } my $filter = $package->new( %attr ); $self->mywarn(":-( Filter [SWISH/Filters/${base}$suffix] not loaded\n") unless $filter; next unless $filter; # may not get installed push @filters, $filter; # save it in our list. } } unless ( @filters ) { warn "No SWISH filters found\n"; return; } # Now sort the filters in order. $self->filter_list( [ sort { $a->type <=> $b->type || $a->priority <=> $b->priority } @filters ] ); } =item @filter = $filter-Ecan_filter( $content_type ); This is useful for testing to see if a mimetype might be handled by SWISH::Filter wihtout having to pass in a document. Helpful if doing HEAD requests. Returns an array of filters that can handle this type of document =back =cut sub can_filter { my ( $self, $content_type ) = @_; my @filters; unless ( $content_type ) { warn "Failed to pass in a content type to can_filter() method"; return; } for my $filter ( $self->filter_list ) { push @filters, $filter if $filter->can_filter_mimetype( $content_type ); } return @filters; } #------------------------------------------------------ # converts a file name to a mimetype sub decode_content_type { my ( $self, $file ) = @_; return unless $file; return ($self->{mimetypes})->mimeTypeOf($file); } =head1 WRITING FILTERS Filters are standard perl modules that are installed into the SWISH::Filters name space. Filters are not complicated -- see the existing filters for examples. Each filter defines the content-types (or mimetypes) that it can handle. These are specified as a list of regular expressions to match against the document's content-type. If one of the mimetypes of a filter match the incoming document's content-type the filter is called. The filter can then either filter the content or return undefined indicating that it decided not to filter the document for some reason. If the document is converted the filter returns either a reference to a scalar of the content or a file name where the content is stored. The filter also must change the content-type of the document to reflect the new document. Filters typically use external programs or modules to do that actual work of converting a document from one type to another. For example, programs in the Xpdf packages are used for converting PDF files. The filter can (and should) test for those programs in its new() method. Filters also can define a type and priority. These attributes are used to set the order filters are tested for a content-type match. This allows you to have more than one filter that can work on the same content-type. If a filter calls die() then the filter is removed from the chain and will not be called again I. Calling die when running with -S http or -S fs has no effect since the program is run once per document. Once a filter returns something other than undef no more filters will be called. If the filter calls $filter-Eset_continue then processing will continue as if the file was not filtered. For example, a filter can uncompress data and then set $filter-Eset_continue and let other filters process the document. This is the list of methods the filter should or may define (as specificed): =over 4 =item new() * required * This method returns either an object which provides access to the filter, or undefined if the filter is not to be used. The new() method is a good place to check for required modules or helper programs. Returning undefined prevents the filter from being included in the filter chain. The new method must return a blessed hash reference. The only required attribute is B. This attribute must contain a reference to an array of regular expressions used for matching the content-type of the document passed in. Example: sub new { my ( $class ) = @_; # List of regular expressions my @mimetypes = ( qr[application/(x-)?msword], qr[application/worddoc], ); my %settings = ( mimetypes => \@mimetypes, # Optional settings priority => 20, type => 2, ); return bless \%settings, $class; } The attribute "mimetypes" returns an array reference to a list of regular expressions. Those patterns are matched against each document's content type. =item filter() * required * This is the function that does the work of converting a document from one content type to another. The function is passed the document object. See document object methods listed below for what methods may be called on a document. The function can return undefined (or any false value) to indicate that the filter did not want to process the document. Other filters will then be tested for a content type match. If the document is filtered then the filter must set the new document's content type (if it changed) and return either a file name where the document can be found or a reference to a scalar containing the document. =item type() Returns a number. Filters are sorted (for processing in a specific order) and this number is simply the primary key used in sorting. If not specified the filter's type used for sorting is 2. This is an optional method. You can also set the type in your new() constructor as shown above. =item priority() Returns a number. Filters are sorted (for processing in a specific order) and this number is simply the secondary key used in sorting. If not specified the filter's priority is 50. This is an optional method. You can also set the priority in your new() constructor as shown above. =back Again, the point of the type() and priority() methods is to allow setting the sort order of the filters. Useful if you have two filters for filtering the same content-type, but prefer to use one over the other. Neither are required. Here's a module to convert MS Word documents using the program "catdoc": package SWISH::Filters::Doc2txt; use vars qw/ $VERSION /; $VERSION = '0.02'; sub new { my ( $class ) = @_; my $self = bless { mimetypes => [ qr!application/(x-)?msword! ], priority => 50, }, $class; # check for helpers return $self->set_programs( 'catdoc' ); } sub filter { my ( $self, $doc ) = @_; my $content = $self->run_catdoc( $doc->fetch_filename ) || return; # update the document's content type $filter->set_content_type( 'text/plain' ); # return the document return \$content; } 1; The new() constructor creates a blessed hash which contains an array reference of mimetypes patterns that this filter accepts. The priority sets this filter to run after any other filters that might handle the same type of content. The F function says that we need to call a program called "catdoc". The function either returns $self or undefined if catdoc could not be found. The F function creates a new method for running catdoc. The filter function runs catdoc passing in the name of the file (if the file is in memory a temporary file is created). That F function was created by the F call above. =cut #========================================================================= package SWISH::Filter::document; use strict; use File::Temp; use Symbol; use vars '$AUTOLOAD'; =head1 SWISH::Filter::document Methods These methods are available to Filter authors, and also provide access to the document after calling the convert() method to end-users of SWISH::Filter. End users of SWISH::Filter will use a subset of these methods. Mostly: $doc_object->fetch_doc # and alias for fetch_document_reference() $doc_object->was_filtered # true the document was filtered $doc_object->content_type # document's current content type (mime type) $doc_object->swish_parser_type # returns a parser type to use with Swish-e -S prog method $doc_object->is_binary # returns $content_type !~ m[^text/]; These methods are also available to the individual filter modules. The filter's "filter" function is also passed a SWISH::Filter::document object. Method calls may be made on this object to check the document's current content type, or to fetch the document as either a file name or a reference to a scalar containing the document content. =cut # Returns a new SWISH::Filter::document object # or null if just can't process the document sub new { my ( $class, $doc, $content_type ) = @_; return unless $doc && $content_type; my $self = bless {}, $class; if ( ref $doc ) { unless ( length $$doc ) { warn "Empty document passed to filter\n"; return; } die "Must supply a content type when passing in a reference to a document\n" unless $content_type; } else { unless ( -r $doc ) { warn "Filter unable to read doc '$doc': $!\n"; return; } } $self->set_content_type( $content_type ); $self->{cur_doc} = $doc; return $self; } # Clean up any temporary files sub DESTROY { my $self = shift; $self->remove_temp_file; } sub cur_doc { my ( $self, $doc_ref ) = @_; $self->{cur_doc} = $doc_ref if $doc_ref; return $self->{cur_doc}; } sub remove_temp_file { my $self = shift; unlink delete $self->{temp_file} if $self->{temp_file}; } # Used for tracking what filter(s) were used in processing sub filters_used { my $self = shift; return $self->{filters_used} || undef; } sub dump_filters_used { my $self = shift; my $used = $self->filters_used; local $SIG{__WARN__}; warn "\nFinal Content type for ", $self->name, " is ", $self->content_type, "\n"; unless ( $used ) { warn " *No filters were used\n"; return; } warn " >Filter $_->{name} converted from [$_->{start_content_type}] to [$_->{end_content_type}]\n" for @$used; } =head2 Methods used by end-users and filter authors =over 4 =item $doc_ref = $doc_object-Efetch_doc_reference; Returns a scalar reference to the document. This can be used when the filter can operate on the document in memory (or if an external program expects the input to be from standard input). If the file is currently on disk then it will be read into memory. If the file was stored in a temporary file on disk the file will be deleted once read into memory. The file will be read in binmode if $doc-Eis_binary is true. Note that $doc_object-Efetch_doc is an alias. =cut sub fetch_doc_reference { my ( $self ) = @_; return ref $self->{cur_doc} # just $self->read_file should work ? $self->{cur_doc} : $self->read_file; } # here's an alias for fetching a document reference. *fetch_doc = *fetch_doc_reference; =item $was_filtered = $doc_object-Ewas_filtered Returns true if some filter processed the document =cut sub was_filtered { my $self = shift; return $self->filters_used ? 1 : 0; } =item $content_type = $doc_object-Econtent_type; Fetches the current content type for the document. Example: return unless $filter->content_type =~ m!application/pdf!; =cut sub content_type { return $_[0]->{content_type} || ''; } =item $type = $doc_object-Eswish_parser_type Returns a parser type based on the content type =cut # Map content types to swish-e parsers. my %swish_parser_types = ( 'text/html' => 'HTML*', 'text/xml' => 'XML*', 'text/plain' => 'TXT*', ); sub swish_parser_type { my $self = shift; my $content_type = $self->content_type || return; for ( keys %swish_parser_types ) { return $swish_parser_types{$_} if $content_type =~ /^\Q$_/; } return; } =item $doc_object-Eis_binary Returns true if the document's content-type does not match "text/". =back =cut sub is_binary { my $self = shift; return $self->content_type !~ m[^text]; } =head2 Methods used by filter authors =over 4 =item $file_name = $doc_object-Efetch_filename; Returns a path to the document as stored on disk. This name can be passed to external programs (e.g. catdoc) that expect input as a file name. If the document is currently in memory then a temporary file will be created. Do not expect the file name passed to be the real path of the document. The file will be written in binmode if $doc-Eis_binary is true. This method is not normally used by end-users of SWISH::Filter. =cut # This will create a tempoary file if file is in memory sub fetch_filename { my ( $self ) = @_; return ref $self->{cur_doc} ? $self->create_temp_file : $self->{cur_doc}; } =item $doc_object-Eset_continue; Processing will continue to the next filter if this is set to a true value. This should be set for filters that change encodings or uncompress documents. =cut sub set_continue { my ( $self ) = @_; return $self->continue(1); } sub continue { my ( $self, $continue ) = @_; my $old = $self->{continue} || 0; $self->{continue}++ if $continue; return $old; } =item $doc_object-Eset_content_type( $type ); Sets the content type for a document. =cut sub set_content_type { my ( $self, $type ) = @_; die "Failed to pass in new content type\n" unless $type; $self->{content_type} = $type; } sub read_file { my $self = shift; my $doc = $self->{cur_doc}; return $doc if ref $doc; my $sym = gensym(); open($sym, "<$doc" ) or die "Failed to open file '$doc': $!"; binmode $sym if $self->is_binary; local $/ = undef; my $content = <$sym>; close $sym; $self->{cur_doc} = \$content; # Remove the temporary file, if one was created. $self->remove_temp_file; return $self->{cur_doc}; } # write file out to a temporary file sub create_temp_file { my $self = shift; my $doc = $self->{cur_doc}; return $doc unless ref $doc; my ( $fh, $file_name ) = File::Temp::tempfile(); # assume binmode if we need to filter... binmode $fh if $self->is_binary; print $fh $$doc or die "Failed to write to '$file_name': $!"; close $fh or die "Failed to close '$file_name' $!"; $self->{cur_doc} = $file_name; $self->{temp_file} = $file_name; return $file_name; } =item $doc_object-Ename Fetches the name of the current file. This is useful for printing out the name of the file in an error message. This is the name passed in to the SWISH::Filter-Econvert method. It is optional and thus may not always be set. my $name = $doc_object->name || 'Unknown name'; warn "File '$name': failed to convert -- file may be corrupt\n"; =item $doc_object-Euser_data Fetches the the user_data passed in to the filter. This can be any data or data structure passed into SWISH::Filter-Enew. This is an easy way to pass special parameters into your filters. Example: my $data = $doc_object->user_data; # see if a choice for the was passed in if ( ref $data eq 'HASH' && $data->{pdf2html}{title_field} { ... ... } =back =cut sub AUTOLOAD { my ( $self, $newval ) = @_; no strict 'refs'; if ($AUTOLOAD =~ /.*::set_(\w+)/ && $SWISH::Filter::extra_methods{$1}) { my $attr_name=$1; *{$AUTOLOAD} = sub { $_[0]->{$attr_name} = $_[1]; return }; return $self->{$attr_name} = $newval; } elsif ($AUTOLOAD =~ /.*::(\w+)/ && $SWISH::Filter::extra_methods{$1}) { my $attr_name=$1; *{$AUTOLOAD} = sub { return $_[0]->{$attr_name} }; return $self->{$attr_name}; } die "No such method: $AUTOLOAD\n"; } #====================================================================================== # Default methods for the filters package SWISH::Filters::_BASE; use strict; =head1 SWISH::Filters::_BASE Each filter is a subclass of SWISH::Filters::_BASE. A number of methods are available by default (and some can be overridden). Others are useful when writing your new() constructor. =over 4 =item $self-E<gt>type This method fetches the type of the filter. The value returned sets the primary sort key for sorting the filters. You can override this in your filter, or just set it as an attribute in your object. The default is 2. The idea of the "type" is to create groups of filters, if needed. For example, you might have a set of filters that are used for uncompressing some documents before passing on to another group for filtering. =cut sub type { 2 }; =item $self-E<gt>priority This method fetches the priority of the filter. The value returned sets the secondary sort key for sorting the filters. You can override this in your filter, or just set it as an attribute in your object. The default method returns 50. The priority is useful if you have multiple filters for the same content type that use different methods for filtering (say one uses wvWare and another uses catdoc for filtering MS Word files). You might give the wvWare filter a lower priority number so it runs before the catdoc filter if both wvWare AND catdoc happen to be installed at the same time. =cut sub priority { 50 }; # default priority =item @types = $self-E<gt>mimetypes Returns the list of mimetypes (as regular expressions) set for the filter. =cut sub mimetypes { my $self = shift; die "Filter [$self] failed to set 'mimetypes' in new() constructor\n" if ! $self->{mimetypes}; die "Filter [$self] 'mimetypes' entry is not an array reference\n" unless ref $self->{mimetypes} eq 'ARRAY'; return @{ $self->{mimetypes} }; } =item $pattern = $self-E<gt>can_filter_mimetype( $content_type ) Returns true if passed in content type matches one of the filter's mimetypes Returns the pattern that matched. =cut sub can_filter_mimetype { my ( $self, $content_type ) = @_; die "Must supply content_type to can_filter_mimetype()" unless $content_type; for my $pattern ( $self->mimetypes ) { return $pattern if $content_type =~ /$pattern/; } return; } =item mywarn( $message ) method for printing out message if debugging is available =cut sub mywarn { my $self = shift; print STDERR "Filter: $self: ", @_,"\n" if $ENV{FILTER_DEBUG}; } =item $boolean = $self-E<gt>set_programs( @program_list ); Returns true if all the programs listed in @program_list are found and can be executed as the current user. Creates a method for each program with the "run_" prefix. Returns false is ANY program cannot be found. Actually, it returns $self, so you can make it the last statement in your constructor. So in your constructor you might do: return $self->set_programs( qw/ pdftotext pdfinfo / ); Then in your filter() method: my $content = $self->run_pdfinfo( $doc->fetch_filename, [options] ); =cut sub set_programs { my ($self, @progs ) = @_; for my $prog ( @progs ) { my $path = $self->find_binary( $prog ); unless ( $path ) { $self->mywarn("Can not use Filter: failed to find $prog. Maybe need to install?"); return; } no strict 'refs'; *{"run_$prog"} = sub { my $self = shift; return $self->run_program( $path, @_ ); # closure }; } return $self; } =item $path = $self-E<gt>find_binary( $prog ); Use in a filter's new() method to test for a necesary program located in $PATH. Returns the path to the program or undefined if not found or does not pass the -x file test. =cut use Config; my @path_segments; sub find_binary { my ( $self, $prog ) = @_; unless ( @path_segments ) { my $path_sep = $Config{path_sep} || ':'; @path_segments = split /\Q$path_sep/, $ENV{PATH}; if ( my $libexecdir = get_libexec() ) { push @path_segments, $libexecdir; } } $self->mywarn("Find path of [$prog] in " . join ':', @path_segments); for ( @path_segments ) { my $path = "$_/$prog"; # For buggy Windows98 that accepts forward slashes if the filename isn't too long $path =~ s[/][\\]g if $^O =~ /Win32/; if ( -x $path ) { $self->mywarn(" * Found program at: [$path]\n"); return $path; } $self->mywarn(" Not found at path [$path]" ); # ok, try Windows extenstions if ( $^O =~ /Win32/ ) { for my $extension ( qw/ exe bat / ) { if ( -x "$path.$extension" ) { $self->mywarn(" * Found program at: [$path.$extension]\n"); return "$path.$extension"; } $self->mywarn(" Not found at path [$path.$extension]" ); } } } return; } # Try and return libexecdir in case programs are installed there (the case with Windows) # Assumes that we are running from libexecdir or bindir # The other option under Windows would be to fetch libexecdir from the Windows registry, # but that could break if a new (another) swish install was done since the registry # would then point to the new install location. sub get_libexec { return '@@mylibexecdir@@'; # the below isn't robust, so set at install time return unless $FindBin::Bin; # Look for something we expect in libexecdir. # Are we in $libexecdir already (like with swishspider or spider.pl) return $FindBin::Bin if -e "$FindBin::Bin/spider.pl"; # Are we in $prefix/bin? (swish-filter-test) return "$FindBin::Bin/../lib/swish-e" if -e "$FindBin::Bin/../lib/swish-e/spider.pl"; return; } =item $bool = $self-E<gt>use_modules( @module_list ); Attempts to load each of the module listed and calls its import() method. Use to test and load required modules within a filter without aborting. return unless $self->use_modules( qw/ Spreadsheet::ParseExcel HTML::Entities / ); A warning message is displayed if the FILTER_DEBUG environment variable is true. Returns $self if no error. =cut sub use_modules { my ( $self, @modules ) = @_; for my $mod ( @modules ) { $self->mywarn("trying to load [$mod]"); eval { eval "require $mod" or die "$!\n" }; if ( $@ ) { my $caller = caller(); $self->mywarn("Can not use Filter $caller -- need to install $mod: $@"); return; } $self->mywarn(" ** Loaded $mod **"); # Export back to caller $mod->export_to_level( 1 ) if $mod->can('export_to_level'); } return $self; } =item $doc_ref = $self-E<gt>run_program( $program, @args ); Runs $program with @args. Must pass in @args. Under Windows calls IPC::Open2, which may pass data through the shell. Double-quotes are escaped (backslashed) and each parameter is wrapped in double-quotes. On other platforms a fork and exec is used to avoid passing any data through the shell. Returns a reference to a scalar containing the output from your program, or dies. This method is intended to read output from a program that converts one format into text. The output is read back in text mode -- on systems like Windows this means \r\n (CRLF) will be convertet to \n. =cut sub run_program { my $self = shift; die "No arguments passed to run_program()\n" unless @_; die "Must pass arguments to program '$_[0]'\n" unless @_ > 1; my $fh = $^O =~ /Win32/i || $^O =~ /VMS/i ? $self->windows_fork( @_ ) : $self->real_fork( @_ ); local $/ = undef; my $output = <$fh>; close $fh; # When using IPC::Open3 need to reap the processes. waitpid delete $self->{pid}, 0 if $self->{pid}; return $output; } #================================================================== # Run swish-e by forking # use Symbol; sub real_fork { my ( $self, @args ) = @_; # Run swish my $fh = gensym; my $pid = open( $fh, '-|' ); die "Failed to fork: $!\n" unless defined $pid; return $fh if $pid; delete $self->{temp_file}; # in child, so don't want to delete on destroy. exec @args or exit; # die "Failed to exec '$args[0]': $!\n"; } #===================================================================================== # Need # sub windows_fork { my ( $self, @args ) = @_; require IPC::Open2; my ( $rdrfh, $wtrfh ); my @command = map { s/"/\\"/g; qq["$_"] } @args; my $pid = IPC::Open2::open2($rdrfh, $wtrfh, @command ); # IPC::Open3 uses binmode for some reason (5.6.1) # Assume that the output from the program will be in text # Maybe an invalid assumption if running through a binary filter binmode $rdrfh, ':crlf'; # perhpaps: unless delete $self->{binary_output}; $self->{pid} = $pid; return $rdrfh; } =back =cut 1; __END__ =head1 TESTING Filters can be tested with the F<swish-filter-test> program. Run: swish-filter-test -man for documentation. =head1 SUPPORT Please contact the Swish-e discussion list. http://swish-e.org =head1 Bugs, todo items, and other notes TBD =head1 AUTHOR Bill Moseley =head1 COPYRIGHT This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/filters/Makefile.am�������������������������������������������������������������������0000664�0000771�0001750�00000001155�11166010112�013523� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������SUBDIRS = SWISH exampledir = $(datadir)/doc/$(PACKAGE)/examples/filters bin_SCRIPTS = swish-filter-test example_DATA = README CLEANFILES = swish-filter-test # This is done here to stay in the GNU coding standards # libexecdir can be modified at make time, so can't use # variable substitution at configure time swish-filter-test: swish-filter-test.in @rm -f swish-filter-test @sed \ -e 's,@@perlmoduledir@@,$(libexecdir)/perl,' \ -e 's,@@swishbindir@@,$(bindir),' \ -e 's,@@perlbinary@@,$(PERL),' \ $(srcdir)/swish-filter-test.in > swish-filter-test EXTRA_DIST = \ README \ swish-filter-test.in �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/filters/README������������������������������������������������������������������������0000664�0000771�0001750�00000006671�11166010112�012357� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������Filtering documents with SWISH::Filter -------------------------------------- Swish-e knows only how to parse HTML, XML, and text files. Other file types may be indexed with the help of filters. SWISH::Filter is a Perl module designed to make converting documents from one type of content to another type of content easy. It's uses a plug-in type of system where new filters can be added with little effort. SWISH::Filter (and associated plug-in filter modules) do not normally do the actual filtering. This system provides only an interface to the programs that do the filtering. For example, the Swish-e distribution includes a filter plug-in called SWISH::Filters::Pdf2HTML. For this filter to work you must install the xpdf package that includes the pdftotext and pdfinfo programs. SWISH::Filters::Pdf2HTML only provides a unified interface to this programs. The included program F<spider.pl> will use SWISH::Filter by default. This means that installing the programs that do the filter is all that is needed to start filtering documents. For example, installing the xpdf package will enable indexing of PDF file when spidering. The filter modules are in the $libexecdir/perl directory. Running swish-e -h will list the setting for $libexecdir, but is typically /usr/local/lib/swish-e if swish-e was built from source, or /usr/lib/swish-e if installed as a package. On Window $libexecdir will be set at installation time. Note that $libexecdir/perl is not normally part of Perl's @INC array. So to read documenation on a specific filter you will need to either specify the full path to the filter or set PERL5LIB. For example: export PERL5LIB=/usr/local/lib/swish-e/perl perldoc SWISH::Filter Documentation for SWISH::Filter can also be found in the html directory and at http://swish-e.org. Swish-e has another filter system. The FileFilter directive that can be used to filter documents through an external program while indexing. That system requires a separate filter setup for each type of document. See the SWISH-CONFIG page for information on that type of filtering. Testing SWISH::Filter --------------------- The program swish-filter-test in installed by default (in the same location as the swish-e binary). This program can be used to test SWISH::Filter. For example, run the command: $ swish-filter-test foo.pdf foo.txt Document foo.pdf was filtered. Document: foo.pdf Content-Type: text/html (initial was application/pdf) Parser type: HTML* Document foo.txt was not filtered. Document: foo.txt Content-Type: text/plain (initial was text/plain) Parser type: TXT* Run the command $ swish-filter-test -man for documentation. Current filters distributed with Swish-e: ----------------------------------------- All of these filters require installation of helper programs and/or Perl modules. See the individual module's documentation for dependencies. SWISH::Filters::Doc2txt - converts MS Word documents to text SWISH::Filters::Pdf2HTML - converts PDF files to HTML with info tags as metanames SWISH::Filters::ID3toHTML - extracts out ID3 (v1 and v2) tags from MP3 files SWISH::Filters::XLtoHTML - converts MS Excel to HTML Filters that depend on Perl modules that are not installed will not load. Setting the environment variable FILTER_DEBUG may report helpful errors when using filters. See perldoc SWISH::Filter for instructions on creating filters. �����������������������������������������������������������������������swish-e-2.4.7/filters/Makefile.in�������������������������������������������������������������������0000664�0000771�0001750�00000042167�11166010112�013544� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������# Makefile.in generated by automake 1.9.6 from Makefile.am. # @configure_input@ # Copyright (C) 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, # 2003, 2004, 2005 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. @SET_MAKE@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = .. am__cd = CDPATH="$${ZSH_VERSION+.}$(PATH_SEPARATOR)" && cd INSTALL = @INSTALL@ install_sh_DATA = $(install_sh) -c -m 644 install_sh_PROGRAM = $(install_sh) -c install_sh_SCRIPT = $(install_sh) -c INSTALL_HEADER = $(INSTALL_DATA) transform = $(program_transform_name) NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : build_triplet = @build@ host_triplet = @host@ subdir = filters DIST_COMMON = README $(srcdir)/Makefile.am $(srcdir)/Makefile.in ACLOCAL_M4 = $(top_srcdir)/aclocal.m4 am__aclocal_m4_deps = $(top_srcdir)/config/acinclude.m4 \ $(top_srcdir)/configure.in am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \ $(ACLOCAL_M4) mkinstalldirs = $(SHELL) $(top_srcdir)/config/mkinstalldirs CONFIG_HEADER = $(top_builddir)/src/acconfig.h CONFIG_CLEAN_FILES = am__installdirs = "$(DESTDIR)$(bindir)" "$(DESTDIR)$(exampledir)" binSCRIPT_INSTALL = $(INSTALL_SCRIPT) SCRIPTS = $(bin_SCRIPTS) SOURCES = DIST_SOURCES = RECURSIVE_TARGETS = all-recursive check-recursive dvi-recursive \ html-recursive info-recursive install-data-recursive \ install-exec-recursive install-info-recursive \ install-recursive installcheck-recursive installdirs-recursive \ pdf-recursive ps-recursive uninstall-info-recursive \ uninstall-recursive am__vpath_adj_setup = srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; am__vpath_adj = case $$p in \ $(srcdir)/*) f=`echo "$$p" | sed "s|^$$srcdirstrip/||"`;; \ *) f=$$p;; \ esac; am__strip_dir = `echo $$p | sed -e 's|^.*/||'`; exampleDATA_INSTALL = $(INSTALL_DATA) DATA = $(example_DATA) ETAGS = etags CTAGS = ctags DIST_SUBDIRS = $(SUBDIRS) DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST) ACLOCAL = @ACLOCAL@ ALLOCA = @ALLOCA@ AMDEP_FALSE = @AMDEP_FALSE@ AMDEP_TRUE = @AMDEP_TRUE@ AMTAR = @AMTAR@ AR = @AR@ AS = @AS@ AUTOCONF = @AUTOCONF@ AUTOHEADER = @AUTOHEADER@ AUTOMAKE = @AUTOMAKE@ AWK = @AWK@ BTREE_OBJS = @BTREE_OBJS@ BUILDDOCS_FALSE = @BUILDDOCS_FALSE@ BUILDDOCS_TRUE = @BUILDDOCS_TRUE@ CC = @CC@ CCDEPMODE = @CCDEPMODE@ CFLAGS = @CFLAGS@ CPP = @CPP@ CPPFLAGS = @CPPFLAGS@ CXX = @CXX@ CXXCPP = @CXXCPP@ CXXDEPMODE = @CXXDEPMODE@ CXXFLAGS = @CXXFLAGS@ CYGPATH_W = @CYGPATH_W@ DEFS = @DEFS@ DEPDIR = @DEPDIR@ DLLTOOL = @DLLTOOL@ ECHO = @ECHO@ ECHO_C = @ECHO_C@ ECHO_N = @ECHO_N@ ECHO_T = @ECHO_T@ EGREP = @EGREP@ EXEEXT = @EXEEXT@ F77 = @F77@ FFLAGS = @FFLAGS@ INSTALLDOCS_FALSE = @INSTALLDOCS_FALSE@ INSTALLDOCS_TRUE = @INSTALLDOCS_TRUE@ INSTALL_DATA = @INSTALL_DATA@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@ LARGEFILES_MACROS = @LARGEFILES_MACROS@ LDFLAGS = @LDFLAGS@ LIBOBJS = @LIBOBJS@ LIBS = @LIBS@ LIBTOOL = @LIBTOOL@ LIBXML2_CFLAGS = @LIBXML2_CFLAGS@ LIBXML2_LIB = @LIBXML2_LIB@ LIBXML2_OBJS = @LIBXML2_OBJS@ LIBXML_REQUIRED_VERSION = @LIBXML_REQUIRED_VERSION@ LN_S = @LN_S@ LTLIBOBJS = @LTLIBOBJS@ MAINT = @MAINT@ MAINTAINER_MODE_FALSE = @MAINTAINER_MODE_FALSE@ MAINTAINER_MODE_TRUE = @MAINTAINER_MODE_TRUE@ MAKEINFO = @MAKEINFO@ OBJDUMP = @OBJDUMP@ OBJEXT = @OBJEXT@ PACKAGE = @PACKAGE@ PACKAGE_BUGREPORT = @PACKAGE_BUGREPORT@ PACKAGE_NAME = @PACKAGE_NAME@ PACKAGE_STRING = @PACKAGE_STRING@ PACKAGE_TARNAME = @PACKAGE_TARNAME@ PACKAGE_VERSION = @PACKAGE_VERSION@ PATH_SEPARATOR = @PATH_SEPARATOR@ PCRE_CFLAGS = @PCRE_CFLAGS@ PCRE_CONFIG = @PCRE_CONFIG@ PCRE_LIBS = @PCRE_LIBS@ PCRE_REQUIRED_VERSION = @PCRE_REQUIRED_VERSION@ PERL = @PERL@ POD2MAN = @POD2MAN@ RANLIB = @RANLIB@ SET_MAKE = @SET_MAKE@ SHELL = @SHELL@ STRIP = @STRIP@ SWISH_WEB = @SWISH_WEB@ VERSION = @VERSION@ XML2_CONFIG = @XML2_CONFIG@ Z_CFLAGS = @Z_CFLAGS@ Z_LIBS = @Z_LIBS@ ac_ct_AR = @ac_ct_AR@ ac_ct_AS = @ac_ct_AS@ ac_ct_CC = @ac_ct_CC@ ac_ct_CXX = @ac_ct_CXX@ ac_ct_DLLTOOL = @ac_ct_DLLTOOL@ ac_ct_F77 = @ac_ct_F77@ ac_ct_OBJDUMP = @ac_ct_OBJDUMP@ ac_ct_RANLIB = @ac_ct_RANLIB@ ac_ct_STRIP = @ac_ct_STRIP@ am__fastdepCC_FALSE = @am__fastdepCC_FALSE@ am__fastdepCC_TRUE = @am__fastdepCC_TRUE@ am__fastdepCXX_FALSE = @am__fastdepCXX_FALSE@ am__fastdepCXX_TRUE = @am__fastdepCXX_TRUE@ am__include = @am__include@ am__leading_dot = @am__leading_dot@ am__quote = @am__quote@ am__tar = @am__tar@ am__untar = @am__untar@ bindir = @bindir@ build = @build@ build_alias = @build_alias@ build_cpu = @build_cpu@ build_os = @build_os@ build_vendor = @build_vendor@ datadir = @datadir@ exec_prefix = @exec_prefix@ host = @host@ host_alias = @host_alias@ host_cpu = @host_cpu@ host_os = @host_os@ host_vendor = @host_vendor@ includedir = @includedir@ infodir = @infodir@ install_sh = @install_sh@ libdir = @libdir@ libexecdir = @libexecdir@ localstatedir = @localstatedir@ mandir = @mandir@ mkdir_p = @mkdir_p@ oldincludedir = @oldincludedir@ prefix = @prefix@ program_transform_name = @program_transform_name@ sbindir = @sbindir@ sharedstatedir = @sharedstatedir@ sysconfdir = @sysconfdir@ target_alias = @target_alias@ SUBDIRS = SWISH exampledir = $(datadir)/doc/$(PACKAGE)/examples/filters bin_SCRIPTS = swish-filter-test example_DATA = README CLEANFILES = swish-filter-test EXTRA_DIST = \ README \ swish-filter-test.in all: all-recursive .SUFFIXES: $(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am $(am__configure_deps) @for dep in $?; do \ case '$(am__configure_deps)' in \ *$$dep*) \ cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh \ && exit 0; \ exit 1;; \ esac; \ done; \ echo ' cd $(top_srcdir) && $(AUTOMAKE) --foreign filters/Makefile'; \ cd $(top_srcdir) && \ $(AUTOMAKE) --foreign filters/Makefile .PRECIOUS: Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status @case '$?' in \ *config.status*) \ cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh;; \ *) \ echo ' cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe)'; \ cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe);; \ esac; $(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh $(top_srcdir)/configure: @MAINTAINER_MODE_TRUE@ $(am__configure_deps) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh $(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh install-binSCRIPTS: $(bin_SCRIPTS) @$(NORMAL_INSTALL) test -z "$(bindir)" || $(mkdir_p) "$(DESTDIR)$(bindir)" @list='$(bin_SCRIPTS)'; for p in $$list; do \ if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \ if test -f $$d$$p; then \ f=`echo "$$p" | sed 's|^.*/||;$(transform)'`; \ echo " $(binSCRIPT_INSTALL) '$$d$$p' '$(DESTDIR)$(bindir)/$$f'"; \ $(binSCRIPT_INSTALL) "$$d$$p" "$(DESTDIR)$(bindir)/$$f"; \ else :; fi; \ done uninstall-binSCRIPTS: @$(NORMAL_UNINSTALL) @list='$(bin_SCRIPTS)'; for p in $$list; do \ f=`echo "$$p" | sed 's|^.*/||;$(transform)'`; \ echo " rm -f '$(DESTDIR)$(bindir)/$$f'"; \ rm -f "$(DESTDIR)$(bindir)/$$f"; \ done mostlyclean-libtool: -rm -f *.lo clean-libtool: -rm -rf .libs _libs distclean-libtool: -rm -f libtool uninstall-info-am: install-exampleDATA: $(example_DATA) @$(NORMAL_INSTALL) test -z "$(exampledir)" || $(mkdir_p) "$(DESTDIR)$(exampledir)" @list='$(example_DATA)'; for p in $$list; do \ if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \ f=$(am__strip_dir) \ echo " $(exampleDATA_INSTALL) '$$d$$p' '$(DESTDIR)$(exampledir)/$$f'"; \ $(exampleDATA_INSTALL) "$$d$$p" "$(DESTDIR)$(exampledir)/$$f"; \ done uninstall-exampleDATA: @$(NORMAL_UNINSTALL) @list='$(example_DATA)'; for p in $$list; do \ f=$(am__strip_dir) \ echo " rm -f '$(DESTDIR)$(exampledir)/$$f'"; \ rm -f "$(DESTDIR)$(exampledir)/$$f"; \ done # This directory's subdirectories are mostly independent; you can cd # into them and run `make' without going through this Makefile. # To change the values of `make' variables: instead of editing Makefiles, # (1) if the variable is set in `config.status', edit `config.status' # (which will cause the Makefiles to be regenerated when you run `make'); # (2) otherwise, pass the desired values on the `make' command line. $(RECURSIVE_TARGETS): @failcom='exit 1'; \ for f in x $$MAKEFLAGS; do \ case $$f in \ *=* | --[!k]*);; \ *k*) failcom='fail=yes';; \ esac; \ done; \ dot_seen=no; \ target=`echo $@ | sed s/-recursive//`; \ list='$(SUBDIRS)'; for subdir in $$list; do \ echo "Making $$target in $$subdir"; \ if test "$$subdir" = "."; then \ dot_seen=yes; \ local_target="$$target-am"; \ else \ local_target="$$target"; \ fi; \ (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \ || eval $$failcom; \ done; \ if test "$$dot_seen" = "no"; then \ $(MAKE) $(AM_MAKEFLAGS) "$$target-am" || exit 1; \ fi; test -z "$$fail" mostlyclean-recursive clean-recursive distclean-recursive \ maintainer-clean-recursive: @failcom='exit 1'; \ for f in x $$MAKEFLAGS; do \ case $$f in \ *=* | --[!k]*);; \ *k*) failcom='fail=yes';; \ esac; \ done; \ dot_seen=no; \ case "$@" in \ distclean-* | maintainer-clean-*) list='$(DIST_SUBDIRS)' ;; \ *) list='$(SUBDIRS)' ;; \ esac; \ rev=''; for subdir in $$list; do \ if test "$$subdir" = "."; then :; else \ rev="$$subdir $$rev"; \ fi; \ done; \ rev="$$rev ."; \ target=`echo $@ | sed s/-recursive//`; \ for subdir in $$rev; do \ echo "Making $$target in $$subdir"; \ if test "$$subdir" = "."; then \ local_target="$$target-am"; \ else \ local_target="$$target"; \ fi; \ (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \ || eval $$failcom; \ done && test -z "$$fail" tags-recursive: list='$(SUBDIRS)'; for subdir in $$list; do \ test "$$subdir" = . || (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) tags); \ done ctags-recursive: list='$(SUBDIRS)'; for subdir in $$list; do \ test "$$subdir" = . || (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) ctags); \ done ID: $(HEADERS) $(SOURCES) $(LISP) $(TAGS_FILES) list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \ unique=`for i in $$list; do \ if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \ done | \ $(AWK) ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ mkid -fID $$unique tags: TAGS TAGS: tags-recursive $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) \ $(TAGS_FILES) $(LISP) tags=; \ here=`pwd`; \ if ($(ETAGS) --etags-include --version) >/dev/null 2>&1; then \ include_option=--etags-include; \ empty_fix=.; \ else \ include_option=--include; \ empty_fix=; \ fi; \ list='$(SUBDIRS)'; for subdir in $$list; do \ if test "$$subdir" = .; then :; else \ test ! -f $$subdir/TAGS || \ tags="$$tags $$include_option=$$here/$$subdir/TAGS"; \ fi; \ done; \ list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \ unique=`for i in $$list; do \ if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \ done | \ $(AWK) ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ if test -z "$(ETAGS_ARGS)$$tags$$unique"; then :; else \ test -n "$$unique" || unique=$$empty_fix; \ $(ETAGS) $(ETAGSFLAGS) $(AM_ETAGSFLAGS) $(ETAGS_ARGS) \ $$tags $$unique; \ fi ctags: CTAGS CTAGS: ctags-recursive $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) \ $(TAGS_FILES) $(LISP) tags=; \ here=`pwd`; \ list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \ unique=`for i in $$list; do \ if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \ done | \ $(AWK) ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ test -z "$(CTAGS_ARGS)$$tags$$unique" \ || $(CTAGS) $(CTAGSFLAGS) $(AM_CTAGSFLAGS) $(CTAGS_ARGS) \ $$tags $$unique GTAGS: here=`$(am__cd) $(top_builddir) && pwd` \ && cd $(top_srcdir) \ && gtags -i $(GTAGS_ARGS) $$here distclean-tags: -rm -f TAGS ID GTAGS GRTAGS GSYMS GPATH tags distdir: $(DISTFILES) @srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; \ topsrcdirstrip=`echo "$(top_srcdir)" | sed 's|.|.|g'`; \ list='$(DISTFILES)'; for file in $$list; do \ case $$file in \ $(srcdir)/*) file=`echo "$$file" | sed "s|^$$srcdirstrip/||"`;; \ $(top_srcdir)/*) file=`echo "$$file" | sed "s|^$$topsrcdirstrip/|$(top_builddir)/|"`;; \ esac; \ if test -f $$file || test -d $$file; then d=.; else d=$(srcdir); fi; \ dir=`echo "$$file" | sed -e 's,/[^/]*$$,,'`; \ if test "$$dir" != "$$file" && test "$$dir" != "."; then \ dir="/$$dir"; \ $(mkdir_p) "$(distdir)$$dir"; \ else \ dir=''; \ fi; \ if test -d $$d/$$file; then \ if test -d $(srcdir)/$$file && test $$d != $(srcdir); then \ cp -pR $(srcdir)/$$file $(distdir)$$dir || exit 1; \ fi; \ cp -pR $$d/$$file $(distdir)$$dir || exit 1; \ else \ test -f $(distdir)/$$file \ || cp -p $$d/$$file $(distdir)/$$file \ || exit 1; \ fi; \ done list='$(DIST_SUBDIRS)'; for subdir in $$list; do \ if test "$$subdir" = .; then :; else \ test -d "$(distdir)/$$subdir" \ || $(mkdir_p) "$(distdir)/$$subdir" \ || exit 1; \ distdir=`$(am__cd) $(distdir) && pwd`; \ top_distdir=`$(am__cd) $(top_distdir) && pwd`; \ (cd $$subdir && \ $(MAKE) $(AM_MAKEFLAGS) \ top_distdir="$$top_distdir" \ distdir="$$distdir/$$subdir" \ distdir) \ || exit 1; \ fi; \ done check-am: all-am check: check-recursive all-am: Makefile $(SCRIPTS) $(DATA) installdirs: installdirs-recursive installdirs-am: for dir in "$(DESTDIR)$(bindir)" "$(DESTDIR)$(exampledir)"; do \ test -z "$$dir" || $(mkdir_p) "$$dir"; \ done install: install-recursive install-exec: install-exec-recursive install-data: install-data-recursive uninstall: uninstall-recursive install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am installcheck: installcheck-recursive install-strip: $(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \ install_sh_PROGRAM="$(INSTALL_STRIP_PROGRAM)" INSTALL_STRIP_FLAG=-s \ `test -z '$(STRIP)' || \ echo "INSTALL_PROGRAM_ENV=STRIPPROG='$(STRIP)'"` install mostlyclean-generic: clean-generic: -test -z "$(CLEANFILES)" || rm -f $(CLEANFILES) distclean-generic: -test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES) maintainer-clean-generic: @echo "This command is intended for maintainers to use" @echo "it deletes files that may require special tools to rebuild." clean: clean-recursive clean-am: clean-generic clean-libtool mostlyclean-am distclean: distclean-recursive -rm -f Makefile distclean-am: clean-am distclean-generic distclean-libtool \ distclean-tags dvi: dvi-recursive dvi-am: html: html-recursive info: info-recursive info-am: install-data-am: install-exampleDATA install-exec-am: install-binSCRIPTS install-info: install-info-recursive install-man: installcheck-am: maintainer-clean: maintainer-clean-recursive -rm -f Makefile maintainer-clean-am: distclean-am maintainer-clean-generic mostlyclean: mostlyclean-recursive mostlyclean-am: mostlyclean-generic mostlyclean-libtool pdf: pdf-recursive pdf-am: ps: ps-recursive ps-am: uninstall-am: uninstall-binSCRIPTS uninstall-exampleDATA \ uninstall-info-am uninstall-info: uninstall-info-recursive .PHONY: $(RECURSIVE_TARGETS) CTAGS GTAGS all all-am check check-am \ clean clean-generic clean-libtool clean-recursive ctags \ ctags-recursive distclean distclean-generic distclean-libtool \ distclean-recursive distclean-tags distdir dvi dvi-am html \ html-am info info-am install install-am install-binSCRIPTS \ install-data install-data-am install-exampleDATA install-exec \ install-exec-am install-info install-info-am install-man \ install-strip installcheck installcheck-am installdirs \ installdirs-am maintainer-clean maintainer-clean-generic \ maintainer-clean-recursive mostlyclean mostlyclean-generic \ mostlyclean-libtool mostlyclean-recursive pdf pdf-am ps ps-am \ tags tags-recursive uninstall uninstall-am \ uninstall-binSCRIPTS uninstall-exampleDATA uninstall-info-am # This is done here to stay in the GNU coding standards # libexecdir can be modified at make time, so can't use # variable substitution at configure time swish-filter-test: swish-filter-test.in @rm -f swish-filter-test @sed \ -e 's,@@perlmoduledir@@,$(libexecdir)/perl,' \ -e 's,@@swishbindir@@,$(bindir),' \ -e 's,@@perlbinary@@,$(PERL),' \ $(srcdir)/swish-filter-test.in > swish-filter-test # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/filters/swish-filter-test.in����������������������������������������������������������0000664�0000771�0001750�00000023061�11166010112�015414� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!@@perlbinary@@ -w use strict; # This is set to where Swish-e's "make install" installed the helper modules. use lib ( '@@perlmoduledir@@' ); ################################################################################### # # Copyright (C) 2001 Bill Moseley swishscript@hank.org # Program to test the SWISH::Filter module # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License # as published by the Free Software Foundation; either version # 2 of the License, or (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # The above lines must remain at the top of this program # # $Id: swish-filter-test.in 1766 2005-06-09 18:58:24Z augur $ # #################################################################################### use Getopt::Long; use SWISH::Filter; use Pod::Usage; use URI; use constant ABORT => 0; use constant DEBUG => 1; use constant INFO => 2; my ( $verbose, $show_content, @file, @url, $help, $man, $quiet, $headers, $path, $depreciated, $mimetypes); my $skip_binary = 1; my $lines = 10; my $max_chars = 1000; GetOptions( 'verbose!' => \$verbose, # turn on INFO messages 'content!' => \$show_content, 'quiet' => \$quiet, 'lines=i' => \$lines, 'help|?' => \$help, 'man' => \$man, 'headers' => \$headers, 'skip_binary!' => \$skip_binary, 'path' => \$path, 'depreciated'=> \$depreciated, 'mimetypes' => \$mimetypes, ) || pod2usage(2); pod2usage( -verbose => 1 ) if $help; pod2usage( -verbose => 2 ) if $man; if ( $path ) { print '@@perlmoduledir@@',"\n"; exit; } pod2usage( -verbose => 0, -message => "Must specify a file or URL", -exitvar => 1, ) unless @ARGV or $mimetypes; $ENV{FILTER_DEBUG} = 1 if $verbose; # used by SWISH::Filter msg(INFO, "SWISH::Filter found at [%s]\n", $INC{'SWISH/Filter.pm'} ); my $filter = SWISH::Filter->new; if ( $mimetypes ) { my @filters = $filter->filter_list; print "Mimetypes:\n\n"; for my $filter ( @filters ) { print " $filter:\n"; for my $pattern ( $filter->mimetypes ) { print" $pattern\n"; } print "\n"; } } my $return = 0; for my $doc ( @ARGV ) { eval { $depreciated ? process_doc_old( $doc ) : process_doc( $doc ) }; $return = 1 if $@; warn "** $0:\n $@\n" if $@; # always warn on die } exit $return; sub process_doc { my ($file) = @_; my $uri; eval { $uri = URI->new( $file ) }; my %config = !$@ && $uri->scheme ? fetch_url( $file ) : fetch_file( $file); my $doc = $filter->convert( %config, name => $file, ); die "Failed to process document [$file]\n" unless $doc; my $content_type = $doc->content_type || "unknown"; my $parser_type = $doc->swish_parser_type || ''; my $binary = $doc->is_binary; my $msg = $doc->was_filtered ? '' : 'not'; my $name = $doc->name; msg(DEBUG, <<EOF ); Document $file was $msg filtered. Document: $file ($name) Content-Type: $content_type Parser type: $parser_type EOF if ( my $filters_used = $doc->filters_used ) { for my $filter ( @$filters_used ) { msg( DEBUG, " >Filter used: $filter->{name} ( $filter->{start_content_type} -> $filter->{end_content_type} )" ); } } if ( !$binary ) { my @doc = split /\n/, substr( ${$doc->fetch_doc}, 0, $max_chars ); $lines = @doc-1 if $lines >= @doc; msg(INFO, join "\n", '-- Output Content Sample --', @doc[0..$lines],'','-- end --','' ); } die "Skipping binary [$file]\n" if $binary && $skip_binary; if ($headers ) { my $len = length ${$doc->fetch_doc}; print "Path-Name: $file\nContent-Length: $len\n"; print "Document-Type: $parser_type\n" if $parser_type; print "\n"; } print ${$doc->fetch_doc} if $show_content; } sub process_doc_old { my ($file) = @_; my $uri; eval { $uri = URI->new( $file ) }; my %config = !$@ && $uri->scheme ? fetch_url( $file ) : fetch_file( $file); my $was_filtered = $filter->filter( %config, name => $file, ); my $content_type = $filter->content_type || "unknown"; my $orig_content_type = $filter->original_content_type || "unknown"; my $parser_type = $filter->swish_parser_type || ''; my $binary = $content_type !~ m[^text/]; my $msg = $was_filtered ? '' : 'not'; msg(DEBUG, <<EOF ); Document $file was $msg filtered. Document: $file Content-Type: $content_type (initial was $orig_content_type) Parser type: $parser_type EOF if ( !$binary ) { my @doc = split /\n/, substr( ${$filter->fetch_doc}, 0, $max_chars ); $lines = @doc-1 if $lines >= @doc; msg(INFO, join "\n", '-- Output Content Sample --', @doc[0..$lines],'','-- end --','' ); } die "Skipping binary [$file]\n" if $binary && $skip_binary; if ($headers ) { my $len = length ${$filter->fetch_doc}; print "Path-Name: $file\nContent-Length: $len\n"; print "Document-Type: $parser_type\n" if $parser_type; print "\n"; } print ${$filter->fetch_doc} if $show_content; } sub fetch_file { my $file = shift; # just try to open for error reporting open FH, "<$file" or die "Failed to open '$file': $!\n"; close FH; die "File '$file' has zero size\n" if -z $file; return ( document => $file ); } sub fetch_url { my $url = shift; eval { require LWP::UserAgent }; die "LWP::UserAgent is required to fetch a URL\n$@\n" if $@; my $ua = LWP::UserAgent->new; my $request = HTTP::Request->new('GET', $url ); my $response = $ua->request( $request ); die sprintf "Error while getting '%s'. Server returned %s.", $response->request->uri,$response->status_line unless $response->is_success; my $content = $response->content; my $content_type = $response->content_type; return ( document => \$content, content_type => $content_type, ); } sub msg { my $msg_level = shift; return if $quiet; return if !$verbose && $msg_level > DEBUG; printf( STDERR @_), print STDERR "\n"; } __END__ =head1 NAME swish-filter-test - program to test the SWISH::Filter module. =head1 SYNOPSIS swish-filter-test [options] <file or url> <...> Options: -quiet don't generate messages to stderr -content output content to stdout -(no)skip_binary skip output of binary files (default) -lines <num> Number of lines of content to display to stderr if verbose -headers output with headers for swish-e -S prog method -verbose enable $ENV{FILTER_DEBUG} for verbose output -path output @INC path to SWISH::Filter module -help brief help message -man full documentation =head1 DESCRIPTION swish-filter-test is a program to test the Perl module SWISH::Filter. SWISH::Filter is a module that is included with the swish-e distribution. Documents supplied to this script (as a URL or a plain file) on the command line are passed to the SWISH::Filter module. This is useful for testing filters. The SWISH::Filter module works by checking a document's content-type and looking for an available filter. Most filters require additional helper programs (e.g. the filter to convert PDF to HTML requires the Xpdf package). Using the -verbose options should indicate if a required program is missing. Options to this script control how much output is generated. Options can also be specified to generate output that can be piped directly to swish-e (see C<-headers> below). All messages are sent to stderr unless --content or -headers are specified and then content is sent to stdout. =head1 OPTIONS =over 8 =item B<-quiet> Don't write info out to stderr. Normally not useful unless you just want to filter a document and not really test the SWISH::Filter module. Fatal errors are written to stderr error regardless of the -quiet option. =item B<-verbose> Enables FILTER_DEBUG mode in the SWISH::Filter module, and enables extra info including a summary of the filtered document to stderr. Enable if trying to find out why a filter does not work. =item B<-lines> Number of summary lines of summary content to show. Summary lines are only showed if -verbose is selected. Lines are sent to stderr, not stdout. Note, the summary is limited to 1000 characters regardless of this option. =item B<-content> Specifying -content will output full content to stdout. The default is to only display a few lines. =item B<-(no)skip_binary> The default is to not output content from binary files. -noskip_binary will disable this. =item B<-headers> Prints the headers used by swish-e when reading documents with -S prog. This can be used to filter documents directly to swish-e: swish-filter-test -headers -content http://localhost/ test.pdf | swish-e -S prog -i stdin -v1 =item B<-path> Prints the installed location of the SWISH::Filter parent directory for use in PERL5LIB, Allows using SWISH::Filter in other programs, or with the Swish-e -S http method with swishspider. For example: PERL5LIB=`swish-filter-test -path` swish-e -S http -i http://localhost =item B<-help> Print a brief help message and exits. =item B<-man> Prints the manual page and exits. =back =cut �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/����������������������������������������������������������������������������������0000777�0000771�0001750�00000000000�11166013172�010677� 5����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/dump.h����������������������������������������������������������������������������0000664�0000771�0001750�00000003143�11166010110�011721� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/* ** $Id: dump.h 1736 2005-05-12 15:41:22Z karman $ This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 18:19:34 CDT 2005 ** added GPL ** ** ** ** 2001-01 jose initial coding ** */ void DB_decompress(SWISH * sw, IndexFILE * indexf, int begin, int maxhits); void dump_file_list( SWISH *sw, IndexFILE *indexf ); void dump_memory_file_list( SWISH *sw, IndexFILE *indexf ); void dump_metanames( SWISH *sw, IndexFILE *indexf, int check_presorted ); void dump_file_properties(IndexFILE * indexf, FileRec *fi ); void dump_single_property( propEntry *prop, struct metaEntry *meta_entry ); void dump_words_per_file(SWISH *sw, IndexFILE * indexf, FileRec *fi ); void dump_word_count( SWISH *sw, IndexFILE *indexf, int begin, int maxhits ) ; �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/mem.c�����������������������������������������������������������������������������0000664�0000771�0001750�00000041110�11166010110�011521� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/* $Id: mem.c 2290 2009-03-31 01:48:53Z karpet $ ** This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 15:08:15 CDT 2005 ** added GPL ** ** Author: Bill Meier, June 2001 ** ** To Do: ** Memory statistics doesn't really require use of Mem_ routines */ #include <stdio.h> #include <stdlib.h> #include <memory.h> #include "swish.h" #include "error.h" #include "swstring.h" #include "mem.h" /* we can use the real ones here! */ #undef malloc #undef realloc #undef free /* Alignment size in bytes */ /* $$$ This needs to be fixed -- need an autoconf macro that will detect the correct value. */ /* Check the Postgres configure script for a macro that may work */ #if defined(__sparc__) || defined(__mips64) /* __sparc__ is not exactly correct because not all sparc machines require 8, but minor difference in memory usage */ /* __mips64 it to catch IRIX 6.5 */ #define PointerAlignmentSize 8 #else #define PointerAlignmentSize sizeof(long) #endif /* typical machine has pagesize 4096 (not critical anyway, just can help malloc) */ #define pageSize (1<<12) /* simple cases first ... */ /* emalloc - malloc a block of memory */ void *ecalloc(size_t nelem, size_t size) { void *p; p = emalloc(nelem * size); memset(p,0,size); return p; } #if ! (MEM_DEBUG | MEM_TRACE | MEM_STATISTICS) /* emalloc - malloc a block of memory */ void *emalloc(size_t size) { void *p; if ((p = malloc(size)) == NULL) progerr("Ran out of memory (could not malloc %lu more bytes)!", size); return p; } /* erealloc - realloc a block of memory */ void *erealloc(void *ptr, size_t size) { void *p; if ((p = realloc(ptr, size)) == NULL) progerr("Ran out of memory (could not reallocate %lu more bytes)!", size); return p; } /* efree - free a block of memory */ void efree(void *ptr) { free(ptr); } /* Mem_Summary - print out a memory usage summary */ void Mem_Summary(char *title, int final) { return; } #else // (MEM_DEBUG | MEM_TRACE | MEM_STATISTICS) /* ** Now we get into the debugging/trace memory allocator. I apologize for the ** conditionals in this code, but it does work :-) */ /* parameters of typical allocators (just for statistics) */ #define MIN_CHUNK 16 #define CHUNK_ROUNDOFF (sizeof(long) - 1) /* Random value for a GUARD at beginning and end of allocated memory */ #define GUARD 0x14739182 #if MEM_TRACE size_t memory_trace_counter = 0; // mem break point: helps a little to track down unfreed memory // Say if you see // Unfreed: string.c line 958: Size: 11 Counter: 402 // in gdb: watch memory_trace_counter == 402 typedef struct { char *File; int Line; // void *Ptr; // only needed for extra special debugging size_t Size; size_t Count; } TraceBlock; #endif /* MemHeader is what is before the user allocated block - for our bookkeeping */ typedef struct { #if MEM_TRACE TraceBlock *Trace; #endif #if MEM_DEBUG unsigned long Guard1; #endif size_t Size; #if MEM_DEBUG void *Start; unsigned long Guard2; #endif } MemHeader; /* MemTail is what is after the user allocated block - for our bookkeeping */ #if MEM_DEBUG typedef struct { unsigned long Guard; } MemTail; /* Define the extra amount of memory we need for our bookkeeping */ #define MEM_OVERHEAD_SIZE (sizeof(MemHeader) + sizeof(MemTail)) #else #define MEM_OVERHEAD_SIZE (sizeof(MemHeader)) #endif #if MEM_TRACE #define MAX_TRACE 1000000 static TraceBlock TraceData[MAX_TRACE]; static TraceBlock *Free = NULL; static int last; #endif static unsigned int MAllocCalls = 0; static unsigned int MReallocCalls = 0; static unsigned int MFreeCalls = 0; static size_t MAllocCurrentOverhead = 0; static size_t MAllocMaximumOverhead = 0; static size_t MAllocCurrent = 0; static size_t MAllocMaximum = 0; static size_t MAllocCurrentEstimated = 0; static size_t MAllocMaximumEstimated = 0; #if MEM_DEBUG #define MEM_ERROR(str) \ { \ printf("\nMemory free error! At %s line %d\n", file, line); \ printf str; \ fflush(stdout); \ } #endif #if MEM_TRACE static TraceBlock *AllocTrace(char *file, int line, void *ptr, size_t size) { TraceBlock *Block; int i; if (Free) { Block = Free; Free = NULL; } else { for (i = last; i < MAX_TRACE; i++) /** $$$ What if we run off the end? **/ if (TraceData[i].File == NULL) break; Block = &TraceData[i]; last = i + 1; } Block->File = file; Block->Line = line; // Block->Ptr = ptr; // only needed for extra special debugging Block->Size = size; Block->Count = ++memory_trace_counter; return Block; } #endif static size_t Estimated(size_t size) { if (size <= MIN_CHUNK) return MIN_CHUNK; return (size + CHUNK_ROUNDOFF) & (~CHUNK_ROUNDOFF); } /* Mem_Alloc - Allocate a chunk of memory */ void * Mem_Alloc (size_t Size, char *file, int line) /* * FUNCTIONAL DESCRIPTION: * * This routine will allocate a chunk of memory of a specified size. * * FORMAL PARAMETERS: * * Size Size in bytes of the chunk to allocate * * ROUTINE VALUE: * * Address of allocated memory * * SIDE EFFECTS: * * Fatal error and exit if can't allocate memory */ { size_t MemSize; /* Actual size of memory to alloc */ unsigned char *MemPtr; /* Pointer to allocated memory */ MemHeader *Header; /* Header of memory */ /* * Adjust size to account for the memory header and tail information we * include. This is so we know how much we allocated to be able to free it, * and also in support of memory debugging routines. */ MemSize = Size + MEM_OVERHEAD_SIZE; /* * Get the memory; die if we can't... */ MemPtr = (unsigned char *)malloc(MemSize); if (MemPtr == NULL) { printf("At file %s line %d:\n", file, line); progerr("Ran out of memory (could not Mem_Alloc %lu more bytes)!", MemSize); } /* * Keep a running total of the memory allocated for statistical purposes. * Save the chunk size in the first long word of the chunk, and return the * address of the chunk (following the chunk size) to the caller. */ MAllocCalls++; MAllocCurrentOverhead += MEM_OVERHEAD_SIZE; if (MAllocCurrentOverhead > MAllocMaximumOverhead) MAllocMaximumOverhead = MAllocCurrentOverhead; MAllocCurrent += Size; MAllocCurrentEstimated += Estimated(Size); if (MAllocCurrent > MAllocMaximum) MAllocMaximum = MAllocCurrent; if (MAllocCurrentEstimated > MAllocMaximumEstimated) MAllocMaximumEstimated = MAllocCurrentEstimated; Header = (MemHeader *)MemPtr; Header->Size = Size; MemPtr = MemPtr + sizeof (MemHeader); /* * Add guards, and fill the memory with a pattern */ #if MEM_DEBUG { MemTail *Tail; Header->Start = MemPtr; Header->Guard1 = GUARD; Header->Guard2 = GUARD; memset(MemPtr, 0xAA, Size); Tail = (MemTail *)(MemPtr + Size); Tail->Guard = GUARD; } #endif #if MEM_TRACE Header->Trace = AllocTrace(file, line, MemPtr, Size); #endif // printf("Alloc: %s line %d: Addr: %08X Size: %u\n", file, line, MemPtr, Size); return (MemPtr); } /* Mem_Free - Free a chunk of memory */ void Mem_Free (void *Address, char *file, int line) /* * FUNCTIONAL DESCRIPTION: * * This routine will free a chunk of previously allocated memory. * This memory must have been allocated via Mem_Alloc. * * FORMAL PARAMETERS: * * Address Address of the memory chunk to free * * ROUTINE VALUE: * * NONE * * SIDE EFFECTS: * * Severe error is signaled if can't free the memory */ { MemHeader *Header; size_t MemSize; /* Size of chunk to free */ size_t UserSize; void *MemPtr; /* ANSI allows free of NULL */ if (!Address) return; /* * Get the size of the chunk to free from the long word preceding the * address of the chunk. */ Header = (MemHeader *)Address - 1; #if MEM_DEBUG { MemTail *Tail; if ( (long)Address & (~(PointerAlignmentSize-1)) != 0 ) MEM_ERROR(("Address %08X not longword aligned\n", (unsigned int)Address)); if (Address != Header->Start) MEM_ERROR(("Already free: %08X\n", (unsigned int)Address)); // Err_Signal (PWRK$_BUGMEMFREE, 1, Address); if (Header->Guard1 != GUARD) MEM_ERROR(("Head Guard 1 overwritten: %08X\n", (unsigned int)&Header->Guard1)); // Err_Signal (PWRK$_BUGMEMGUARD1, 4, Address, // Header->Guard1, 4, &Header->Guard1); if (Header->Guard2 != GUARD) MEM_ERROR(("Head Guard 2 overwritten: %08X\n", (unsigned int)&Header->Guard2)); // Err_Signal (PWRK$_BUGMEMGUARD1, 4, Address, // Header->Guard2, 4, &Header->Guard2); Tail = (MemTail *)((unsigned char *)Address + Header->Size); if (Tail->Guard != GUARD) MEM_ERROR(("Tail Guard overwritten: %08X\n", (unsigned int)&Tail->Guard)); // Err_Signal (PWRK$_BUGMEMGUARD2, 4, Address, // Tail->Guard, 4, &Tail->Guard); } #endif MFreeCalls++; MemPtr = (unsigned char *)Header; UserSize = Header->Size; MemSize = UserSize + MEM_OVERHEAD_SIZE; // printf("Free: %s line %d: Addr: %08X Size: %u\n", file, line, Address, UserSize); #if MEM_TRACE Free = Header->Trace; // printf(" Allocated at %s line %d:\n", Free->File, Free->Line); Free->File = NULL; #endif #if MEM_DEBUG memset (MemPtr, 0xDD, MemSize); #endif /* Subtract chunk size from running total */ MAllocCurrent -= UserSize; MAllocCurrentEstimated -= Estimated(UserSize); MAllocCurrentOverhead -= MEM_OVERHEAD_SIZE; /* Free the memory */ free(MemPtr); } /* Mem_Realloc - realloc a block of memory */ void *Mem_Realloc (void *Address, size_t Size, char *file, int line) { void *MemPtr; MemHeader *Header; size_t OldSize; MReallocCalls++; MAllocCalls--; MFreeCalls--; MemPtr = Mem_Alloc(Size, file, line); if (Address) { Header = (MemHeader *)Address - 1; OldSize = Header->Size; memcpy(MemPtr, Address, (OldSize < Size ? OldSize : Size)); Mem_Free(Address, file, line); } // printf("Realloc: %s line %d: Addr: %08X Size: %u to %u\n", file, line, Address, OldSize, Size); return MemPtr; } /* Mem_Summary - Give a memory usage summary */ void Mem_Summary(char *title, int final) { #if MEM_STATISTICS printf("\nMemory usage summary: %s\n\n", title); printf("Alloc calls: %u, Realloc calls: %u, Free calls: %u\n", MAllocCalls, MReallocCalls, MFreeCalls); printf("Requested: Maximum usage: %u, Current usage: %u\n", MAllocMaximum, MAllocCurrent); printf("Estimated: Maximum usage: %u, Current usage: %u\n", MAllocMaximumEstimated, MAllocCurrentEstimated); #endif #if MEM_TRACE if (final) { int i; unsigned ttl = 0; printf("\nUnfreed memory: %s\n\n", title); for (i = 0; i < MAX_TRACE; i++) if (TraceData[i].File) { printf("Unfreed: %s line %d: Size: %d Counter: %d\n", TraceData[i].File, TraceData[i].Line, TraceData[i].Size, TraceData[i].Count); ttl += TraceData[i].Size; } printf("Total Unfreed size: %d\n", ttl ); } #endif } #endif /************************************************************************* ** ** Mem Zone routines -- efficient memory allocation if you don't need ** realloc and free... ** */ /* round up to a long word */ #define ROUND_LONG(n) (((n) + PointerAlignmentSize - 1) & (~(PointerAlignmentSize - 1))) /* round up to a page */ #define ROUND_PAGE(n) (((n) + pageSize - 1) & (~(pageSize - 1))) typedef struct _zone { struct _zone *next; /* link to next chunk */ size_t free; /* bytes free in this chunk */ unsigned char *ptr; /* start of free space in this chunk */ void *alloc; /* ptr to malloced memory (for free) */ size_t size; /* size of allocation (for statistics) */ } ZONE; /* allocate a chunk of memory from the OS */ static ZONE *allocChunk(size_t size) { ZONE *zone; zone = emalloc(sizeof(ZONE)); zone->alloc = emalloc(size); zone->size = size; zone->ptr = zone->alloc; zone->free = size; zone->next = NULL; return zone; } /* create a memory zone */ MEM_ZONE *Mem_ZoneCreate(char *name, size_t size, int attributes) { MEM_ZONE *head; head = emalloc(sizeof(MEM_ZONE)); head->name = estrdup(name); size = ROUND_PAGE(size); if (size == 0) size = pageSize*64; head->size = size; head->attributes = attributes; head->allocs = 0; head->next = NULL; return head; } /* allocate memory from a zone (can use like malloc if you aren't going to realloc) */ void *Mem_ZoneAlloc(MEM_ZONE *head, size_t size) { ZONE *zone; ZONE *newzone; unsigned char *ptr; /* statistics */ head->allocs++; size = ROUND_LONG(size); zone = head->next; /* If not enough free in this chunk, allocate a new one. Don't worry about the small amount of unused space at the end. If we are asking for a really big chunk allocate a new buffer just for that! */ if (!zone || (zone->free < size)) { newzone = allocChunk(size > head->size ? size : head->size); head->next = newzone; newzone->next = zone; zone = newzone; } /* decrement free, advance pointer, and return allocation to the user */ zone->free -= size; ptr = zone->ptr; zone->ptr += size; return ptr; } void Mem_ZoneFree(MEM_ZONE **head) { ZONE *next; ZONE *tmp; if (!*head) return; #if MEM_STATISTICS Mem_ZoneStatistics(*head); #endif next = (*head)->next; while (next) { efree(next->alloc); tmp = next->next; efree(next); next = tmp; } efree((*head)->name); efree(*head); *head = NULL; } #if MEM_STATISTICS void Mem_ZoneStatistics(MEM_ZONE *head) { int chunks = 0; size_t used = 0; size_t free = 0; size_t wasted = 0; ZONE *zone; for (zone = head->next; zone; zone = zone->next) { if (zone == head->next) free = zone->free; chunks++; used += zone->size - zone->free; wasted += zone->free; } wasted -= free; printf("Zone '%s':\n Chunks:%d, Allocs:%u, Used:%u, Free:%u, Wasted:%u\n", head->name, chunks, head->allocs, used, free, wasted); } #endif /* Frees all memory chunks but preserves head */ /* 2001-17 jmruiz modified to avoid the document peak problem (one document can use a lot of memory and, in the old way, this memory was never reused) */ void Mem_ZoneReset(MEM_ZONE *head) { ZONE *next, *tmp; if (!head) return; head->allocs = 0; next = head->next; while (next) { efree(next->alloc); tmp = next->next; efree(next); next = tmp; } head->next = NULL; } /* Routine that returns the amount of memory allocated by a zone */ int Mem_ZoneSize(MEM_ZONE *head) { ZONE *next; int size = 0; if (!head) return 0; next = head->next; while (next) { size += next->size; next = next->next; } return size; } ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/proplimit.h�����������������������������������������������������������������������0000775�0000771�0001750�00000003044�11166010110�012776� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/* $Id: proplimit.h 1736 2005-05-12 15:41:22Z karman $ This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 18:19:34 CDT 2005 ** added GPL */ #ifndef __HasSeenModule_PropLimit #define __HasSeenModule_PropLimit 1 void SwishResetSearchLimit( SEARCH_OBJECT *srch ); int SwishSetSearchLimit(SEARCH_OBJECT *srch, char *propertyname, char *low, char *hi); /* internal use */ void ClearLimitParams( LIMIT_PARAMS *params ); LIMIT_PARAMS *setlimit_params( SWISH *sw, LIMIT_PARAMS *params, char *propertyname, char *low, char *hi ); int Prepare_PropLookup(SEARCH_OBJECT * srch ); int LimitByProperty( IndexFILE *indexf, PROP_LIMITS *prop_limits, int filenum ); #endif ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/bash.c����������������������������������������������������������������������������0000664�0000771�0001750�00000020655�11166010110�011673� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/* * Copyright (C) 1987 - 1999 Free Software Foundation, Inc. * * This file is based on stuff from GNU Bash 1.14.7, the Bourne Again SHell. * Everything that was changed is marked with the word `CHANGED'. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version 2 * of the License, or (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ /* Mon May 9 15:57:53 CDT 2005 did not update license info here because (a) the original is still ok per GPL (b) the original is lifted from another package and it's better to preserve (c) this file is only used in swish-e binary, not libswish-e */ #include "sys.h" #include "mem.h" /* swish-e memory */ #include <sys/stat.h> #include "bash.h" /* bash uses GID_T, autoconf defines GETGROUPS_T */ #define GID_T GETGROUPS_T /* * CHANGED: * Perhaps these need new configure.in entries. * The following macro's are used in bash, and below: */ #undef SHELL #undef AFS #undef NOGROUP /* * CHANGED: * - Added prototypes, * - used ANSI function arguments, * - made all functions static and * - changed all occurences of 'char *' into 'const char *' where possible. * - exported functions needed in which.c */ static int group_member (GID_T gid); static char *extract_colon_unit (const char *string, int *p_index); /*=========================================================================== * * Almost everything below is from execute_cmd.c from bash-1.14.7, * a few functions are from other files: test.c, general.c and variables.c. * */ #if defined (HAVE_GETGROUPS) /* The number of groups that this user is a member of. */ static int ngroups = 0; static GID_T *group_array = (GID_T *)NULL; static int default_group_array_size = 0; #endif /* HAVE_GETGROUPS */ #if !defined (NOGROUP) # define NOGROUP (GID_T) -1 #endif /* Return non-zero if GID is one that we have in our groups list. */ static int group_member (GID_T gid) { static GID_T pgid = (GID_T)NOGROUP; static GID_T egid = (GID_T)NOGROUP; if (pgid == (GID_T)NOGROUP) #if defined (SHELL) pgid = (GID_T) current_user.gid; #else /* !SHELL */ pgid = (GID_T) getgid (); #endif /* !SHELL */ if (egid == (GID_T)NOGROUP) #if defined (SHELL) egid = (GID_T) current_user.egid; #else /* !SHELL */ egid = (GID_T) getegid (); #endif /* !SHELL */ if (gid == pgid || gid == egid) return (1); #if defined (HAVE_GETGROUPS) /* getgroups () returns the number of elements that it was able to place into the array. We simply continue to call getgroups () until the number of elements placed into the array is smaller than the physical size of the array. */ while (ngroups == default_group_array_size) { default_group_array_size += 64; group_array = (GID_T *) xrealloc (group_array, default_group_array_size * sizeof (GID_T)); ngroups = getgroups (default_group_array_size, group_array); } /* In case of error, the user loses. */ if (ngroups < 0) return (0); /* Search through the list looking for GID. */ { register int i; for (i = 0; i < ngroups; i++) if (gid == group_array[i]) return (1); } #endif /* HAVE_GETGROUPS */ return (0); } #define u_mode_bits(x) (((x) & 0000700) >> 6) #define g_mode_bits(x) (((x) & 0000070) >> 3) #define o_mode_bits(x) (((x) & 0000007) >> 0) #define X_BIT(x) ((x) & 1) /* Return some flags based on information about this file. The EXISTS bit is non-zero if the file is found. The EXECABLE bit is non-zero the file is executble. Zero is returned if the file is not found. */ int file_status (const char *name) { struct stat finfo; static int user_id = -1; /* Determine whether this file exists or not. */ if (stat (name, &finfo) < 0) return (0); /* If the file is a directory, then it is not "executable" in the sense of the shell. */ if (S_ISDIR (finfo.st_mode)) return (FS_EXISTS); #if defined (AFS) /* We have to use access(2) to determine access because AFS does not support Unix file system semantics. This may produce wrong answers for non-AFS files when ruid != euid. I hate AFS. */ if (access (name, X_OK) == 0) return (FS_EXISTS | FS_EXECABLE); else return (FS_EXISTS); #else /* !AFS */ /* Find out if the file is actually executable. By definition, the only other criteria is that the file has an execute bit set that we can use. */ if (user_id == -1) user_id = geteuid (); /* CHANGED: bash uses: current_user.euid; */ /* Root only requires execute permission for any of owner, group or others to be able to exec a file. */ if (user_id == 0) { int bits; bits = (u_mode_bits (finfo.st_mode) | g_mode_bits (finfo.st_mode) | o_mode_bits (finfo.st_mode)); if (X_BIT (bits)) return (FS_EXISTS | FS_EXECABLE); } /* If we are the owner of the file, the owner execute bit applies. */ if (user_id == finfo.st_uid ) return X_BIT (u_mode_bits (finfo.st_mode)) ? (FS_EXISTS | FS_EXECABLE) : FS_EXISTS; /* If we are in the owning group, the group permissions apply. */ if (group_member (finfo.st_gid) ) return X_BIT (g_mode_bits (finfo.st_mode)) ? (FS_EXISTS | FS_EXECABLE) : FS_EXISTS; /* If `others' have execute permission to the file, then so do we, since we are also `others'. */ return X_BIT (o_mode_bits (finfo.st_mode)) ? (FS_EXISTS | FS_EXECABLE) : FS_EXISTS; #endif /* !AFS */ } /* Return 1 if STRING is an absolute program name; it is absolute if it contains any slashes. This is used to decide whether or not to look up through $PATH. */ int absolute_program (const char *string) { return ((char *)strchr (string, '/') != (char *)NULL); } /* Given a string containing units of information separated by colons, return the next one pointed to by (P_INDEX), or NULL if there are no more. Advance (P_INDEX) to the character after the colon. */ char * extract_colon_unit (const char *string, int *p_index) { int i, start; char path_separator; #if defined( PATH_SEPARATOR ) path_separator = PATH_SEPARATOR[0]; #else path_separator = ':'; #endif i = *p_index; if (!string || (i >= (int)strlen (string))) return ((char *)NULL); /* Each call to this routine leaves the index pointing at a colon if there is more to the path. If I is > 0, then increment past the `:'. If I is 0, then the path has a leading colon. Trailing colons are handled OK by the `else' part of the if statement; an empty string is returned in that case. */ if (i && string[i] == path_separator ) i++; start = i; while (string[i] && string[i] != path_separator ) i++; *p_index = i; if (i == start) { if (string[i]) (*p_index)++; /* Return "" in the case of a trailing `:'. */ return (savestring ("")); } else { char *value; value = xmalloc (1 + i - start); strncpy (value, string + start, i - start); value [i - start] = '\0'; return (value); } } /* Return the next element from PATH_LIST, a colon separated list of paths. PATH_INDEX_POINTER is the address of an index into PATH_LIST; the index is modified by this function. Return the next element of PATH_LIST or NULL if there are no more. */ char * get_next_path_element (const char *path_list, int *path_index_pointer) { char *path; path = extract_colon_unit (path_list, path_index_pointer); if (!path) return (path); if (!*path) { xfree (path); path = savestring ("."); } return (path); } /* Turn PATH, a directory, and NAME, a filename, into a full pathname. This allocates new memory and returns it. */ char * make_full_pathname (const char *path, const char *name, int name_len) { char *full_path; int path_len; path_len = strlen (path); full_path = (char *) xmalloc (2 + path_len + name_len); strcpy (full_path, path); full_path[path_len] = '/'; strcpy (full_path + path_len + 1, name); return (full_path); } �����������������������������������������������������������������������������������swish-e-2.4.7/src/date_time.c�����������������������������������������������������������������������0000664�0000771�0001750�00000004335�11166010110�012706� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/* $Id: date_time.c 1736 2005-05-12 15:41:22Z karman $ This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 15:51:39 CDT 2005 ** added GPL ** ** 2001-03-20 rasc own module for this routine (from swish.c) ** */ #include <time.h> #include "swish.h" #include "mem.h" #include "date_time.h" #include "getruntime.c" /* -- TimeHiRes returns a ClockTick value (double) -- in seconds.fractions */ #ifdef HAVE_BSDGETTIMEOFDAY #define gettimeofday BSDgettimeofday #endif #ifdef NO_GETTOD double TimeElapsed(void) { #ifdef HAVE_SYS_TIMEB_H #include <sys/timeb.h> struct timeb ftimebuf; ftime(&ftimebuf); return (double)ftimebuf.time + (double)ftimebuf.millitm/1000.0; #else return ((double) clock()) / CLOCKS_PER_SEC; #endif } #else #include <sys/time.h> double TimeElapsed(void) { struct timeval t; int i; i = gettimeofday( &t, NULL ); if ( i ) return 0; return (double)( t.tv_sec + t.tv_usec / 1000000.0 ); } #endif /* return CPU time used */ double TimeCPU(void) { return (double) get_cpu_secs(); } /* Returns the nicely formatted date. Returns ISO like char */ char *getTheDateISO() { char *date; time_t now; date=emalloc(MAXSTRLEN); now = time(NULL); /* 2/22/00 - switched to 4-digit year (%Y vs. %y) */ strftime(date, MAXSTRLEN, "%Y-%m-%d %H:%M:%S %Z", (struct tm *) localtime(&now)); return date; } ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/db_write.c������������������������������������������������������������������������0000775�0000771�0001750�00000071716�11166010110�012564� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/* $Id: db_write.c 1945 2007-10-22 14:54:07Z karpet $ This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 15:51:39 CDT 2005 ** added GPL ** ** 2001-05-07 jmruiz init coding ** */ #include "swish.h" #include "mem.h" #include "swstring.h" #include "index.h" #include "hash.h" #include "date_time.h" #include "compress.h" #include "error.h" #include "metanames.h" #include "db.h" #include "db_native.h" // #include "db_berkeley_db.h" #ifndef min #define min(a, b) (a) < (b) ? a : b #endif /* General write DB routines - Common to all DB */ static int write_hash_words_to_header(SWISH *sw, int header_ID, struct swline **hash, void *DB); /* Header routines */ #define write_header_int(sw,id,num,DB) {unsigned long itmp = (num); itmp = PACKLONG(itmp); DB_WriteHeaderData((sw),(id), (unsigned char *)&itmp, sizeof(long), (DB));} #define write_header_int2(sw,id,num1,num2,DB)\ { \ unsigned long itmp[2]; \ itmp[0] = (num1); \ itmp[1] = (num2); \ itmp[0] = PACKLONG(itmp[0]); \ itmp[1] = PACKLONG(itmp[1]); \ DB_WriteHeaderData((sw),(id), (unsigned char *)itmp, sizeof(long) * 2, (DB)); \ } #define write_header_int4(sw,id,num1,num2,num3,num4,DB)\ { \ unsigned long itmp[4]; \ itmp[0] = (num1); \ itmp[1] = (num2); \ itmp[2] = (num3); \ itmp[3] = (num4); \ itmp[0] = PACKLONG(itmp[0]); \ itmp[1] = PACKLONG(itmp[1]); \ itmp[2] = PACKLONG(itmp[2]); \ itmp[3] = PACKLONG(itmp[3]); \ DB_WriteHeaderData((sw),(id), (unsigned char *)itmp, sizeof(long) * 4, (DB)); \ } void write_header(SWISH *sw, int merged_flag ) { IndexFILE *indexf = sw->indexlist; /* first element in the list */ INDEXDATAHEADER *header = &indexf->header; char *filename = indexf->line; void *DB = indexf->DB; char *c; char *tmp; /* $$$ this isn't portable */ c = (char *) strrchr(filename, '/'); if (!c || (c && !*(c + 1))) c = filename; else c += 1; DB_InitWriteHeader(sw, DB); DB_WriteHeaderData(sw, INDEXHEADER_ID, (unsigned char *)INDEXHEADER, strlen(INDEXHEADER) +1, DB); DB_WriteHeaderData(sw, INDEXVERSION_ID, (unsigned char *)INDEXVERSION, strlen(INDEXVERSION) + 1, DB); write_header_int(sw, MERGED_ID, merged_flag, DB); DB_WriteHeaderData(sw, NAMEHEADER_ID, (unsigned char *)header->indexn, strlen(header->indexn) + 1, DB); DB_WriteHeaderData(sw, SAVEDASHEADER_ID, (unsigned char *)c, strlen(c) + 1, DB); write_header_int4(sw, COUNTSHEADER_ID, header->totalwords, header->totalfiles, header->total_word_positions + indexf->total_word_positions_cur_run, /* Total this run's total words (not unique) with any previous */ header->removedfiles, DB); tmp = getTheDateISO(); DB_WriteHeaderData(sw, INDEXEDONHEADER_ID, (unsigned char *)tmp, strlen(tmp) + 1,DB); efree(tmp); DB_WriteHeaderData(sw, DESCRIPTIONHEADER_ID, (unsigned char *)header->indexd, strlen(header->indexd) + 1, DB); DB_WriteHeaderData(sw, POINTERHEADER_ID, (unsigned char *)header->indexp, strlen(header->indexp) + 1, DB); DB_WriteHeaderData(sw, MAINTAINEDBYHEADER_ID, (unsigned char *)header->indexa, strlen(header->indexa) + 1,DB); write_header_int(sw, DOCPROPENHEADER_ID, 1, DB); write_header_int(sw, FUZZYMODEHEADER_ID, (int)fuzzy_mode_value(header->fuzzy_data), DB); write_header_int(sw, IGNORETOTALWORDCOUNTWHENRANKING_ID, header->ignoreTotalWordCountWhenRanking, DB); DB_WriteHeaderData(sw, WORDCHARSHEADER_ID, (unsigned char *)header->wordchars, strlen(header->wordchars) + 1, DB); write_header_int(sw, MINWORDLIMHEADER_ID, header->minwordlimit, DB); write_header_int(sw, MAXWORDLIMHEADER_ID, header->maxwordlimit, DB); DB_WriteHeaderData(sw, BEGINCHARSHEADER_ID, (unsigned char *)header->beginchars, strlen(header->beginchars) + 1, DB); DB_WriteHeaderData(sw, ENDCHARSHEADER_ID, (unsigned char *)header->endchars, strlen(header->endchars) + 1, DB); DB_WriteHeaderData(sw, IGNOREFIRSTCHARHEADER_ID, (unsigned char *)header->ignorefirstchar, strlen(header->ignorefirstchar) + 1, DB); DB_WriteHeaderData(sw, IGNORELASTCHARHEADER_ID, (unsigned char *)header->ignorelastchar, strlen(header->ignorelastchar) + 1,DB); /* Removed - Patents write_header_int(FILEINFOCOMPRESSION_ID, header->applyFileInfoCompression, DB); */ /* Jose Ruiz 06/00 Added this line to delimite the header */ write_integer_table_to_header(sw, TRANSLATECHARTABLE_ID, header->translatecharslookuptable, sizeof(header->translatecharslookuptable) / sizeof(int), DB); /* Other header stuff */ /* StopWords */ write_hash_words_to_header(sw, STOPWORDS_ID, header->hashstoplist.hash_array, DB); /* Metanames */ write_MetaNames(sw, METANAMES_ID, header, DB); /* BuzzWords */ write_hash_words_to_header(sw, BUZZWORDS_ID, header->hashbuzzwordlist.hash_array, DB); #ifndef USE_BTREE /* Write the total words per file array, if used */ if ( !header->ignoreTotalWordCountWhenRanking ) write_integer_table_to_header(sw, TOTALWORDSPERFILE_ID, header->TotalWordsPerFile, header->totalfiles, DB); #endif write_header_int(sw, TOTALWORDS_REMOVED_ID, header->removed_word_positions, DB); DB_EndWriteHeader(sw, DB); } /* Jose Ruiz 11/00 ** Function to write a word to the index DB */ void write_word(SWISH * sw, ENTRY * ep, IndexFILE * indexf) { sw_off_t wordID; wordID = DB_GetWordID(sw, indexf->DB); DB_WriteWord(sw, ep->word,wordID,indexf->DB); /* Store word offset for futher hash computing */ ep->u1.wordID = wordID; } #ifdef USE_BTREE /* 04/2002 jmruiz ** Routine to update wordID */ void update_wordID(SWISH * sw, ENTRY * ep, IndexFILE * indexf) { sw_off_t wordID; wordID = DB_GetWordID(sw, indexf->DB); DB_UpdateWordID(sw, ep->word,wordID,indexf->DB); /* Store word offset for futher hash computing */ ep->u1.wordID = wordID; } void delete_worddata(SWISH * sw, sw_off_t wordID, IndexFILE * indexf) { DB_DeleteWordData(sw,wordID,indexf->DB); } #endif /* Jose Ruiz 11/00 ** Function to write all word's data to the index DB */ void build_worddata(SWISH * sw, ENTRY * ep) { int curmetaID, sz_worddata; unsigned long tmp, curmetanamepos; int metaID; int chunk_size; unsigned char *compressed_data, *p,*q; LOCATION *l, *next; curmetaID=0; curmetanamepos=0L; q=sw->Index->worddata_buffer; /* Write tfrequency */ q = compress3(ep->tfrequency,q); /* Write location list */ for(l=ep->allLocationList;l;) { compressed_data = (unsigned char *) l; /* Get next element */ next = *(LOCATION **)compressed_data; /* Jump pointer to next element */ p = compressed_data + sizeof(LOCATION *); metaID = uncompress2(&p); memcpy((char *)&chunk_size,(char *)p,sizeof(chunk_size)); p += sizeof(chunk_size); if(curmetaID!=metaID) { if(curmetaID) { /* Write in previous meta (curmetaID) ** file offset to next meta */ tmp=q - sw->Index->worddata_buffer; PACKLONG2(tmp,sw->Index->worddata_buffer+curmetanamepos); } /* Check for enough memory */ /* ** MAXINTCOMPSIZE is for the worst case metaID ** ** sizeof(long) is to leave four bytes to ** store the offset of the next metaname ** (it will be 0 if no more metanames). ** ** 1 is for the trailing '\0' */ tmp=q - sw->Index->worddata_buffer; if((long)(tmp + MAXINTCOMPSIZE + sizeof(long) + 1) >= (long)sw->Index->len_worddata_buffer) { sw->Index->len_worddata_buffer=sw->Index->len_worddata_buffer*2+MAXINTCOMPSIZE+sizeof(long)+1; sw->Index->worddata_buffer=(unsigned char *) erealloc(sw->Index->worddata_buffer,sw->Index->len_worddata_buffer); q=sw->Index->worddata_buffer+tmp; /* reasign pointer inside buffer */ } /* store metaID in buffer */ curmetaID=metaID; q = compress3(curmetaID,q); /* preserve position for offset to next ** metaname. We do not know its size ** so store it as a packed long */ curmetanamepos=q - sw->Index->worddata_buffer; /* Store 0 and increase pointer */ tmp=0L; PACKLONG2(tmp,q); q+=sizeof(unsigned long); } /* Store all data for this chunk */ /* First check for enough space ** ** 1 is for the trailing '\0' */ tmp=q - sw->Index->worddata_buffer; if((long)(tmp + chunk_size + 1) >= (long)sw->Index->len_worddata_buffer) { sw->Index->len_worddata_buffer=sw->Index->len_worddata_buffer*2+chunk_size+1; sw->Index->worddata_buffer=(unsigned char *) erealloc(sw->Index->worddata_buffer,sw->Index->len_worddata_buffer); q=sw->Index->worddata_buffer+tmp; /* reasign pointer inside buffer */ } /* Copy it and advance pointer */ memcpy(q,p,chunk_size); q += chunk_size; /* End of chunk mark -> Write trailing '\0' */ *q++ = '\0'; l = next; } /* Write in previous meta (curmetaID) ** file offset to end of metas */ tmp=q - sw->Index->worddata_buffer; PACKLONG2(tmp,sw->Index->worddata_buffer+curmetanamepos); sz_worddata = q - sw->Index->worddata_buffer; /* Adjust word positions. ** if ignorelimit was set and some new stopwords weee found, positions ** are recalculated ** Also call it even if we have not set IgnoreLimit to calesce word chunks ** and remove trailing 0 from chunks to save some bytes */ adjustWordPositions(sw->Index->worddata_buffer, &sz_worddata, sw->indexlist->header.totalfiles, sw->Index->IgnoreLimitPositionsArray); sw->Index->sz_worddata_buffer = sz_worddata; } /* 04/2002 jmruiz ** New simpler routine to write worddata ** ** 10/2002 jmruiz ** Add extra compression for worddata. Call to remove_worddata_longs */ void write_worddata(SWISH * sw, ENTRY * ep, IndexFILE * indexf ) { int zlib_size; /* Get some extra compression */ remove_worddata_longs(sw->Index->worddata_buffer,&sw->Index->sz_worddata_buffer); if(sw->compressPositions) zlib_size = compress_worddata(sw->Index->worddata_buffer, sw->Index->sz_worddata_buffer,sw->Index->swap_locdata); else zlib_size = sw->Index->sz_worddata_buffer; /* Write worddata */ DB_WriteWordData(sw, ep->u1.wordID,sw->Index->worddata_buffer,zlib_size, sw->Index->sz_worddata_buffer - zlib_size ,indexf->DB); } /* 04/2002 jmruiz ** Routine to merge two buffers of worddata */ void add_worddata(SWISH *sw, unsigned char *olddata, int sz_olddata) { int maxtotsize; unsigned char stack_buffer[32000]; /* Just to try malloc/free fragmentation */ unsigned char *newdata; int sz_newdata; int tfreq1, tfreq2; unsigned char *p1, *p2, *p; int curmetaID_1,curmetaID_2,metadata_length_1,num_metaids1; unsigned long nextposmetaname_1,nextposmetaname_2, curmetanamepos, curmetanamepos_1, curmetanamepos_2, tmp; int last_filenum, filenum, tmpval, frequency; unsigned int *posdata; #define POSDATA_STACK 2000 unsigned int stack_posdata[POSDATA_STACK]; /* Just to avoid the overhead of malloc/free */ unsigned char r_flag, *w_flag; unsigned char *q; /* First of all, ckeck for size in buffer */ /* Olddata is extra compressed. longs offsets where stored as compressed ** numbers to save space. So, we need to compute how many meta_ID ** are presents to calculate a safe size for olddata with packedlongs */ p1=olddata; num_metaids1=0; uncompress2(&p1); /* Jump tfreq */ do { num_metaids1++; uncompress2(&p1); /* Jump metaid */ metadata_length_1 = uncompress2(&p1); p1 += metadata_length_1; } while ((p1 - olddata) != sz_olddata); maxtotsize = sw->Index->sz_worddata_buffer + (sz_olddata + num_metaids1 * sizeof(long)); if(maxtotsize > sw->Index->len_worddata_buffer) { sw->Index->len_worddata_buffer = maxtotsize + 2000; sw->Index->worddata_buffer = (unsigned char *) erealloc(sw->Index->worddata_buffer,sw->Index->len_worddata_buffer); } /* Preserve new data in a local copy - sw->Index->worddata_buffer is the final destination ** of data */ if(sw->Index->sz_worddata_buffer > (int)sizeof(stack_buffer)) newdata = (unsigned char *) emalloc(sw->Index->sz_worddata_buffer); else newdata = stack_buffer; sz_newdata = sw->Index->sz_worddata_buffer; memcpy(newdata,sw->Index->worddata_buffer, sz_newdata); /* Set pointers to all buffers */ p1 = olddata; p2 = newdata; q = p = sw->Index->worddata_buffer; /* Now read tfrequency */ tfreq1 = uncompress2(&p1); /* tfrequency - number of files with this word */ tfreq2 = uncompress2(&p2); /* tfrequency - number of files with this word */ /* Write tfrequency */ p = compress3(tfreq1 + tfreq2, p); /* Now look for MetaIDs */ curmetaID_1 = uncompress2(&p1); curmetaID_2 = uncompress2(&p2); /* Old data is compressed in a different more optimized schema */ metadata_length_1 = uncompress2(&p1); nextposmetaname_1 = p1 - olddata + metadata_length_1; curmetanamepos_1 = p1 - olddata; nextposmetaname_2 = UNPACKLONG2(p2); p2 += sizeof(long); curmetanamepos_2 = p2 - newdata; while(curmetaID_1 && curmetaID_2) { p = compress3(min(curmetaID_1,curmetaID_2),p); curmetanamepos = p - sw->Index->worddata_buffer; /* Store 0 and increase pointer */ tmp=0L; PACKLONG2(tmp,p); p+=sizeof(unsigned long); if(curmetaID_1 == curmetaID_2) { /* Both buffers have the same metaID - In this case I have to know the number of the filenum of the last hit of the original buffer to adjust the filenum counter in the second buffer */ last_filenum = 0; do { /* Read on all items */ uncompress_location_values(&p1,&r_flag,&tmpval,&frequency); last_filenum += tmpval; if(frequency > POSDATA_STACK) posdata = (unsigned int *) emalloc(frequency * sizeof(int)); else posdata = stack_posdata; /* Read and discard positions just to advance pointer */ uncompress_location_positions(&p1,r_flag,frequency,posdata); if(posdata!=stack_posdata) efree(posdata); if ((p1 - olddata) == sz_olddata) { curmetaID_1 = 0; /* No more metaIDs for olddata */ break; /* End of olddata */ } if ((unsigned long)(p1 - olddata) == nextposmetaname_1) { break; } } while(1); memcpy(p,olddata + curmetanamepos_1, p1 - (olddata + curmetanamepos_1)); p += p1 - (olddata + curmetanamepos_1); /* Values for next metaID if exists */ if(curmetaID_1) { curmetaID_1 = uncompress2(&p1); /* Next metaID */ metadata_length_1 = uncompress2(&p1); nextposmetaname_1 = p1 - olddata + metadata_length_1; curmetanamepos_1 = p1 - olddata; } /* Now add the new values adjusting with last_filenum just the first ** filenum in olddata*/ /* Read first item */ uncompress_location_values(&p2,&r_flag,&tmpval,&frequency); filenum = tmpval; /* First filenum in chunk */ if(frequency > POSDATA_STACK) posdata = (unsigned int *) emalloc(frequency * sizeof(int)); else posdata = stack_posdata; /* Read positions */ uncompress_location_positions(&p2,r_flag,frequency,posdata); compress_location_values(&p,&w_flag,filenum - last_filenum,frequency,posdata); compress_location_positions(&p,w_flag,frequency,posdata); if(posdata!=stack_posdata) efree(posdata); /* Copy rest of data */ memcpy(p,p2,nextposmetaname_2 - (p2 - newdata)); p += nextposmetaname_2 - (p2 - newdata); p2 += nextposmetaname_2 - (p2 - newdata); if ((p2 - newdata) == sz_newdata) { curmetaID_2 = 0; /* No more metaIDs for newdata */ } /* Values for next metaID if exists */ if(curmetaID_2) { curmetaID_2 = uncompress2(&p2); /* Next metaID */ nextposmetaname_2 = UNPACKLONG2(p2); p2 += sizeof(long); curmetanamepos_2 = p2 - newdata; } } else if (curmetaID_1 < curmetaID_2) { memcpy(p,p1,nextposmetaname_1 - (p1 - olddata)); p += nextposmetaname_1 - (p1 - olddata); p1 = olddata + nextposmetaname_1; if ((p1 - olddata) == sz_olddata) { curmetaID_1 = 0; /* No more metaIDs for newdata */ } else { curmetaID_1 = uncompress2(&p1); /* Next metaID */ metadata_length_1 = uncompress2(&p1); nextposmetaname_1 = p1 - olddata + metadata_length_1; curmetanamepos_1 = p1 - olddata; } } else /* curmetaID_1 > curmetaID_2 */ { memcpy(p,p2,nextposmetaname_2 - (p2 - newdata)); p += nextposmetaname_2 - (p2 - newdata); p2 = newdata + nextposmetaname_2; if ((p2 - newdata) == sz_newdata) { curmetaID_2 = 0; /* No more metaIDs for newdata */ } else { curmetaID_2 = uncompress2(&p2); /* Next metaID */ nextposmetaname_2 = UNPACKLONG2(p2); p2 += sizeof(long); curmetanamepos_2 = p2 - newdata; } } /* Put nextmetaname offset */ PACKLONG2(p - sw->Index->worddata_buffer, sw->Index->worddata_buffer + curmetanamepos); } /* while */ /* Add the rest of the data if exists */ while(curmetaID_1) { p = compress3(curmetaID_1,p); curmetanamepos = p - sw->Index->worddata_buffer; /* Store 0 and increase pointer */ tmp=0L; PACKLONG2(tmp,p); p += sizeof(unsigned long); memcpy(p,p1,nextposmetaname_1 - (p1 - olddata)); p += nextposmetaname_1 - (p1 - olddata); p1 = olddata + nextposmetaname_1; if ((p1 - olddata) == sz_olddata) { curmetaID_1 = 0; /* No more metaIDs for olddata */ } else { curmetaID_1 = uncompress2(&p1); /* Next metaID */ metadata_length_1 = uncompress2(&p1); nextposmetaname_1 = p1 - olddata + metadata_length_1; curmetanamepos_1 = p1 - olddata; } PACKLONG2(p - sw->Index->worddata_buffer, sw->Index->worddata_buffer + curmetanamepos); } while(curmetaID_2) { p = compress3(curmetaID_2,p); curmetanamepos = p - sw->Index->worddata_buffer; /* Store 0 and increase pointer */ tmp=0L; PACKLONG2(tmp,p); p += sizeof(unsigned long); memcpy(p,p2,nextposmetaname_2 - (p2 - newdata)); p += nextposmetaname_2 - (p2 - newdata); p2 = newdata + nextposmetaname_2; if ((p2 - newdata) == sz_newdata) { curmetaID_2 = 0; /* No more metaIDs for olddata */ } else { curmetaID_2 = uncompress2(&p2); /* Next metaID */ nextposmetaname_2 = UNPACKLONG2(p2); p2+= sizeof(long); curmetanamepos_2= p2 - newdata; } PACKLONG2(p - sw->Index->worddata_buffer, sw->Index->worddata_buffer + curmetanamepos); } if(newdata != stack_buffer) efree(newdata); /* Save the new size */ sw->Index->sz_worddata_buffer = p - sw->Index->worddata_buffer; } /* Writes the list of metaNames into the DB index * (should maybe be in metanames.c) */ void write_MetaNames(SWISH *sw, int id, INDEXDATAHEADER * header, void *DB) { struct metaEntry *entry = NULL; int i, sz_buffer, len; unsigned char *buffer,*s; int fields; /* Use new metaType schema - see metanames.h */ // Format of metaname is // <len><metaName><metaType><Alias><rank_bias> // len, metaType, alias, and rank_bias are compressed numbers // metaName is the ascii name of the metaname // // The list of metanames is delimited by a 0 fields = 5; // len, metaID, metaType, alias, rank_bias /* Compute buffer size */ for (sz_buffer = 0 , i = 0; i < header->metaCounter; i++) { entry = header->metaEntryArray[i]; len = strlen(entry->metaName); sz_buffer += len + fields * MAXINTCOMPSIZE; /* compress can use MAXINTCOMPSIZE bytes in worse case, */ } sz_buffer += MAXINTCOMPSIZE; /* Add extra MAXINTCOMPSIZE for the number of metanames */ s = buffer = (unsigned char *) emalloc(sz_buffer); s = compress3(header->metaCounter,s); /* store the number of metanames */ for (i = 0; i < header->metaCounter; i++) { entry = header->metaEntryArray[i]; len = strlen(entry->metaName); s = compress3(len, s); memcpy(s,entry->metaName,len); s += len; s = compress3(entry->metaID, s); s = compress3(entry->metaType, s); s = compress3(entry->alias+1, s); /* keep zeros away from compress3, I believe */ s = compress3(entry->sort_len, s); s = compress3(entry->rank_bias+RANK_BIAS_RANGE+1, s); } DB_WriteHeaderData(sw, id,buffer,s-buffer,DB); efree(buffer); } /* Write a the hashlist of words into the index header file (used by stopwords and buzzwords */ static int write_hash_words_to_header(SWISH *sw, int header_ID, struct swline **hash, void *DB) { int hashval, len, num_words, sz_buffer; char *buffer, *s; struct swline *sp = NULL; /* Let's count the words */ if ( !hash ) return 0; for (sz_buffer = 0, num_words = 0 , hashval = 0; hashval < HASHSIZE; hashval++) { sp = hash[hashval]; while (sp != NULL) { num_words++; sz_buffer += MAXINTCOMPSIZE + strlen(sp->line); sp = sp->next; } } if(num_words) { sz_buffer += MAXINTCOMPSIZE; /* Add MAXINTCOMPSIZE for the number of words */ s = buffer = (char *)emalloc(sz_buffer); s = (char *)compress3(num_words, (unsigned char *)s); for (hashval = 0; hashval < HASHSIZE; hashval++) { sp = hash[hashval]; while (sp != NULL) { len = strlen(sp->line); s = (char *)compress3(len,(unsigned char *)s); memcpy(s,sp->line,len); s +=len; sp = sp->next; } } DB_WriteHeaderData(sw, header_ID, (unsigned char *)buffer, s - buffer, DB); efree(buffer); } return 0; } int write_integer_table_to_header(SWISH *sw, int id, int table[], int table_size, void *DB) { int i, tmp; char *s; char *buffer; s = buffer = (char *) emalloc((table_size + 1) * MAXINTCOMPSIZE); s = (char *)compress3(table_size,(unsigned char *)s); /* Put the number of elements */ for (i = 0; i < table_size; i++) { tmp = table[i] + 1; s = (char *)compress3(tmp, (unsigned char *)s); /* Put all the elements */ } DB_WriteHeaderData(sw, id, (unsigned char *)buffer, s-buffer, DB); efree(buffer); return 0; } void setTotalWordsPerFile(IndexFILE *indexf, int idx,int wordcount) { #ifdef USE_BTREE DB_WriteTotalWordsPerFile(indexf->sw, idx, wordcount, indexf->DB); #else INDEXDATAHEADER *header = &indexf->header; if ( !header->TotalWordsPerFile || idx >= header->TotalWordsPerFileMax ) { header->TotalWordsPerFileMax += 20000; /* random guess -- could be a config setting */ if(! header->TotalWordsPerFile) header->TotalWordsPerFile = emalloc( header->TotalWordsPerFileMax * sizeof(int) ); else header->TotalWordsPerFile = erealloc( header->TotalWordsPerFile, header->TotalWordsPerFileMax * sizeof(int) ); } header->TotalWordsPerFile[idx] = wordcount; #endif } /*------------------------------------------------------*/ /*---------- General entry point of DB module ----------*/ void *DB_Create (SWISH *sw, char *dbname) { return sw->Db->DB_Create(sw, dbname); } void DB_Remove(SWISH *sw, void *DB) { sw->Db->DB_Remove(DB); } int DB_InitWriteHeader(SWISH *sw, void *DB) { return sw->Db->DB_InitWriteHeader(DB); } int DB_WriteHeaderData(SWISH *sw, int id, unsigned char *s, int len, void *DB) { return sw->Db->DB_WriteHeaderData(id, s,len,DB); } int DB_EndWriteHeader(SWISH *sw, void *DB) { return sw->Db->DB_EndWriteHeader(DB); } int DB_InitWriteWords(SWISH *sw, void *DB) { return sw->Db->DB_InitWriteWords(DB); } sw_off_t DB_GetWordID(SWISH *sw, void *DB) { return sw->Db->DB_GetWordID(DB); } int DB_WriteWord(SWISH *sw, char *word, sw_off_t wordID, void *DB) { return sw->Db->DB_WriteWord(word, wordID, DB); } #ifdef USE_BTREE int DB_UpdateWordID(SWISH *sw, char *word, sw_off_t wordID, void *DB) { return sw->Db->DB_UpdateWordID(word, wordID, DB); } int DB_DeleteWordData(SWISH *sw, sw_off_t wordID, void *DB) { return sw->Db->DB_DeleteWordData(wordID, DB); } #endif int DB_WriteWordHash(SWISH *sw, char *word, sw_off_t wordID, void *DB) { return sw->Db->DB_WriteWordHash(word, wordID, DB); } long DB_WriteWordData(SWISH *sw, sw_off_t wordID, unsigned char *worddata, int data_size, int saved_bytes, void *DB) { return sw->Db->DB_WriteWordData(wordID, worddata, data_size, saved_bytes, DB); } int DB_EndWriteWords(SWISH *sw, void *DB) { return sw->Db->DB_EndWriteWords(DB); } int DB_InitWriteProperties(SWISH *sw, void *DB) { return sw->Db->DB_InitWriteProperties(DB); } int DB_WriteFileNum(SWISH *sw, int filenum, unsigned char *filedata,int sz_filedata, void *DB) { return sw->Db->DB_WriteFileNum(filenum, filedata, sz_filedata, DB); } int DB_RemoveFileNum(SWISH *sw, int filenum, void *DB) { return sw->Db->DB_RemoveFileNum(filenum, DB); } #ifdef USE_PRESORT_ARRAY int DB_InitWriteSortedIndex(SWISH *sw, void *DB, int n_props) { return sw->Db->DB_InitWriteSortedIndex(DB, n_props); } int DB_WriteSortedIndex(SWISH *sw, int propID, int *data, int sz_data,void *DB) { return sw->Db->DB_WriteSortedIndex(propID, data, sz_data,DB); } #else int DB_InitWriteSortedIndex(SWISH *sw, void *DB) { return sw->Db->DB_InitWriteSortedIndex(DB); } int DB_WriteSortedIndex(SWISH *sw, int propID, unsigned char *data, int sz_data,void *DB) { return sw->Db->DB_WriteSortedIndex(propID, data, sz_data,DB); } #endif int DB_EndWriteSortedIndex(SWISH *sw, void *DB) { return sw->Db->DB_EndWriteSortedIndex(DB); } void DB_WriteProperty( SWISH *sw, IndexFILE *indexf, FileRec *fi, int propID, char *buffer, int buf_len, int uncompressed_len, void *db) { sw->Db->DB_WriteProperty( indexf, fi, propID, buffer, buf_len, uncompressed_len, db ); } void DB_WritePropPositions(SWISH *sw, IndexFILE *indexf, FileRec *fi, void *db) { sw->Db->DB_WritePropPositions( indexf, fi, db); } void DB_Reopen_PropertiesForRead(SWISH *sw, void *DB ) { sw->Db->DB_Reopen_PropertiesForRead(DB); } #ifdef USE_BTREE int DB_WriteTotalWordsPerFile(SWISH *sw, int idx, int wordcount, void *DB) { return sw->Db->DB_WriteTotalWordsPerFile(sw, idx, wordcount, DB); } #endif ��������������������������������������������������swish-e-2.4.7/src/html.c����������������������������������������������������������������������������0000664�0000771�0001750�00000070525�11166010110�011723� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/* $Id: html.c 1736 2005-05-12 15:41:22Z karman $ ** Copyright (C) 1995, 1996, 1997, 1998 Hewlett-Packard Company ** Originally by Kevin Hughes, kev@kevcom.com, 3/11/94 ** This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 15:51:39 CDT 2005 ** added GPL **--------------------------------------------------------- ** ** ** PATCHED 5/13/96, CJC ** Added MatchAndChange for regex in replace rule G.Hill 2/10/98 ** ** change sprintf to snprintf to avoid corruption ** added safestrcpy() macro to avoid corruption from strcpy overflow ** SRE 11/17/99 ** ** fixed cast to int problems pointed out by "gcc -Wall" ** SRE 2/22/00 ** ** 2001-03-17 rasc save real_filename as title (instead full real_path) ** was: compatibility issue to v 1.x.x ** ** 2001-05-09 rasc entities completly rewritten (new module) ** small fix in parseHTMLsummary ** ** */ #include "swish.h" #include "mem.h" #include "swstring.h" #include "index.h" #include "compress.h" #include "merge.h" #include "search.h" #include "docprop.h" #include "metanames.h" #include "html.h" #include "entities.h" #include "fs.h" #include "error.h" /* #### */ static char *parsetag(SWISH *sw, char *parsetag, char *buffer, int max_lines, int case_sensitive); static struct metaEntry *getHTMLMeta(IndexFILE * indexf, char *tag, SWISH *sw, char *name, char **parsed_tag, char *filename) { char *temp; int lenword = 0; char *word = NULL; char buffer[MAXSTRLEN + 1]; int i; struct metaEntry *e = NULL; word = buffer; lenword = sizeof(buffer) - 1; if (!name) { if (!(temp = (char *) lstrstr((char *) tag, (char *) "NAME"))) return NULL; } else temp = name; temp += 4; /* strlen("NAME") */ /* Get to the '=' sign disreguarding any other char */ while (*temp) { if (*temp && (*temp != '=')) /* TAB */ temp++; else { temp++; break; } } /* Get to the beginning of the word disreguarding blanks and quotes */ /* TAB */ while (*temp) { if (*temp == ' ' || *temp == '"') temp++; else break; } /* Copy the word and convert to lowercase */ /* TAB */ /* while (temp !=NULL && strncmp(temp," ",1) */ /* && strncmp(temp,"\"",1) && i<= MAXWORDLEN ) { */ /* and the above <= was wrong, should be < which caused the null insertion below to be off by two bytes */ for (i = 0; temp != NULL && *temp && *temp != ' ' && *temp != '"';) { if (i == lenword) { lenword *= 2; if(word == buffer) { word = (char *) emalloc(lenword + 1); memcpy(word,buffer,sizeof(buffer)); } else word = (char *) erealloc(word, lenword + 1); } word[i] = *temp++; i++; } if (i == lenword) { lenword *= 2; word = (char *) erealloc(word, lenword + 1); } word[i] = '\0'; /* Use Rainer's routine */ strtolower(word); *parsed_tag = word; if ((e = getMetaNameByName(&indexf->header, word))) return e; if ( (sw->UndefinedMetaTags == UNDEF_META_AUTO) && word && *word) { if (sw->verbose) printf("Adding automatic MetaName '%s' found in file '%s'\n", word, filename); return addMetaEntry(&indexf->header, word, META_INDEX, 0); } /* If it is ok not to have the name listed, just index as no-name */ if (sw->UndefinedMetaTags == UNDEF_META_ERROR) progerr("UndefinedMetaNames=error. Found meta name '%s' in file '%s', not listed as a MetaNames in config", word, filename); if(word != buffer) efree(word); return NULL; } /* Parses the Meta tag */ static int parseMetaData(SWISH * sw, IndexFILE * indexf, char *tag, int filenum, int structure, char *name, char *content, FileRec *thisFileEntry, int *position, char *filename) { int metaName; struct metaEntry *metaNameEntry; char *temp, *start, *convtag; int wordcount = 0; /* Word count */ char *parsed_tag; /* Lookup (or add if "auto") meta name for tag */ metaNameEntry = getHTMLMeta(indexf, tag, sw, name, &parsed_tag, filename); metaName = metaNameEntry ? metaNameEntry->metaID : 1; temp = content + 7; /* 7 is strlen("CONTENT") */ /* Get to the " sign disreguarding other characters */ if ((temp = strchr(temp, '\"'))) { structure |= IN_META; start = temp + 1; /* Jump escaped \" */ temp = strchr(start, '\"'); while (temp) { if (*(temp - 1) == '\\') temp = strchr(temp + 1, '\"'); else break; } if (temp) *temp = '\0'; /* terminate CONTENT, temporarily */ /* Convert entities, if requested, and remove newlines */ convtag = (char *)sw_ConvHTMLEntities2ISO(sw, (unsigned char *)start); remove_newlines(convtag); /** why isn't this just done for the entire doc? */ /* Index only if a metaEntry was found, or if not not ReqMetaName */ if ( sw->UndefinedMetaTags != UNDEF_META_IGNORE || metaNameEntry) { /* Meta tags get bummped */ /* I'm not clear this works as well as I'd like because it always bumps on a new Meta tag, * but in order to disable this behavior the name MUST be a meta name. * Probably better to let getHTMLMeta() return the name as a string. */ if (!metaNameEntry || !isDontBumpMetaName(sw->dontbumpstarttagslist, metaNameEntry->metaName)) position[0]++; wordcount = indexstring(sw, convtag, filenum, structure, 1, &metaName, position); if (!metaNameEntry || !isDontBumpMetaName(sw->dontbumpendtagslist, metaNameEntry->metaName)) position[0]++; } /* If it is a property store it */ if ((metaNameEntry = getPropNameByName(&indexf->header, parsed_tag))) if (!addDocProperty(&thisFileEntry->docProperties, metaNameEntry, (unsigned char*)convtag, strlen(convtag), 0)) progwarn("property '%s' not added for document '%s'\n", metaNameEntry->metaName, filename); if (temp) *temp = '\"'; /* restore string */ } return wordcount; } /* Extracts anything in <title> tags from an HTML file and returns it. ** Otherwise, only the file name without its path is returned. */ char *parseHTMLtitle(SWISH *sw, char *buffer) { char *title; char *empty_title; empty_title = (char *)Mem_ZoneAlloc(sw->Index->perDocTmpZone,1); *empty_title = '\0'; if (!buffer) return empty_title; if ((title = parsetag(sw, "title", buffer, TITLETOPLINES, CASE_SENSITIVE_OFF))) return title; return empty_title; } /* Check if a particular title (read: file!) should be ignored ** according to the settings in the configuration file. */ /* This is to check "title contains" option in config file */ int isoktitle(SWISH * sw, char *title) { struct MOD_FS *fs = sw->FS; return !match_regex_list(title, fs->filerules.title, "FileRules title"); } /* This returns the value corresponding to the HTML structures ** a word is in. */ static int getstructure(char *tag, int structure) { /* int len; *//* not used - 2/22/00 */ char oldChar = 0; char *endOfTag = NULL; char *pos; pos = tag; while (*pos) { if (isspace((int) ((unsigned char) *pos))) { endOfTag = pos; /* remember where we are... */ oldChar = *pos; /* ...and what we saw */ *pos = '\0'; /* truncate string, for now */ } else pos++; } /* Store Word Context ** Modified DLN 1999-10-24 - Comments and Cleaning ** TODO: Make sure that these allow for HTML attributes * */ /* HEAD */ if (strcasecmp(tag, "/head") == 0) structure &= ~IN_HEAD; /* Out */ else if (strcasecmp(tag, "head") == 0) structure |= IN_HEAD; /* In */ /* TITLE */ else if (strcasecmp(tag, "/title") == 0) structure &= ~IN_TITLE; else if (strcasecmp(tag, "title") == 0) structure |= IN_TITLE; /* BODY */ else if (strcasecmp(tag, "/body") == 0) structure &= ~IN_BODY; /* In */ else if (strcasecmp(tag, "body") == 0) structure |= IN_BODY; /* Out */ /* H1, H2, H3, H4, H5, H6 */ else if (tag[0] == '/' && tolower((int)((unsigned char)tag[1])) == 'h' && isdigit((int)((unsigned char)tag[2]))) /* cast to int - 2/22/00 */ structure &= ~IN_HEADER; /* In */ else if (tolower((int)((unsigned char)tag[0])) == 'h' && isdigit((int)(unsigned char)tag[1])) /* cast to int - 2/22/00 */ structure |= IN_HEADER; /* Out */ /* EM, STRONG */ else if ((strcasecmp(tag, "/em") == 0) || (strcasecmp(tag, "/strong") == 0)) structure &= ~IN_EMPHASIZED; /* Out */ else if ((strcasecmp(tag, "em") == 0) || (strcasecmp(tag, "strong") == 0)) structure |= IN_EMPHASIZED; /* In */ /* B, I are seperate for semantics */ else if ((strcasecmp(tag, "/b") == 0) || (strcasecmp(tag, "/i") == 0)) structure &= ~IN_EMPHASIZED; /* Out */ else if ((strcasecmp(tag, "b") == 0) || (strcasecmp(tag, "i") == 0)) structure |= IN_EMPHASIZED; /* In */ /* The End */ if (endOfTag != NULL) { *endOfTag = oldChar; } return structure; } /* Get the MetaData index when the whole tag is passed */ /* Patch by Tom Brown */ /* TAB, this routine is/was somewhat pathetic... but it was pathetic in 1.2.4 too ... someone needed a course in defensive programming... there are lots of tests below for temp != NULL, but what is desired is *temp != '\0' (e.g. simply *temp) ... I'm going to remove some strncmp(temp,constant,1) which are must faster as *temp != constant ... Anyhow, the test case I've got that's core dumping is: <META content=3D"MSHTML 5.00.2614.3401" name=3DGENERATOR> no trailing quote, no trailing space... and with the missing/broken check for+ end of string it scribbles over the stack... */ static char *parseHtmlSummary(char *buffer, char *field, int size, SWISH * sw) { char *p, *q, *tag, *endtag, c = '\0'; char *summary, *beginsum, *endsum, *tmp, *tmp2, *tmp3; int found, lensummary; /* Get the summary if no metaname/field is given */ if (!field && size) { /* Jump title if it exists */ if ((p = lstrstr(buffer, ""))) { p += 8; } else p = buffer; /* Let us try to find */ if ((q = lstrstr(p, "'); } else q = p; summary = (char *) Mem_ZoneAlloc(sw->Index->perDocTmpZone,strlen(p)+1); strcpy(summary,p); remove_newlines(summary); //$$$$ Todo: remove tag and content of scripts, css, java, embeddedobjects, comments, etc remove_tags(summary); summary = (char *)sw_ConvHTMLEntities2ISO(sw, (unsigned char *)summary); /* use only the required memory -save those not used */ /* 2001-03-13 rasc copy only bytes of string */ if((int) strlen(summary) > size) summary[size]='\0'; return summary; } for (p = buffer, summary = NULL, found = 0, beginsum = NULL, endsum = NULL; p && *p;) { if ((tag = strchr(p, '<')) && ((tag == p) || (*(tag - 1) != '\\'))) { /* Look for non escaped '<' */ tag++; for (endtag = tag;;) if ((endtag = strchr(endtag, '>'))) { if (*(endtag - 1) != '\\') break; else endtag++; } else break; if (endtag) { c = *endtag; *endtag++ = '\0'; if ((tag[0] == '!') && lstrstr(tag, "META") && (lstrstr(tag, "START") || lstrstr(tag, "END"))) { /* Check for META TAG TYPE 1 */ if (lstrstr(tag, "START")) { if ((tmp = lstrstr(tag, "NAME"))) { tmp += 4; if (lstrstr(tmp, field)) { beginsum = endtag; found = 1; } p = endtag; } else p = endtag; } else if (lstrstr(tag, "END")) { if (!found) { p = endtag; } else { endsum = tag - 1; *(endtag - 1) = c; break; } } } /* Check for META TAG TYPE 2 */ else if ((tag[0] != '!') && lstrstr(tag, "META") && (tmp = lstrstr(tag, "NAME")) && (tmp2 = lstrstr(tag, "CONTENT"))) { tmp += 4; tmp3 = lstrstr(tmp, field); if (tmp3 && tmp3 < tmp2) { tmp2 += 7; if ((tmp = strchr(tmp2, '='))) { for (++tmp; isspace((int) ((unsigned char) *tmp)); tmp++); if (*tmp == '\"') { beginsum = tmp + 1; for (tmp = endtag - 1; tmp > beginsum; tmp--) if (*tmp == '\"') break; if (tmp == beginsum) endsum = endtag - 1; else endsum = tmp; } else { beginsum = tmp; endsum = endtag - 1; } found = 1; *(endtag - 1) = c; break; } } p = endtag; } /* Default: Continue */ else { p = endtag; } } else p = NULL; /* tag not closed ->END */ if (endtag) *(endtag - 1) = c; } else { /* No more '<' */ p = NULL; } } if (found && beginsum && endsum && endsum > beginsum) { lensummary = endsum - beginsum; summary = (char *)Mem_ZoneAlloc(sw->Index->perDocTmpZone, lensummary + 1); memcpy(summary, beginsum, lensummary); summary[lensummary] = '\0'; } /* If field is set an no metaname is found, let us search */ /* for something like bla bla */ if (!summary && field) { summary = parsetag(sw, field, buffer, 0, CASE_SENSITIVE_OFF); } /* Finally check for something after title (if exists) and */ /* after (if exists) */ if (!summary) { /* Jump title if it exists */ if ((p = lstrstr(buffer, ""))) { p += 8; } else p = buffer; /* Let us try to find */ if ((q = lstrstr(p, "'); } else q = p; summary = (char *)Mem_ZoneAlloc(sw->Index->perDocTmpZone,strlen(q) + 1); strcpy(summary,q); } if (summary) { remove_newlines(summary); remove_tags(summary); summary = (char *)sw_ConvHTMLEntities2ISO(sw, (unsigned char *)summary); } if (summary && size && ((int) strlen(summary)) > size) summary[size] = '\0'; return summary; } #define NO_TAG 0 #define TAG_CLOSE 1 #define TAG_FOUND 2 /* Gets the content between "" and "" from buffer limiting the scan to the first max_lines lines (0 means all lines) */ static char *parsetag(SWISH *sw, char *parsetag, char *buffer, int max_lines, int case_sensitive) { register int c, d; register char *p, *r; char *tag; int lencontent; char *content; int i, j, lines, status, tagbuflen, totaltaglen, curlencontent; char *begintag; char *endtag; char *newbuf; char *(*f_strstr) (); if (case_sensitive) f_strstr = strstr; else f_strstr = lstrstr; lencontent = strlen(parsetag); begintag = (char *)Mem_ZoneAlloc(sw->Index->perDocTmpZone, lencontent + 3); endtag = (char *)Mem_ZoneAlloc(sw->Index->perDocTmpZone, lencontent + 4); sprintf(begintag, "<%s>", parsetag); sprintf(endtag, "", parsetag); tag = (char *) Mem_ZoneAlloc(sw->Index->perDocTmpZone, 1); tag[0] = '\0'; content = (char *) Mem_ZoneAlloc(sw->Index->perDocTmpZone, (lencontent = MAXSTRLEN) + 1); lines = 0; status = NO_TAG; p = content; *p = '\0'; for (r = buffer;;) { c = *r++; if (c == '\n') { lines++; if (max_lines && lines == max_lines) break; } if (!c) return NULL; switch (c) { case '<': tag = (char *) Mem_ZoneAlloc(sw->Index->perDocTmpZone, (tagbuflen = MAXSTRLEN) + 1); totaltaglen = 0; tag[totaltaglen++] = '<'; /* Collect until find '>' */ while (1) { d = *r++; if (!d) return NULL; if (totaltaglen == tagbuflen) { newbuf = (char *) Mem_ZoneAlloc(sw->Index->perDocTmpZone, tagbuflen + 200 + 1); memcpy(newbuf,tag,tagbuflen + 1); tag = newbuf; tagbuflen += 200; } tag[totaltaglen++] = d; if (d == '>') { tag[totaltaglen] = '\0'; break; } } if (f_strstr(tag, endtag)) { status = TAG_CLOSE; *p = '\0'; /* nulls to spaces */ for (i = 0; content[i]; i++) if (content[i] == '\n') content[i] = ' '; /* skip over initial spaces and quotes */ for (i = 0; isspace((int) ((unsigned char) content[i])) || content[i] == '\"'; i++) ; /* shift buffer to left */ for (j = 0; content[i]; j++) content[j] = content[i++]; content[j] = '\0'; /* remove trailing spaces, nulls, quotes */ for (j = strlen(content) - 1; ( j >= 0 ) && ( isspace((int) ((unsigned char) content[j])) || content[j] == '\0' || content[j] == '\"'); j--) content[j] = '\0'; /* replace double quotes with single quotes -- why? */ for (j = 0; content[j]; j++) if (content[j] == '\"') content[j] = '\''; if (*content) return (content); else return NULL; } else if (f_strstr(tag, begintag)) { status = TAG_FOUND; } break; default: if (status == TAG_FOUND) { curlencontent = p - content; if (curlencontent == lencontent) { newbuf = Mem_ZoneAlloc(sw->Index->perDocTmpZone,(lencontent * 2) + 1); memcpy(newbuf,content,lencontent + 1); lencontent *= 2; content = newbuf; p = content + curlencontent; } *p = c; p++; } } } return NULL; } /* Parses the words in a comment. */ int parsecomment(SWISH * sw, char *tag, int filenum, int structure, int metaID, int *position) { structure |= IN_COMMENTS; return indexstring(sw, tag + 1, filenum, structure, 1, &metaID, position); } /* Indexes all the words in a html file and adds the appropriate information ** to the appropriate structures. */ /* Indexes all the words in a html file and adds the appropriate information ** to the appropriate structures. */ int countwords_HTML(SWISH *sw, FileProp *fprop, FileRec *fi, char *buffer) { int ftotalwords; int *metaID; int metaIDlen; int position; /* Position of word in file */ int currentmetanames; char *p, *newp, *tag, *endtag; int structure; FileRec *thisFileEntry = fi; struct metaEntry *metaNameEntry; IndexFILE *indexf = sw->indexlist; struct MOD_Index *idx = sw->Index; char *Content = NULL, *Name = NULL, *summary = NULL; char *title = (char *)sw_ConvHTMLEntities2ISO(sw, (unsigned char *)parseHTMLtitle(sw, buffer)); if (!isoktitle(sw, title)) return -2; if (fprop->stordesc) summary = parseHtmlSummary(buffer, fprop->stordesc->field, fprop->stordesc->size, sw); addCommonProperties( sw, fprop, fi, title, summary, 0 ); /* Init meta info */ metaID = (int *) Mem_ZoneAlloc(sw->Index->perDocTmpZone,(metaIDlen = 16) * sizeof(int)); currentmetanames = 0; ftotalwords = 0; structure = IN_FILE; metaID[0] = 1; position = 1; for (p = buffer; p && *p;) { /* Look for non escaped '<' */ if ((tag = strchr(p, '<')) && ((tag == p) || (*(tag - 1) != '\\'))) { /* Index up to the tag */ *tag++ = '\0'; newp = (char *)sw_ConvHTMLEntities2ISO(sw, (unsigned char *)p); if ( ! currentmetanames ) currentmetanames++; ftotalwords += indexstring(sw, newp, idx->filenum, structure, currentmetanames, metaID, &position); /* Now let us look for a not escaped '>' */ for (endtag = tag;;) if ((endtag = strchr(endtag, '>'))) { if (*(endtag - 1) != '\\') break; else endtag++; } else break; if (endtag) { *endtag++ = '\0'; if ((tag[0] == '!') && lstrstr(tag, "META") && (lstrstr(tag, "START") || lstrstr(tag, "END"))) { /* Check for META TAG TYPE 1 */ structure |= IN_META; if (lstrstr(tag, "START")) { char *parsed_tag; if ( (metaNameEntry = getHTMLMeta(indexf, tag, sw, NULL, &parsed_tag, fprop->real_path))) { /* realloc memory if needed */ if (currentmetanames == metaIDlen) { int *newbuf = (int *)Mem_ZoneAlloc(sw->Index->perDocTmpZone, metaIDlen * 2 * sizeof(int)); memcpy((char *)newbuf,(char *)metaID,metaIDlen * sizeof(int)); metaID = newbuf; metaIDlen *= 2; } /* add metaname to array of current metanames */ metaID[currentmetanames] = metaNameEntry->metaID; /* Bump position for all metanames unless metaname in dontbumppositionOnmetatags */ if (!isDontBumpMetaName(sw->dontbumpstarttagslist, metaNameEntry->metaName)) position++; currentmetanames++; p = endtag; /* If it is also a property store it until a < is found */ if ((metaNameEntry = getPropNameByName(&indexf->header, parsed_tag))) { if ((endtag = strchr(p, '<'))) *endtag = '\0'; p = (char *)sw_ConvHTMLEntities2ISO(sw, (unsigned char *)p); remove_newlines(p); /** why isn't this just done for the entire doc? */ if (!addDocProperty(&thisFileEntry->docProperties, metaNameEntry, (unsigned char *)p, strlen(p), 0)) progwarn("property '%s' not added for document '%s'\n", metaNameEntry->metaName, fprop->real_path); if (endtag) *endtag = '<'; continue; } } } else if (lstrstr(tag, "END")) { /* this will close the last metaname */ if (currentmetanames) { currentmetanames--; if (!currentmetanames) metaID[0] = 1; } } p = endtag; } /* Check for META TAG TYPE 2 */ else if ((tag[0] != '!') && lstrstr(tag, "META") && (Name = lstrstr(tag, "NAME")) && (Content = lstrstr(tag, "CONTENT"))) { ftotalwords += parseMetaData(sw, indexf, tag, idx->filenum, structure, Name, Content, thisFileEntry, &position, fprop->real_path); p = endtag; } /* Check for COMMENT */ else if ((tag[0] == '!') && sw->indexComments) { ftotalwords += parsecomment(sw, tag, idx->filenum, structure, 1, &position); p = endtag; } /* Default: Continue */ else { structure = getstructure(tag, structure); p = endtag; } } else p = tag; /* tag not closed: continue */ } else { /* No more '<' */ newp = (char *)sw_ConvHTMLEntities2ISO(sw, (unsigned char *)p); if ( ! currentmetanames ) currentmetanames++; ftotalwords += indexstring(sw, newp, idx->filenum, structure, currentmetanames, metaID, &position); p = NULL; } } return ftotalwords; } swish-e-2.4.7/src/vms/0000777000077100017500000000000011166013167011510 500000000000000swish-e-2.4.7/src/vms/regex.h0000664000077100017500000004441611166010104012706 00000000000000/* Definitions for data structures and routines for the regular expression library, version 0.12. Copyright (C) 1985, 1989, 1990, 1991, 1992, 1993 Free Software Foundation, Inc. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ #ifndef __REGEXP_LIBRARY_H__ #define __REGEXP_LIBRARY_H__ /* POSIX says that must be included (by the caller) before . */ #ifdef VMS /* VMS doesn't have `size_t' in , even though POSIX says it should be there. */ #include #endif /* The following bits are used to determine the regexp syntax we recognize. The set/not-set meanings are chosen so that Emacs syntax remains the value 0. The bits are given in alphabetical order, and the definitions shifted by one from the previous bit; thus, when we add or remove a bit, only one other definition need change. */ typedef unsigned reg_syntax_t; /* If this bit is not set, then \ inside a bracket expression is literal. If set, then such a \ quotes the following character. */ #define RE_BACKSLASH_ESCAPE_IN_LISTS (1) /* If this bit is not set, then + and ? are operators, and \+ and \? are literals. If set, then \+ and \? are operators and + and ? are literals. */ #define RE_BK_PLUS_QM (RE_BACKSLASH_ESCAPE_IN_LISTS << 1) /* If this bit is set, then character classes are supported. They are: [:alpha:], [:upper:], [:lower:], [:digit:], [:alnum:], [:xdigit:], [:space:], [:print:], [:punct:], [:graph:], and [:cntrl:]. If not set, then character classes are not supported. */ #define RE_CHAR_CLASSES (RE_BK_PLUS_QM << 1) /* If this bit is set, then ^ and $ are always anchors (outside bracket expressions, of course). If this bit is not set, then it depends: ^ is an anchor if it is at the beginning of a regular expression or after an open-group or an alternation operator; $ is an anchor if it is at the end of a regular expression, or before a close-group or an alternation operator. This bit could be (re)combined with RE_CONTEXT_INDEP_OPS, because POSIX draft 11.2 says that * etc. in leading positions is undefined. We already implemented a previous draft which made those constructs invalid, though, so we haven't changed the code back. */ #define RE_CONTEXT_INDEP_ANCHORS (RE_CHAR_CLASSES << 1) /* If this bit is set, then special characters are always special regardless of where they are in the pattern. If this bit is not set, then special characters are special only in some contexts; otherwise they are ordinary. Specifically, * + ? and intervals are only special when not after the beginning, open-group, or alternation operator. */ #define RE_CONTEXT_INDEP_OPS (RE_CONTEXT_INDEP_ANCHORS << 1) /* If this bit is set, then *, +, ?, and { cannot be first in an re or immediately after an alternation or begin-group operator. */ #define RE_CONTEXT_INVALID_OPS (RE_CONTEXT_INDEP_OPS << 1) /* If this bit is set, then . matches newline. If not set, then it doesn't. */ #define RE_DOT_NEWLINE (RE_CONTEXT_INVALID_OPS << 1) /* If this bit is set, then . doesn't match NUL. If not set, then it does. */ #define RE_DOT_NOT_NULL (RE_DOT_NEWLINE << 1) /* If this bit is set, nonmatching lists [^...] do not match newline. If not set, they do. */ #define RE_HAT_LISTS_NOT_NEWLINE (RE_DOT_NOT_NULL << 1) /* If this bit is set, either \{...\} or {...} defines an interval, depending on RE_NO_BK_BRACES. If not set, \{, \}, {, and } are literals. */ #define RE_INTERVALS (RE_HAT_LISTS_NOT_NEWLINE << 1) /* If this bit is set, +, ? and | aren't recognized as operators. If not set, they are. */ #define RE_LIMITED_OPS (RE_INTERVALS << 1) /* If this bit is set, newline is an alternation operator. If not set, newline is literal. */ #define RE_NEWLINE_ALT (RE_LIMITED_OPS << 1) /* If this bit is set, then `{...}' defines an interval, and \{ and \} are literals. If not set, then `\{...\}' defines an interval. */ #define RE_NO_BK_BRACES (RE_NEWLINE_ALT << 1) /* If this bit is set, (...) defines a group, and \( and \) are literals. If not set, \(...\) defines a group, and ( and ) are literals. */ #define RE_NO_BK_PARENS (RE_NO_BK_BRACES << 1) /* If this bit is set, then \ matches . If not set, then \ is a back-reference. */ #define RE_NO_BK_REFS (RE_NO_BK_PARENS << 1) /* If this bit is set, then | is an alternation operator, and \| is literal. If not set, then \| is an alternation operator, and | is literal. */ #define RE_NO_BK_VBAR (RE_NO_BK_REFS << 1) /* If this bit is set, then an ending range point collating higher than the starting range point, as in [z-a], is invalid. If not set, then when ending range point collates higher than the starting range point, the range is ignored. */ #define RE_NO_EMPTY_RANGES (RE_NO_BK_VBAR << 1) /* If this bit is set, then an unmatched ) is ordinary. If not set, then an unmatched ) is invalid. */ #define RE_UNMATCHED_RIGHT_PAREN_ORD (RE_NO_EMPTY_RANGES << 1) /* This global variable defines the particular regexp syntax to use (for some interfaces). When a regexp is compiled, the syntax used is stored in the pattern buffer, so changing this does not affect already-compiled regexps. */ extern reg_syntax_t re_syntax_options; /* Define combinations of the above bits for the standard possibilities. (The [[[ comments delimit what gets put into the Texinfo file, so don't delete them!) */ /* [[[begin syntaxes]]] */ #define RE_SYNTAX_EMACS 0 #define RE_SYNTAX_AWK \ (RE_BACKSLASH_ESCAPE_IN_LISTS | RE_DOT_NOT_NULL \ | RE_NO_BK_PARENS | RE_NO_BK_REFS \ | RE_NO_BK_VBAR | RE_NO_EMPTY_RANGES \ | RE_UNMATCHED_RIGHT_PAREN_ORD) #define RE_SYNTAX_POSIX_AWK \ (RE_SYNTAX_POSIX_EXTENDED | RE_BACKSLASH_ESCAPE_IN_LISTS) #define RE_SYNTAX_GREP \ (RE_BK_PLUS_QM | RE_CHAR_CLASSES \ | RE_HAT_LISTS_NOT_NEWLINE | RE_INTERVALS \ | RE_NEWLINE_ALT) #define RE_SYNTAX_EGREP \ (RE_CHAR_CLASSES | RE_CONTEXT_INDEP_ANCHORS \ | RE_CONTEXT_INDEP_OPS | RE_HAT_LISTS_NOT_NEWLINE \ | RE_NEWLINE_ALT | RE_NO_BK_PARENS \ | RE_NO_BK_VBAR) #define RE_SYNTAX_POSIX_EGREP \ (RE_SYNTAX_EGREP | RE_INTERVALS | RE_NO_BK_BRACES) /* P1003.2/D11.2, section 4.20.7.1, lines 5078ff. */ #define RE_SYNTAX_ED RE_SYNTAX_POSIX_BASIC #define RE_SYNTAX_SED RE_SYNTAX_POSIX_BASIC /* Syntax bits common to both basic and extended POSIX regex syntax. */ #define _RE_SYNTAX_POSIX_COMMON \ (RE_CHAR_CLASSES | RE_DOT_NEWLINE | RE_DOT_NOT_NULL \ | RE_INTERVALS | RE_NO_EMPTY_RANGES) #define RE_SYNTAX_POSIX_BASIC \ (_RE_SYNTAX_POSIX_COMMON | RE_BK_PLUS_QM) /* Differs from ..._POSIX_BASIC only in that RE_BK_PLUS_QM becomes RE_LIMITED_OPS, i.e., \? \+ \| are not recognized. Actually, this isn't minimal, since other operators, such as \`, aren't disabled. */ #define RE_SYNTAX_POSIX_MINIMAL_BASIC \ (_RE_SYNTAX_POSIX_COMMON | RE_LIMITED_OPS) #define RE_SYNTAX_POSIX_EXTENDED \ (_RE_SYNTAX_POSIX_COMMON | RE_CONTEXT_INDEP_ANCHORS \ | RE_CONTEXT_INDEP_OPS | RE_NO_BK_BRACES \ | RE_NO_BK_PARENS | RE_NO_BK_VBAR \ | RE_UNMATCHED_RIGHT_PAREN_ORD) /* Differs from ..._POSIX_EXTENDED in that RE_CONTEXT_INVALID_OPS replaces RE_CONTEXT_INDEP_OPS and RE_NO_BK_REFS is added. */ #define RE_SYNTAX_POSIX_MINIMAL_EXTENDED \ (_RE_SYNTAX_POSIX_COMMON | RE_CONTEXT_INDEP_ANCHORS \ | RE_CONTEXT_INVALID_OPS | RE_NO_BK_BRACES \ | RE_NO_BK_PARENS | RE_NO_BK_REFS \ | RE_NO_BK_VBAR | RE_UNMATCHED_RIGHT_PAREN_ORD) /* [[[end syntaxes]]] */ /* Maximum number of duplicates an interval can allow. Some systems (erroneously) define this in other header files, but we want our value, so remove any previous define. */ #ifdef RE_DUP_MAX #undef RE_DUP_MAX #endif #define RE_DUP_MAX ((1 << 15) - 1) /* POSIX `cflags' bits (i.e., information for `regcomp'). */ /* If this bit is set, then use extended regular expression syntax. If not set, then use basic regular expression syntax. */ #define REG_EXTENDED 1 /* If this bit is set, then ignore case when matching. If not set, then case is significant. */ #define REG_ICASE (REG_EXTENDED << 1) /* If this bit is set, then anchors do not match at newline characters in the string. If not set, then anchors do match at newlines. */ #define REG_NEWLINE (REG_ICASE << 1) /* If this bit is set, then report only success or fail in regexec. If not set, then returns differ between not matching and errors. */ #define REG_NOSUB (REG_NEWLINE << 1) /* POSIX `eflags' bits (i.e., information for regexec). */ /* If this bit is set, then the beginning-of-line operator doesn't match the beginning of the string (presumably because it's not the beginning of a line). If not set, then the beginning-of-line operator does match the beginning of the string. */ #define REG_NOTBOL 1 /* Like REG_NOTBOL, except for the end-of-line. */ #define REG_NOTEOL (1 << 1) /* If any error codes are removed, changed, or added, update the `re_error_msg' table in regex.c. */ typedef enum { REG_NOERROR = 0, /* Success. */ REG_NOMATCH, /* Didn't find a match (for regexec). */ /* POSIX regcomp return error codes. (In the order listed in the standard.) */ REG_BADPAT, /* Invalid pattern. */ REG_ECOLLATE, /* Not implemented. */ REG_ECTYPE, /* Invalid character class name. */ REG_EESCAPE, /* Trailing backslash. */ REG_ESUBREG, /* Invalid back reference. */ REG_EBRACK, /* Unmatched left bracket. */ REG_EPAREN, /* Parenthesis imbalance. */ REG_EBRACE, /* Unmatched \{. */ REG_BADBR, /* Invalid contents of \{\}. */ REG_ERANGE, /* Invalid range end. */ REG_ESPACE, /* Ran out of memory. */ REG_BADRPT, /* No preceding re for repetition op. */ /* Error codes we've added. */ REG_EEND, /* Premature end. */ REG_ESIZE, /* Compiled pattern bigger than 2^16 bytes. */ REG_ERPAREN /* Unmatched ) or \); not returned from regcomp. */ } reg_errcode_t; /* This data structure represents a compiled pattern. Before calling the pattern compiler, the fields `buffer', `allocated', `fastmap', `translate', and `no_sub' can be set. After the pattern has been compiled, the `re_nsub' field is available. All other fields are private to the regex routines. */ struct re_pattern_buffer { /* [[[begin pattern_buffer]]] */ /* Space that holds the compiled pattern. It is declared as `unsigned char *' because its elements are sometimes used as array indexes. */ unsigned char *buffer; /* Number of bytes to which `buffer' points. */ unsigned long allocated; /* Number of bytes actually used in `buffer'. */ unsigned long used; /* Syntax setting with which the pattern was compiled. */ reg_syntax_t syntax; /* Pointer to a fastmap, if any, otherwise zero. re_search uses the fastmap, if there is one, to skip over impossible starting points for matches. */ char *fastmap; /* Either a translate table to apply to all characters before comparing them, or zero for no translation. The translation is applied to a pattern when it is compiled and to a string when it is matched. */ char *translate; /* Number of subexpressions found by the compiler. */ size_t re_nsub; /* Zero if this pattern cannot match the empty string, one else. Well, in truth it's used only in `re_search_2', to see whether or not we should use the fastmap, so we don't set this absolutely perfectly; see `re_compile_fastmap' (the `duplicate' case). */ unsigned can_be_null : 1; /* If REGS_UNALLOCATED, allocate space in the `regs' structure for `max (RE_NREGS, re_nsub + 1)' groups. If REGS_REALLOCATE, reallocate space if necessary. If REGS_FIXED, use what's there. */ #define REGS_UNALLOCATED 0 #define REGS_REALLOCATE 1 #define REGS_FIXED 2 unsigned regs_allocated : 2; /* Set to zero when `regex_compile' compiles a pattern; set to one by `re_compile_fastmap' if it updates the fastmap. */ unsigned fastmap_accurate : 1; /* If set, `re_match_2' does not return information about subexpressions. */ unsigned no_sub : 1; /* If set, a beginning-of-line anchor doesn't match at the beginning of the string. */ unsigned not_bol : 1; /* Similarly for an end-of-line anchor. */ unsigned not_eol : 1; /* If true, an anchor at a newline matches. */ unsigned newline_anchor : 1; /* [[[end pattern_buffer]]] */ }; typedef struct re_pattern_buffer regex_t; /* search.c (search_buffer) in Emacs needs this one opcode value. It is defined both in `regex.c' and here. */ #define RE_EXACTN_VALUE 1 /* Type for byte offsets within the string. POSIX mandates this. */ typedef int regoff_t; /* This is the structure we store register match data in. See regex.texinfo for a full description of what registers match. */ struct re_registers { unsigned num_regs; regoff_t *start; regoff_t *end; }; /* If `regs_allocated' is REGS_UNALLOCATED in the pattern buffer, `re_match_2' returns information about at least this many registers the first time a `regs' structure is passed. */ #ifndef RE_NREGS #define RE_NREGS 30 #endif /* POSIX specification for registers. Aside from the different names than `re_registers', POSIX uses an array of structures, instead of a structure of arrays. */ typedef struct { regoff_t rm_so; /* Byte offset from string's start to substring's start. */ regoff_t rm_eo; /* Byte offset from string's start to substring's end. */ } regmatch_t; /* Declarations for routines. */ /* To avoid duplicating every routine declaration -- once with a prototype (if we are ANSI), and once without (if we aren't) -- we use the following macro to declare argument types. This unfortunately clutters up the declarations a bit, but I think it's worth it. */ #if __STDC__ #define _RE_ARGS(args) args #else /* not __STDC__ */ #define _RE_ARGS(args) () #endif /* not __STDC__ */ /* Sets the current default syntax to SYNTAX, and return the old syntax. You can also simply assign to the `re_syntax_options' variable. */ extern reg_syntax_t re_set_syntax _RE_ARGS ((reg_syntax_t syntax)); /* Compile the regular expression PATTERN, with length LENGTH and syntax given by the global `re_syntax_options', into the buffer BUFFER. Return NULL if successful, and an error string if not. */ extern const char *re_compile_pattern _RE_ARGS ((const char *pattern, int length, struct re_pattern_buffer *buffer)); /* Compile a fastmap for the compiled pattern in BUFFER; used to accelerate searches. Return 0 if successful and -2 if was an internal error. */ extern int re_compile_fastmap _RE_ARGS ((struct re_pattern_buffer *buffer)); /* Search in the string STRING (with length LENGTH) for the pattern compiled into BUFFER. Start searching at position START, for RANGE characters. Return the starting position of the match, -1 for no match, or -2 for an internal error. Also return register information in REGS (if REGS and BUFFER->no_sub are nonzero). */ extern int re_search _RE_ARGS ((struct re_pattern_buffer *buffer, const char *string, int length, int start, int range, struct re_registers *regs)); /* Like `re_search', but search in the concatenation of STRING1 and STRING2. Also, stop searching at index START + STOP. */ extern int re_search_2 _RE_ARGS ((struct re_pattern_buffer *buffer, const char *string1, int length1, const char *string2, int length2, int start, int range, struct re_registers *regs, int stop)); /* Like `re_search', but return how many characters in STRING the regexp in BUFFER matched, starting at position START. */ extern int re_match _RE_ARGS ((struct re_pattern_buffer *buffer, const char *string, int length, int start, struct re_registers *regs)); /* Relates to `re_match' as `re_search_2' relates to `re_search'. */ extern int re_match_2 _RE_ARGS ((struct re_pattern_buffer *buffer, const char *string1, int length1, const char *string2, int length2, int start, struct re_registers *regs, int stop)); /* Set REGS to hold NUM_REGS registers, storing them in STARTS and ENDS. Subsequent matches using BUFFER and REGS will use this memory for recording register information. STARTS and ENDS must be allocated with malloc, and must each be at least `NUM_REGS * sizeof (regoff_t)' bytes long. If NUM_REGS == 0, then subsequent matches should allocate their own register data. Unless this function is called, the first search or match using PATTERN_BUFFER will allocate its own register data, without freeing the old data. */ extern void re_set_registers _RE_ARGS ((struct re_pattern_buffer *buffer, struct re_registers *regs, unsigned num_regs, regoff_t *starts, regoff_t *ends)); /* 4.2 bsd compatibility. */ extern char *re_comp _RE_ARGS ((const char *)); extern int re_exec _RE_ARGS ((const char *)); /* POSIX compatibility. */ extern int regcomp _RE_ARGS ((regex_t *preg, const char *pattern, int cflags)); extern int regexec _RE_ARGS ((const regex_t *preg, const char *string, size_t nmatch, regmatch_t pmatch[], int eflags)); extern size_t regerror _RE_ARGS ((int errcode, const regex_t *preg, char *errbuf, size_t errbuf_size)); extern void regfree _RE_ARGS ((regex_t *preg)); #endif /* not __REGEXP_LIBRARY_H__ */ /* Local variables: make-backup-files: t version-control: t trim-versions-without-asking: nil End: */ swish-e-2.4.7/src/vms/build_swish-e.com0000664000077100017500000000064011166010104014650 00000000000000$ arch_name = f$edit(f$getsyi("arch_name"),"upcase") $ if arch_name .eqs. "ALPHA" then arch_name = "axp" $ proc = f$environment("procedure") $ dev = f$parse(proc,,,"DEVICE") $ dir = f$parse(proc,,,"DIRECTORY") $ set default 'dev''dir' $ set default [-] $ if p1 .eqs. "LIBXML2" $ then $! Don't work on VAX $ mms/descr=[.vms]descrip_libxml2.mms 'p2' $ else $ mms/descr=[.vms]descrip_'arch_name'.mms 'p2' $ endif swish-e-2.4.7/src/vms/descrip_axp.mms0000775000077100017500000001513211166010104014436 00000000000000# # Makefile derived from the Makefile coming with swish-e 1.3.2 # (original Makefile for SWISH Kevin Hughes, 3/12/95) # # The code has been tested to compile on OpenVMS 7.3 # JF. Piéronne jf.pieronne@laposte.net 29-Mar-2003 # # autoconf configuration by Bas Meijer, 1 June 2000 # Cross Platform Compilation on Solaris, HP-UX, IRIX and Linux # Several ideas from a Makefile by Christian Lindig # NAME = swish-e.exe # C compiler CC = CC SHELL = /bin/sh prefix = @prefix@ bindir = $(prefix)/bin mandir = $(prefix)/man man1dir = $(mandir)/man1 # Flags for C compiler #CWARN= CDEF = /def=(VMS,HAVE_CONFIG_H,STDC_HEADERS) CINCL= /include=([.expat.xmlparse],[.expat.xmltok],libz:) CWARN= #CDEBUG= /debug/noopt CDEBUG= CFLAGS = /prefix=all$(CINCL)$(CDEF)$(CWARN)$(CDEBUG)/name=short #LINKFLAGS = /debug LINKFLAGS = LIBS= # # The objects for the different methods and # some common aliases # FILESYSTEM_OBJS=fs.obj HTTP_OBJS=http.obj httpserver.obj FS_OBJS=$(FILESYSTEM_OBJS) WEB_OBJS=$(HTTP_OBJS) VMS_OBJS = regex.obj VSNPRINTF_OBJ = vsnprintf.obj OBJS= check.obj file.obj index.obj search.obj error.obj methods.obj\ hash.obj list.obj mem.obj merge.obj swish2.obj stemmer.obj \ soundex.obj docprop.obj compress.obj xml.obj txt.obj \ metanames.obj result_sort.obj html.obj \ filter.obj parse_conffile.obj result_output.obj date_time.obj \ keychar_out.obj extprog.obj bash.obj db_native.obj dump.obj \ entities.obj swish_words.obj \ proplimit.obj swish_qsort.obj ramdisk.obj rank.obj \ xmlparse.obj xmltok.obj xmlrole.obj swregex.obj vsnprintf.obj \ double_metaphone.obj db_read.obj db_write.obj swstring.obj \ pre_sort.obj headers.obj docprop_write.obj stemmer.obj\ $(FILESYSTEM_OBJS) $(HTTP_OBJS) $(VMS_OBJS) \ api.obj stem_de.obj stem_dk.obj stem_en1.obj stem_en2.obj stem_es.obj\ stem_fi.obj stem_fr.obj stem_it.obj stem_nl.obj stem_no.obj \ stem_pt.obj stem_ru.obj stem_se.obj utilities.obj all : acconfig.h $(NAME) swish-search.exe libtest.exe ! xmlparse.obj : [.expat.xmlparse]xmlparse.c xmltok.obj : [.expat.xmltok]xmltok.c xmlrole.obj : [.expat.xmltok]xmlrole.c $(NAME) : $(OBJS) libswish-e.olb swish.obj link/exe=$(MMS$TARGET) $(LINKFLAGS) - swish.obj, libswish-e.olb/lib, [.vms]swish.opt/opt libtest.exe : libtest.obj libswish-e.olb swish.obj link/exe=$(MMS$TARGET) $(LINKFLAGS) libtest.obj, [.vms]libtest.opt/opt libswish-e.olb : $(OBJS) library/create $(MMS$TARGET) $(MMS$SOURCE_LIST) swish-search.exe : $(NAME) copy $(NAME) swish-search.exe regex.obj : [.vms]regex.c [.vms]descrip_axp.mms acconfig.h : [.vms]acconfig.h_vms copy $(MMS$SOURCE) $(MMS$TARGET) clean : delete [...]*.obj;*, [...]*.olb;*, index.swish;*, [-.tests]*.index;* realclean : pur [-...] delete [...]*.exe;*, [...]*.obj;*, [...]*.olb;*, index.swish;*, acconfig.h;*, [-.tests]*.index;* test : $(NAME) set def [-.tests] mc [-.src]swish-e -c test.config write sys$output "test 1 (Normal search) ..." mc [-.src]swish-e -f test.index -w test write sys$output "test 1 (MetaTag search 1) ..." mc [-.src]swish-e -f test.index -w meta1=metatest1 write sys$output "test 1 (MetaTag search 2) ..." mc [-.src]swish-e -f test.index -w meta2=metatest2 write sys$output "test 1 (XML search) ..." mc [-.src]swish-e -f ./test.index -w meta3=metatest3 write sys$output "test 1 (Phrase search) ..." mc [-.src]swish-e -f test.index -w """three little pigs""" $(OBJS) : [.vms]descrip_axp.mms config.h swish.h acconfig.h swish.obj : [.vms]descrip_axp.mms config.h swish.h acconfig.h install : ! man : ! # # dependencies # check.obj : check.c swish.h config.h check.h hash.h compress.obj : compress.c swish.h config.h error.h mem.h docprop.h index.h search.h merge.h compress.h deflate.obj : deflate.c swish.h config.h error.h mem.h docprop.h index.h search.h merge.h deflate.h docprop.obj : docprop.c swish.h config.h file.h hash.h mem.h merge.h \ error.h search.h docprop.h compress.h error.obj : error.c swish.h config.h error.h file.obj : file.c swish.h config.h file.h mem.h error.h list.h \ hash.h index.h fs.obj : fs.c swish.h config.h index.h hash.h mem.h file.h \ list.h hash.obj : hash.c swish.h config.h hash.h mem.h http.obj : http.c swish.h config.h index.h hash.h mem.h file.h \ http.h httpserver.h httpserver.obj : httpserver.c swish.h config.h mem.h http.h \ httpserver.h index.obj : index.c swish.h config.h index.h hash.h mem.h \ check.h search.h docprop.h stemmer.h compress.h list.obj : list.c swish.h config.h list.h mem.h mem.obj : mem.c swish.h config.h mem.h error.h merge.obj : merge.c swish.h config.h merge.h error.h search.h index.h \ hash.h mem.h docprop.h compress.h methods.obj : methods.c swish.h config.h search.obj : search.c swish.h config.h search.h file.h list.h \ merge.h hash.h mem.h docprop.h stemmer.h compress.h stemmer.obj : stemmer.c swish.h config.h stemmer.h soundex.obj : soundex.c swish.h config.h stemmer.h swish2.obj : swish2.c swish.h config.h error.h list.h search.h index.h \ file.h merge.h docprop.h swish.obj : swish.c swish.h config.h error.h list.h search.h index.h \ file.h merge.h docprop.h libtest.obj : libtest.c swish.h config.h error.h list.h search.h index.h \ file.h merge.h docprop.h txt.obj : txt.c txt.h swish.h mem.h index.h xml.obj : xml.c txt.h swish.h mem.h index.h proplimi.obj : swish.h mem.h merge.h docprop.h index.h metanames.h \ compress.h error.h db.h result_sort.h swish_qsort.h proplimit.h metanames.obj : metanames.c result_sort.obj : result_sort.c html.obj : html.c filter.obj : filter.c parse_conffile.obj : parse_conffile.c result_output.obj : result_output.c date_time.obj : date_time.c keychar_out.obj : keychar_out.c extprog.obj : extprog.c bash.obj : bash.c db_native.obj : db_native.c dump.obj : dump.c entities.obj : entities.c swish_words.obj : swish_words.c proplimit.obj : proplimit.c swish_qsort.obj : swish_qsort.c ramdisk.obj : ramdisk.c rank.obj : rank.c swregex.obj : swregex.c double_metaphone.obj : double_metaphone.c vsnprintf.obj : [.replace]vsnprintf.c db_read.obj : db_read.c db_write.obj : db_write.c swstring.obj : swstring.c pre_sort.obj : pre_sort.c hearders.obj : headers.c docprop_write.obj : docprop_write.c api.obj : [.snowball]api.c stem_de.obj : [.snowball]stem_de.c stem_dk.obj : [.snowball]stem_dk.c stem_en1.obj : [.snowball]stem_en1.c stem_en2.obj : [.snowball]stem_en2.c stem_es.obj : [.snowball]stem_es.c stem_fi.obj : [.snowball]stem_fi.c stem_fr.obj : [.snowball]stem_fr.c stem_it.obj : [.snowball]stem_it.c stem_nl.obj : [.snowball]stem_nl.c stem_no.obj : [.snowball]stem_no.c stem_pt.obj : [.snowball]stem_pt.c stem_ru.obj : [.snowball]stem_ru.c stem_se.obj : [.snowball]stem_se.c utilities.obj : [.snowball]utilities.c swish-e-2.4.7/src/vms/regexpr.h0000664000077100017500000001273211166010104013244 00000000000000/* * -*- mode: c-mode; c-file-style: python -*- */ #ifndef Py_REGEXPR_H #define Py_REGEXPR_H #ifdef __cplusplus extern "C" { #endif /* * regexpr.h * * Author: Tatu Ylonen * * Copyright (c) 1991 Tatu Ylonen, Espoo, Finland * * Permission to use, copy, modify, distribute, and sell this software * and its documentation for any purpose is hereby granted without fee, * provided that the above copyright notice appear in all copies. This * software is provided "as is" without express or implied warranty. * * Created: Thu Sep 26 17:15:36 1991 ylo * Last modified: Mon Nov 4 15:49:46 1991 ylo */ /* $Id: regexpr.h 1244 2003-05-28 05:38:41Z whmoseley $ */ #ifndef REGEXPR_H #define REGEXPR_H #define RE_NREGS 100 /* number of registers available */ typedef struct re_pattern_buffer { unsigned char *buffer; /* compiled pattern */ int allocated; /* allocated size of compiled pattern */ int used; /* actual length of compiled pattern */ unsigned char *fastmap; /* fastmap[ch] is true if ch can start pattern */ unsigned char *translate; /* translation to apply during compilation/matching */ unsigned char fastmap_accurate; /* true if fastmap is valid */ unsigned char can_be_null; /* true if can match empty string */ unsigned char uses_registers; /* registers are used and need to be initialized */ int num_registers; /* number of registers used */ unsigned char anchor; /* anchor: 0=none 1=begline 2=begbuf */ } *regexp_t; typedef struct re_registers { int start[RE_NREGS]; /* start offset of region */ int end[RE_NREGS]; /* end offset of region */ } *regexp_registers_t; /* bit definitions for syntax */ #define RE_NO_BK_PARENS 1 /* no quoting for parentheses */ #define RE_NO_BK_VBAR 2 /* no quoting for vertical bar */ #define RE_BK_PLUS_QM 4 /* quoting needed for + and ? */ #define RE_TIGHT_VBAR 8 /* | binds tighter than ^ and $ */ #define RE_NEWLINE_OR 16 /* treat newline as or */ #define RE_CONTEXT_INDEP_OPS 32 /* ^$?*+ are special in all contexts */ #define RE_ANSI_HEX 64 /* ansi sequences (\n etc) and \xhh */ #define RE_NO_GNU_EXTENSIONS 128 /* no gnu extensions */ /* definitions for some common regexp styles */ #define RE_SYNTAX_AWK (RE_NO_BK_PARENS|RE_NO_BK_VBAR|RE_CONTEXT_INDEP_OPS) #define RE_SYNTAX_EGREP (RE_SYNTAX_AWK|RE_NEWLINE_OR) #define RE_SYNTAX_GREP (RE_BK_PLUS_QM|RE_NEWLINE_OR) #define RE_SYNTAX_EMACS 0 #define Sword 1 #define Swhitespace 2 #define Sdigit 4 #define Soctaldigit 8 #define Shexdigit 16 /* Rename all exported symbols to avoid conflicts with similarly named symbols in some systems' standard C libraries... */ #define re_syntax _Py_re_syntax #define re_syntax_table _Py_re_syntax_table #define re_compile_initialize _Py_re_compile_initialize #define re_set_syntax _Py_re_set_syntax #define re_compile_pattern _Py_re_compile_pattern #define re_match _Py_re_match #define re_search _Py_re_search #define re_compile_fastmap _Py_re_compile_fastmap #define re_comp _Py_re_comp #define re_exec _Py_re_exec #ifdef HAVE_PROTOTYPES extern int re_syntax; /* This is the actual syntax mask. It was added so that Python could do * syntax-dependent munging of patterns before compilation. */ extern unsigned char re_syntax_table[256]; void re_compile_initialize(void); int re_set_syntax(int syntax); /* This sets the syntax to use and returns the previous syntax. The * syntax is specified by a bit mask of the above defined bits. */ char *re_compile_pattern(unsigned char *regex, int regex_size, regexp_t compiled); /* This compiles the regexp (given in regex and length in regex_size). * This returns NULL if the regexp compiled successfully, and an error * message if an error was encountered. The buffer field must be * initialized to a memory area allocated by malloc (or to NULL) before * use, and the allocated field must be set to its length (or 0 if * buffer is NULL). Also, the translate field must be set to point to a * valid translation table, or NULL if it is not used. */ int re_match(regexp_t compiled, unsigned char *string, int size, int pos, regexp_registers_t old_regs); /* This tries to match the regexp against the string. This returns the * length of the matched portion, or -1 if the pattern could not be * matched and -2 if an error (such as failure stack overflow) is * encountered. */ int re_search(regexp_t compiled, unsigned char *string, int size, int startpos, int range, regexp_registers_t regs); /* This searches for a substring matching the regexp. This returns the * first index at which a match is found. range specifies at how many * positions to try matching; positive values indicate searching * forwards, and negative values indicate searching backwards. mstop * specifies the offset beyond which a match must not go. This returns * -1 if no match is found, and -2 if an error (such as failure stack * overflow) is encountered. */ void re_compile_fastmap(regexp_t compiled); /* This computes the fastmap for the regexp. For this to have any effect, * the calling program must have initialized the fastmap field to point * to an array of 256 characters. */ #else /* HAVE_PROTOTYPES */ extern int re_syntax; extern unsigned char re_syntax_table[256]; void re_compile_initialize(); int re_set_syntax(); char *re_compile_pattern(); int re_match(); int re_search(); void re_compile_fastmap(); #endif /* HAVE_PROTOTYPES */ #endif /* REGEXPR_H */ #ifdef __cplusplus } #endif #endif /* !Py_REGEXPR_H */ swish-e-2.4.7/src/vms/libtest.opt0000664000077100017500000000004411166010104013602 00000000000000libswish-e.olb/lib LIBZ_SHR32/share swish-e-2.4.7/src/vms/acconfig.h_vms0000775000077100017500000001505311166010104014230 00000000000000/* src/acconfig.h.in. Generated from configure.in by autoheader. */ /* Define to one of `_getb67', `GETB67', `getb67' for Cray-2 and Cray-YMP systems. This function is required for `alloca.c' support on those systems. */ /* #undef CRAY_STACKSEG_END */ /* Define to 1 if using `alloca.c'. */ /* #undef C_ALLOCA */ /* Check for groups with AC_TYPE_GETGROUPS */ /* #undef GETGROUPS_T */ /* Define to 1 if you have the `access' function. */ #define HAVE_ACCESS 1 /* Define to 1 if you have `alloca', as a function or macro. */ /* #undef HAVE_ALLOCA */ /* Define to 1 if you have and it should be used (not on Ultrix). */ /* #undef HAVE_ALLOCA_H */ /* Get time of day */ /* #undef HAVE_BSDGETTIMEOFDAY */ /* Define to 1 if you have the `clock' function. */ #define HAVE_CLOCK 1 /* Define to 1 if you have the header file, and it defines `DIR'. */ #define HAVE_DIRENT_H 1 /* Define to 1 if you have the header file. */ #define HAVE_DLFCN_H 1 /* Define to 1 if you don't have `vprintf' but do have `_doprnt.' */ /* #undef HAVE_DOPRNT */ /* Define to 1 if you have the `fork' function. */ /* #undef HAVE_FORK */ /* Define to 1 if your system has a working `getgroups' function. */ /* #undef HAVE_GETGROUPS */ /* Define to 1 if you have the `getrusage' function. */ /* #undef HAVE_GETRUSAGE */ /* Define to 1 if you have the header file. */ #define HAVE_INTTYPES_H 1 /* Define to 1 if you have the `m' library (-lm). */ /* #undef HAVE_LIBM */ /* Define to 1 if you have the `snprintf' library (-lsnprintf). */ /* #undef HAVE_LIBSNPRINTF */ /* Libxml2 support included */ /* #undef HAVE_LIBXML2 */ /* Define to 1 if you have the `lstat' function. */ #define HAVE_LSTAT 1 /* Define to 1 if you have the `memcpy' function. */ #define HAVE_MEMCPY 1 /* Define to 1 if you have the header file. */ #define HAVE_MEMORY_H 1 /* Define to 1 if you have the `mkstemp' function. */ #define HAVE_MKSTEMP 1 /* Define to 1 if you have the header file, and it defines `DIR'. */ /* #undef HAVE_NDIR_H */ /* Perl REGEX library */ /* #undef HAVE_PCRE */ /* Define to 1 if you have the `regcomp' function. */ /* #undef HAVE_REGCOMP */ /* Define to 1 if you have the `re_comp' function. */ /* #undef HAVE_RE_COMP */ /* Define to 1 if you have the `setenv' function. */ #define HAVE_SETENV 1 /* Define to 1 if you have the header file. */ /* #undef HAVE_STDINT_H */ /* Define to 1 if you have the header file. */ #define HAVE_STDLIB_H 1 /* Define to 1 if you have the `strchr' function. */ #define HAVE_STRCHR 1 /* Define to 1 if you have the `strdup' function. */ #define HAVE_STRDUP 1 /* Define to 1 if you have the `strftime' function. */ #define HAVE_STRFTIME 1 /* Define to 1 if you have the header file. */ #define HAVE_STRINGS_H 1 /* Define to 1 if you have the header file. */ #define HAVE_STRING_H 1 /* Define to 1 if you have the `strstr' function. */ #define HAVE_STRSTR 1 /* Define to 1 if you have the header file, and it defines `DIR'. */ /* #undef HAVE_SYS_DIR_H */ /* Define to 1 if you have the header file, and it defines `DIR'. */ /* #undef HAVE_SYS_NDIR_H */ /* Define to 1 if you have the header file. */ /* #undef HAVE_SYS_PARAM_H */ /* Define to 1 if you have the header file. */ /* #undef HAVE_SYS_RESOURCE_H */ /* Define to 1 if you have the header file. */ /* #undef HAVE_SYS_STAT_H */ /* Define to 1 if you have the header file. */ /* #undef HAVE_SYS_TIMEB_H */ /* Define to 1 if you have the header file. */ /* #undef HAVE_SYS_TYPES_H */ /* Define to 1 if you have that is POSIX.1 compatible. */ /* #undef HAVE_SYS_WAIT_H */ /* Define to 1 if you have the `times' function. */ #define HAVE_TIMES 1 /* Define to 1 if you have the header file. */ #define HAVE_UNISTD_H 1 /* Define to 1 if you have the `vfork' function. */ #define HAVE_VFORK 1 /* Define to 1 if you have the header file. */ /* #undef HAVE_VFORK_H */ /* Define to 1 if you have the `vprintf' function. */ #define HAVE_VPRINTF 1 /* Define to 1 if you have the `vsnprintf' function. */ #define HAVE_VSNPRINTF 1 /* Define to 1 if you have the header file. */ /* #undef HAVE_WINDOWS_H */ /* Define to 1 if `fork' works. */ /* #undef HAVE_WORKING_FORK */ /* Define to 1 if `vfork' works. */ #define HAVE_WORKING_VFORK 1 /* Do we have zlib */ #define HAVE_ZLIB 1 /* Define to 1 if you have the header file. */ #define HAVE_ZLIB_H 1 /* (developers only) checks for memory consistency on alloc/free using guards */ /* #undef MEM_DEBUG */ /* (developers only) gives memory statistics (bytes allocated, calls, etc) */ /* #undef MEM_STATISTICS */ /* (developers only) checks for unfreed memory, and where it is allocated */ /* #undef MEM_TRACE */ /* Get time of day */ /* #undef NO_GETTOD */ /* Name of package */ #define PACKAGE "swish-e" /* Define to the address where bug reports for this package should be sent. */ #define PACKAGE_BUGREPORT "" /* Define to the full name of this package. */ #define PACKAGE_NAME "" /* Define to the full name and version of this package. */ #define PACKAGE_STRING "" /* Define to the one symbol short name of this package. */ #define PACKAGE_TARNAME "" /* Define to the version of this package. */ #define PACKAGE_VERSION "" /* If using the C implementation of alloca, define if you know the direction of stack growth for your system; otherwise it will be automatically deduced at run-time. STACK_DIRECTION > 0 => grows toward higher addresses STACK_DIRECTION < 0 => grows toward lower addresses STACK_DIRECTION = 0 => direction of growth unknown */ /* #undef STACK_DIRECTION */ /* Define to 1 if the `S_IS*' macros in do not work properly. */ /* #undef STAT_MACROS_BROKEN */ /* Define to 1 if you have the ANSI C header files. */ #define STDC_HEADERS 1 /* Define to 1 if your declares `struct tm'. */ /* #undef TM_IN_SYS_TIME */ /* Experimental BTREE support */ /* #undef USE_BTREE */ /* Version number of package */ #define VERSION "2.4.0" /* Define to empty if `const' does not conform to ANSI C. */ /* #undef const */ /* Define to `int' if doesn't define. */ /* #undef gid_t */ /* Define to `int' if does not define. */ /* #undef pid_t */ /* Define to `unsigned' if does not define. */ /* #undef size_t */ /* Define to `int' if doesn't define. */ /* #undef uid_t */ /* Define as `fork' if `vfork' does not work. */ /* #undef vfork */ swish-e-2.4.7/src/vms/config.h0000664000077100017500000000000011166010104013016 00000000000000swish-e-2.4.7/src/vms/descrip_libxml2.mms0000664000077100017500000001545011166010105015220 00000000000000# # Makefile derived from the Makefile coming with swish-e 1.3.2 # (original Makefile for SWISH Kevin Hughes, 3/12/95) # # The code has been tested to compile on OpenVMS 7.3 # JF. Piéronne jf.pieronne@laposte.net 29-Mar-2003 # # autoconf configuration by Bas Meijer, 1 June 2000 # Cross Platform Compilation on Solaris, HP-UX, IRIX and Linux # Several ideas from a Makefile by Christian Lindig # NAME = swish-e.exe # C compiler CC = CC SHELL = /bin/sh prefix = @prefix@ bindir = $(prefix)/bin mandir = $(prefix)/man man1dir = $(mandir)/man1 # Flags for C compiler #CWARN= CDEF = /def=(HAVE_LIBXML2,VMS,HAVE_CONFIG_H,STDC_HEADERS) CINCL= /include=([.expat.xmlparse],[.expat.xmltok],libz:) CWARN= #CDEBUG= /debug/noopt CDEBUG= CFLAGS =/prefix=all$(CINCL)$(CDEF)$(CWARN)$(CDEBUG)/name=(as_is,short)/float=ieee #LINKFLAGS = /debug LINKFLAGS = LIBS= # # The objects for the different methods and # some common aliases # FILESYSTEM_OBJS=fs.obj HTTP_OBJS=http.obj httpserver.obj FS_OBJS=$(FILESYSTEM_OBJS) WEB_OBJS=$(HTTP_OBJS) VMS_OBJS = regex.obj XML_PARSE = xmlparse.obj xmltok.obj xmlrole.obj LIBXML2_LIB = libxml:libxml.olb/lib LIBXML2_OBJS = parser.obj VSNPRINTF_OBJ = vsnprintf.obj OBJS= check.obj file.obj index.obj search.obj error.obj methods.obj\ hash.obj list.obj mem.obj merge.obj swish2.obj stemmer.obj \ soundex.obj docprop.obj compress.obj xml.obj txt.obj \ metanames.obj result_sort.obj html.obj \ filter.obj parse_conffile.obj result_output.obj date_time.obj \ keychar_out.obj extprog.obj bash.obj db_native.obj dump.obj \ entities.obj swish_words.obj \ proplimit.obj swish_qsort.obj ramdisk.obj rank.obj swregex.obj \ double_metaphone.obj db_read.obj db_write.obj swstring.obj \ pre_sort.obj parser.obj headers.obj docprop_write.obj stemmer.obj\ $(XML_PARSE)\ $(LIBXML2_OBJS)\ $(FILESYSTEM_OBJS) $(HTTP_OBJS) $(VMS_OBJS) $(VSNPRINTF_OBJ) \ api.obj stem_de.obj stem_dk.obj stem_en1.obj stem_en2.obj stem_es.obj\ stem_fi.obj stem_fr.obj stem_it.obj stem_nl.obj stem_no.obj \ stem_pt.obj stem_ru.obj stem_se.obj utilities.obj all : acconfig.h $(NAME) swish-search.exe libtest.exe ! xmlparse.obj : [.expat.xmlparse]xmlparse.c xmltok.obj : [.expat.xmltok]xmltok.c xmlrole.obj : [.expat.xmltok]xmlrole.c $(NAME) : $(OBJS) libswish-e.olb swish.obj link/exe=$(MMS$TARGET) $(LINKFLAGS) - swish.obj,libswish-e.olb/lib,[.vms]swish.opt/opt,$(LIBXML2_LIB) libtest.exe : libtest.obj libswish-e.olb swish.obj link/exe=$(MMS$TARGET) $(LINKFLAGS) libtest.obj, [.vms]libtest.opt/opt libswish-e.olb : $(OBJS) library/create $(MMS$TARGET) $(MMS$SOURCE_LIST) swish-search.exe : $(NAME) copy $(NAME) swish-search.exe regex.obj : [.vms]regex.c [.vms]descrip_libxml2.mms acconfig.h : [.vms]acconfig.h_vms copy $(MMS$SOURCE) $(MMS$TARGET) clean : delete [...]*.obj;*, [...]*.olb;*, index.swish;*, [-.tests]*.index;* realclean : pur [-...] delete [...]*.exe;*, [...]*.obj;*, [...]*.olb;*, index.swish;*, acconfig.h;*, [-.tests]*.index;* test : $(NAME) set def [-.tests] mc [-.src]swish-e -c test.config write sys$output "test 1 (Normal search) ..." mc [-.src]swish-e -f test.index -w test write sys$output "test 1 (MetaTag search 1) ..." mc [-.src]swish-e -f test.index -w meta1=metatest1 write sys$output "test 1 (MetaTag search 2) ..." mc [-.src]swish-e -f test.index -w meta2=metatest2 write sys$output "test 1 (XML search) ..." mc [-.src]swish-e -f ./test.index -w meta3=metatest3 write sys$output "test 1 (Phrase search) ..." mc [-.src]swish-e -f test.index -w """three little pigs""" $(OBJS) : [.vms]descrip_libxml2.mms config.h swish.h acconfig.h swish.obj : [.vms]descrip_libxml2.mms config.h swish.h acconfig.h install : ! man : ! # # dependencies # check.obj : check.c swish.h config.h check.h hash.h compress.obj : compress.c swish.h config.h error.h mem.h docprop.h index.h search.h merge.h compress.h deflate.obj : deflate.c swish.h config.h error.h mem.h docprop.h index.h search.h merge.h deflate.h docprop.obj : docprop.c swish.h config.h file.h hash.h mem.h merge.h \ error.h search.h docprop.h compress.h error.obj : error.c swish.h config.h error.h file.obj : file.c swish.h config.h file.h mem.h error.h list.h \ hash.h index.h fs.obj : fs.c swish.h config.h index.h hash.h mem.h file.h \ list.h hash.obj : hash.c swish.h config.h hash.h mem.h http.obj : http.c swish.h config.h index.h hash.h mem.h file.h \ http.h httpserver.h httpserver.obj : httpserver.c swish.h config.h mem.h http.h \ httpserver.h index.obj : index.c swish.h config.h index.h hash.h mem.h \ check.h search.h docprop.h stemmer.h compress.h list.obj : list.c swish.h config.h list.h mem.h mem.obj : mem.c swish.h config.h mem.h error.h merge.obj : merge.c swish.h config.h merge.h error.h search.h index.h \ hash.h mem.h docprop.h compress.h methods.obj : methods.c swish.h config.h search.obj : search.c swish.h config.h search.h file.h list.h \ merge.h hash.h mem.h docprop.h stemmer.h compress.h stemmer.obj : stemmer.c swish.h config.h stemmer.h soundex.obj : soundex.c swish.h config.h stemmer.h swish2.obj : swish2.c swish.h config.h error.h list.h search.h index.h \ file.h merge.h docprop.h swish.obj : swish.c swish.h config.h error.h list.h search.h index.h \ file.h merge.h docprop.h libtest.obj : libtest.c swish.h config.h error.h list.h search.h index.h \ file.h merge.h docprop.h txt.obj : txt.c txt.h swish.h mem.h index.h xml.obj : xml.c txt.h swish.h mem.h index.h proplimi.obj : swish.h mem.h merge.h docprop.h index.h metanames.h \ compress.h error.h db.h result_sort.h swish_qsort.h proplimit.h metanames.obj : metanames.c result_sort.obj : result_sort.c html.obj : html.c filter.obj : filter.c parse_conffile.obj : parse_conffile.c result_output.obj : result_output.c date_time.obj : date_time.c keychar_out.obj : keychar_out.c extprog.obj : extprog.c bash.obj : bash.c db_native.obj : db_native.c dump.obj : dump.c entities.obj : entities.c swish_words.obj : swish_words.c proplimit.obj : proplimit.c swish_qsort.obj : swish_qsort.c ramdisk.obj : ramdisk.c rank.obj : rank.c swregex.obj : swregex.c double_metaphone.obj : double_metaphone.c parser.obj : parser.c vsnprintf.obj : [.replace]vsnprintf.c db_read.obj : db_read.c db_write.obj : db_write.c swstring.obj : swstring.c pre_sort.obj : pre_sort.c hearders.obj : headers.c docprop_write.obj : docprop_write.c api.obj : [.snowball]api.c stem_de.obj : [.snowball]stem_de.c stem_dk.obj : [.snowball]stem_dk.c stem_en1.obj : [.snowball]stem_en1.c stem_en2.obj : [.snowball]stem_en2.c stem_es.obj : [.snowball]stem_es.c stem_fi.obj : [.snowball]stem_fi.c stem_fr.obj : [.snowball]stem_fr.c stem_it.obj : [.snowball]stem_it.c stem_nl.obj : [.snowball]stem_nl.c stem_no.obj : [.snowball]stem_no.c stem_pt.obj : [.snowball]stem_pt.c stem_ru.obj : [.snowball]stem_ru.c stem_se.obj : [.snowball]stem_se.c utilities.obj : [.snowball]utilities.c swish-e-2.4.7/src/vms/swish.opt0000664000077100017500000000002111166010105013265 00000000000000LIBZ_SHR32/share swish-e-2.4.7/src/vms/readme_vms.txt0000664000077100017500000000065011166010104014276 00000000000000I have test this version of SWISH-E on OpenVMS 7.3 AXP. Building procedure: $ @BUILD_SWISH-E or $ @BUILD_SWISH-E LIBXML2 If you have LIBXML2 installed If you used LIBXML2 SWISH-E is compiled using /float=ieee On AXP SWISH-E is compiled using /name=(short,as_is) The build generate SWISH-E.EXE SWISH-SEARCH.EXE LIBTEST.EXE Testing: $ @BUILD_SWISH-E test Port made by Piéronne Jean-François (jf.pieronne@laposte.net) swish-e-2.4.7/src/vms/regex.c0000664000077100017500000047404011166010104012701 00000000000000/* Extended regular expression matching and search library, version 0.12. (Implements POSIX draft P10003.2/D11.2, except for internationalization features.) Copyright (C) 1993 Free Software Foundation, Inc. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ /* AIX requires this to be the first thing in the file. */ #if defined (_AIX) && !defined (REGEX_MALLOC) #pragma alloca #endif #define _GNU_SOURCE #ifdef _WIN32 #define HAVE_STRING_H 1 /* Win32 */ #define REGEX_MALLOC 1 /* Win32 */ #endif /* We need this for `regex.h', and perhaps for the Emacs include files. */ #include #ifdef HAVE_CONFIG_H #include "config.h" #endif /* The `emacs' switch turns on certain matching commands that make sense only in Emacs. */ #ifdef emacs #include "lisp.h" #include "buffer.h" #include "syntax.h" /* Emacs uses `NULL' as a predicate. */ #undef NULL #else /* not emacs */ /* We used to test for `BSTRING' here, but only GCC and Emacs define `BSTRING', as far as I know, and neither of them use this code. */ #if HAVE_STRING_H || STDC_HEADERS #include #ifndef bcmp #define bcmp(s1, s2, n) memcmp ((s1), (s2), (n)) #endif #ifndef bcopy #define bcopy(s, d, n) memcpy ((d), (s), (n)) #endif #ifndef bzero #define bzero(s, n) memset ((s), 0, (n)) #endif #else #include #endif #ifdef STDC_HEADERS #include #else char *malloc (); char *realloc (); #endif /* Define the syntax stuff for \<, \>, etc. */ /* This must be nonzero for the wordchar and notwordchar pattern commands in re_match_2. */ #ifndef Sword #define Sword 1 #endif #ifdef SYNTAX_TABLE extern char *re_syntax_table; #else /* not SYNTAX_TABLE */ /* How many characters in the character set. */ #define CHAR_SET_SIZE 256 static char re_syntax_table[CHAR_SET_SIZE]; static void init_syntax_once () { register int c; static int done = 0; if (done) return; bzero (re_syntax_table, sizeof re_syntax_table); for (c = 'a'; c <= 'z'; c++) re_syntax_table[c] = Sword; for (c = 'A'; c <= 'Z'; c++) re_syntax_table[c] = Sword; for (c = '0'; c <= '9'; c++) re_syntax_table[c] = Sword; re_syntax_table['_'] = Sword; done = 1; } #endif /* not SYNTAX_TABLE */ #define SYNTAX(c) re_syntax_table[c] #endif /* not emacs */ /* Get the interface, including the syntax bits. */ #include "regex.h" /* isalpha etc. are used for the character classes. */ #include #ifndef isascii #define isascii(c) 1 #endif #ifdef isblank #define ISBLANK(c) (isascii (c) && isblank (c)) #else #define ISBLANK(c) ((c) == ' ' || (c) == '\t') #endif #ifdef isgraph #define ISGRAPH(c) (isascii (c) && isgraph (c)) #else #define ISGRAPH(c) (isascii (c) && isprint (c) && !isspace (c)) #endif #define ISPRINT(c) (isascii (c) && isprint (c)) #define ISDIGIT(c) (isascii (c) && isdigit (c)) #define ISALNUM(c) (isascii (c) && isalnum (c)) #define ISALPHA(c) (isascii (c) && isalpha (c)) #define ISCNTRL(c) (isascii (c) && iscntrl (c)) #define ISLOWER(c) (isascii (c) && islower (c)) #define ISPUNCT(c) (isascii (c) && ispunct (c)) #define ISSPACE(c) (isascii (c) && isspace (c)) #define ISUPPER(c) (isascii (c) && isupper (c)) #define ISXDIGIT(c) (isascii (c) && isxdigit (c)) #ifndef NULL #define NULL 0 #endif /* We remove any previous definition of `SIGN_EXTEND_CHAR', since ours (we hope) works properly with all combinations of machines, compilers, `char' and `unsigned char' argument types. (Per Bothner suggested the basic approach.) */ #undef SIGN_EXTEND_CHAR #if __STDC__ #define SIGN_EXTEND_CHAR(c) ((signed char) (c)) #else /* not __STDC__ */ /* As in Harbison and Steele. */ #define SIGN_EXTEND_CHAR(c) ((((unsigned char) (c)) ^ 128) - 128) #endif /* Should we use malloc or alloca? If REGEX_MALLOC is not defined, we use `alloca' instead of `malloc'. This is because using malloc in re_search* or re_match* could cause memory leaks when C-g is used in Emacs; also, malloc is slower and causes storage fragmentation. On the other hand, malloc is more portable, and easier to debug. Because we sometimes use alloca, some routines have to be macros, not functions -- `alloca'-allocated space disappears at the end of the function it is called in. */ #ifdef REGEX_MALLOC #define REGEX_ALLOCATE malloc #define REGEX_REALLOCATE(source, osize, nsize) realloc (source, nsize) #else /* not REGEX_MALLOC */ /* Emacs already defines alloca, sometimes. */ #ifndef alloca /* Make alloca work the best possible way. */ #ifdef __GNUC__ #define alloca __builtin_alloca #else /* not __GNUC__ */ #if HAVE_ALLOCA_H #include #else /* not __GNUC__ or HAVE_ALLOCA_H */ #ifndef _AIX /* Already did AIX, up at the top. */ char *alloca (); #endif /* not _AIX */ #endif /* not HAVE_ALLOCA_H */ #endif /* not __GNUC__ */ #endif /* not alloca */ #define REGEX_ALLOCATE alloca /* Assumes a `char *destination' variable. */ #define REGEX_REALLOCATE(source, osize, nsize) \ (destination = (char *) alloca (nsize), \ bcopy (source, destination, osize), \ destination) #endif /* not REGEX_MALLOC */ /* True if `size1' is non-NULL and PTR is pointing anywhere inside `string1' or just past its end. This works if PTR is NULL, which is a good thing. */ #define FIRST_STRING_P(ptr) \ (size1 && string1 <= (ptr) && (ptr) <= string1 + size1) /* (Re)Allocate N items of type T using malloc, or fail. */ #define TALLOC(n, t) ((t *) malloc ((n) * sizeof (t))) #define RETALLOC(addr, n, t) ((addr) = (t *) realloc (addr, (n) * sizeof (t))) #define REGEX_TALLOC(n, t) ((t *) REGEX_ALLOCATE ((n) * sizeof (t))) #define BYTEWIDTH 8 /* In bits. */ #define STREQ(s1, s2) ((strcmp (s1, s2) == 0)) #define MAX(a, b) ((a) > (b) ? (a) : (b)) #define MIN(a, b) ((a) < (b) ? (a) : (b)) typedef char boolean; #define false 0 #define true 1 /* These are the command codes that appear in compiled regular expressions. Some opcodes are followed by argument bytes. A command code can specify any interpretation whatsoever for its arguments. Zero bytes may appear in the compiled regular expression. The value of `exactn' is needed in search.c (search_buffer) in Emacs. So regex.h defines a symbol `RE_EXACTN_VALUE' to be 1; the value of `exactn' we use here must also be 1. */ typedef enum { no_op = 0, /* Followed by one byte giving n, then by n literal bytes. */ exactn = 1, /* Matches any (more or less) character. */ anychar, /* Matches any one char belonging to specified set. First following byte is number of bitmap bytes. Then come bytes for a bitmap saying which chars are in. Bits in each byte are ordered low-bit-first. A character is in the set if its bit is 1. A character too large to have a bit in the map is automatically not in the set. */ charset, /* Same parameters as charset, but match any character that is not one of those specified. */ charset_not, /* Start remembering the text that is matched, for storing in a register. Followed by one byte with the register number, in the range 0 to one less than the pattern buffer's re_nsub field. Then followed by one byte with the number of groups inner to this one. (This last has to be part of the start_memory only because we need it in the on_failure_jump of re_match_2.) */ start_memory, /* Stop remembering the text that is matched and store it in a memory register. Followed by one byte with the register number, in the range 0 to one less than `re_nsub' in the pattern buffer, and one byte with the number of inner groups, just like `start_memory'. (We need the number of inner groups here because we don't have any easy way of finding the corresponding start_memory when we're at a stop_memory.) */ stop_memory, /* Match a duplicate of something remembered. Followed by one byte containing the register number. */ duplicate, /* Fail unless at beginning of line. */ begline, /* Fail unless at end of line. */ endline, /* Succeeds if at beginning of buffer (if emacs) or at beginning of string to be matched (if not). */ begbuf, /* Analogously, for end of buffer/string. */ endbuf, /* Followed by two byte relative address to which to jump. */ jump, /* Same as jump, but marks the end of an alternative. */ jump_past_alt, /* Followed by two-byte relative address of place to resume at in case of failure. */ on_failure_jump, /* Like on_failure_jump, but pushes a placeholder instead of the current string position when executed. */ on_failure_keep_string_jump, /* Throw away latest failure point and then jump to following two-byte relative address. */ pop_failure_jump, /* Change to pop_failure_jump if know won't have to backtrack to match; otherwise change to jump. This is used to jump back to the beginning of a repeat. If what follows this jump clearly won't match what the repeat does, such that we can be sure that there is no use backtracking out of repetitions already matched, then we change it to a pop_failure_jump. Followed by two-byte address. */ maybe_pop_jump, /* Jump to following two-byte address, and push a dummy failure point. This failure point will be thrown away if an attempt is made to use it for a failure. A `+' construct makes this before the first repeat. Also used as an intermediary kind of jump when compiling an alternative. */ dummy_failure_jump, /* Push a dummy failure point and continue. Used at the end of alternatives. */ push_dummy_failure, /* Followed by two-byte relative address and two-byte number n. After matching N times, jump to the address upon failure. */ succeed_n, /* Followed by two-byte relative address, and two-byte number n. Jump to the address N times, then fail. */ jump_n, /* Set the following two-byte relative address to the subsequent two-byte number. The address *includes* the two bytes of number. */ set_number_at, wordchar, /* Matches any word-constituent character. */ notwordchar, /* Matches any char that is not a word-constituent. */ wordbeg, /* Succeeds if at word beginning. */ wordend, /* Succeeds if at word end. */ wordbound, /* Succeeds if at a word boundary. */ notwordbound /* Succeeds if not at a word boundary. */ #ifdef emacs ,before_dot, /* Succeeds if before point. */ at_dot, /* Succeeds if at point. */ after_dot, /* Succeeds if after point. */ /* Matches any character whose syntax is specified. Followed by a byte which contains a syntax code, e.g., Sword. */ syntaxspec, /* Matches any character whose syntax is not that specified. */ notsyntaxspec #endif /* emacs */ } re_opcode_t; /* Common operations on the compiled pattern. */ /* Store NUMBER in two contiguous bytes starting at DESTINATION. */ #define STORE_NUMBER(destination, number) \ do { \ (destination)[0] = (number) & 0377; \ (destination)[1] = (number) >> 8; \ } while (0) /* Same as STORE_NUMBER, except increment DESTINATION to the byte after where the number is stored. Therefore, DESTINATION must be an lvalue. */ #define STORE_NUMBER_AND_INCR(destination, number) \ do { \ STORE_NUMBER (destination, number); \ (destination) += 2; \ } while (0) /* Put into DESTINATION a number stored in two contiguous bytes starting at SOURCE. */ #define EXTRACT_NUMBER(destination, source) \ do { \ (destination) = *(source) & 0377; \ (destination) += SIGN_EXTEND_CHAR (*((source) + 1)) << 8; \ } while (0) #ifdef DEBUG static void extract_number (dest, source) int *dest; unsigned char *source; { int temp = SIGN_EXTEND_CHAR (*(source + 1)); *dest = *source & 0377; *dest += temp << 8; } #ifndef EXTRACT_MACROS /* To debug the macros. */ #undef EXTRACT_NUMBER #define EXTRACT_NUMBER(dest, src) extract_number (&dest, src) #endif /* not EXTRACT_MACROS */ #endif /* DEBUG */ /* Same as EXTRACT_NUMBER, except increment SOURCE to after the number. SOURCE must be an lvalue. */ #define EXTRACT_NUMBER_AND_INCR(destination, source) \ do { \ EXTRACT_NUMBER (destination, source); \ (source) += 2; \ } while (0) #ifdef DEBUG static void extract_number_and_incr (destination, source) int *destination; unsigned char **source; { extract_number (destination, *source); *source += 2; } #ifndef EXTRACT_MACROS #undef EXTRACT_NUMBER_AND_INCR #define EXTRACT_NUMBER_AND_INCR(dest, src) \ extract_number_and_incr (&dest, &src) #endif /* not EXTRACT_MACROS */ #endif /* DEBUG */ /* If DEBUG is defined, Regex prints many voluminous messages about what it is doing (if the variable `debug' is nonzero). If linked with the main program in `iregex.c', you can enter patterns and strings interactively. And if linked with the main program in `main.c' and the other test files, you can run the already-written tests. */ #ifdef DEBUG /* We use standard I/O for debugging. */ #include /* It is useful to test things that ``must'' be true when debugging. */ #include static int debug = 0; #define DEBUG_STATEMENT(e) e #define DEBUG_PRINT1(x) if (debug) printf (x) #define DEBUG_PRINT2(x1, x2) if (debug) printf (x1, x2) #define DEBUG_PRINT3(x1, x2, x3) if (debug) printf (x1, x2, x3) #define DEBUG_PRINT4(x1, x2, x3, x4) if (debug) printf (x1, x2, x3, x4) #define DEBUG_PRINT_COMPILED_PATTERN(p, s, e) \ if (debug) print_partial_compiled_pattern (s, e) #define DEBUG_PRINT_DOUBLE_STRING(w, s1, sz1, s2, sz2) \ if (debug) print_double_string (w, s1, sz1, s2, sz2) extern void printchar (); /* Print the fastmap in human-readable form. */ void print_fastmap (fastmap) char *fastmap; { unsigned was_a_range = 0; unsigned i = 0; while (i < (1 << BYTEWIDTH)) { if (fastmap[i++]) { was_a_range = 0; printchar (i - 1); while (i < (1 << BYTEWIDTH) && fastmap[i]) { was_a_range = 1; i++; } if (was_a_range) { printf ("-"); printchar (i - 1); } } } putchar ('\n'); } /* Print a compiled pattern string in human-readable form, starting at the START pointer into it and ending just before the pointer END. */ void print_partial_compiled_pattern (start, end) unsigned char *start; unsigned char *end; { int mcnt, mcnt2; unsigned char *p = start; unsigned char *pend = end; if (start == NULL) { printf ("(null)\n"); return; } /* Loop over pattern commands. */ while (p < pend) { switch ((re_opcode_t) *p++) { case no_op: printf ("/no_op"); break; case exactn: mcnt = *p++; printf ("/exactn/%d", mcnt); do { putchar ('/'); printchar (*p++); } while (--mcnt); break; case start_memory: mcnt = *p++; printf ("/start_memory/%d/%d", mcnt, *p++); break; case stop_memory: mcnt = *p++; printf ("/stop_memory/%d/%d", mcnt, *p++); break; case duplicate: printf ("/duplicate/%d", *p++); break; case anychar: printf ("/anychar"); break; case charset: case charset_not: { register int c; printf ("/charset%s", (re_opcode_t) *(p - 1) == charset_not ? "_not" : ""); assert (p + *p < pend); for (c = 0; c < *p; c++) { unsigned bit; unsigned char map_byte = p[1 + c]; putchar ('/'); for (bit = 0; bit < BYTEWIDTH; bit++) if (map_byte & (1 << bit)) printchar (c * BYTEWIDTH + bit); } p += 1 + *p; break; } case begline: printf ("/begline"); break; case endline: printf ("/endline"); break; case on_failure_jump: extract_number_and_incr (&mcnt, &p); printf ("/on_failure_jump/0/%d", mcnt); break; case on_failure_keep_string_jump: extract_number_and_incr (&mcnt, &p); printf ("/on_failure_keep_string_jump/0/%d", mcnt); break; case dummy_failure_jump: extract_number_and_incr (&mcnt, &p); printf ("/dummy_failure_jump/0/%d", mcnt); break; case push_dummy_failure: printf ("/push_dummy_failure"); break; case maybe_pop_jump: extract_number_and_incr (&mcnt, &p); printf ("/maybe_pop_jump/0/%d", mcnt); break; case pop_failure_jump: extract_number_and_incr (&mcnt, &p); printf ("/pop_failure_jump/0/%d", mcnt); break; case jump_past_alt: extract_number_and_incr (&mcnt, &p); printf ("/jump_past_alt/0/%d", mcnt); break; case jump: extract_number_and_incr (&mcnt, &p); printf ("/jump/0/%d", mcnt); break; case succeed_n: extract_number_and_incr (&mcnt, &p); extract_number_and_incr (&mcnt2, &p); printf ("/succeed_n/0/%d/0/%d", mcnt, mcnt2); break; case jump_n: extract_number_and_incr (&mcnt, &p); extract_number_and_incr (&mcnt2, &p); printf ("/jump_n/0/%d/0/%d", mcnt, mcnt2); break; case set_number_at: extract_number_and_incr (&mcnt, &p); extract_number_and_incr (&mcnt2, &p); printf ("/set_number_at/0/%d/0/%d", mcnt, mcnt2); break; case wordbound: printf ("/wordbound"); break; case notwordbound: printf ("/notwordbound"); break; case wordbeg: printf ("/wordbeg"); break; case wordend: printf ("/wordend"); #ifdef emacs case before_dot: printf ("/before_dot"); break; case at_dot: printf ("/at_dot"); break; case after_dot: printf ("/after_dot"); break; case syntaxspec: printf ("/syntaxspec"); mcnt = *p++; printf ("/%d", mcnt); break; case notsyntaxspec: printf ("/notsyntaxspec"); mcnt = *p++; printf ("/%d", mcnt); break; #endif /* emacs */ case wordchar: printf ("/wordchar"); break; case notwordchar: printf ("/notwordchar"); break; case begbuf: printf ("/begbuf"); break; case endbuf: printf ("/endbuf"); break; default: printf ("?%d", *(p-1)); } } printf ("/\n"); } void print_compiled_pattern (bufp) struct re_pattern_buffer *bufp; { unsigned char *buffer = bufp->buffer; print_partial_compiled_pattern (buffer, buffer + bufp->used); printf ("%d bytes used/%d bytes allocated.\n", bufp->used, bufp->allocated); if (bufp->fastmap_accurate && bufp->fastmap) { printf ("fastmap: "); print_fastmap (bufp->fastmap); } printf ("re_nsub: %d\t", bufp->re_nsub); printf ("regs_alloc: %d\t", bufp->regs_allocated); printf ("can_be_null: %d\t", bufp->can_be_null); printf ("newline_anchor: %d\n", bufp->newline_anchor); printf ("no_sub: %d\t", bufp->no_sub); printf ("not_bol: %d\t", bufp->not_bol); printf ("not_eol: %d\t", bufp->not_eol); printf ("syntax: %d\n", bufp->syntax); /* Perhaps we should print the translate table? */ } void print_double_string (where, string1, size1, string2, size2) const char *where; const char *string1; const char *string2; int size1; int size2; { unsigned this_char; if (where == NULL) printf ("(null)"); else { if (FIRST_STRING_P (where)) { for (this_char = where - string1; this_char < size1; this_char++) printchar (string1[this_char]); where = string2; } for (this_char = where - string2; this_char < size2; this_char++) printchar (string2[this_char]); } } #else /* not DEBUG */ #undef assert #define assert(e) #define DEBUG_STATEMENT(e) #define DEBUG_PRINT1(x) #define DEBUG_PRINT2(x1, x2) #define DEBUG_PRINT3(x1, x2, x3) #define DEBUG_PRINT4(x1, x2, x3, x4) #define DEBUG_PRINT_COMPILED_PATTERN(p, s, e) #define DEBUG_PRINT_DOUBLE_STRING(w, s1, sz1, s2, sz2) #endif /* not DEBUG */ /* Set by `re_set_syntax' to the current regexp syntax to recognize. Can also be assigned to arbitrarily: each pattern buffer stores its own syntax, so it can be changed between regex compilations. */ reg_syntax_t re_syntax_options = RE_SYNTAX_EMACS; /* Specify the precise syntax of regexps for compilation. This provides for compatibility for various utilities which historically have different, incompatible syntaxes. The argument SYNTAX is a bit mask comprised of the various bits defined in regex.h. We return the old syntax. */ reg_syntax_t re_set_syntax (syntax) reg_syntax_t syntax; { reg_syntax_t ret = re_syntax_options; re_syntax_options = syntax; return ret; } /* This table gives an error message for each of the error codes listed in regex.h. Obviously the order here has to be same as there. */ static const char *re_error_msg[] = { NULL, /* REG_NOERROR */ "No match", /* REG_NOMATCH */ "Invalid regular expression", /* REG_BADPAT */ "Invalid collation character", /* REG_ECOLLATE */ "Invalid character class name", /* REG_ECTYPE */ "Trailing backslash", /* REG_EESCAPE */ "Invalid back reference", /* REG_ESUBREG */ "Unmatched [ or [^", /* REG_EBRACK */ "Unmatched ( or \\(", /* REG_EPAREN */ "Unmatched \\{", /* REG_EBRACE */ "Invalid content of \\{\\}", /* REG_BADBR */ "Invalid range end", /* REG_ERANGE */ "Memory exhausted", /* REG_ESPACE */ "Invalid preceding regular expression", /* REG_BADRPT */ "Premature end of regular expression", /* REG_EEND */ "Regular expression too big", /* REG_ESIZE */ "Unmatched ) or \\)", /* REG_ERPAREN */ }; /* Subroutine declarations and macros for regex_compile. */ static void store_op1 (), store_op2 (); static void insert_op1 (), insert_op2 (); static boolean at_begline_loc_p (), at_endline_loc_p (); static boolean group_in_compile_stack (); static reg_errcode_t compile_range (); /* Fetch the next character in the uncompiled pattern---translating it if necessary. Also cast from a signed character in the constant string passed to us by the user to an unsigned char that we can use as an array index (in, e.g., `translate'). */ #define PATFETCH(c) \ do {if (p == pend) return REG_EEND; \ c = (unsigned char) *p++; \ if (translate) c = translate[c]; \ } while (0) /* Fetch the next character in the uncompiled pattern, with no translation. */ #define PATFETCH_RAW(c) \ do {if (p == pend) return REG_EEND; \ c = (unsigned char) *p++; \ } while (0) /* Go backwards one character in the pattern. */ #define PATUNFETCH p-- /* If `translate' is non-null, return translate[D], else just D. We cast the subscript to translate because some data is declared as `char *', to avoid warnings when a string constant is passed. But when we use a character as a subscript we must make it unsigned. */ #define TRANSLATE(d) (translate ? translate[(unsigned char) (d)] : (d)) /* Macros for outputting the compiled pattern into `buffer'. */ /* If the buffer isn't allocated when it comes in, use this. */ #define INIT_BUF_SIZE 32 /* Make sure we have at least N more bytes of space in buffer. */ #define GET_BUFFER_SPACE(n) \ while (b - bufp->buffer + (n) > bufp->allocated) \ EXTEND_BUFFER () /* Make sure we have one more byte of buffer space and then add C to it. */ #define BUF_PUSH(c) \ do { \ GET_BUFFER_SPACE (1); \ *b++ = (unsigned char) (c); \ } while (0) /* Ensure we have two more bytes of buffer space and then append C1 and C2. */ #define BUF_PUSH_2(c1, c2) \ do { \ GET_BUFFER_SPACE (2); \ *b++ = (unsigned char) (c1); \ *b++ = (unsigned char) (c2); \ } while (0) /* As with BUF_PUSH_2, except for three bytes. */ #define BUF_PUSH_3(c1, c2, c3) \ do { \ GET_BUFFER_SPACE (3); \ *b++ = (unsigned char) (c1); \ *b++ = (unsigned char) (c2); \ *b++ = (unsigned char) (c3); \ } while (0) /* Store a jump with opcode OP at LOC to location TO. We store a relative address offset by the three bytes the jump itself occupies. */ #define STORE_JUMP(op, loc, to) \ store_op1 (op, loc, (to) - (loc) - 3) /* Likewise, for a two-argument jump. */ #define STORE_JUMP2(op, loc, to, arg) \ store_op2 (op, loc, (to) - (loc) - 3, arg) /* Like `STORE_JUMP', but for inserting. Assume `b' is the buffer end. */ #define INSERT_JUMP(op, loc, to) \ insert_op1 (op, loc, (to) - (loc) - 3, b) /* Like `STORE_JUMP2', but for inserting. Assume `b' is the buffer end. */ #define INSERT_JUMP2(op, loc, to, arg) \ insert_op2 (op, loc, (to) - (loc) - 3, arg, b) /* This is not an arbitrary limit: the arguments which represent offsets into the pattern are two bytes long. So if 2^16 bytes turns out to be too small, many things would have to change. */ #define MAX_BUF_SIZE (1L << 16) /* Extend the buffer by twice its current size via realloc and reset the pointers that pointed into the old block to point to the correct places in the new one. If extending the buffer results in it being larger than MAX_BUF_SIZE, then flag memory exhausted. */ #define EXTEND_BUFFER() \ do { \ unsigned char *old_buffer = bufp->buffer; \ if (bufp->allocated == MAX_BUF_SIZE) \ return REG_ESIZE; \ bufp->allocated <<= 1; \ if (bufp->allocated > MAX_BUF_SIZE) \ bufp->allocated = MAX_BUF_SIZE; \ bufp->buffer = (unsigned char *) realloc (bufp->buffer, bufp->allocated);\ if (bufp->buffer == NULL) \ return REG_ESPACE; \ /* If the buffer moved, move all the pointers into it. */ \ if (old_buffer != bufp->buffer) \ { \ b = (b - old_buffer) + bufp->buffer; \ begalt = (begalt - old_buffer) + bufp->buffer; \ if (fixup_alt_jump) \ fixup_alt_jump = (fixup_alt_jump - old_buffer) + bufp->buffer;\ if (laststart) \ laststart = (laststart - old_buffer) + bufp->buffer; \ if (pending_exact) \ pending_exact = (pending_exact - old_buffer) + bufp->buffer; \ } \ } while (0) /* Since we have one byte reserved for the register number argument to {start,stop}_memory, the maximum number of groups we can report things about is what fits in that byte. */ #define MAX_REGNUM 255 /* But patterns can have more than `MAX_REGNUM' registers. We just ignore the excess. */ typedef unsigned regnum_t; /* Macros for the compile stack. */ /* Since offsets can go either forwards or backwards, this type needs to be able to hold values from -(MAX_BUF_SIZE - 1) to MAX_BUF_SIZE - 1. */ typedef int pattern_offset_t; typedef struct { pattern_offset_t begalt_offset; pattern_offset_t fixup_alt_jump; pattern_offset_t inner_group_offset; pattern_offset_t laststart_offset; regnum_t regnum; } compile_stack_elt_t; typedef struct { compile_stack_elt_t *stack; unsigned size; unsigned avail; /* Offset of next open position. */ } compile_stack_type; #define INIT_COMPILE_STACK_SIZE 32 #define COMPILE_STACK_EMPTY (compile_stack.avail == 0) #define COMPILE_STACK_FULL (compile_stack.avail == compile_stack.size) /* The next available element. */ #define COMPILE_STACK_TOP (compile_stack.stack[compile_stack.avail]) /* Set the bit for character C in a list. */ #define SET_LIST_BIT(c) \ (b[((unsigned char) (c)) / BYTEWIDTH] \ |= 1 << (((unsigned char) c) % BYTEWIDTH)) /* Get the next unsigned number in the uncompiled pattern. */ #define GET_UNSIGNED_NUMBER(num) \ { if (p != pend) \ { \ PATFETCH (c); \ while (ISDIGIT (c)) \ { \ if (num < 0) \ num = 0; \ num = num * 10 + c - '0'; \ if (p == pend) \ break; \ PATFETCH (c); \ } \ } \ } #define CHAR_CLASS_MAX_LENGTH 6 /* Namely, `xdigit'. */ #define IS_CHAR_CLASS(string) \ (STREQ (string, "alpha") || STREQ (string, "upper") \ || STREQ (string, "lower") || STREQ (string, "digit") \ || STREQ (string, "alnum") || STREQ (string, "xdigit") \ || STREQ (string, "space") || STREQ (string, "print") \ || STREQ (string, "punct") || STREQ (string, "graph") \ || STREQ (string, "cntrl") || STREQ (string, "blank")) /* `regex_compile' compiles PATTERN (of length SIZE) according to SYNTAX. Returns one of error codes defined in `regex.h', or zero for success. Assumes the `allocated' (and perhaps `buffer') and `translate' fields are set in BUFP on entry. If it succeeds, results are put in BUFP (if it returns an error, the contents of BUFP are undefined): `buffer' is the compiled pattern; `syntax' is set to SYNTAX; `used' is set to the length of the compiled pattern; `fastmap_accurate' is zero; `re_nsub' is the number of subexpressions in PATTERN; `not_bol' and `not_eol' are zero; The `fastmap' and `newline_anchor' fields are neither examined nor set. */ static reg_errcode_t regex_compile (pattern, size, syntax, bufp) const char *pattern; int size; reg_syntax_t syntax; struct re_pattern_buffer *bufp; { /* We fetch characters from PATTERN here. Even though PATTERN is `char *' (i.e., signed), we declare these variables as unsigned, so they can be reliably used as array indices. */ register unsigned char c, c1; /* A random tempory spot in PATTERN. */ const char *p1; /* Points to the end of the buffer, where we should append. */ register unsigned char *b; /* Keeps track of unclosed groups. */ compile_stack_type compile_stack; /* Points to the current (ending) position in the pattern. */ const char *p = pattern; const char *pend = pattern + size; /* How to translate the characters in the pattern. */ char *translate = bufp->translate; /* Address of the count-byte of the most recently inserted `exactn' command. This makes it possible to tell if a new exact-match character can be added to that command or if the character requires a new `exactn' command. */ unsigned char *pending_exact = 0; /* Address of start of the most recently finished expression. This tells, e.g., postfix * where to find the start of its operand. Reset at the beginning of groups and alternatives. */ unsigned char *laststart = 0; /* Address of beginning of regexp, or inside of last group. */ unsigned char *begalt; /* Place in the uncompiled pattern (i.e., the {) to which to go back if the interval is invalid. */ const char *beg_interval; /* Address of the place where a forward jump should go to the end of the containing expression. Each alternative of an `or' -- except the last -- ends with a forward jump of this sort. */ unsigned char *fixup_alt_jump = 0; /* Counts open-groups as they are encountered. Remembered for the matching close-group on the compile stack, so the same register number is put in the stop_memory as the start_memory. */ regnum_t regnum = 0; #ifdef DEBUG DEBUG_PRINT1 ("\nCompiling pattern: "); if (debug) { unsigned debug_count; for (debug_count = 0; debug_count < size; debug_count++) printchar (pattern[debug_count]); putchar ('\n'); } #endif /* DEBUG */ /* Initialize the compile stack. */ compile_stack.stack = TALLOC (INIT_COMPILE_STACK_SIZE, compile_stack_elt_t); if (compile_stack.stack == NULL) return REG_ESPACE; compile_stack.size = INIT_COMPILE_STACK_SIZE; compile_stack.avail = 0; /* Initialize the pattern buffer. */ bufp->syntax = syntax; bufp->fastmap_accurate = 0; bufp->not_bol = bufp->not_eol = 0; /* Set `used' to zero, so that if we return an error, the pattern printer (for debugging) will think there's no pattern. We reset it at the end. */ bufp->used = 0; /* Always count groups, whether or not bufp->no_sub is set. */ bufp->re_nsub = 0; #if !defined (emacs) && !defined (SYNTAX_TABLE) /* Initialize the syntax table. */ init_syntax_once (); #endif if (bufp->allocated == 0) { if (bufp->buffer) { /* If zero allocated, but buffer is non-null, try to realloc enough space. This loses if buffer's address is bogus, but that is the user's responsibility. */ RETALLOC (bufp->buffer, INIT_BUF_SIZE, unsigned char); } else { /* Caller did not allocate a buffer. Do it for them. */ bufp->buffer = TALLOC (INIT_BUF_SIZE, unsigned char); } if (!bufp->buffer) return REG_ESPACE; bufp->allocated = INIT_BUF_SIZE; } begalt = b = bufp->buffer; /* Loop through the uncompiled pattern until we're at the end. */ while (p != pend) { PATFETCH (c); switch (c) { case '^': { if ( /* If at start of pattern, it's an operator. */ p == pattern + 1 /* If context independent, it's an operator. */ || syntax & RE_CONTEXT_INDEP_ANCHORS /* Otherwise, depends on what's come before. */ || at_begline_loc_p (pattern, p, syntax)) BUF_PUSH (begline); else goto normal_char; } break; case '$': { if ( /* If at end of pattern, it's an operator. */ p == pend /* If context independent, it's an operator. */ || syntax & RE_CONTEXT_INDEP_ANCHORS /* Otherwise, depends on what's next. */ || at_endline_loc_p (p, pend, syntax)) BUF_PUSH (endline); else goto normal_char; } break; case '+': case '?': if ((syntax & RE_BK_PLUS_QM) || (syntax & RE_LIMITED_OPS)) goto normal_char; handle_plus: case '*': /* If there is no previous pattern... */ if (!laststart) { if (syntax & RE_CONTEXT_INVALID_OPS) return REG_BADRPT; else if (!(syntax & RE_CONTEXT_INDEP_OPS)) goto normal_char; } { /* Are we optimizing this jump? */ boolean keep_string_p = false; /* 1 means zero (many) matches is allowed. */ char zero_times_ok = 0, many_times_ok = 0; /* If there is a sequence of repetition chars, collapse it down to just one (the right one). We can't combine interval operators with these because of, e.g., `a{2}*', which should only match an even number of `a's. */ for (;;) { zero_times_ok |= c != '+'; many_times_ok |= c != '?'; if (p == pend) break; PATFETCH (c); if (c == '*' || (!(syntax & RE_BK_PLUS_QM) && (c == '+' || c == '?'))) ; else if (syntax & RE_BK_PLUS_QM && c == '\\') { if (p == pend) return REG_EESCAPE; PATFETCH (c1); if (!(c1 == '+' || c1 == '?')) { PATUNFETCH; PATUNFETCH; break; } c = c1; } else { PATUNFETCH; break; } /* If we get here, we found another repeat character. */ } /* Star, etc. applied to an empty pattern is equivalent to an empty pattern. */ if (!laststart) break; /* Now we know whether or not zero matches is allowed and also whether or not two or more matches is allowed. */ if (many_times_ok) { /* More than one repetition is allowed, so put in at the end a backward relative jump from `b' to before the next jump we're going to put in below (which jumps from laststart to after this jump). But if we are at the `*' in the exact sequence `.*\n', insert an unconditional jump backwards to the ., instead of the beginning of the loop. This way we only push a failure point once, instead of every time through the loop. */ assert (p - 1 > pattern); /* Allocate the space for the jump. */ GET_BUFFER_SPACE (3); /* We know we are not at the first character of the pattern, because laststart was nonzero. And we've already incremented `p', by the way, to be the character after the `*'. Do we have to do something analogous here for null bytes, because of RE_DOT_NOT_NULL? */ if (TRANSLATE (*(p - 2)) == TRANSLATE ('.') && zero_times_ok && p < pend && TRANSLATE (*p) == TRANSLATE ('\n') && !(syntax & RE_DOT_NEWLINE)) { /* We have .*\n. */ STORE_JUMP (jump, b, laststart); keep_string_p = true; } else /* Anything else. */ STORE_JUMP (maybe_pop_jump, b, laststart - 3); /* We've added more stuff to the buffer. */ b += 3; } /* On failure, jump from laststart to b + 3, which will be the end of the buffer after this jump is inserted. */ GET_BUFFER_SPACE (3); INSERT_JUMP (keep_string_p ? on_failure_keep_string_jump : on_failure_jump, laststart, b + 3); pending_exact = 0; b += 3; if (!zero_times_ok) { /* At least one repetition is required, so insert a `dummy_failure_jump' before the initial `on_failure_jump' instruction of the loop. This effects a skip over that instruction the first time we hit that loop. */ GET_BUFFER_SPACE (3); INSERT_JUMP (dummy_failure_jump, laststart, laststart + 6); b += 3; } } break; case '.': laststart = b; BUF_PUSH (anychar); break; case '[': { boolean had_char_class = false; if (p == pend) return REG_EBRACK; /* Ensure that we have enough space to push a charset: the opcode, the length count, and the bitset; 34 bytes in all. */ GET_BUFFER_SPACE (34); laststart = b; /* We test `*p == '^' twice, instead of using an if statement, so we only need one BUF_PUSH. */ BUF_PUSH (*p == '^' ? charset_not : charset); if (*p == '^') p++; /* Remember the first position in the bracket expression. */ p1 = p; /* Push the number of bytes in the bitmap. */ BUF_PUSH ((1 << BYTEWIDTH) / BYTEWIDTH); /* Clear the whole map. */ bzero (b, (1 << BYTEWIDTH) / BYTEWIDTH); /* charset_not matches newline according to a syntax bit. */ if ((re_opcode_t) b[-2] == charset_not && (syntax & RE_HAT_LISTS_NOT_NEWLINE)) SET_LIST_BIT ('\n'); /* Read in characters and ranges, setting map bits. */ for (;;) { if (p == pend) return REG_EBRACK; PATFETCH (c); /* \ might escape characters inside [...] and [^...]. */ if ((syntax & RE_BACKSLASH_ESCAPE_IN_LISTS) && c == '\\') { if (p == pend) return REG_EESCAPE; PATFETCH (c1); SET_LIST_BIT (c1); continue; } /* Could be the end of the bracket expression. If it's not (i.e., when the bracket expression is `[]' so far), the ']' character bit gets set way below. */ if (c == ']' && p != p1 + 1) break; /* Look ahead to see if it's a range when the last thing was a character class. */ if (had_char_class && c == '-' && *p != ']') return REG_ERANGE; /* Look ahead to see if it's a range when the last thing was a character: if this is a hyphen not at the beginning or the end of a list, then it's the range operator. */ if (c == '-' && !(p - 2 >= pattern && p[-2] == '[') && !(p - 3 >= pattern && p[-3] == '[' && p[-2] == '^') && *p != ']') { reg_errcode_t ret = compile_range (&p, pend, translate, syntax, b); if (ret != REG_NOERROR) return ret; } else if (p[0] == '-' && p[1] != ']') { /* This handles ranges made up of characters only. */ reg_errcode_t ret; /* Move past the `-'. */ PATFETCH (c1); ret = compile_range (&p, pend, translate, syntax, b); if (ret != REG_NOERROR) return ret; } /* See if we're at the beginning of a possible character class. */ else if (syntax & RE_CHAR_CLASSES && c == '[' && *p == ':') { /* Leave room for the null. */ char str[CHAR_CLASS_MAX_LENGTH + 1]; PATFETCH (c); c1 = 0; /* If pattern is `[[:'. */ if (p == pend) return REG_EBRACK; for (;;) { PATFETCH (c); if (c == ':' || c == ']' || p == pend || c1 == CHAR_CLASS_MAX_LENGTH) break; str[c1++] = c; } str[c1] = '\0'; /* If isn't a word bracketed by `[:' and:`]': undo the ending character, the letters, and leave the leading `:' and `[' (but set bits for them). */ if (c == ':' && *p == ']') { int ch; boolean is_alnum = STREQ (str, "alnum"); boolean is_alpha = STREQ (str, "alpha"); boolean is_blank = STREQ (str, "blank"); boolean is_cntrl = STREQ (str, "cntrl"); boolean is_digit = STREQ (str, "digit"); boolean is_graph = STREQ (str, "graph"); boolean is_lower = STREQ (str, "lower"); boolean is_print = STREQ (str, "print"); boolean is_punct = STREQ (str, "punct"); boolean is_space = STREQ (str, "space"); boolean is_upper = STREQ (str, "upper"); boolean is_xdigit = STREQ (str, "xdigit"); if (!IS_CHAR_CLASS (str)) return REG_ECTYPE; /* Throw away the ] at the end of the character class. */ PATFETCH (c); if (p == pend) return REG_EBRACK; for (ch = 0; ch < 1 << BYTEWIDTH; ch++) { if ( (is_alnum && ISALNUM (ch)) || (is_alpha && ISALPHA (ch)) || (is_blank && ISBLANK (ch)) || (is_cntrl && ISCNTRL (ch)) || (is_digit && ISDIGIT (ch)) || (is_graph && ISGRAPH (ch)) || (is_lower && ISLOWER (ch)) || (is_print && ISPRINT (ch)) || (is_punct && ISPUNCT (ch)) || (is_space && ISSPACE (ch)) || (is_upper && ISUPPER (ch)) || (is_xdigit && ISXDIGIT (ch))) SET_LIST_BIT (ch); } had_char_class = true; } else { c1++; while (c1--) PATUNFETCH; SET_LIST_BIT ('['); SET_LIST_BIT (':'); had_char_class = false; } } else { had_char_class = false; SET_LIST_BIT (c); } } /* Discard any (non)matching list bytes that are all 0 at the end of the map. Decrease the map-length byte too. */ while ((int) b[-1] > 0 && b[b[-1] - 1] == 0) b[-1]--; b += b[-1]; } break; case '(': if (syntax & RE_NO_BK_PARENS) goto handle_open; else goto normal_char; case ')': if (syntax & RE_NO_BK_PARENS) goto handle_close; else goto normal_char; case '\n': if (syntax & RE_NEWLINE_ALT) goto handle_alt; else goto normal_char; case '|': if (syntax & RE_NO_BK_VBAR) goto handle_alt; else goto normal_char; case '{': if (syntax & RE_INTERVALS && syntax & RE_NO_BK_BRACES) goto handle_interval; else goto normal_char; case '\\': if (p == pend) return REG_EESCAPE; /* Do not translate the character after the \, so that we can distinguish, e.g., \B from \b, even if we normally would translate, e.g., B to b. */ PATFETCH_RAW (c); switch (c) { case '(': if (syntax & RE_NO_BK_PARENS) goto normal_backslash; handle_open: bufp->re_nsub++; regnum++; if (COMPILE_STACK_FULL) { RETALLOC (compile_stack.stack, compile_stack.size << 1, compile_stack_elt_t); if (compile_stack.stack == NULL) return REG_ESPACE; compile_stack.size <<= 1; } /* These are the values to restore when we hit end of this group. They are all relative offsets, so that if the whole pattern moves because of realloc, they will still be valid. */ COMPILE_STACK_TOP.begalt_offset = begalt - bufp->buffer; COMPILE_STACK_TOP.fixup_alt_jump = fixup_alt_jump ? fixup_alt_jump - bufp->buffer + 1 : 0; COMPILE_STACK_TOP.laststart_offset = b - bufp->buffer; COMPILE_STACK_TOP.regnum = regnum; /* We will eventually replace the 0 with the number of groups inner to this one. But do not push a start_memory for groups beyond the last one we can represent in the compiled pattern. */ if (regnum <= MAX_REGNUM) { COMPILE_STACK_TOP.inner_group_offset = b - bufp->buffer + 2; BUF_PUSH_3 (start_memory, regnum, 0); } compile_stack.avail++; fixup_alt_jump = 0; laststart = 0; begalt = b; /* If we've reached MAX_REGNUM groups, then this open won't actually generate any code, so we'll have to clear pending_exact explicitly. */ pending_exact = 0; break; case ')': if (syntax & RE_NO_BK_PARENS) goto normal_backslash; if (COMPILE_STACK_EMPTY) if (syntax & RE_UNMATCHED_RIGHT_PAREN_ORD) goto normal_backslash; else return REG_ERPAREN; handle_close: if (fixup_alt_jump) { /* Push a dummy failure point at the end of the alternative for a possible future `pop_failure_jump' to pop. See comments at `push_dummy_failure' in `re_match_2'. */ BUF_PUSH (push_dummy_failure); /* We allocated space for this jump when we assigned to `fixup_alt_jump', in the `handle_alt' case below. */ STORE_JUMP (jump_past_alt, fixup_alt_jump, b - 1); } /* See similar code for backslashed left paren above. */ if (COMPILE_STACK_EMPTY) if (syntax & RE_UNMATCHED_RIGHT_PAREN_ORD) goto normal_char; else return REG_ERPAREN; /* Since we just checked for an empty stack above, this ``can't happen''. */ assert (compile_stack.avail != 0); { /* We don't just want to restore into `regnum', because later groups should continue to be numbered higher, as in `(ab)c(de)' -- the second group is #2. */ regnum_t this_group_regnum; compile_stack.avail--; begalt = bufp->buffer + COMPILE_STACK_TOP.begalt_offset; fixup_alt_jump = COMPILE_STACK_TOP.fixup_alt_jump ? bufp->buffer + COMPILE_STACK_TOP.fixup_alt_jump - 1 : 0; laststart = bufp->buffer + COMPILE_STACK_TOP.laststart_offset; this_group_regnum = COMPILE_STACK_TOP.regnum; /* If we've reached MAX_REGNUM groups, then this open won't actually generate any code, so we'll have to clear pending_exact explicitly. */ pending_exact = 0; /* We're at the end of the group, so now we know how many groups were inside this one. */ if (this_group_regnum <= MAX_REGNUM) { unsigned char *inner_group_loc = bufp->buffer + COMPILE_STACK_TOP.inner_group_offset; *inner_group_loc = regnum - this_group_regnum; BUF_PUSH_3 (stop_memory, this_group_regnum, regnum - this_group_regnum); } } break; case '|': /* `\|'. */ if (syntax & RE_LIMITED_OPS || syntax & RE_NO_BK_VBAR) goto normal_backslash; handle_alt: if (syntax & RE_LIMITED_OPS) goto normal_char; /* Insert before the previous alternative a jump which jumps to this alternative if the former fails. */ GET_BUFFER_SPACE (3); INSERT_JUMP (on_failure_jump, begalt, b + 6); pending_exact = 0; b += 3; /* The alternative before this one has a jump after it which gets executed if it gets matched. Adjust that jump so it will jump to this alternative's analogous jump (put in below, which in turn will jump to the next (if any) alternative's such jump, etc.). The last such jump jumps to the correct final destination. A picture: _____ _____ | | | | | v | v a | b | c If we are at `b', then fixup_alt_jump right now points to a three-byte space after `a'. We'll put in the jump, set fixup_alt_jump to right after `b', and leave behind three bytes which we'll fill in when we get to after `c'. */ if (fixup_alt_jump) STORE_JUMP (jump_past_alt, fixup_alt_jump, b); /* Mark and leave space for a jump after this alternative, to be filled in later either by next alternative or when know we're at the end of a series of alternatives. */ fixup_alt_jump = b; GET_BUFFER_SPACE (3); b += 3; laststart = 0; begalt = b; break; case '{': /* If \{ is a literal. */ if (!(syntax & RE_INTERVALS) /* If we're at `\{' and it's not the open-interval operator. */ || ((syntax & RE_INTERVALS) && (syntax & RE_NO_BK_BRACES)) || (p - 2 == pattern && p == pend)) goto normal_backslash; handle_interval: { /* If got here, then the syntax allows intervals. */ /* At least (most) this many matches must be made. */ int lower_bound = -1, upper_bound = -1; beg_interval = p - 1; if (p == pend) { if (syntax & RE_NO_BK_BRACES) goto unfetch_interval; else return REG_EBRACE; } GET_UNSIGNED_NUMBER (lower_bound); if (c == ',') { GET_UNSIGNED_NUMBER (upper_bound); if (upper_bound < 0) upper_bound = RE_DUP_MAX; } else /* Interval such as `{1}' => match exactly once. */ upper_bound = lower_bound; if (lower_bound < 0 || upper_bound > RE_DUP_MAX || lower_bound > upper_bound) { if (syntax & RE_NO_BK_BRACES) goto unfetch_interval; else return REG_BADBR; } if (!(syntax & RE_NO_BK_BRACES)) { if (c != '\\') return REG_EBRACE; PATFETCH (c); } if (c != '}') { if (syntax & RE_NO_BK_BRACES) goto unfetch_interval; else return REG_BADBR; } /* We just parsed a valid interval. */ /* If it's invalid to have no preceding re. */ if (!laststart) { if (syntax & RE_CONTEXT_INVALID_OPS) return REG_BADRPT; else if (syntax & RE_CONTEXT_INDEP_OPS) laststart = b; else goto unfetch_interval; } /* If the upper bound is zero, don't want to succeed at all; jump from `laststart' to `b + 3', which will be the end of the buffer after we insert the jump. */ if (upper_bound == 0) { GET_BUFFER_SPACE (3); INSERT_JUMP (jump, laststart, b + 3); b += 3; } /* Otherwise, we have a nontrivial interval. When we're all done, the pattern will look like: set_number_at set_number_at succeed_n jump_n (The upper bound and `jump_n' are omitted if `upper_bound' is 1, though.) */ else { /* If the upper bound is > 1, we need to insert more at the end of the loop. */ unsigned nbytes = 10 + (upper_bound > 1) * 10; GET_BUFFER_SPACE (nbytes); /* Initialize lower bound of the `succeed_n', even though it will be set during matching by its attendant `set_number_at' (inserted next), because `re_compile_fastmap' needs to know. Jump to the `jump_n' we might insert below. */ INSERT_JUMP2 (succeed_n, laststart, b + 5 + (upper_bound > 1) * 5, lower_bound); b += 5; /* Code to initialize the lower bound. Insert before the `succeed_n'. The `5' is the last two bytes of this `set_number_at', plus 3 bytes of the following `succeed_n'. */ insert_op2 (set_number_at, laststart, 5, lower_bound, b); b += 5; if (upper_bound > 1) { /* More than one repetition is allowed, so append a backward jump to the `succeed_n' that starts this interval. When we've reached this during matching, we'll have matched the interval once, so jump back only `upper_bound - 1' times. */ STORE_JUMP2 (jump_n, b, laststart + 5, upper_bound - 1); b += 5; /* The location we want to set is the second parameter of the `jump_n'; that is `b-2' as an absolute address. `laststart' will be the `set_number_at' we're about to insert; `laststart+3' the number to set, the source for the relative address. But we are inserting into the middle of the pattern -- so everything is getting moved up by 5. Conclusion: (b - 2) - (laststart + 3) + 5, i.e., b - laststart. We insert this at the beginning of the loop so that if we fail during matching, we'll reinitialize the bounds. */ insert_op2 (set_number_at, laststart, b - laststart, upper_bound - 1, b); b += 5; } } pending_exact = 0; beg_interval = NULL; } break; unfetch_interval: /* If an invalid interval, match the characters as literals. */ assert (beg_interval); p = beg_interval; beg_interval = NULL; /* normal_char and normal_backslash need `c'. */ PATFETCH (c); if (!(syntax & RE_NO_BK_BRACES)) { if (p > pattern && p[-1] == '\\') goto normal_backslash; } goto normal_char; #ifdef emacs /* There is no way to specify the before_dot and after_dot operators. rms says this is ok. --karl */ case '=': BUF_PUSH (at_dot); break; case 's': laststart = b; PATFETCH (c); BUF_PUSH_2 (syntaxspec, syntax_spec_code[c]); break; case 'S': laststart = b; PATFETCH (c); BUF_PUSH_2 (notsyntaxspec, syntax_spec_code[c]); break; #endif /* emacs */ case 'w': laststart = b; BUF_PUSH (wordchar); break; case 'W': laststart = b; BUF_PUSH (notwordchar); break; case '<': BUF_PUSH (wordbeg); break; case '>': BUF_PUSH (wordend); break; case 'b': BUF_PUSH (wordbound); break; case 'B': BUF_PUSH (notwordbound); break; case '`': BUF_PUSH (begbuf); break; case '\'': BUF_PUSH (endbuf); break; case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': if (syntax & RE_NO_BK_REFS) goto normal_char; c1 = c - '0'; if (c1 > regnum) return REG_ESUBREG; /* Can't back reference to a subexpression if inside of it. */ if (group_in_compile_stack (compile_stack, c1)) goto normal_char; laststart = b; BUF_PUSH_2 (duplicate, c1); break; case '+': case '?': if (syntax & RE_BK_PLUS_QM) goto handle_plus; else goto normal_backslash; default: normal_backslash: /* You might think it would be useful for \ to mean not to translate; but if we don't translate it it will never match anything. */ c = TRANSLATE (c); goto normal_char; } break; default: /* Expects the character in `c'. */ normal_char: /* If no exactn currently being built. */ if (!pending_exact /* If last exactn not at current position. */ || pending_exact + *pending_exact + 1 != b /* We have only one byte following the exactn for the count. */ || *pending_exact == (1 << BYTEWIDTH) - 1 /* If followed by a repetition operator. */ || *p == '*' || *p == '^' || ((syntax & RE_BK_PLUS_QM) ? *p == '\\' && (p[1] == '+' || p[1] == '?') : (*p == '+' || *p == '?')) || ((syntax & RE_INTERVALS) && ((syntax & RE_NO_BK_BRACES) ? *p == '{' : (p[0] == '\\' && p[1] == '{')))) { /* Start building a new exactn. */ laststart = b; BUF_PUSH_2 (exactn, 0); pending_exact = b - 1; } BUF_PUSH (c); (*pending_exact)++; break; } /* switch (c) */ } /* while p != pend */ /* Through the pattern now. */ if (fixup_alt_jump) STORE_JUMP (jump_past_alt, fixup_alt_jump, b); if (!COMPILE_STACK_EMPTY) return REG_EPAREN; free (compile_stack.stack); /* We have succeeded; set the length of the buffer. */ bufp->used = b - bufp->buffer; #ifdef DEBUG if (debug) { DEBUG_PRINT1 ("\nCompiled pattern: "); print_compiled_pattern (bufp); } #endif /* DEBUG */ return REG_NOERROR; } /* regex_compile */ /* Subroutines for `regex_compile'. */ /* Store OP at LOC followed by two-byte integer parameter ARG. */ static void store_op1 (op, loc, arg) re_opcode_t op; unsigned char *loc; int arg; { *loc = (unsigned char) op; STORE_NUMBER (loc + 1, arg); } /* Like `store_op1', but for two two-byte parameters ARG1 and ARG2. */ static void store_op2 (op, loc, arg1, arg2) re_opcode_t op; unsigned char *loc; int arg1, arg2; { *loc = (unsigned char) op; STORE_NUMBER (loc + 1, arg1); STORE_NUMBER (loc + 3, arg2); } /* Copy the bytes from LOC to END to open up three bytes of space at LOC for OP followed by two-byte integer parameter ARG. */ static void insert_op1 (op, loc, arg, end) re_opcode_t op; unsigned char *loc; int arg; unsigned char *end; { register unsigned char *pfrom = end; register unsigned char *pto = end + 3; while (pfrom != loc) *--pto = *--pfrom; store_op1 (op, loc, arg); } /* Like `insert_op1', but for two two-byte parameters ARG1 and ARG2. */ static void insert_op2 (op, loc, arg1, arg2, end) re_opcode_t op; unsigned char *loc; int arg1, arg2; unsigned char *end; { register unsigned char *pfrom = end; register unsigned char *pto = end + 5; while (pfrom != loc) *--pto = *--pfrom; store_op2 (op, loc, arg1, arg2); } /* P points to just after a ^ in PATTERN. Return true if that ^ comes after an alternative or a begin-subexpression. We assume there is at least one character before the ^. */ static boolean at_begline_loc_p (pattern, p, syntax) const char *pattern, *p; reg_syntax_t syntax; { const char *prev = p - 2; boolean prev_prev_backslash = prev > pattern && prev[-1] == '\\'; return /* After a subexpression? */ (*prev == '(' && (syntax & RE_NO_BK_PARENS || prev_prev_backslash)) /* After an alternative? */ || (*prev == '|' && (syntax & RE_NO_BK_VBAR || prev_prev_backslash)); } /* The dual of at_begline_loc_p. This one is for $. We assume there is at least one character after the $, i.e., `P < PEND'. */ static boolean at_endline_loc_p (p, pend, syntax) const char *p, *pend; int syntax; { const char *next = p; boolean next_backslash = *next == '\\'; const char *next_next = p + 1 < pend ? p + 1 : NULL; return /* Before a subexpression? */ (syntax & RE_NO_BK_PARENS ? *next == ')' : next_backslash && next_next && *next_next == ')') /* Before an alternative? */ || (syntax & RE_NO_BK_VBAR ? *next == '|' : next_backslash && next_next && *next_next == '|'); } /* Returns true if REGNUM is in one of COMPILE_STACK's elements and false if it's not. */ static boolean group_in_compile_stack (compile_stack, regnum) compile_stack_type compile_stack; regnum_t regnum; { int this_element; for (this_element = compile_stack.avail - 1; this_element >= 0; this_element--) if (compile_stack.stack[this_element].regnum == regnum) return true; return false; } /* Read the ending character of a range (in a bracket expression) from the uncompiled pattern *P_PTR (which ends at PEND). We assume the starting character is in `P[-2]'. (`P[-1]' is the character `-'.) Then we set the translation of all bits between the starting and ending characters (inclusive) in the compiled pattern B. Return an error code. We use these short variable names so we can use the same macros as `regex_compile' itself. */ static reg_errcode_t compile_range (p_ptr, pend, translate, syntax, b) const char **p_ptr, *pend; char *translate; reg_syntax_t syntax; unsigned char *b; { unsigned this_char; const char *p = *p_ptr; int range_start, range_end; if (p == pend) return REG_ERANGE; /* Even though the pattern is a signed `char *', we need to fetch with unsigned char *'s; if the high bit of the pattern character is set, the range endpoints will be negative if we fetch using a signed char *. We also want to fetch the endpoints without translating them; the appropriate translation is done in the bit-setting loop below. */ range_start = ((unsigned char *) p)[-2]; range_end = ((unsigned char *) p)[0]; /* Have to increment the pointer into the pattern string, so the caller isn't still at the ending character. */ (*p_ptr)++; /* If the start is after the end, the range is empty. */ if (range_start > range_end) return syntax & RE_NO_EMPTY_RANGES ? REG_ERANGE : REG_NOERROR; /* Here we see why `this_char' has to be larger than an `unsigned char' -- the range is inclusive, so if `range_end' == 0xff (assuming 8-bit characters), we would otherwise go into an infinite loop, since all characters <= 0xff. */ for (this_char = range_start; this_char <= range_end; this_char++) { SET_LIST_BIT (TRANSLATE (this_char)); } return REG_NOERROR; } /* Failure stack declarations and macros; both re_compile_fastmap and re_match_2 use a failure stack. These have to be macros because of REGEX_ALLOCATE. */ /* Number of failure points for which to initially allocate space when matching. If this number is exceeded, we allocate more space, so it is not a hard limit. */ #ifndef INIT_FAILURE_ALLOC #define INIT_FAILURE_ALLOC 5 #endif /* Roughly the maximum number of failure points on the stack. Would be exactly that if always used MAX_FAILURE_SPACE each time we failed. This is a variable only so users of regex can assign to it; we never change it ourselves. */ int re_max_failures = 2000; typedef const unsigned char *fail_stack_elt_t; typedef struct { fail_stack_elt_t *stack; unsigned size; unsigned avail; /* Offset of next open position. */ } fail_stack_type; #define FAIL_STACK_EMPTY() (fail_stack.avail == 0) #define FAIL_STACK_PTR_EMPTY() (fail_stack_ptr->avail == 0) #define FAIL_STACK_FULL() (fail_stack.avail == fail_stack.size) #define FAIL_STACK_TOP() (fail_stack.stack[fail_stack.avail]) /* Initialize `fail_stack'. Do `return -2' if the alloc fails. */ #define INIT_FAIL_STACK() \ do { \ fail_stack.stack = (fail_stack_elt_t *) \ REGEX_ALLOCATE (INIT_FAILURE_ALLOC * sizeof (fail_stack_elt_t)); \ \ if (fail_stack.stack == NULL) \ return -2; \ \ fail_stack.size = INIT_FAILURE_ALLOC; \ fail_stack.avail = 0; \ } while (0) /* Double the size of FAIL_STACK, up to approximately `re_max_failures' items. Return 1 if succeeds, and 0 if either ran out of memory allocating space for it or it was already too large. REGEX_REALLOCATE requires `destination' be declared. */ #define DOUBLE_FAIL_STACK(fail_stack) \ ((fail_stack).size > re_max_failures * MAX_FAILURE_ITEMS \ ? 0 \ : ((fail_stack).stack = (fail_stack_elt_t *) \ REGEX_REALLOCATE ((fail_stack).stack, \ (fail_stack).size * sizeof (fail_stack_elt_t), \ ((fail_stack).size << 1) * sizeof (fail_stack_elt_t)), \ \ (fail_stack).stack == NULL \ ? 0 \ : ((fail_stack).size <<= 1, \ 1))) /* Push PATTERN_OP on FAIL_STACK. Return 1 if was able to do so and 0 if ran out of memory allocating space to do so. */ #define PUSH_PATTERN_OP(pattern_op, fail_stack) \ ((FAIL_STACK_FULL () \ && !DOUBLE_FAIL_STACK (fail_stack)) \ ? 0 \ : ((fail_stack).stack[(fail_stack).avail++] = pattern_op, \ 1)) /* This pushes an item onto the failure stack. Must be a four-byte value. Assumes the variable `fail_stack'. Probably should only be called from within `PUSH_FAILURE_POINT'. */ #define PUSH_FAILURE_ITEM(item) \ fail_stack.stack[fail_stack.avail++] = (fail_stack_elt_t) item /* The complement operation. Assumes `fail_stack' is nonempty. */ #define POP_FAILURE_ITEM() fail_stack.stack[--fail_stack.avail] /* Used to omit pushing failure point id's when we're not debugging. */ #ifdef DEBUG #define DEBUG_PUSH PUSH_FAILURE_ITEM #define DEBUG_POP(item_addr) *(item_addr) = POP_FAILURE_ITEM () #else #define DEBUG_PUSH(item) #define DEBUG_POP(item_addr) #endif /* Push the information about the state we will need if we ever fail back to it. Requires variables fail_stack, regstart, regend, reg_info, and num_regs be declared. DOUBLE_FAIL_STACK requires `destination' be declared. Does `return FAILURE_CODE' if runs out of memory. */ #define PUSH_FAILURE_POINT(pattern_place, string_place, failure_code) \ do { \ char *destination; \ /* Must be int, so when we don't save any registers, the arithmetic \ of 0 + -1 isn't done as unsigned. */ \ int this_reg; \ \ DEBUG_STATEMENT (failure_id++); \ DEBUG_STATEMENT (nfailure_points_pushed++); \ DEBUG_PRINT2 ("\nPUSH_FAILURE_POINT #%u:\n", failure_id); \ DEBUG_PRINT2 (" Before push, next avail: %d\n", (fail_stack).avail);\ DEBUG_PRINT2 (" size: %d\n", (fail_stack).size);\ \ DEBUG_PRINT2 (" slots needed: %d\n", NUM_FAILURE_ITEMS); \ DEBUG_PRINT2 (" available: %d\n", REMAINING_AVAIL_SLOTS); \ \ /* Ensure we have enough space allocated for what we will push. */ \ while (REMAINING_AVAIL_SLOTS < NUM_FAILURE_ITEMS) \ { \ if (!DOUBLE_FAIL_STACK (fail_stack)) \ return failure_code; \ \ DEBUG_PRINT2 ("\n Doubled stack; size now: %d\n", \ (fail_stack).size); \ DEBUG_PRINT2 (" slots available: %d\n", REMAINING_AVAIL_SLOTS);\ } \ \ /* Push the info, starting with the registers. */ \ DEBUG_PRINT1 ("\n"); \ \ for (this_reg = lowest_active_reg; this_reg <= highest_active_reg; \ this_reg++) \ { \ DEBUG_PRINT2 (" Pushing reg: %d\n", this_reg); \ DEBUG_STATEMENT (num_regs_pushed++); \ \ DEBUG_PRINT2 (" start: 0x%x\n", regstart[this_reg]); \ PUSH_FAILURE_ITEM (regstart[this_reg]); \ \ DEBUG_PRINT2 (" end: 0x%x\n", regend[this_reg]); \ PUSH_FAILURE_ITEM (regend[this_reg]); \ \ DEBUG_PRINT2 (" info: 0x%x\n ", reg_info[this_reg]); \ DEBUG_PRINT2 (" match_null=%d", \ REG_MATCH_NULL_STRING_P (reg_info[this_reg])); \ DEBUG_PRINT2 (" active=%d", IS_ACTIVE (reg_info[this_reg])); \ DEBUG_PRINT2 (" matched_something=%d", \ MATCHED_SOMETHING (reg_info[this_reg])); \ DEBUG_PRINT2 (" ever_matched=%d", \ EVER_MATCHED_SOMETHING (reg_info[this_reg])); \ DEBUG_PRINT1 ("\n"); \ PUSH_FAILURE_ITEM (reg_info[this_reg].word); \ } \ \ DEBUG_PRINT2 (" Pushing low active reg: %d\n", lowest_active_reg);\ PUSH_FAILURE_ITEM (lowest_active_reg); \ \ DEBUG_PRINT2 (" Pushing high active reg: %d\n", highest_active_reg);\ PUSH_FAILURE_ITEM (highest_active_reg); \ \ DEBUG_PRINT2 (" Pushing pattern 0x%x: ", pattern_place); \ DEBUG_PRINT_COMPILED_PATTERN (bufp, pattern_place, pend); \ PUSH_FAILURE_ITEM (pattern_place); \ \ DEBUG_PRINT2 (" Pushing string 0x%x: `", string_place); \ DEBUG_PRINT_DOUBLE_STRING (string_place, string1, size1, string2, \ size2); \ DEBUG_PRINT1 ("'\n"); \ PUSH_FAILURE_ITEM (string_place); \ \ DEBUG_PRINT2 (" Pushing failure id: %u\n", failure_id); \ DEBUG_PUSH (failure_id); \ } while (0) /* This is the number of items that are pushed and popped on the stack for each register. */ #define NUM_REG_ITEMS 3 /* Individual items aside from the registers. */ #ifdef DEBUG #define NUM_NONREG_ITEMS 5 /* Includes failure point id. */ #else #define NUM_NONREG_ITEMS 4 #endif /* We push at most this many items on the stack. */ #define MAX_FAILURE_ITEMS ((num_regs - 1) * NUM_REG_ITEMS + NUM_NONREG_ITEMS) /* We actually push this many items. */ #define NUM_FAILURE_ITEMS \ ((highest_active_reg - lowest_active_reg + 1) * NUM_REG_ITEMS \ + NUM_NONREG_ITEMS) /* How many items can still be added to the stack without overflowing it. */ #define REMAINING_AVAIL_SLOTS ((fail_stack).size - (fail_stack).avail) /* Pops what PUSH_FAIL_STACK pushes. We restore into the parameters, all of which should be lvalues: STR -- the saved data position. PAT -- the saved pattern position. LOW_REG, HIGH_REG -- the highest and lowest active registers. REGSTART, REGEND -- arrays of string positions. REG_INFO -- array of information about each subexpression. Also assumes the variables `fail_stack' and (if debugging), `bufp', `pend', `string1', `size1', `string2', and `size2'. */ #define POP_FAILURE_POINT(str, pat, low_reg, high_reg, regstart, regend, reg_info)\ { \ DEBUG_STATEMENT (fail_stack_elt_t failure_id;) \ int this_reg; \ const unsigned char *string_temp; \ \ assert (!FAIL_STACK_EMPTY ()); \ \ /* Remove failure points and point to how many regs pushed. */ \ DEBUG_PRINT1 ("POP_FAILURE_POINT:\n"); \ DEBUG_PRINT2 (" Before pop, next avail: %d\n", fail_stack.avail); \ DEBUG_PRINT2 (" size: %d\n", fail_stack.size); \ \ assert (fail_stack.avail >= NUM_NONREG_ITEMS); \ \ DEBUG_POP (&failure_id); \ DEBUG_PRINT2 (" Popping failure id: %u\n", failure_id); \ \ /* If the saved string location is NULL, it came from an \ on_failure_keep_string_jump opcode, and we want to throw away the \ saved NULL, thus retaining our current position in the string. */ \ string_temp = POP_FAILURE_ITEM (); \ if (string_temp != NULL) \ str = (const char *) string_temp; \ \ DEBUG_PRINT2 (" Popping string 0x%x: `", str); \ DEBUG_PRINT_DOUBLE_STRING (str, string1, size1, string2, size2); \ DEBUG_PRINT1 ("'\n"); \ \ pat = (unsigned char *) POP_FAILURE_ITEM (); \ DEBUG_PRINT2 (" Popping pattern 0x%x: ", pat); \ DEBUG_PRINT_COMPILED_PATTERN (bufp, pat, pend); \ \ /* Restore register info. */ \ high_reg = (unsigned) POP_FAILURE_ITEM (); \ DEBUG_PRINT2 (" Popping high active reg: %d\n", high_reg); \ \ low_reg = (unsigned) POP_FAILURE_ITEM (); \ DEBUG_PRINT2 (" Popping low active reg: %d\n", low_reg); \ \ for (this_reg = high_reg; this_reg >= low_reg; this_reg--) \ { \ DEBUG_PRINT2 (" Popping reg: %d\n", this_reg); \ \ reg_info[this_reg].word = POP_FAILURE_ITEM (); \ DEBUG_PRINT2 (" info: 0x%x\n", reg_info[this_reg]); \ \ regend[this_reg] = (const char *) POP_FAILURE_ITEM (); \ DEBUG_PRINT2 (" end: 0x%x\n", regend[this_reg]); \ \ regstart[this_reg] = (const char *) POP_FAILURE_ITEM (); \ DEBUG_PRINT2 (" start: 0x%x\n", regstart[this_reg]); \ } \ \ DEBUG_STATEMENT (nfailure_points_popped++); \ } /* POP_FAILURE_POINT */ /* re_compile_fastmap computes a ``fastmap'' for the compiled pattern in BUFP. A fastmap records which of the (1 << BYTEWIDTH) possible characters can start a string that matches the pattern. This fastmap is used by re_search to skip quickly over impossible starting points. The caller must supply the address of a (1 << BYTEWIDTH)-byte data area as BUFP->fastmap. We set the `fastmap', `fastmap_accurate', and `can_be_null' fields in the pattern buffer. Returns 0 if we succeed, -2 if an internal error. */ int re_compile_fastmap (bufp) struct re_pattern_buffer *bufp; { int j, k; fail_stack_type fail_stack; #ifndef REGEX_MALLOC char *destination; #endif /* We don't push any register information onto the failure stack. */ unsigned num_regs = 0; register char *fastmap = bufp->fastmap; unsigned char *pattern = bufp->buffer; unsigned long size = bufp->used; const unsigned char *p = pattern; register unsigned char *pend = pattern + size; /* Assume that each path through the pattern can be null until proven otherwise. We set this false at the bottom of switch statement, to which we get only if a particular path doesn't match the empty string. */ boolean path_can_be_null = true; /* We aren't doing a `succeed_n' to begin with. */ boolean succeed_n_p = false; assert (fastmap != NULL && p != NULL); INIT_FAIL_STACK (); bzero (fastmap, 1 << BYTEWIDTH); /* Assume nothing's valid. */ bufp->fastmap_accurate = 1; /* It will be when we're done. */ bufp->can_be_null = 0; while (p != pend || !FAIL_STACK_EMPTY ()) { if (p == pend) { bufp->can_be_null |= path_can_be_null; /* Reset for next path. */ path_can_be_null = true; p = fail_stack.stack[--fail_stack.avail]; } /* We should never be about to go beyond the end of the pattern. */ assert (p < pend); #ifdef SWITCH_ENUM_BUG switch ((int) ((re_opcode_t) *p++)) #else switch ((re_opcode_t) *p++) #endif { /* I guess the idea here is to simply not bother with a fastmap if a backreference is used, since it's too hard to figure out the fastmap for the corresponding group. Setting `can_be_null' stops `re_search_2' from using the fastmap, so that is all we do. */ case duplicate: bufp->can_be_null = 1; return 0; /* Following are the cases which match a character. These end with `break'. */ case exactn: fastmap[p[1]] = 1; break; case charset: for (j = *p++ * BYTEWIDTH - 1; j >= 0; j--) if (p[j / BYTEWIDTH] & (1 << (j % BYTEWIDTH))) fastmap[j] = 1; break; case charset_not: /* Chars beyond end of map must be allowed. */ for (j = *p * BYTEWIDTH; j < (1 << BYTEWIDTH); j++) fastmap[j] = 1; for (j = *p++ * BYTEWIDTH - 1; j >= 0; j--) if (!(p[j / BYTEWIDTH] & (1 << (j % BYTEWIDTH)))) fastmap[j] = 1; break; case wordchar: for (j = 0; j < (1 << BYTEWIDTH); j++) if (SYNTAX (j) == Sword) fastmap[j] = 1; break; case notwordchar: for (j = 0; j < (1 << BYTEWIDTH); j++) if (SYNTAX (j) != Sword) fastmap[j] = 1; break; case anychar: /* `.' matches anything ... */ for (j = 0; j < (1 << BYTEWIDTH); j++) fastmap[j] = 1; /* ... except perhaps newline. */ if (!(bufp->syntax & RE_DOT_NEWLINE)) fastmap['\n'] = 0; /* Return if we have already set `can_be_null'; if we have, then the fastmap is irrelevant. Something's wrong here. */ else if (bufp->can_be_null) return 0; /* Otherwise, have to check alternative paths. */ break; #ifdef emacs case syntaxspec: k = *p++; for (j = 0; j < (1 << BYTEWIDTH); j++) if (SYNTAX (j) == (enum syntaxcode) k) fastmap[j] = 1; break; case notsyntaxspec: k = *p++; for (j = 0; j < (1 << BYTEWIDTH); j++) if (SYNTAX (j) != (enum syntaxcode) k) fastmap[j] = 1; break; /* All cases after this match the empty string. These end with `continue'. */ case before_dot: case at_dot: case after_dot: continue; #endif /* not emacs */ case no_op: case begline: case endline: case begbuf: case endbuf: case wordbound: case notwordbound: case wordbeg: case wordend: case push_dummy_failure: continue; case jump_n: case pop_failure_jump: case maybe_pop_jump: case jump: case jump_past_alt: case dummy_failure_jump: EXTRACT_NUMBER_AND_INCR (j, p); p += j; if (j > 0) continue; /* Jump backward implies we just went through the body of a loop and matched nothing. Opcode jumped to should be `on_failure_jump' or `succeed_n'. Just treat it like an ordinary jump. For a * loop, it has pushed its failure point already; if so, discard that as redundant. */ if ((re_opcode_t) *p != on_failure_jump && (re_opcode_t) *p != succeed_n) continue; p++; EXTRACT_NUMBER_AND_INCR (j, p); p += j; /* If what's on the stack is where we are now, pop it. */ if (!FAIL_STACK_EMPTY () && fail_stack.stack[fail_stack.avail - 1] == p) fail_stack.avail--; continue; case on_failure_jump: case on_failure_keep_string_jump: handle_on_failure_jump: EXTRACT_NUMBER_AND_INCR (j, p); /* For some patterns, e.g., `(a?)?', `p+j' here points to the end of the pattern. We don't want to push such a point, since when we restore it above, entering the switch will increment `p' past the end of the pattern. We don't need to push such a point since we obviously won't find any more fastmap entries beyond `pend'. Such a pattern can match the null string, though. */ if (p + j < pend) { if (!PUSH_PATTERN_OP (p + j, fail_stack)) return -2; } else bufp->can_be_null = 1; if (succeed_n_p) { EXTRACT_NUMBER_AND_INCR (k, p); /* Skip the n. */ succeed_n_p = false; } continue; case succeed_n: /* Get to the number of times to succeed. */ p += 2; /* Increment p past the n for when k != 0. */ EXTRACT_NUMBER_AND_INCR (k, p); if (k == 0) { p -= 4; succeed_n_p = true; /* Spaghetti code alert. */ goto handle_on_failure_jump; } continue; case set_number_at: p += 4; continue; case start_memory: case stop_memory: p += 2; continue; default: abort (); /* We have listed all the cases. */ } /* switch *p++ */ /* Getting here means we have found the possible starting characters for one path of the pattern -- and that the empty string does not match. We need not follow this path further. Instead, look at the next alternative (remembered on the stack), or quit if no more. The test at the top of the loop does these things. */ path_can_be_null = false; p = pend; } /* while p */ /* Set `can_be_null' for the last path (also the first path, if the pattern is empty). */ bufp->can_be_null |= path_can_be_null; return 0; } /* re_compile_fastmap */ /* Set REGS to hold NUM_REGS registers, storing them in STARTS and ENDS. Subsequent matches using PATTERN_BUFFER and REGS will use this memory for recording register information. STARTS and ENDS must be allocated using the malloc library routine, and must each be at least NUM_REGS * sizeof (regoff_t) bytes long. If NUM_REGS == 0, then subsequent matches should allocate their own register data. Unless this function is called, the first search or match using PATTERN_BUFFER will allocate its own register data, without freeing the old data. */ void re_set_registers (bufp, regs, num_regs, starts, ends) struct re_pattern_buffer *bufp; struct re_registers *regs; unsigned num_regs; regoff_t *starts, *ends; { if (num_regs) { bufp->regs_allocated = REGS_REALLOCATE; regs->num_regs = num_regs; regs->start = starts; regs->end = ends; } else { bufp->regs_allocated = REGS_UNALLOCATED; regs->num_regs = 0; regs->start = regs->end = (regoff_t) 0; } } /* Searching routines. */ /* Like re_search_2, below, but only one string is specified, and doesn't let you say where to stop matching. */ int re_search (bufp, string, size, startpos, range, regs) struct re_pattern_buffer *bufp; const char *string; int size, startpos, range; struct re_registers *regs; { return re_search_2 (bufp, NULL, 0, string, size, startpos, range, regs, size); } /* Using the compiled pattern in BUFP->buffer, first tries to match the virtual concatenation of STRING1 and STRING2, starting first at index STARTPOS, then at STARTPOS + 1, and so on. STRING1 and STRING2 have length SIZE1 and SIZE2, respectively. RANGE is how far to scan while trying to match. RANGE = 0 means try only at STARTPOS; in general, the last start tried is STARTPOS + RANGE. In REGS, return the indices of the virtual concatenation of STRING1 and STRING2 that matched the entire BUFP->buffer and its contained subexpressions. Do not consider matching one past the index STOP in the virtual concatenation of STRING1 and STRING2. We return either the position in the strings at which the match was found, -1 if no match, or -2 if error (such as failure stack overflow). */ int re_search_2 (bufp, string1, size1, string2, size2, startpos, range, regs, stop) struct re_pattern_buffer *bufp; const char *string1, *string2; int size1, size2; int startpos; int range; struct re_registers *regs; int stop; { int val; register char *fastmap = bufp->fastmap; register char *translate = bufp->translate; int total_size = size1 + size2; int endpos = startpos + range; /* Check for out-of-range STARTPOS. */ if (startpos < 0 || startpos > total_size) return -1; /* Fix up RANGE if it might eventually take us outside the virtual concatenation of STRING1 and STRING2. */ if (endpos < -1) range = -1 - startpos; else if (endpos > total_size) range = total_size - startpos; /* If the search isn't to be a backwards one, don't waste time in a search for a pattern that must be anchored. */ if (bufp->used > 0 && (re_opcode_t) bufp->buffer[0] == begbuf && range > 0) { if (startpos > 0) return -1; else range = 1; } /* Update the fastmap now if not correct already. */ if (fastmap && !bufp->fastmap_accurate) if (re_compile_fastmap (bufp) == -2) return -2; /* Loop through the string, looking for a place to start matching. */ for (;;) { /* If a fastmap is supplied, skip quickly over characters that cannot be the start of a match. If the pattern can match the null string, however, we don't need to skip characters; we want the first null string. */ if (fastmap && startpos < total_size && !bufp->can_be_null) { if (range > 0) /* Searching forwards. */ { register const char *d; register int lim = 0; int irange = range; if (startpos < size1 && startpos + range >= size1) lim = range - (size1 - startpos); d = (startpos >= size1 ? string2 - size1 : string1) + startpos; /* Written out as an if-else to avoid testing `translate' inside the loop. */ if (translate) while (range > lim && !fastmap[(unsigned char) translate[(unsigned char) *d++]]) range--; else while (range > lim && !fastmap[(unsigned char) *d++]) range--; startpos += irange - range; } else /* Searching backwards. */ { register char c = (size1 == 0 || startpos >= size1 ? string2[startpos - size1] : string1[startpos]); if (!fastmap[(unsigned char) TRANSLATE (c)]) goto advance; } } /* If can't match the null string, and that's all we have left, fail. */ if (range >= 0 && startpos == total_size && fastmap && !bufp->can_be_null) return -1; val = re_match_2 (bufp, string1, size1, string2, size2, startpos, regs, stop); if (val >= 0) return startpos; if (val == -2) return -2; advance: if (!range) break; else if (range > 0) { range--; startpos++; } else { range++; startpos--; } } return -1; } /* re_search_2 */ /* Declarations and macros for re_match_2. */ static int bcmp_translate (); static boolean alt_match_null_string_p (), common_op_match_null_string_p (), group_match_null_string_p (); /* Structure for per-register (a.k.a. per-group) information. This must not be longer than one word, because we push this value onto the failure stack. Other register information, such as the starting and ending positions (which are addresses), and the list of inner groups (which is a bits list) are maintained in separate variables. We are making a (strictly speaking) nonportable assumption here: that the compiler will pack our bit fields into something that fits into the type of `word', i.e., is something that fits into one item on the failure stack. */ typedef union { fail_stack_elt_t word; struct { /* This field is one if this group can match the empty string, zero if not. If not yet determined, `MATCH_NULL_UNSET_VALUE'. */ #define MATCH_NULL_UNSET_VALUE 3 unsigned match_null_string_p : 2; unsigned is_active : 1; unsigned matched_something : 1; unsigned ever_matched_something : 1; } bits; } register_info_type; #define REG_MATCH_NULL_STRING_P(R) ((R).bits.match_null_string_p) #define IS_ACTIVE(R) ((R).bits.is_active) #define MATCHED_SOMETHING(R) ((R).bits.matched_something) #define EVER_MATCHED_SOMETHING(R) ((R).bits.ever_matched_something) /* Call this when have matched a real character; it sets `matched' flags for the subexpressions which we are currently inside. Also records that those subexprs have matched. */ #define SET_REGS_MATCHED() \ do \ { \ unsigned r; \ for (r = lowest_active_reg; r <= highest_active_reg; r++) \ { \ MATCHED_SOMETHING (reg_info[r]) \ = EVER_MATCHED_SOMETHING (reg_info[r]) \ = 1; \ } \ } \ while (0) /* This converts PTR, a pointer into one of the search strings `string1' and `string2' into an offset from the beginning of that string. */ #define POINTER_TO_OFFSET(ptr) \ (FIRST_STRING_P (ptr) ? (ptr) - string1 : (ptr) - string2 + size1) /* Registers are set to a sentinel when they haven't yet matched. */ #define REG_UNSET_VALUE ((char *) -1) #define REG_UNSET(e) ((e) == REG_UNSET_VALUE) /* Macros for dealing with the split strings in re_match_2. */ #define MATCHING_IN_FIRST_STRING (dend == end_match_1) /* Call before fetching a character with *d. This switches over to string2 if necessary. */ #define PREFETCH() \ while (d == dend) \ { \ /* End of string2 => fail. */ \ if (dend == end_match_2) \ goto fail; \ /* End of string1 => advance to string2. */ \ d = string2; \ dend = end_match_2; \ } /* Test if at very beginning or at very end of the virtual concatenation of `string1' and `string2'. If only one string, it's `string2'. */ #define AT_STRINGS_BEG(d) ((d) == (size1 ? string1 : string2) || !size2) #define AT_STRINGS_END(d) ((d) == end2) /* Test if D points to a character which is word-constituent. We have two special cases to check for: if past the end of string1, look at the first character in string2; and if before the beginning of string2, look at the last character in string1. */ #define WORDCHAR_P(d) \ (SYNTAX ((d) == end1 ? *string2 \ : (d) == string2 - 1 ? *(end1 - 1) : *(d)) \ == Sword) /* Test if the character before D and the one at D differ with respect to being word-constituent. */ #define AT_WORD_BOUNDARY(d) \ (AT_STRINGS_BEG (d) || AT_STRINGS_END (d) \ || WORDCHAR_P (d - 1) != WORDCHAR_P (d)) /* Free everything we malloc. */ #ifdef REGEX_MALLOC #define FREE_VAR(var) if (var) free (var); var = NULL #define FREE_VARIABLES() \ do { \ FREE_VAR (fail_stack.stack); \ FREE_VAR (regstart); \ FREE_VAR (regend); \ FREE_VAR (old_regstart); \ FREE_VAR (old_regend); \ FREE_VAR (best_regstart); \ FREE_VAR (best_regend); \ FREE_VAR (reg_info); \ FREE_VAR (reg_dummy); \ FREE_VAR (reg_info_dummy); \ } while (0) #else /* not REGEX_MALLOC */ /* Some MIPS systems (at least) want this to free alloca'd storage. */ #define FREE_VARIABLES() alloca (0) #endif /* not REGEX_MALLOC */ /* These values must meet several constraints. They must not be valid register values; since we have a limit of 255 registers (because we use only one byte in the pattern for the register number), we can use numbers larger than 255. They must differ by 1, because of NUM_FAILURE_ITEMS above. And the value for the lowest register must be larger than the value for the highest register, so we do not try to actually save any registers when none are active. */ #define NO_HIGHEST_ACTIVE_REG (1 << BYTEWIDTH) #define NO_LOWEST_ACTIVE_REG (NO_HIGHEST_ACTIVE_REG + 1) /* Matching routines. */ #ifndef emacs /* Emacs never uses this. */ /* re_match is like re_match_2 except it takes only a single string. */ int re_match (bufp, string, size, pos, regs) struct re_pattern_buffer *bufp; const char *string; int size, pos; struct re_registers *regs; { return re_match_2 (bufp, NULL, 0, string, size, pos, regs, size); } #endif /* not emacs */ /* re_match_2 matches the compiled pattern in BUFP against the the (virtual) concatenation of STRING1 and STRING2 (of length SIZE1 and SIZE2, respectively). We start matching at POS, and stop matching at STOP. If REGS is non-null and the `no_sub' field of BUFP is nonzero, we store offsets for the substring each group matched in REGS. See the documentation for exactly how many groups we fill. We return -1 if no match, -2 if an internal error (such as the failure stack overflowing). Otherwise, we return the length of the matched substring. */ int re_match_2 (bufp, string1, size1, string2, size2, pos, regs, stop) struct re_pattern_buffer *bufp; const char *string1, *string2; int size1, size2; int pos; struct re_registers *regs; int stop; { /* General temporaries. */ int mcnt; unsigned char *p1; /* Just past the end of the corresponding string. */ const char *end1, *end2; /* Pointers into string1 and string2, just past the last characters in each to consider matching. */ const char *end_match_1, *end_match_2; /* Where we are in the data, and the end of the current string. */ const char *d, *dend; /* Where we are in the pattern, and the end of the pattern. */ unsigned char *p = bufp->buffer; register unsigned char *pend = p + bufp->used; /* We use this to map every character in the string. */ char *translate = bufp->translate; /* Failure point stack. Each place that can handle a failure further down the line pushes a failure point on this stack. It consists of restart, regend, and reg_info for all registers corresponding to the subexpressions we're currently inside, plus the number of such registers, and, finally, two char *'s. The first char * is where to resume scanning the pattern; the second one is where to resume scanning the strings. If the latter is zero, the failure point is a ``dummy''; if a failure happens and the failure point is a dummy, it gets discarded and the next next one is tried. */ fail_stack_type fail_stack; #ifdef DEBUG static unsigned failure_id = 0; unsigned nfailure_points_pushed = 0, nfailure_points_popped = 0; #endif /* We fill all the registers internally, independent of what we return, for use in backreferences. The number here includes an element for register zero. */ unsigned num_regs = bufp->re_nsub + 1; /* The currently active registers. */ unsigned lowest_active_reg = NO_LOWEST_ACTIVE_REG; unsigned highest_active_reg = NO_HIGHEST_ACTIVE_REG; /* Information on the contents of registers. These are pointers into the input strings; they record just what was matched (on this attempt) by a subexpression part of the pattern, that is, the regnum-th regstart pointer points to where in the pattern we began matching and the regnum-th regend points to right after where we stopped matching the regnum-th subexpression. (The zeroth register keeps track of what the whole pattern matches.) */ const char **regstart, **regend; /* If a group that's operated upon by a repetition operator fails to match anything, then the register for its start will need to be restored because it will have been set to wherever in the string we are when we last see its open-group operator. Similarly for a register's end. */ const char **old_regstart, **old_regend; /* The is_active field of reg_info helps us keep track of which (possibly nested) subexpressions we are currently in. The matched_something field of reg_info[reg_num] helps us tell whether or not we have matched any of the pattern so far this time through the reg_num-th subexpression. These two fields get reset each time through any loop their register is in. */ register_info_type *reg_info; /* The following record the register info as found in the above variables when we find a match better than any we've seen before. This happens as we backtrack through the failure points, which in turn happens only if we have not yet matched the entire string. */ unsigned best_regs_set = false; const char **best_regstart, **best_regend; /* Logically, this is `best_regend[0]'. But we don't want to have to allocate space for that if we're not allocating space for anything else (see below). Also, we never need info about register 0 for any of the other register vectors, and it seems rather a kludge to treat `best_regend' differently than the rest. So we keep track of the end of the best match so far in a separate variable. We initialize this to NULL so that when we backtrack the first time and need to test it, it's not garbage. */ const char *match_end = NULL; /* Used when we pop values we don't care about. */ const char **reg_dummy; register_info_type *reg_info_dummy; #ifdef DEBUG /* Counts the total number of registers pushed. */ unsigned num_regs_pushed = 0; #endif DEBUG_PRINT1 ("\n\nEntering re_match_2.\n"); INIT_FAIL_STACK (); /* Do not bother to initialize all the register variables if there are no groups in the pattern, as it takes a fair amount of time. If there are groups, we include space for register 0 (the whole pattern), even though we never use it, since it simplifies the array indexing. We should fix this. */ if (bufp->re_nsub) { regstart = REGEX_TALLOC (num_regs, const char *); regend = REGEX_TALLOC (num_regs, const char *); old_regstart = REGEX_TALLOC (num_regs, const char *); old_regend = REGEX_TALLOC (num_regs, const char *); best_regstart = REGEX_TALLOC (num_regs, const char *); best_regend = REGEX_TALLOC (num_regs, const char *); reg_info = REGEX_TALLOC (num_regs, register_info_type); reg_dummy = REGEX_TALLOC (num_regs, const char *); reg_info_dummy = REGEX_TALLOC (num_regs, register_info_type); if (!(regstart && regend && old_regstart && old_regend && reg_info && best_regstart && best_regend && reg_dummy && reg_info_dummy)) { FREE_VARIABLES (); return -2; } } #ifdef REGEX_MALLOC else { /* We must initialize all our variables to NULL, so that `FREE_VARIABLES' doesn't try to free them. */ regstart = regend = old_regstart = old_regend = best_regstart = best_regend = reg_dummy = NULL; reg_info = reg_info_dummy = (register_info_type *) NULL; } #endif /* REGEX_MALLOC */ /* The starting position is bogus. */ if (pos < 0 || pos > size1 + size2) { FREE_VARIABLES (); return -1; } /* Initialize subexpression text positions to -1 to mark ones that no start_memory/stop_memory has been seen for. Also initialize the register information struct. */ for (mcnt = 1; mcnt < num_regs; mcnt++) { regstart[mcnt] = regend[mcnt] = old_regstart[mcnt] = old_regend[mcnt] = REG_UNSET_VALUE; REG_MATCH_NULL_STRING_P (reg_info[mcnt]) = MATCH_NULL_UNSET_VALUE; IS_ACTIVE (reg_info[mcnt]) = 0; MATCHED_SOMETHING (reg_info[mcnt]) = 0; EVER_MATCHED_SOMETHING (reg_info[mcnt]) = 0; } /* We move `string1' into `string2' if the latter's empty -- but not if `string1' is null. */ if (size2 == 0 && string1 != NULL) { string2 = string1; size2 = size1; string1 = 0; size1 = 0; } end1 = string1 + size1; end2 = string2 + size2; /* Compute where to stop matching, within the two strings. */ if (stop <= size1) { end_match_1 = string1 + stop; end_match_2 = string2; } else { end_match_1 = end1; end_match_2 = string2 + stop - size1; } /* `p' scans through the pattern as `d' scans through the data. `dend' is the end of the input string that `d' points within. `d' is advanced into the following input string whenever necessary, but this happens before fetching; therefore, at the beginning of the loop, `d' can be pointing at the end of a string, but it cannot equal `string2'. */ if (size1 > 0 && pos <= size1) { d = string1 + pos; dend = end_match_1; } else { d = string2 + pos - size1; dend = end_match_2; } DEBUG_PRINT1 ("The compiled pattern is: "); DEBUG_PRINT_COMPILED_PATTERN (bufp, p, pend); DEBUG_PRINT1 ("The string to match is: `"); DEBUG_PRINT_DOUBLE_STRING (d, string1, size1, string2, size2); DEBUG_PRINT1 ("'\n"); /* This loops over pattern commands. It exits by returning from the function if the match is complete, or it drops through if the match fails at this starting point in the input data. */ for (;;) { DEBUG_PRINT2 ("\n0x%x: ", p); if (p == pend) { /* End of pattern means we might have succeeded. */ DEBUG_PRINT1 ("end of pattern ... "); /* If we haven't matched the entire string, and we want the longest match, try backtracking. */ if (d != end_match_2) { DEBUG_PRINT1 ("backtracking.\n"); if (!FAIL_STACK_EMPTY ()) { /* More failure points to try. */ boolean same_str_p = (FIRST_STRING_P (match_end) == MATCHING_IN_FIRST_STRING); /* If exceeds best match so far, save it. */ if (!best_regs_set || (same_str_p && d > match_end) || (!same_str_p && !MATCHING_IN_FIRST_STRING)) { best_regs_set = true; match_end = d; DEBUG_PRINT1 ("\nSAVING match as best so far.\n"); for (mcnt = 1; mcnt < num_regs; mcnt++) { best_regstart[mcnt] = regstart[mcnt]; best_regend[mcnt] = regend[mcnt]; } } goto fail; } /* If no failure points, don't restore garbage. */ else if (best_regs_set) { restore_best_regs: /* Restore best match. It may happen that `dend == end_match_1' while the restored d is in string2. For example, the pattern `x.*y.*z' against the strings `x-' and `y-z-', if the two strings are not consecutive in memory. */ DEBUG_PRINT1 ("Restoring best registers.\n"); d = match_end; dend = ((d >= string1 && d <= end1) ? end_match_1 : end_match_2); for (mcnt = 1; mcnt < num_regs; mcnt++) { regstart[mcnt] = best_regstart[mcnt]; regend[mcnt] = best_regend[mcnt]; } } } /* d != end_match_2 */ DEBUG_PRINT1 ("Accepting match.\n"); /* If caller wants register contents data back, do it. */ if (regs && !bufp->no_sub) { /* Have the register data arrays been allocated? */ if (bufp->regs_allocated == REGS_UNALLOCATED) { /* No. So allocate them with malloc. We need one extra element beyond `num_regs' for the `-1' marker GNU code uses. */ regs->num_regs = MAX (RE_NREGS, num_regs + 1); regs->start = TALLOC (regs->num_regs, regoff_t); regs->end = TALLOC (regs->num_regs, regoff_t); if (regs->start == NULL || regs->end == NULL) return -2; bufp->regs_allocated = REGS_REALLOCATE; } else if (bufp->regs_allocated == REGS_REALLOCATE) { /* Yes. If we need more elements than were already allocated, reallocate them. If we need fewer, just leave it alone. */ if (regs->num_regs < num_regs + 1) { regs->num_regs = num_regs + 1; RETALLOC (regs->start, regs->num_regs, regoff_t); RETALLOC (regs->end, regs->num_regs, regoff_t); if (regs->start == NULL || regs->end == NULL) return -2; } } else assert (bufp->regs_allocated == REGS_FIXED); /* Convert the pointer data in `regstart' and `regend' to indices. Register zero has to be set differently, since we haven't kept track of any info for it. */ if (regs->num_regs > 0) { regs->start[0] = pos; regs->end[0] = (MATCHING_IN_FIRST_STRING ? d - string1 : d - string2 + size1); } /* Go through the first `min (num_regs, regs->num_regs)' registers, since that is all we initialized. */ for (mcnt = 1; mcnt < MIN (num_regs, regs->num_regs); mcnt++) { if (REG_UNSET (regstart[mcnt]) || REG_UNSET (regend[mcnt])) regs->start[mcnt] = regs->end[mcnt] = -1; else { regs->start[mcnt] = POINTER_TO_OFFSET (regstart[mcnt]); regs->end[mcnt] = POINTER_TO_OFFSET (regend[mcnt]); } } /* If the regs structure we return has more elements than were in the pattern, set the extra elements to -1. If we (re)allocated the registers, this is the case, because we always allocate enough to have at least one -1 at the end. */ for (mcnt = num_regs; mcnt < regs->num_regs; mcnt++) regs->start[mcnt] = regs->end[mcnt] = -1; } /* regs && !bufp->no_sub */ FREE_VARIABLES (); DEBUG_PRINT4 ("%u failure points pushed, %u popped (%u remain).\n", nfailure_points_pushed, nfailure_points_popped, nfailure_points_pushed - nfailure_points_popped); DEBUG_PRINT2 ("%u registers pushed.\n", num_regs_pushed); mcnt = d - pos - (MATCHING_IN_FIRST_STRING ? string1 : string2 - size1); DEBUG_PRINT2 ("Returning %d from re_match_2.\n", mcnt); return mcnt; } /* Otherwise match next pattern command. */ #ifdef SWITCH_ENUM_BUG switch ((int) ((re_opcode_t) *p++)) #else switch ((re_opcode_t) *p++) #endif { /* Ignore these. Used to ignore the n of succeed_n's which currently have n == 0. */ case no_op: DEBUG_PRINT1 ("EXECUTING no_op.\n"); break; /* Match the next n pattern characters exactly. The following byte in the pattern defines n, and the n bytes after that are the characters to match. */ case exactn: mcnt = *p++; DEBUG_PRINT2 ("EXECUTING exactn %d.\n", mcnt); /* This is written out as an if-else so we don't waste time testing `translate' inside the loop. */ if (translate) { do { PREFETCH (); if (translate[(unsigned char) *d++] != (char) *p++) goto fail; } while (--mcnt); } else { do { PREFETCH (); if (*d++ != (char) *p++) goto fail; } while (--mcnt); } SET_REGS_MATCHED (); break; /* Match any character except possibly a newline or a null. */ case anychar: DEBUG_PRINT1 ("EXECUTING anychar.\n"); PREFETCH (); if ((!(bufp->syntax & RE_DOT_NEWLINE) && TRANSLATE (*d) == '\n') || (bufp->syntax & RE_DOT_NOT_NULL && TRANSLATE (*d) == '\000')) goto fail; SET_REGS_MATCHED (); DEBUG_PRINT2 (" Matched `%d'.\n", *d); d++; break; case charset: case charset_not: { register unsigned char c; boolean not = (re_opcode_t) *(p - 1) == charset_not; DEBUG_PRINT2 ("EXECUTING charset%s.\n", not ? "_not" : ""); PREFETCH (); c = TRANSLATE (*d); /* The character to match. */ /* Cast to `unsigned' instead of `unsigned char' in case the bit list is a full 32 bytes long. */ if (c < (unsigned) (*p * BYTEWIDTH) && p[1 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH))) not = !not; p += 1 + *p; if (!not) goto fail; SET_REGS_MATCHED (); d++; break; } /* The beginning of a group is represented by start_memory. The arguments are the register number in the next byte, and the number of groups inner to this one in the next. The text matched within the group is recorded (in the internal registers data structure) under the register number. */ case start_memory: DEBUG_PRINT3 ("EXECUTING start_memory %d (%d):\n", *p, p[1]); /* Find out if this group can match the empty string. */ p1 = p; /* To send to group_match_null_string_p. */ if (REG_MATCH_NULL_STRING_P (reg_info[*p]) == MATCH_NULL_UNSET_VALUE) REG_MATCH_NULL_STRING_P (reg_info[*p]) = group_match_null_string_p (&p1, pend, reg_info); /* Save the position in the string where we were the last time we were at this open-group operator in case the group is operated upon by a repetition operator, e.g., with `(a*)*b' against `ab'; then we want to ignore where we are now in the string in case this attempt to match fails. */ old_regstart[*p] = REG_MATCH_NULL_STRING_P (reg_info[*p]) ? REG_UNSET (regstart[*p]) ? d : regstart[*p] : regstart[*p]; DEBUG_PRINT2 (" old_regstart: %d\n", POINTER_TO_OFFSET (old_regstart[*p])); regstart[*p] = d; DEBUG_PRINT2 (" regstart: %d\n", POINTER_TO_OFFSET (regstart[*p])); IS_ACTIVE (reg_info[*p]) = 1; MATCHED_SOMETHING (reg_info[*p]) = 0; /* This is the new highest active register. */ highest_active_reg = *p; /* If nothing was active before, this is the new lowest active register. */ if (lowest_active_reg == NO_LOWEST_ACTIVE_REG) lowest_active_reg = *p; /* Move past the register number and inner group count. */ p += 2; break; /* The stop_memory opcode represents the end of a group. Its arguments are the same as start_memory's: the register number, and the number of inner groups. */ case stop_memory: DEBUG_PRINT3 ("EXECUTING stop_memory %d (%d):\n", *p, p[1]); /* We need to save the string position the last time we were at this close-group operator in case the group is operated upon by a repetition operator, e.g., with `((a*)*(b*)*)*' against `aba'; then we want to ignore where we are now in the string in case this attempt to match fails. */ old_regend[*p] = REG_MATCH_NULL_STRING_P (reg_info[*p]) ? REG_UNSET (regend[*p]) ? d : regend[*p] : regend[*p]; DEBUG_PRINT2 (" old_regend: %d\n", POINTER_TO_OFFSET (old_regend[*p])); regend[*p] = d; DEBUG_PRINT2 (" regend: %d\n", POINTER_TO_OFFSET (regend[*p])); /* This register isn't active anymore. */ IS_ACTIVE (reg_info[*p]) = 0; /* If this was the only register active, nothing is active anymore. */ if (lowest_active_reg == highest_active_reg) { lowest_active_reg = NO_LOWEST_ACTIVE_REG; highest_active_reg = NO_HIGHEST_ACTIVE_REG; } else { /* We must scan for the new highest active register, since it isn't necessarily one less than now: consider (a(b)c(d(e)f)g). When group 3 ends, after the f), the new highest active register is 1. */ unsigned char r = *p - 1; while (r > 0 && !IS_ACTIVE (reg_info[r])) r--; /* If we end up at register zero, that means that we saved the registers as the result of an `on_failure_jump', not a `start_memory', and we jumped to past the innermost `stop_memory'. For example, in ((.)*) we save registers 1 and 2 as a result of the *, but when we pop back to the second ), we are at the stop_memory 1. Thus, nothing is active. */ if (r == 0) { lowest_active_reg = NO_LOWEST_ACTIVE_REG; highest_active_reg = NO_HIGHEST_ACTIVE_REG; } else highest_active_reg = r; } /* If just failed to match something this time around with a group that's operated on by a repetition operator, try to force exit from the ``loop'', and restore the register information for this group that we had before trying this last match. */ if ((!MATCHED_SOMETHING (reg_info[*p]) || (re_opcode_t) p[-3] == start_memory) && (p + 2) < pend) { boolean is_a_jump_n = false; p1 = p + 2; mcnt = 0; switch ((re_opcode_t) *p1++) { case jump_n: is_a_jump_n = true; case pop_failure_jump: case maybe_pop_jump: case jump: case dummy_failure_jump: EXTRACT_NUMBER_AND_INCR (mcnt, p1); if (is_a_jump_n) p1 += 2; break; default: /* do nothing */ ; } p1 += mcnt; /* If the next operation is a jump backwards in the pattern to an on_failure_jump right before the start_memory corresponding to this stop_memory, exit from the loop by forcing a failure after pushing on the stack the on_failure_jump's jump in the pattern, and d. */ if (mcnt < 0 && (re_opcode_t) *p1 == on_failure_jump && (re_opcode_t) p1[3] == start_memory && p1[4] == *p) { /* If this group ever matched anything, then restore what its registers were before trying this last failed match, e.g., with `(a*)*b' against `ab' for regstart[1], and, e.g., with `((a*)*(b*)*)*' against `aba' for regend[3]. Also restore the registers for inner groups for, e.g., `((a*)(b*))*' against `aba' (register 3 would otherwise get trashed). */ if (EVER_MATCHED_SOMETHING (reg_info[*p])) { unsigned r; EVER_MATCHED_SOMETHING (reg_info[*p]) = 0; /* Restore this and inner groups' (if any) registers. */ for (r = *p; r < *p + *(p + 1); r++) { regstart[r] = old_regstart[r]; /* xx why this test? */ if ((int) old_regend[r] >= (int) regstart[r]) regend[r] = old_regend[r]; } } p1++; EXTRACT_NUMBER_AND_INCR (mcnt, p1); PUSH_FAILURE_POINT (p1 + mcnt, d, -2); goto fail; } } /* Move past the register number and the inner group count. */ p += 2; break; /* \ has been turned into a `duplicate' command which is followed by the numeric value of as the register number. */ case duplicate: { register const char *d2, *dend2; int regno = *p++; /* Get which register to match against. */ DEBUG_PRINT2 ("EXECUTING duplicate %d.\n", regno); /* Can't back reference a group which we've never matched. */ if (REG_UNSET (regstart[regno]) || REG_UNSET (regend[regno])) goto fail; /* Where in input to try to start matching. */ d2 = regstart[regno]; /* Where to stop matching; if both the place to start and the place to stop matching are in the same string, then set to the place to stop, otherwise, for now have to use the end of the first string. */ dend2 = ((FIRST_STRING_P (regstart[regno]) == FIRST_STRING_P (regend[regno])) ? regend[regno] : end_match_1); for (;;) { /* If necessary, advance to next segment in register contents. */ while (d2 == dend2) { if (dend2 == end_match_2) break; if (dend2 == regend[regno]) break; /* End of string1 => advance to string2. */ d2 = string2; dend2 = regend[regno]; } /* At end of register contents => success */ if (d2 == dend2) break; /* If necessary, advance to next segment in data. */ PREFETCH (); /* How many characters left in this segment to match. */ mcnt = dend - d; /* Want how many consecutive characters we can match in one shot, so, if necessary, adjust the count. */ if (mcnt > dend2 - d2) mcnt = dend2 - d2; /* Compare that many; failure if mismatch, else move past them. */ if (translate ? bcmp_translate (d, d2, mcnt, translate) : bcmp (d, d2, mcnt)) goto fail; d += mcnt, d2 += mcnt; } } break; /* begline matches the empty string at the beginning of the string (unless `not_bol' is set in `bufp'), and, if `newline_anchor' is set, after newlines. */ case begline: DEBUG_PRINT1 ("EXECUTING begline.\n"); if (AT_STRINGS_BEG (d)) { if (!bufp->not_bol) break; } else if (d[-1] == '\n' && bufp->newline_anchor) { break; } /* In all other cases, we fail. */ goto fail; /* endline is the dual of begline. */ case endline: DEBUG_PRINT1 ("EXECUTING endline.\n"); if (AT_STRINGS_END (d)) { if (!bufp->not_eol) break; } /* We have to ``prefetch'' the next character. */ else if ((d == end1 ? *string2 : *d) == '\n' && bufp->newline_anchor) { break; } goto fail; /* Match at the very beginning of the data. */ case begbuf: DEBUG_PRINT1 ("EXECUTING begbuf.\n"); if (AT_STRINGS_BEG (d)) break; goto fail; /* Match at the very end of the data. */ case endbuf: DEBUG_PRINT1 ("EXECUTING endbuf.\n"); if (AT_STRINGS_END (d)) break; goto fail; /* on_failure_keep_string_jump is used to optimize `.*\n'. It pushes NULL as the value for the string on the stack. Then `pop_failure_point' will keep the current value for the string, instead of restoring it. To see why, consider matching `foo\nbar' against `.*\n'. The .* matches the foo; then the . fails against the \n. But the next thing we want to do is match the \n against the \n; if we restored the string value, we would be back at the foo. Because this is used only in specific cases, we don't need to check all the things that `on_failure_jump' does, to make sure the right things get saved on the stack. Hence we don't share its code. The only reason to push anything on the stack at all is that otherwise we would have to change `anychar's code to do something besides goto fail in this case; that seems worse than this. */ case on_failure_keep_string_jump: DEBUG_PRINT1 ("EXECUTING on_failure_keep_string_jump"); EXTRACT_NUMBER_AND_INCR (mcnt, p); DEBUG_PRINT3 (" %d (to 0x%x):\n", mcnt, p + mcnt); PUSH_FAILURE_POINT (p + mcnt, NULL, -2); break; /* Uses of on_failure_jump: Each alternative starts with an on_failure_jump that points to the beginning of the next alternative. Each alternative except the last ends with a jump that in effect jumps past the rest of the alternatives. (They really jump to the ending jump of the following alternative, because tensioning these jumps is a hassle.) Repeats start with an on_failure_jump that points past both the repetition text and either the following jump or pop_failure_jump back to this on_failure_jump. */ case on_failure_jump: on_failure: DEBUG_PRINT1 ("EXECUTING on_failure_jump"); EXTRACT_NUMBER_AND_INCR (mcnt, p); DEBUG_PRINT3 (" %d (to 0x%x)", mcnt, p + mcnt); /* If this on_failure_jump comes right before a group (i.e., the original * applied to a group), save the information for that group and all inner ones, so that if we fail back to this point, the group's information will be correct. For example, in \(a*\)*\1, we need the preceding group, and in \(\(a*\)b*\)\2, we need the inner group. */ /* We can't use `p' to check ahead because we push a failure point to `p + mcnt' after we do this. */ p1 = p; /* We need to skip no_op's before we look for the start_memory in case this on_failure_jump is happening as the result of a completed succeed_n, as in \(a\)\{1,3\}b\1 against aba. */ while (p1 < pend && (re_opcode_t) *p1 == no_op) p1++; if (p1 < pend && (re_opcode_t) *p1 == start_memory) { /* We have a new highest active register now. This will get reset at the start_memory we are about to get to, but we will have saved all the registers relevant to this repetition op, as described above. */ highest_active_reg = *(p1 + 1) + *(p1 + 2); if (lowest_active_reg == NO_LOWEST_ACTIVE_REG) lowest_active_reg = *(p1 + 1); } DEBUG_PRINT1 (":\n"); PUSH_FAILURE_POINT (p + mcnt, d, -2); break; /* A smart repeat ends with `maybe_pop_jump'. We change it to either `pop_failure_jump' or `jump'. */ case maybe_pop_jump: EXTRACT_NUMBER_AND_INCR (mcnt, p); DEBUG_PRINT2 ("EXECUTING maybe_pop_jump %d.\n", mcnt); { register unsigned char *p2 = p; /* Compare the beginning of the repeat with what in the pattern follows its end. If we can establish that there is nothing that they would both match, i.e., that we would have to backtrack because of (as in, e.g., `a*a') then we can change to pop_failure_jump, because we'll never have to backtrack. This is not true in the case of alternatives: in `(a|ab)*' we do need to backtrack to the `ab' alternative (e.g., if the string was `ab'). But instead of trying to detect that here, the alternative has put on a dummy failure point which is what we will end up popping. */ /* Skip over open/close-group commands. */ while (p2 + 2 < pend && ((re_opcode_t) *p2 == stop_memory || (re_opcode_t) *p2 == start_memory)) p2 += 3; /* Skip over args, too. */ /* If we're at the end of the pattern, we can change. */ if (p2 == pend) { /* Consider what happens when matching ":\(.*\)" against ":/". I don't really understand this code yet. */ p[-3] = (unsigned char) pop_failure_jump; DEBUG_PRINT1 (" End of pattern: change to `pop_failure_jump'.\n"); } else if ((re_opcode_t) *p2 == exactn || (bufp->newline_anchor && (re_opcode_t) *p2 == endline)) { register unsigned char c = *p2 == (unsigned char) endline ? '\n' : p2[2]; p1 = p + mcnt; /* p1[0] ... p1[2] are the `on_failure_jump' corresponding to the `maybe_finalize_jump' of this case. Examine what follows. */ if ((re_opcode_t) p1[3] == exactn && p1[5] != c) { p[-3] = (unsigned char) pop_failure_jump; DEBUG_PRINT3 (" %c != %c => pop_failure_jump.\n", c, p1[5]); } else if ((re_opcode_t) p1[3] == charset || (re_opcode_t) p1[3] == charset_not) { int not = (re_opcode_t) p1[3] == charset_not; if (c < (unsigned char) (p1[4] * BYTEWIDTH) && p1[5 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH))) not = !not; /* `not' is equal to 1 if c would match, which means that we can't change to pop_failure_jump. */ if (!not) { p[-3] = (unsigned char) pop_failure_jump; DEBUG_PRINT1 (" No match => pop_failure_jump.\n"); } } } } p -= 2; /* Point at relative address again. */ if ((re_opcode_t) p[-1] != pop_failure_jump) { p[-1] = (unsigned char) jump; DEBUG_PRINT1 (" Match => jump.\n"); goto unconditional_jump; } /* Note fall through. */ /* The end of a simple repeat has a pop_failure_jump back to its matching on_failure_jump, where the latter will push a failure point. The pop_failure_jump takes off failure points put on by this pop_failure_jump's matching on_failure_jump; we got through the pattern to here from the matching on_failure_jump, so didn't fail. */ case pop_failure_jump: { /* We need to pass separate storage for the lowest and highest registers, even though we don't care about the actual values. Otherwise, we will restore only one register from the stack, since lowest will == highest in `pop_failure_point'. */ unsigned dummy_low_reg, dummy_high_reg; unsigned char *pdummy; const char *sdummy; DEBUG_PRINT1 ("EXECUTING pop_failure_jump.\n"); POP_FAILURE_POINT (sdummy, pdummy, dummy_low_reg, dummy_high_reg, reg_dummy, reg_dummy, reg_info_dummy); } /* Note fall through. */ /* Unconditionally jump (without popping any failure points). */ case jump: unconditional_jump: EXTRACT_NUMBER_AND_INCR (mcnt, p); /* Get the amount to jump. */ DEBUG_PRINT2 ("EXECUTING jump %d ", mcnt); p += mcnt; /* Do the jump. */ DEBUG_PRINT2 ("(to 0x%x).\n", p); break; /* We need this opcode so we can detect where alternatives end in `group_match_null_string_p' et al. */ case jump_past_alt: DEBUG_PRINT1 ("EXECUTING jump_past_alt.\n"); goto unconditional_jump; /* Normally, the on_failure_jump pushes a failure point, which then gets popped at pop_failure_jump. We will end up at pop_failure_jump, also, and with a pattern of, say, `a+', we are skipping over the on_failure_jump, so we have to push something meaningless for pop_failure_jump to pop. */ case dummy_failure_jump: DEBUG_PRINT1 ("EXECUTING dummy_failure_jump.\n"); /* It doesn't matter what we push for the string here. What the code at `fail' tests is the value for the pattern. */ PUSH_FAILURE_POINT (0, 0, -2); goto unconditional_jump; /* At the end of an alternative, we need to push a dummy failure point in case we are followed by a `pop_failure_jump', because we don't want the failure point for the alternative to be popped. For example, matching `(a|ab)*' against `aab' requires that we match the `ab' alternative. */ case push_dummy_failure: DEBUG_PRINT1 ("EXECUTING push_dummy_failure.\n"); /* See comments just above at `dummy_failure_jump' about the two zeroes. */ PUSH_FAILURE_POINT (0, 0, -2); break; /* Have to succeed matching what follows at least n times. After that, handle like `on_failure_jump'. */ case succeed_n: EXTRACT_NUMBER (mcnt, p + 2); DEBUG_PRINT2 ("EXECUTING succeed_n %d.\n", mcnt); assert (mcnt >= 0); /* Originally, this is how many times we HAVE to succeed. */ if (mcnt > 0) { mcnt--; p += 2; STORE_NUMBER_AND_INCR (p, mcnt); DEBUG_PRINT3 (" Setting 0x%x to %d.\n", p, mcnt); } else if (mcnt == 0) { DEBUG_PRINT2 (" Setting two bytes from 0x%x to no_op.\n", p+2); p[2] = (unsigned char) no_op; p[3] = (unsigned char) no_op; goto on_failure; } break; case jump_n: EXTRACT_NUMBER (mcnt, p + 2); DEBUG_PRINT2 ("EXECUTING jump_n %d.\n", mcnt); /* Originally, this is how many times we CAN jump. */ if (mcnt) { mcnt--; STORE_NUMBER (p + 2, mcnt); goto unconditional_jump; } /* If don't have to jump any more, skip over the rest of command. */ else p += 4; break; case set_number_at: { DEBUG_PRINT1 ("EXECUTING set_number_at.\n"); EXTRACT_NUMBER_AND_INCR (mcnt, p); p1 = p + mcnt; EXTRACT_NUMBER_AND_INCR (mcnt, p); DEBUG_PRINT3 (" Setting 0x%x to %d.\n", p1, mcnt); STORE_NUMBER (p1, mcnt); break; } case wordbound: DEBUG_PRINT1 ("EXECUTING wordbound.\n"); if (AT_WORD_BOUNDARY (d)) break; goto fail; case notwordbound: DEBUG_PRINT1 ("EXECUTING notwordbound.\n"); if (AT_WORD_BOUNDARY (d)) goto fail; break; case wordbeg: DEBUG_PRINT1 ("EXECUTING wordbeg.\n"); if (WORDCHAR_P (d) && (AT_STRINGS_BEG (d) || !WORDCHAR_P (d - 1))) break; goto fail; case wordend: DEBUG_PRINT1 ("EXECUTING wordend.\n"); if (!AT_STRINGS_BEG (d) && WORDCHAR_P (d - 1) && (!WORDCHAR_P (d) || AT_STRINGS_END (d))) break; goto fail; #ifdef emacs #ifdef emacs19 case before_dot: DEBUG_PRINT1 ("EXECUTING before_dot.\n"); if (PTR_CHAR_POS ((unsigned char *) d) >= point) goto fail; break; case at_dot: DEBUG_PRINT1 ("EXECUTING at_dot.\n"); if (PTR_CHAR_POS ((unsigned char *) d) != point) goto fail; break; case after_dot: DEBUG_PRINT1 ("EXECUTING after_dot.\n"); if (PTR_CHAR_POS ((unsigned char *) d) <= point) goto fail; break; #else /* not emacs19 */ case at_dot: DEBUG_PRINT1 ("EXECUTING at_dot.\n"); if (PTR_CHAR_POS ((unsigned char *) d) + 1 != point) goto fail; break; #endif /* not emacs19 */ case syntaxspec: DEBUG_PRINT2 ("EXECUTING syntaxspec %d.\n", mcnt); mcnt = *p++; goto matchsyntax; case wordchar: DEBUG_PRINT1 ("EXECUTING Emacs wordchar.\n"); mcnt = (int) Sword; matchsyntax: PREFETCH (); if (SYNTAX (*d++) != (enum syntaxcode) mcnt) goto fail; SET_REGS_MATCHED (); break; case notsyntaxspec: DEBUG_PRINT2 ("EXECUTING notsyntaxspec %d.\n", mcnt); mcnt = *p++; goto matchnotsyntax; case notwordchar: DEBUG_PRINT1 ("EXECUTING Emacs notwordchar.\n"); mcnt = (int) Sword; matchnotsyntax: PREFETCH (); if (SYNTAX (*d++) == (enum syntaxcode) mcnt) goto fail; SET_REGS_MATCHED (); break; #else /* not emacs */ case wordchar: DEBUG_PRINT1 ("EXECUTING non-Emacs wordchar.\n"); PREFETCH (); if (!WORDCHAR_P (d)) goto fail; SET_REGS_MATCHED (); d++; break; case notwordchar: DEBUG_PRINT1 ("EXECUTING non-Emacs notwordchar.\n"); PREFETCH (); if (WORDCHAR_P (d)) goto fail; SET_REGS_MATCHED (); d++; break; #endif /* not emacs */ default: abort (); } continue; /* Successfully executed one pattern command; keep going. */ /* We goto here if a matching operation fails. */ fail: if (!FAIL_STACK_EMPTY ()) { /* A restart point is known. Restore to that state. */ DEBUG_PRINT1 ("\nFAIL:\n"); POP_FAILURE_POINT (d, p, lowest_active_reg, highest_active_reg, regstart, regend, reg_info); /* If this failure point is a dummy, try the next one. */ if (!p) goto fail; /* If we failed to the end of the pattern, don't examine *p. */ assert (p <= pend); if (p < pend) { boolean is_a_jump_n = false; /* If failed to a backwards jump that's part of a repetition loop, need to pop this failure point and use the next one. */ switch ((re_opcode_t) *p) { case jump_n: is_a_jump_n = true; case maybe_pop_jump: case pop_failure_jump: case jump: p1 = p + 1; EXTRACT_NUMBER_AND_INCR (mcnt, p1); p1 += mcnt; if ((is_a_jump_n && (re_opcode_t) *p1 == succeed_n) || (!is_a_jump_n && (re_opcode_t) *p1 == on_failure_jump)) goto fail; break; default: /* do nothing */ ; } } if (d >= string1 && d <= end1) dend = end_match_1; } else break; /* Matching at this starting point really fails. */ } /* for (;;) */ if (best_regs_set) goto restore_best_regs; FREE_VARIABLES (); return -1; /* Failure to match. */ } /* re_match_2 */ /* Subroutine definitions for re_match_2. */ /* We are passed P pointing to a register number after a start_memory. Return true if the pattern up to the corresponding stop_memory can match the empty string, and false otherwise. If we find the matching stop_memory, sets P to point to one past its number. Otherwise, sets P to an undefined byte less than or equal to END. We don't handle duplicates properly (yet). */ static boolean group_match_null_string_p (p, end, reg_info) unsigned char **p, *end; register_info_type *reg_info; { int mcnt; /* Point to after the args to the start_memory. */ unsigned char *p1 = *p + 2; while (p1 < end) { /* Skip over opcodes that can match nothing, and return true or false, as appropriate, when we get to one that can't, or to the matching stop_memory. */ switch ((re_opcode_t) *p1) { /* Could be either a loop or a series of alternatives. */ case on_failure_jump: p1++; EXTRACT_NUMBER_AND_INCR (mcnt, p1); /* If the next operation is not a jump backwards in the pattern. */ if (mcnt >= 0) { /* Go through the on_failure_jumps of the alternatives, seeing if any of the alternatives cannot match nothing. The last alternative starts with only a jump, whereas the rest start with on_failure_jump and end with a jump, e.g., here is the pattern for `a|b|c': /on_failure_jump/0/6/exactn/1/a/jump_past_alt/0/6 /on_failure_jump/0/6/exactn/1/b/jump_past_alt/0/3 /exactn/1/c So, we have to first go through the first (n-1) alternatives and then deal with the last one separately. */ /* Deal with the first (n-1) alternatives, which start with an on_failure_jump (see above) that jumps to right past a jump_past_alt. */ while ((re_opcode_t) p1[mcnt-3] == jump_past_alt) { /* `mcnt' holds how many bytes long the alternative is, including the ending `jump_past_alt' and its number. */ if (!alt_match_null_string_p (p1, p1 + mcnt - 3, reg_info)) return false; /* Move to right after this alternative, including the jump_past_alt. */ p1 += mcnt; /* Break if it's the beginning of an n-th alternative that doesn't begin with an on_failure_jump. */ if ((re_opcode_t) *p1 != on_failure_jump) break; /* Still have to check that it's not an n-th alternative that starts with an on_failure_jump. */ p1++; EXTRACT_NUMBER_AND_INCR (mcnt, p1); if ((re_opcode_t) p1[mcnt-3] != jump_past_alt) { /* Get to the beginning of the n-th alternative. */ p1 -= 3; break; } } /* Deal with the last alternative: go back and get number of the `jump_past_alt' just before it. `mcnt' contains the length of the alternative. */ EXTRACT_NUMBER (mcnt, p1 - 2); if (!alt_match_null_string_p (p1, p1 + mcnt, reg_info)) return false; p1 += mcnt; /* Get past the n-th alternative. */ } /* if mcnt > 0 */ break; case stop_memory: assert (p1[1] == **p); *p = p1 + 2; return true; default: if (!common_op_match_null_string_p (&p1, end, reg_info)) return false; } } /* while p1 < end */ return false; } /* group_match_null_string_p */ /* Similar to group_match_null_string_p, but doesn't deal with alternatives: It expects P to be the first byte of a single alternative and END one byte past the last. The alternative can contain groups. */ static boolean alt_match_null_string_p (p, end, reg_info) unsigned char *p, *end; register_info_type *reg_info; { int mcnt; unsigned char *p1 = p; while (p1 < end) { /* Skip over opcodes that can match nothing, and break when we get to one that can't. */ switch ((re_opcode_t) *p1) { /* It's a loop. */ case on_failure_jump: p1++; EXTRACT_NUMBER_AND_INCR (mcnt, p1); p1 += mcnt; break; default: if (!common_op_match_null_string_p (&p1, end, reg_info)) return false; } } /* while p1 < end */ return true; } /* alt_match_null_string_p */ /* Deals with the ops common to group_match_null_string_p and alt_match_null_string_p. Sets P to one after the op and its arguments, if any. */ static boolean common_op_match_null_string_p (p, end, reg_info) unsigned char **p, *end; register_info_type *reg_info; { int mcnt; boolean ret; int reg_no; unsigned char *p1 = *p; switch ((re_opcode_t) *p1++) { case no_op: case begline: case endline: case begbuf: case endbuf: case wordbeg: case wordend: case wordbound: case notwordbound: #ifdef emacs case before_dot: case at_dot: case after_dot: #endif break; case start_memory: reg_no = *p1; assert (reg_no > 0 && reg_no <= MAX_REGNUM); ret = group_match_null_string_p (&p1, end, reg_info); /* Have to set this here in case we're checking a group which contains a group and a back reference to it. */ if (REG_MATCH_NULL_STRING_P (reg_info[reg_no]) == MATCH_NULL_UNSET_VALUE) REG_MATCH_NULL_STRING_P (reg_info[reg_no]) = ret; if (!ret) return false; break; /* If this is an optimized succeed_n for zero times, make the jump. */ case jump: EXTRACT_NUMBER_AND_INCR (mcnt, p1); if (mcnt >= 0) p1 += mcnt; else return false; break; case succeed_n: /* Get to the number of times to succeed. */ p1 += 2; EXTRACT_NUMBER_AND_INCR (mcnt, p1); if (mcnt == 0) { p1 -= 4; EXTRACT_NUMBER_AND_INCR (mcnt, p1); p1 += mcnt; } else return false; break; case duplicate: if (!REG_MATCH_NULL_STRING_P (reg_info[*p1])) return false; break; case set_number_at: p1 += 4; default: /* All other opcodes mean we cannot match the empty string. */ return false; } *p = p1; return true; } /* common_op_match_null_string_p */ /* Return zero if TRANSLATE[S1] and TRANSLATE[S2] are identical for LEN bytes; nonzero otherwise. */ static int bcmp_translate (s1, s2, len, translate) unsigned char *s1, *s2; register int len; char *translate; { register unsigned char *p1 = s1, *p2 = s2; while (len) { if (translate[*p1++] != translate[*p2++]) return 1; len--; } return 0; } /* Entry points for GNU code. */ /* re_compile_pattern is the GNU regular expression compiler: it compiles PATTERN (of length SIZE) and puts the result in BUFP. Returns 0 if the pattern was valid, otherwise an error string. Assumes the `allocated' (and perhaps `buffer') and `translate' fields are set in BUFP on entry. We call regex_compile to do the actual compilation. */ const char * re_compile_pattern (pattern, length, bufp) const char *pattern; int length; struct re_pattern_buffer *bufp; { reg_errcode_t ret; /* GNU code is written to assume at least RE_NREGS registers will be set (and at least one extra will be -1). */ bufp->regs_allocated = REGS_UNALLOCATED; /* And GNU code determines whether or not to get register information by passing null for the REGS argument to re_match, etc., not by setting no_sub. */ bufp->no_sub = 0; /* Match anchors at newline. */ bufp->newline_anchor = 1; ret = regex_compile (pattern, length, re_syntax_options, bufp); return re_error_msg[(int) ret]; } /* Entry points compatible with 4.2 BSD regex library. We don't define them if this is an Emacs or POSIX compilation. */ #if !defined (emacs) && !defined (_POSIX_SOURCE) /* BSD has one and only one pattern buffer. */ static struct re_pattern_buffer re_comp_buf; char * re_comp (s) const char *s; { reg_errcode_t ret; if (!s) { if (!re_comp_buf.buffer) return "No previous regular expression"; return 0; } if (!re_comp_buf.buffer) { re_comp_buf.buffer = (unsigned char *) malloc (200); if (re_comp_buf.buffer == NULL) return "Memory exhausted"; re_comp_buf.allocated = 200; re_comp_buf.fastmap = (char *) malloc (1 << BYTEWIDTH); if (re_comp_buf.fastmap == NULL) return "Memory exhausted"; } /* Since `re_exec' always passes NULL for the `regs' argument, we don't need to initialize the pattern buffer fields which affect it. */ /* Match anchors at newlines. */ re_comp_buf.newline_anchor = 1; ret = regex_compile (s, strlen (s), re_syntax_options, &re_comp_buf); /* Yes, we're discarding `const' here. */ return (char *) re_error_msg[(int) ret]; } int re_exec (s) const char *s; { const int len = strlen (s); return 0 <= re_search (&re_comp_buf, s, len, 0, len, (struct re_registers *) 0); } #endif /* not emacs and not _POSIX_SOURCE */ /* POSIX.2 functions. Don't define these for Emacs. */ #ifndef emacs /* regcomp takes a regular expression as a string and compiles it. PREG is a regex_t *. We do not expect any fields to be initialized, since POSIX says we shouldn't. Thus, we set `buffer' to the compiled pattern; `used' to the length of the compiled pattern; `syntax' to RE_SYNTAX_POSIX_EXTENDED if the REG_EXTENDED bit in CFLAGS is set; otherwise, to RE_SYNTAX_POSIX_BASIC; `newline_anchor' to REG_NEWLINE being set in CFLAGS; `fastmap' and `fastmap_accurate' to zero; `re_nsub' to the number of subexpressions in PATTERN. PATTERN is the address of the pattern string. CFLAGS is a series of bits which affect compilation. If REG_EXTENDED is set, we use POSIX extended syntax; otherwise, we use POSIX basic syntax. If REG_NEWLINE is set, then . and [^...] don't match newline. Also, regexec will try a match beginning after every newline. If REG_ICASE is set, then we considers upper- and lowercase versions of letters to be equivalent when matching. If REG_NOSUB is set, then when PREG is passed to regexec, that routine will report only success or failure, and nothing about the registers. It returns 0 if it succeeds, nonzero if it doesn't. (See regex.h for the return codes and their meanings.) */ int regcomp (preg, pattern, cflags) regex_t *preg; const char *pattern; int cflags; { reg_errcode_t ret; unsigned syntax = (cflags & REG_EXTENDED) ? RE_SYNTAX_POSIX_EXTENDED : RE_SYNTAX_POSIX_BASIC; /* regex_compile will allocate the space for the compiled pattern. */ preg->buffer = 0; preg->allocated = 0; /* Don't bother to use a fastmap when searching. This simplifies the REG_NEWLINE case: if we used a fastmap, we'd have to put all the characters after newlines into the fastmap. This way, we just try every character. */ preg->fastmap = 0; if (cflags & REG_ICASE) { unsigned i; preg->translate = (char *) malloc (CHAR_SET_SIZE); if (preg->translate == NULL) return (int) REG_ESPACE; /* Map uppercase characters to corresponding lowercase ones. */ for (i = 0; i < CHAR_SET_SIZE; i++) preg->translate[i] = ISUPPER (i) ? tolower (i) : i; } else preg->translate = NULL; /* If REG_NEWLINE is set, newlines are treated differently. */ if (cflags & REG_NEWLINE) { /* REG_NEWLINE implies neither . nor [^...] match newline. */ syntax &= ~RE_DOT_NEWLINE; syntax |= RE_HAT_LISTS_NOT_NEWLINE; /* It also changes the matching behavior. */ preg->newline_anchor = 1; } else preg->newline_anchor = 0; preg->no_sub = !!(cflags & REG_NOSUB); /* POSIX says a null character in the pattern terminates it, so we can use strlen here in compiling the pattern. */ ret = regex_compile (pattern, strlen (pattern), syntax, preg); /* POSIX doesn't distinguish between an unmatched open-group and an unmatched close-group: both are REG_EPAREN. */ if (ret == REG_ERPAREN) ret = REG_EPAREN; return (int) ret; } /* regexec searches for a given pattern, specified by PREG, in the string STRING. If NMATCH is zero or REG_NOSUB was set in the cflags argument to `regcomp', we ignore PMATCH. Otherwise, we assume PMATCH has at least NMATCH elements, and we set them to the offsets of the corresponding matched substrings. EFLAGS specifies `execution flags' which affect matching: if REG_NOTBOL is set, then ^ does not match at the beginning of the string; if REG_NOTEOL is set, then $ does not match at the end. We return 0 if we find a match and REG_NOMATCH if not. */ int regexec (preg, string, nmatch, pmatch, eflags) const regex_t *preg; const char *string; size_t nmatch; regmatch_t pmatch[]; int eflags; { int ret; struct re_registers regs; regex_t private_preg; int len = strlen (string); boolean want_reg_info = !preg->no_sub && nmatch > 0; private_preg = *preg; private_preg.not_bol = !!(eflags & REG_NOTBOL); private_preg.not_eol = !!(eflags & REG_NOTEOL); /* The user has told us exactly how many registers to return information about, via `nmatch'. We have to pass that on to the matching routines. */ private_preg.regs_allocated = REGS_FIXED; if (want_reg_info) { regs.num_regs = nmatch; regs.start = TALLOC (nmatch, regoff_t); regs.end = TALLOC (nmatch, regoff_t); if (regs.start == NULL || regs.end == NULL) return (int) REG_NOMATCH; } /* Perform the searching operation. */ ret = re_search (&private_preg, string, len, /* start: */ 0, /* range: */ len, want_reg_info ? ®s : (struct re_registers *) 0); /* Copy the register information to the POSIX structure. */ if (want_reg_info) { if (ret >= 0) { unsigned r; for (r = 0; r < nmatch; r++) { pmatch[r].rm_so = regs.start[r]; pmatch[r].rm_eo = regs.end[r]; } } /* If we needed the temporary register info, free the space now. */ free (regs.start); free (regs.end); } /* We want zero return to mean success, unlike `re_search'. */ return ret >= 0 ? (int) REG_NOERROR : (int) REG_NOMATCH; } /* Returns a message corresponding to an error code, ERRCODE, returned from either regcomp or regexec. We don't use PREG here. */ size_t regerror (errcode, preg, errbuf, errbuf_size) int errcode; const regex_t *preg; char *errbuf; size_t errbuf_size; { const char *msg; size_t msg_size; if (errcode < 0 || errcode >= (sizeof (re_error_msg) / sizeof (re_error_msg[0]))) /* Only error codes returned by the rest of the code should be passed to this routine. If we are given anything else, or if other regex code generates an invalid error code, then the program has a bug. Dump core so we can fix it. */ abort (); msg = re_error_msg[errcode]; /* POSIX doesn't require that we do anything in this case, but why not be nice. */ if (! msg) msg = "Success"; msg_size = strlen (msg) + 1; /* Includes the null. */ if (errbuf_size != 0) { if (msg_size > errbuf_size) { strncpy (errbuf, msg, errbuf_size - 1); errbuf[errbuf_size - 1] = 0; } else strcpy (errbuf, msg); } return msg_size; } /* Free dynamically allocated space used by PREG. */ void regfree (preg) regex_t *preg; { if (preg->buffer != NULL) free (preg->buffer); preg->buffer = NULL; preg->allocated = 0; preg->used = 0; if (preg->fastmap != NULL) free (preg->fastmap); preg->fastmap = NULL; preg->fastmap_accurate = 0; if (preg->translate != NULL) free (preg->translate); preg->translate = NULL; } #endif /* not emacs */ /* Local variables: make-backup-files: t version-control: t trim-versions-without-asking: nil End: */ swish-e-2.4.7/src/vms/descrip_vax.mms0000775000077100017500000001512611166010104014447 00000000000000# # Makefile derived from the Makefile coming with swish-e 1.3.2 # (original Makefile for SWISH Kevin Hughes, 3/12/95) # # The code has been tested to compile on OpenVMS 7.3 # JF. Piéronne jf.pieronne@laposte.net 29-Mar-2003 # # autoconf configuration by Bas Meijer, 1 June 2000 # Cross Platform Compilation on Solaris, HP-UX, IRIX and Linux # Several ideas from a Makefile by Christian Lindig # NAME = swish-e.exe # C compiler CC = CC SHELL = /bin/sh prefix = @prefix@ bindir = $(prefix)/bin mandir = $(prefix)/man man1dir = $(mandir)/man1 # Flags for C compiler #CWARN= CDEF = /def=(VMS,HAVE_CONFIG_H,STDC_HEADERS, REGEX_MALLOC) CINCL= /include=([.expat.xmlparse],[.expat.xmltok],libz:) CWARN= #CDEBUG= /debug/noopt CDEBUG= CFLAGS = /prefix=all$(CINCL)$(CDEF)$(CWARN)$(CDEBUG)/name=short #LINKFLAGS = /debug LINKFLAGS = LIBS= # # The objects for the different methods and # some common aliases # FILESYSTEM_OBJS=fs.obj HTTP_OBJS=http.obj httpserver.obj FS_OBJS=$(FILESYSTEM_OBJS) WEB_OBJS=$(HTTP_OBJS) VMS_OBJS = regex.obj VSNPRINTF_OBJ = vsnprintf.obj OBJS= check.obj file.obj index.obj search.obj error.obj methods.obj\ hash.obj list.obj mem.obj merge.obj swish2.obj stemmer.obj \ soundex.obj docprop.obj compress.obj xml.obj txt.obj \ metanames.obj result_sort.obj html.obj \ filter.obj parse_conffile.obj result_output.obj date_time.obj \ keychar_out.obj extprog.obj bash.obj db_native.obj dump.obj \ entities.obj swish_words.obj \ proplimit.obj swish_qsort.obj ramdisk.obj rank.obj \ xmlparse.obj xmltok.obj xmlrole.obj swregex.obj vsnprintf.obj \ double_metaphone.obj db_read.obj db_write.obj swstring.obj \ pre_sort.obj headers.obj docprop_write.obj stemmer.obj\ $(FILESYSTEM_OBJS) $(HTTP_OBJS) $(VMS_OBJS)\ api.obj stem_de.obj stem_dk.obj stem_en1.obj stem_en2.obj stem_es.obj\ stem_fi.obj stem_fr.obj stem_it.obj stem_nl.obj stem_no.obj \ stem_pt.obj stem_ru.obj stem_se.obj utilities.obj all : acconfig.h $(NAME) swish-search.exe ! libtest.exe ! xmlparse.obj : [.expat.xmlparse]xmlparse.c xmltok.obj : [.expat.xmltok]xmltok.c xmlrole.obj : [.expat.xmltok]xmlrole.c $(NAME) : $(OBJS) libswish-e.olb swish.obj link/exe=$(MMS$TARGET) $(LINKFLAGS) - swish.obj, libswish-e.olb/lib, libz:libz.olb/lib libtest.exe : libtest.obj libswish-e.olb swish.obj link/exe=$(MMS$TARGET) $(LINKFLAGS) libtest.obj, libswish-e.olb/lib libswish-e.olb : $(OBJS) library/create $(MMS$TARGET) $(MMS$SOURCE_LIST) swish-search.exe : $(NAME) copy $(NAME) swish-search.exe regex.obj : [.vms]regex.c [.vms]descrip_vax.mms acconfig.h : [.vms]acconfig.h_vms copy $(MMS$SOURCE) $(MMS$TARGET) clean : delete [...]*.obj;*, [...]*.olb;*, index.swish;*, [-.tests]*.index;* realclean : pur [-...] delete [...]*.exe;*, [...]*.obj;*, [...]*.olb;*, index.swish;*, acconfig.h;*, [-.tests]*.index;* test : $(NAME) set def [-.tests] mc [-.src]swish-e -c test.config write sys$output "test 1 (Normal search) ..." mc [-.src]swish-e -f test.index -w test write sys$output "test 1 (MetaTag search 1) ..." mc [-.src]swish-e -f test.index -w meta1=metatest1 write sys$output "test 1 (MetaTag search 2) ..." mc [-.src]swish-e -f test.index -w meta2=metatest2 write sys$output "test 1 (XML search) ..." mc [-.src]swish-e -f ./test.index -w meta3=metatest3 write sys$output "test 1 (Phrase search) ..." mc [-.src]swish-e -f test.index -w """three little pigs""" $(OBJS) : [.vms]descrip_vax.mms config.h swish.h acconfig.h swish.obj : [.vms]descrip_vax.mms config.h swish.h acconfig.h install : ! man : ! # # dependencies # check.obj : check.c swish.h config.h check.h hash.h compress.obj : compress.c swish.h config.h error.h mem.h docprop.h index.h search.h merge.h compress.h deflate.obj : deflate.c swish.h config.h error.h mem.h docprop.h index.h search.h merge.h deflate.h docprop.obj : docprop.c swish.h config.h file.h hash.h mem.h merge.h \ error.h search.h docprop.h compress.h error.obj : error.c swish.h config.h error.h file.obj : file.c swish.h config.h file.h mem.h error.h list.h \ hash.h index.h fs.obj : fs.c swish.h config.h index.h hash.h mem.h file.h \ list.h hash.obj : hash.c swish.h config.h hash.h mem.h http.obj : http.c swish.h config.h index.h hash.h mem.h file.h \ http.h httpserver.h httpserver.obj : httpserver.c swish.h config.h mem.h http.h \ httpserver.h index.obj : index.c swish.h config.h index.h hash.h mem.h \ check.h search.h docprop.h stemmer.h compress.h list.obj : list.c swish.h config.h list.h mem.h mem.obj : mem.c swish.h config.h mem.h error.h merge.obj : merge.c swish.h config.h merge.h error.h search.h index.h \ hash.h mem.h docprop.h compress.h methods.obj : methods.c swish.h config.h search.obj : search.c swish.h config.h search.h file.h list.h \ merge.h hash.h mem.h docprop.h stemmer.h compress.h stemmer.obj : stemmer.c swish.h config.h stemmer.h soundex.obj : soundex.c swish.h config.h stemmer.h swish2.obj : swish2.c swish.h config.h error.h list.h search.h index.h \ file.h merge.h docprop.h swish.obj : swish.c swish.h config.h error.h list.h search.h index.h \ file.h merge.h docprop.h libtest.obj : libtest.c swish.h config.h error.h list.h search.h index.h \ file.h merge.h docprop.h txt.obj : txt.c txt.h swish.h mem.h index.h xml.obj : xml.c txt.h swish.h mem.h index.h proplimi.obj : swish.h mem.h merge.h docprop.h index.h metanames.h \ compress.h error.h db.h result_sort.h swish_qsort.h proplimit.h metanames.obj : metanames.c result_sort.obj : result_sort.c html.obj : html.c filter.obj : filter.c parse_conffile.obj : parse_conffile.c result_output.obj : result_output.c date_time.obj : date_time.c keychar_out.obj : keychar_out.c extprog.obj : extprog.c bash.obj : bash.c db_native.obj : db_native.c dump.obj : dump.c entities.obj : entities.c swish_words.obj : swish_words.c proplimit.obj : proplimit.c swish_qsort.obj : swish_qsort.c ramdisk.obj : ramdisk.c rank.obj : rank.c swregex.obj : swregex.c double_metaphone.obj : double_metaphone.c vsnprintf.obj : [.replace]vsnprintf.c db_read.obj : db_read.c db_write.obj : db_write.c swstring.obj : swstring.c pre_sort.obj : pre_sort.c hearders.obj : headers.c docprop_write.obj : docprop_write.c api.obj : [.snowball]api.c stem_de.obj : [.snowball]stem_de.c stem_dk.obj : [.snowball]stem_dk.c stem_en1.obj : [.snowball]stem_en1.c stem_en2.obj : [.snowball]stem_en2.c stem_es.obj : [.snowball]stem_es.c stem_fi.obj : [.snowball]stem_fi.c stem_fr.obj : [.snowball]stem_fr.c stem_it.obj : [.snowball]stem_it.c stem_nl.obj : [.snowball]stem_nl.c stem_no.obj : [.snowball]stem_no.c stem_pt.obj : [.snowball]stem_pt.c stem_ru.obj : [.snowball]stem_ru.c stem_se.obj : [.snowball]stem_se.c utilities.obj : [.snowball]utilities.c swish-e-2.4.7/src/expat/0000777000077100017500000000000011166013172012020 500000000000000swish-e-2.4.7/src/expat/Makefile.am0000664000077100017500000000112711166010107013766 00000000000000AM_CPPFLAGS = -I"$(srcdir)/xmlparse" -I"$(srcdir)/xmltok" noinst_LTLIBRARIES = libswexpat.la libswexpat_la_SOURCES = \ xmltok.c \ xmlrole.c \ xmlparse.c EXTRA_DIST = \ COPYING \ expat.dsw \ xmlparse/xmlparse.c \ xmlparse/xmlparse.dsp \ xmlparse/xmlparse.h \ xmltok/ascii.h \ xmltok/asciitab.h \ xmltok/dllmain.c \ xmltok/iasciitab.h \ xmltok/latin1tab.h \ xmltok/nametab.h \ xmltok/utf8tab.h \ xmltok/xmldef.h \ xmltok/xmlrole.c \ xmltok/xmlrole.h \ xmltok/xmltok.c \ xmltok/xmltok.dsp \ xmltok/xmltok.h \ xmltok/xmltok_impl.c \ xmltok/xmltok_impl.h \ xmltok/xmltok_ns.c swish-e-2.4.7/src/expat/expat.dsw0000775000077100017500000000245411166010107013601 00000000000000Microsoft Developer Studio Workspace File, Format Version 6.00 # WARNING: DO NOT EDIT OR DELETE THIS WORKSPACE FILE! ############################################################################### Project: "gennmtab"=.\gennmtab\gennmtab.dsp - Package Owner=<4> Package=<5> {{{ }}} Package=<4> {{{ }}} ############################################################################### Project: "xmlparse"=.\xmlparse\xmlparse.dsp - Package Owner=<4> Package=<5> {{{ }}} Package=<4> {{{ Begin Project Dependency Project_Dep_Name xmltok End Project Dependency }}} ############################################################################### Project: "xmltok"=.\xmltok\xmltok.dsp - Package Owner=<4> Package=<5> {{{ }}} Package=<4> {{{ Begin Project Dependency Project_Dep_Name gennmtab End Project Dependency }}} ############################################################################### Project: "xmlwf"=.\xmlwf\xmlwf.dsp - Package Owner=<4> Package=<5> {{{ }}} Package=<4> {{{ Begin Project Dependency Project_Dep_Name xmlparse End Project Dependency }}} ############################################################################### Global: Package=<5> {{{ }}} Package=<3> {{{ }}} ############################################################################### swish-e-2.4.7/src/expat/xmlparse.c0000664000077100017500000032221711166010107013737 00000000000000/* Copyright (c) 1998, 1999, 2000 Thai Open Source Software Center Ltd See the file copying.txt for copying permission. */ #include "xmldef.h" #include "xmlparse.h" #include #ifdef XML_UNICODE #define XML_ENCODE_MAX XML_UTF16_ENCODE_MAX #define XmlConvert XmlUtf16Convert #define XmlGetInternalEncoding XmlGetUtf16InternalEncoding #define XmlGetInternalEncodingNS XmlGetUtf16InternalEncodingNS #define XmlEncode XmlUtf16Encode #define MUST_CONVERT(enc, s) (!(enc)->isUtf16 || (((unsigned long)s) & 1)) typedef unsigned short ICHAR; #else #define XML_ENCODE_MAX XML_UTF8_ENCODE_MAX #define XmlConvert XmlUtf8Convert #define XmlGetInternalEncoding XmlGetUtf8InternalEncoding #define XmlGetInternalEncodingNS XmlGetUtf8InternalEncodingNS #define XmlEncode XmlUtf8Encode #define MUST_CONVERT(enc, s) (!(enc)->isUtf8) typedef char ICHAR; #endif #ifndef XML_NS #define XmlInitEncodingNS XmlInitEncoding #define XmlInitUnknownEncodingNS XmlInitUnknownEncoding #undef XmlGetInternalEncodingNS #define XmlGetInternalEncodingNS XmlGetInternalEncoding #define XmlParseXmlDeclNS XmlParseXmlDecl #endif #ifdef XML_UNICODE_WCHAR_T #define XML_T(x) L ## x #else #define XML_T(x) x #endif /* Round up n to be a multiple of sz, where sz is a power of 2. */ #define ROUND_UP(n, sz) (((n) + ((sz) - 1)) & ~((sz) - 1)) #include "xmltok.h" #include "xmlrole.h" typedef const XML_Char *KEY; typedef struct { KEY name; } NAMED; typedef struct { NAMED **v; size_t size; size_t used; size_t usedLim; } HASH_TABLE; typedef struct { NAMED **p; NAMED **end; } HASH_TABLE_ITER; #define INIT_TAG_BUF_SIZE 32 /* must be a multiple of sizeof(XML_Char) */ #define INIT_DATA_BUF_SIZE 1024 #define INIT_ATTS_SIZE 16 #define INIT_BLOCK_SIZE 1024 #define INIT_BUFFER_SIZE 1024 #define EXPAND_SPARE 24 typedef struct binding { struct prefix *prefix; struct binding *nextTagBinding; struct binding *prevPrefixBinding; const struct attribute_id *attId; XML_Char *uri; int uriLen; int uriAlloc; } BINDING; typedef struct prefix { const XML_Char *name; BINDING *binding; } PREFIX; typedef struct { const XML_Char *str; const XML_Char *localPart; int uriLen; } TAG_NAME; typedef struct tag { struct tag *parent; const char *rawName; int rawNameLength; TAG_NAME name; char *buf; char *bufEnd; BINDING *bindings; } TAG; typedef struct { const XML_Char *name; const XML_Char *textPtr; int textLen; const XML_Char *systemId; const XML_Char *base; const XML_Char *publicId; const XML_Char *notation; char open; } ENTITY; typedef struct block { struct block *next; int size; XML_Char s[1]; } BLOCK; typedef struct { BLOCK *blocks; BLOCK *freeBlocks; const XML_Char *end; XML_Char *ptr; XML_Char *start; } STRING_POOL; /* The XML_Char before the name is used to determine whether an attribute has been specified. */ typedef struct attribute_id { XML_Char *name; PREFIX *prefix; char maybeTokenized; char xmlns; } ATTRIBUTE_ID; typedef struct { const ATTRIBUTE_ID *id; char isCdata; const XML_Char *value; } DEFAULT_ATTRIBUTE; typedef struct { const XML_Char *name; PREFIX *prefix; const ATTRIBUTE_ID *idAtt; int nDefaultAtts; int allocDefaultAtts; DEFAULT_ATTRIBUTE *defaultAtts; } ELEMENT_TYPE; typedef struct { HASH_TABLE generalEntities; HASH_TABLE elementTypes; HASH_TABLE attributeIds; HASH_TABLE prefixes; STRING_POOL pool; int complete; int standalone; #ifdef XML_DTD HASH_TABLE paramEntities; #endif /* XML_DTD */ PREFIX defaultPrefix; } DTD; typedef struct open_internal_entity { const char *internalEventPtr; const char *internalEventEndPtr; struct open_internal_entity *next; ENTITY *entity; } OPEN_INTERNAL_ENTITY; typedef enum XML_Error Processor(XML_Parser parser, const char *start, const char *end, const char **endPtr); static Processor prologProcessor; static Processor prologInitProcessor; static Processor contentProcessor; static Processor cdataSectionProcessor; #ifdef XML_DTD static Processor ignoreSectionProcessor; #endif /* XML_DTD */ static Processor epilogProcessor; static Processor errorProcessor; static Processor externalEntityInitProcessor; static Processor externalEntityInitProcessor2; static Processor externalEntityInitProcessor3; static Processor externalEntityContentProcessor; static enum XML_Error handleUnknownEncoding(XML_Parser parser, const XML_Char *encodingName); static enum XML_Error processXmlDecl(XML_Parser parser, int isGeneralTextEntity, const char *, const char *); static enum XML_Error initializeEncoding(XML_Parser parser); static enum XML_Error doProlog(XML_Parser parser, const ENCODING *enc, const char *s, const char *end, int tok, const char *next, const char **nextPtr); #ifdef XML_DTD static enum XML_Error processInternalParamEntity(XML_Parser parser, ENTITY *entity); #endif static enum XML_Error doContent(XML_Parser parser, int startTagLevel, const ENCODING *enc, const char *start, const char *end, const char **endPtr); static enum XML_Error doCdataSection(XML_Parser parser, const ENCODING *, const char **startPtr, const char *end, const char **nextPtr); #ifdef XML_DTD static enum XML_Error doIgnoreSection(XML_Parser parser, const ENCODING *, const char **startPtr, const char *end, const char **nextPtr); #endif /* XML_DTD */ static enum XML_Error storeAtts(XML_Parser parser, const ENCODING *, const char *s, TAG_NAME *tagNamePtr, BINDING **bindingsPtr); static int addBinding(XML_Parser parser, PREFIX *prefix, const ATTRIBUTE_ID *attId, const XML_Char *uri, BINDING **bindingsPtr); static int defineAttribute(ELEMENT_TYPE *type, ATTRIBUTE_ID *, int isCdata, int isId, const XML_Char *dfltValue); static enum XML_Error storeAttributeValue(XML_Parser parser, const ENCODING *, int isCdata, const char *, const char *, STRING_POOL *); static enum XML_Error appendAttributeValue(XML_Parser parser, const ENCODING *, int isCdata, const char *, const char *, STRING_POOL *); static ATTRIBUTE_ID * getAttributeId(XML_Parser parser, const ENCODING *enc, const char *start, const char *end); static int setElementTypePrefix(XML_Parser parser, ELEMENT_TYPE *); static enum XML_Error storeEntityValue(XML_Parser parser, const ENCODING *enc, const char *start, const char *end); static int reportProcessingInstruction(XML_Parser parser, const ENCODING *enc, const char *start, const char *end); static int reportComment(XML_Parser parser, const ENCODING *enc, const char *start, const char *end); static void reportDefault(XML_Parser parser, const ENCODING *enc, const char *start, const char *end); static const XML_Char *getContext(XML_Parser parser); static int setContext(XML_Parser parser, const XML_Char *context); static void normalizePublicId(XML_Char *s); static int dtdInit(DTD *); static void dtdDestroy(DTD *); static int dtdCopy(DTD *newDtd, const DTD *oldDtd); static int copyEntityTable(HASH_TABLE *, STRING_POOL *, const HASH_TABLE *); #ifdef XML_DTD static void dtdSwap(DTD *, DTD *); #endif /* XML_DTD */ static NAMED *lookup(HASH_TABLE *table, KEY name, size_t createSize); static void hashTableInit(HASH_TABLE *); static void hashTableDestroy(HASH_TABLE *); static void hashTableIterInit(HASH_TABLE_ITER *, const HASH_TABLE *); static NAMED *hashTableIterNext(HASH_TABLE_ITER *); static void poolInit(STRING_POOL *); static void poolClear(STRING_POOL *); static void poolDestroy(STRING_POOL *); static XML_Char *poolAppend(STRING_POOL *pool, const ENCODING *enc, const char *ptr, const char *end); static XML_Char *poolStoreString(STRING_POOL *pool, const ENCODING *enc, const char *ptr, const char *end); static int poolGrow(STRING_POOL *pool); static const XML_Char *poolCopyString(STRING_POOL *pool, const XML_Char *s); static const XML_Char *poolCopyStringN(STRING_POOL *pool, const XML_Char *s, int n); #define poolStart(pool) ((pool)->start) #define poolEnd(pool) ((pool)->ptr) #define poolLength(pool) ((pool)->ptr - (pool)->start) #define poolChop(pool) ((void)--(pool->ptr)) #define poolLastChar(pool) (((pool)->ptr)[-1]) #define poolDiscard(pool) ((pool)->ptr = (pool)->start) #define poolFinish(pool) ((pool)->start = (pool)->ptr) #define poolAppendChar(pool, c) \ (((pool)->ptr == (pool)->end && !poolGrow(pool)) \ ? 0 \ : ((*((pool)->ptr)++ = c), 1)) typedef struct { /* The first member must be userData so that the XML_GetUserData macro works. */ void *m_userData; void *m_handlerArg; char *m_buffer; /* first character to be parsed */ const char *m_bufferPtr; /* past last character to be parsed */ char *m_bufferEnd; /* allocated end of buffer */ const char *m_bufferLim; long m_parseEndByteIndex; const char *m_parseEndPtr; XML_Char *m_dataBuf; XML_Char *m_dataBufEnd; XML_StartElementHandler m_startElementHandler; XML_EndElementHandler m_endElementHandler; XML_CharacterDataHandler m_characterDataHandler; XML_ProcessingInstructionHandler m_processingInstructionHandler; XML_CommentHandler m_commentHandler; XML_StartCdataSectionHandler m_startCdataSectionHandler; XML_EndCdataSectionHandler m_endCdataSectionHandler; XML_DefaultHandler m_defaultHandler; XML_StartDoctypeDeclHandler m_startDoctypeDeclHandler; XML_EndDoctypeDeclHandler m_endDoctypeDeclHandler; XML_UnparsedEntityDeclHandler m_unparsedEntityDeclHandler; XML_NotationDeclHandler m_notationDeclHandler; XML_ExternalParsedEntityDeclHandler m_externalParsedEntityDeclHandler; XML_InternalParsedEntityDeclHandler m_internalParsedEntityDeclHandler; XML_StartNamespaceDeclHandler m_startNamespaceDeclHandler; XML_EndNamespaceDeclHandler m_endNamespaceDeclHandler; XML_NotStandaloneHandler m_notStandaloneHandler; XML_ExternalEntityRefHandler m_externalEntityRefHandler; void *m_externalEntityRefHandlerArg; XML_UnknownEncodingHandler m_unknownEncodingHandler; const ENCODING *m_encoding; INIT_ENCODING m_initEncoding; const ENCODING *m_internalEncoding; const XML_Char *m_protocolEncodingName; int m_ns; void *m_unknownEncodingMem; void *m_unknownEncodingData; void *m_unknownEncodingHandlerData; void (*m_unknownEncodingRelease)(void *); PROLOG_STATE m_prologState; Processor *m_processor; enum XML_Error m_errorCode; const char *m_eventPtr; const char *m_eventEndPtr; const char *m_positionPtr; OPEN_INTERNAL_ENTITY *m_openInternalEntities; int m_defaultExpandInternalEntities; int m_tagLevel; ENTITY *m_declEntity; const XML_Char *m_declNotationName; const XML_Char *m_declNotationPublicId; ELEMENT_TYPE *m_declElementType; ATTRIBUTE_ID *m_declAttributeId; char m_declAttributeIsCdata; char m_declAttributeIsId; DTD m_dtd; const XML_Char *m_curBase; TAG *m_tagStack; TAG *m_freeTagList; BINDING *m_inheritedBindings; BINDING *m_freeBindingList; int m_attsSize; int m_nSpecifiedAtts; int m_idAttIndex; ATTRIBUTE *m_atts; POSITION m_position; STRING_POOL m_tempPool; STRING_POOL m_temp2Pool; char *m_groupConnector; unsigned m_groupSize; int m_hadExternalDoctype; XML_Char m_namespaceSeparator; #ifdef XML_DTD enum XML_ParamEntityParsing m_paramEntityParsing; XML_Parser m_parentParser; #endif } Parser; #define userData (((Parser *)parser)->m_userData) #define handlerArg (((Parser *)parser)->m_handlerArg) #define startElementHandler (((Parser *)parser)->m_startElementHandler) #define endElementHandler (((Parser *)parser)->m_endElementHandler) #define characterDataHandler (((Parser *)parser)->m_characterDataHandler) #define processingInstructionHandler (((Parser *)parser)->m_processingInstructionHandler) #define commentHandler (((Parser *)parser)->m_commentHandler) #define startCdataSectionHandler (((Parser *)parser)->m_startCdataSectionHandler) #define endCdataSectionHandler (((Parser *)parser)->m_endCdataSectionHandler) #define defaultHandler (((Parser *)parser)->m_defaultHandler) #define startDoctypeDeclHandler (((Parser *)parser)->m_startDoctypeDeclHandler) #define endDoctypeDeclHandler (((Parser *)parser)->m_endDoctypeDeclHandler) #define unparsedEntityDeclHandler (((Parser *)parser)->m_unparsedEntityDeclHandler) #define notationDeclHandler (((Parser *)parser)->m_notationDeclHandler) #define externalParsedEntityDeclHandler (((Parser *)parser)->m_externalParsedEntityDeclHandler) #define internalParsedEntityDeclHandler (((Parser *)parser)->m_internalParsedEntityDeclHandler) #define startNamespaceDeclHandler (((Parser *)parser)->m_startNamespaceDeclHandler) #define endNamespaceDeclHandler (((Parser *)parser)->m_endNamespaceDeclHandler) #define notStandaloneHandler (((Parser *)parser)->m_notStandaloneHandler) #define externalEntityRefHandler (((Parser *)parser)->m_externalEntityRefHandler) #define externalEntityRefHandlerArg (((Parser *)parser)->m_externalEntityRefHandlerArg) #define unknownEncodingHandler (((Parser *)parser)->m_unknownEncodingHandler) #define encoding (((Parser *)parser)->m_encoding) #define initEncoding (((Parser *)parser)->m_initEncoding) #define internalEncoding (((Parser *)parser)->m_internalEncoding) #define unknownEncodingMem (((Parser *)parser)->m_unknownEncodingMem) #define unknownEncodingData (((Parser *)parser)->m_unknownEncodingData) #define unknownEncodingHandlerData \ (((Parser *)parser)->m_unknownEncodingHandlerData) #define unknownEncodingRelease (((Parser *)parser)->m_unknownEncodingRelease) #define protocolEncodingName (((Parser *)parser)->m_protocolEncodingName) #define ns (((Parser *)parser)->m_ns) #define prologState (((Parser *)parser)->m_prologState) #define processor (((Parser *)parser)->m_processor) #define errorCode (((Parser *)parser)->m_errorCode) #define eventPtr (((Parser *)parser)->m_eventPtr) #define eventEndPtr (((Parser *)parser)->m_eventEndPtr) #define positionPtr (((Parser *)parser)->m_positionPtr) #define position (((Parser *)parser)->m_position) #define openInternalEntities (((Parser *)parser)->m_openInternalEntities) #define defaultExpandInternalEntities (((Parser *)parser)->m_defaultExpandInternalEntities) #define tagLevel (((Parser *)parser)->m_tagLevel) #define buffer (((Parser *)parser)->m_buffer) #define bufferPtr (((Parser *)parser)->m_bufferPtr) #define bufferEnd (((Parser *)parser)->m_bufferEnd) #define parseEndByteIndex (((Parser *)parser)->m_parseEndByteIndex) #define parseEndPtr (((Parser *)parser)->m_parseEndPtr) #define bufferLim (((Parser *)parser)->m_bufferLim) #define dataBuf (((Parser *)parser)->m_dataBuf) #define dataBufEnd (((Parser *)parser)->m_dataBufEnd) #define dtd (((Parser *)parser)->m_dtd) #define curBase (((Parser *)parser)->m_curBase) #define declEntity (((Parser *)parser)->m_declEntity) #define declNotationName (((Parser *)parser)->m_declNotationName) #define declNotationPublicId (((Parser *)parser)->m_declNotationPublicId) #define declElementType (((Parser *)parser)->m_declElementType) #define declAttributeId (((Parser *)parser)->m_declAttributeId) #define declAttributeIsCdata (((Parser *)parser)->m_declAttributeIsCdata) #define declAttributeIsId (((Parser *)parser)->m_declAttributeIsId) #define freeTagList (((Parser *)parser)->m_freeTagList) #define freeBindingList (((Parser *)parser)->m_freeBindingList) #define inheritedBindings (((Parser *)parser)->m_inheritedBindings) #define tagStack (((Parser *)parser)->m_tagStack) #define atts (((Parser *)parser)->m_atts) #define attsSize (((Parser *)parser)->m_attsSize) #define nSpecifiedAtts (((Parser *)parser)->m_nSpecifiedAtts) #define idAttIndex (((Parser *)parser)->m_idAttIndex) #define tempPool (((Parser *)parser)->m_tempPool) #define temp2Pool (((Parser *)parser)->m_temp2Pool) #define groupConnector (((Parser *)parser)->m_groupConnector) #define groupSize (((Parser *)parser)->m_groupSize) #define hadExternalDoctype (((Parser *)parser)->m_hadExternalDoctype) #define namespaceSeparator (((Parser *)parser)->m_namespaceSeparator) #ifdef XML_DTD #define parentParser (((Parser *)parser)->m_parentParser) #define paramEntityParsing (((Parser *)parser)->m_paramEntityParsing) #endif /* XML_DTD */ #ifdef _MSC_VER #ifdef _DEBUG Parser *asParser(XML_Parser parser) { return parser; } #endif #endif XML_Parser XML_ParserCreate(const XML_Char *encodingName) { XML_Parser parser = malloc(sizeof(Parser)); if (!parser) return parser; processor = prologInitProcessor; XmlPrologStateInit(&prologState); userData = 0; handlerArg = 0; startElementHandler = 0; endElementHandler = 0; characterDataHandler = 0; processingInstructionHandler = 0; commentHandler = 0; startCdataSectionHandler = 0; endCdataSectionHandler = 0; defaultHandler = 0; startDoctypeDeclHandler = 0; endDoctypeDeclHandler = 0; unparsedEntityDeclHandler = 0; notationDeclHandler = 0; externalParsedEntityDeclHandler = 0; internalParsedEntityDeclHandler = 0; startNamespaceDeclHandler = 0; endNamespaceDeclHandler = 0; notStandaloneHandler = 0; externalEntityRefHandler = 0; externalEntityRefHandlerArg = parser; unknownEncodingHandler = 0; buffer = 0; bufferPtr = 0; bufferEnd = 0; parseEndByteIndex = 0; parseEndPtr = 0; bufferLim = 0; declElementType = 0; declAttributeId = 0; declEntity = 0; declNotationName = 0; declNotationPublicId = 0; memset(&position, 0, sizeof(POSITION)); errorCode = XML_ERROR_NONE; eventPtr = 0; eventEndPtr = 0; positionPtr = 0; openInternalEntities = 0; tagLevel = 0; tagStack = 0; freeTagList = 0; freeBindingList = 0; inheritedBindings = 0; attsSize = INIT_ATTS_SIZE; atts = malloc(attsSize * sizeof(ATTRIBUTE)); nSpecifiedAtts = 0; dataBuf = malloc(INIT_DATA_BUF_SIZE * sizeof(XML_Char)); groupSize = 0; groupConnector = 0; hadExternalDoctype = 0; unknownEncodingMem = 0; unknownEncodingRelease = 0; unknownEncodingData = 0; unknownEncodingHandlerData = 0; namespaceSeparator = '!'; #ifdef XML_DTD parentParser = 0; paramEntityParsing = XML_PARAM_ENTITY_PARSING_NEVER; #endif ns = 0; poolInit(&tempPool); poolInit(&temp2Pool); protocolEncodingName = encodingName ? poolCopyString(&tempPool, encodingName) : 0; curBase = 0; if (!dtdInit(&dtd) || !atts || !dataBuf || (encodingName && !protocolEncodingName)) { XML_ParserFree(parser); return 0; } dataBufEnd = dataBuf + INIT_DATA_BUF_SIZE; XmlInitEncoding(&initEncoding, &encoding, 0); internalEncoding = XmlGetInternalEncoding(); return parser; } XML_Parser XML_ParserCreateNS(const XML_Char *encodingName, XML_Char nsSep) { static const XML_Char implicitContext[] = { XML_T('x'), XML_T('m'), XML_T('l'), XML_T('='), XML_T('h'), XML_T('t'), XML_T('t'), XML_T('p'), XML_T(':'), XML_T('/'), XML_T('/'), XML_T('w'), XML_T('w'), XML_T('w'), XML_T('.'), XML_T('w'), XML_T('3'), XML_T('.'), XML_T('o'), XML_T('r'), XML_T('g'), XML_T('/'), XML_T('X'), XML_T('M'), XML_T('L'), XML_T('/'), XML_T('1'), XML_T('9'), XML_T('9'), XML_T('8'), XML_T('/'), XML_T('n'), XML_T('a'), XML_T('m'), XML_T('e'), XML_T('s'), XML_T('p'), XML_T('a'), XML_T('c'), XML_T('e'), XML_T('\0') }; XML_Parser parser = XML_ParserCreate(encodingName); if (parser) { XmlInitEncodingNS(&initEncoding, &encoding, 0); ns = 1; internalEncoding = XmlGetInternalEncodingNS(); namespaceSeparator = nsSep; } if (!setContext(parser, implicitContext)) { XML_ParserFree(parser); return 0; } return parser; } int XML_SetEncoding(XML_Parser parser, const XML_Char *encodingName) { if (!encodingName) protocolEncodingName = 0; else { protocolEncodingName = poolCopyString(&tempPool, encodingName); if (!protocolEncodingName) return 0; } return 1; } XML_Parser XML_ExternalEntityParserCreate(XML_Parser oldParser, const XML_Char *context, const XML_Char *encodingName) { XML_Parser parser = oldParser; DTD *oldDtd = &dtd; XML_StartElementHandler oldStartElementHandler = startElementHandler; XML_EndElementHandler oldEndElementHandler = endElementHandler; XML_CharacterDataHandler oldCharacterDataHandler = characterDataHandler; XML_ProcessingInstructionHandler oldProcessingInstructionHandler = processingInstructionHandler; XML_CommentHandler oldCommentHandler = commentHandler; XML_StartCdataSectionHandler oldStartCdataSectionHandler = startCdataSectionHandler; XML_EndCdataSectionHandler oldEndCdataSectionHandler = endCdataSectionHandler; XML_DefaultHandler oldDefaultHandler = defaultHandler; XML_UnparsedEntityDeclHandler oldUnparsedEntityDeclHandler = unparsedEntityDeclHandler; XML_NotationDeclHandler oldNotationDeclHandler = notationDeclHandler; XML_ExternalParsedEntityDeclHandler oldExternalParsedEntityDeclHandler = externalParsedEntityDeclHandler; XML_InternalParsedEntityDeclHandler oldInternalParsedEntityDeclHandler = internalParsedEntityDeclHandler; XML_StartNamespaceDeclHandler oldStartNamespaceDeclHandler = startNamespaceDeclHandler; XML_EndNamespaceDeclHandler oldEndNamespaceDeclHandler = endNamespaceDeclHandler; XML_NotStandaloneHandler oldNotStandaloneHandler = notStandaloneHandler; XML_ExternalEntityRefHandler oldExternalEntityRefHandler = externalEntityRefHandler; XML_UnknownEncodingHandler oldUnknownEncodingHandler = unknownEncodingHandler; void *oldUserData = userData; void *oldHandlerArg = handlerArg; int oldDefaultExpandInternalEntities = defaultExpandInternalEntities; void *oldExternalEntityRefHandlerArg = externalEntityRefHandlerArg; #ifdef XML_DTD int oldParamEntityParsing = paramEntityParsing; #endif parser = (ns ? XML_ParserCreateNS(encodingName, namespaceSeparator) : XML_ParserCreate(encodingName)); if (!parser) return 0; startElementHandler = oldStartElementHandler; endElementHandler = oldEndElementHandler; characterDataHandler = oldCharacterDataHandler; processingInstructionHandler = oldProcessingInstructionHandler; commentHandler = oldCommentHandler; startCdataSectionHandler = oldStartCdataSectionHandler; endCdataSectionHandler = oldEndCdataSectionHandler; defaultHandler = oldDefaultHandler; unparsedEntityDeclHandler = oldUnparsedEntityDeclHandler; notationDeclHandler = oldNotationDeclHandler; externalParsedEntityDeclHandler = oldExternalParsedEntityDeclHandler; internalParsedEntityDeclHandler = oldInternalParsedEntityDeclHandler; startNamespaceDeclHandler = oldStartNamespaceDeclHandler; endNamespaceDeclHandler = oldEndNamespaceDeclHandler; notStandaloneHandler = oldNotStandaloneHandler; externalEntityRefHandler = oldExternalEntityRefHandler; unknownEncodingHandler = oldUnknownEncodingHandler; userData = oldUserData; if (oldUserData == oldHandlerArg) handlerArg = userData; else handlerArg = parser; if (oldExternalEntityRefHandlerArg != oldParser) externalEntityRefHandlerArg = oldExternalEntityRefHandlerArg; defaultExpandInternalEntities = oldDefaultExpandInternalEntities; #ifdef XML_DTD paramEntityParsing = oldParamEntityParsing; if (context) { #endif /* XML_DTD */ if (!dtdCopy(&dtd, oldDtd) || !setContext(parser, context)) { XML_ParserFree(parser); return 0; } processor = externalEntityInitProcessor; #ifdef XML_DTD } else { dtdSwap(&dtd, oldDtd); parentParser = oldParser; XmlPrologStateInitExternalEntity(&prologState); dtd.complete = 1; hadExternalDoctype = 1; } #endif /* XML_DTD */ return parser; } static void destroyBindings(BINDING *bindings) { for (;;) { BINDING *b = bindings; if (!b) break; bindings = b->nextTagBinding; free(b->uri); free(b); } } void XML_ParserFree(XML_Parser parser) { for (;;) { TAG *p; if (tagStack == 0) { if (freeTagList == 0) break; tagStack = freeTagList; freeTagList = 0; } p = tagStack; tagStack = tagStack->parent; free(p->buf); destroyBindings(p->bindings); free(p); } destroyBindings(freeBindingList); destroyBindings(inheritedBindings); poolDestroy(&tempPool); poolDestroy(&temp2Pool); #ifdef XML_DTD if (parentParser) { if (hadExternalDoctype) dtd.complete = 0; dtdSwap(&dtd, &((Parser *)parentParser)->m_dtd); } #endif /* XML_DTD */ dtdDestroy(&dtd); free((void *)atts); free(groupConnector); free(buffer); free(dataBuf); free(unknownEncodingMem); if (unknownEncodingRelease) unknownEncodingRelease(unknownEncodingData); free(parser); } void XML_UseParserAsHandlerArg(XML_Parser parser) { handlerArg = parser; } void XML_SetUserData(XML_Parser parser, void *p) { if (handlerArg == userData) handlerArg = userData = p; else userData = p; } int XML_SetBase(XML_Parser parser, const XML_Char *p) { if (p) { p = poolCopyString(&dtd.pool, p); if (!p) return 0; curBase = p; } else curBase = 0; return 1; } const XML_Char *XML_GetBase(XML_Parser parser) { return curBase; } int XML_GetSpecifiedAttributeCount(XML_Parser parser) { return nSpecifiedAtts; } int XML_GetIdAttributeIndex(XML_Parser parser) { return idAttIndex; } void XML_SetElementHandler(XML_Parser parser, XML_StartElementHandler start, XML_EndElementHandler end) { startElementHandler = start; endElementHandler = end; } void XML_SetCharacterDataHandler(XML_Parser parser, XML_CharacterDataHandler handler) { characterDataHandler = handler; } void XML_SetProcessingInstructionHandler(XML_Parser parser, XML_ProcessingInstructionHandler handler) { processingInstructionHandler = handler; } void XML_SetCommentHandler(XML_Parser parser, XML_CommentHandler handler) { commentHandler = handler; } void XML_SetCdataSectionHandler(XML_Parser parser, XML_StartCdataSectionHandler start, XML_EndCdataSectionHandler end) { startCdataSectionHandler = start; endCdataSectionHandler = end; } void XML_SetDefaultHandler(XML_Parser parser, XML_DefaultHandler handler) { defaultHandler = handler; defaultExpandInternalEntities = 0; } void XML_SetDefaultHandlerExpand(XML_Parser parser, XML_DefaultHandler handler) { defaultHandler = handler; defaultExpandInternalEntities = 1; } void XML_SetDoctypeDeclHandler(XML_Parser parser, XML_StartDoctypeDeclHandler start, XML_EndDoctypeDeclHandler end) { startDoctypeDeclHandler = start; endDoctypeDeclHandler = end; } void XML_SetUnparsedEntityDeclHandler(XML_Parser parser, XML_UnparsedEntityDeclHandler handler) { unparsedEntityDeclHandler = handler; } void XML_SetExternalParsedEntityDeclHandler(XML_Parser parser, XML_ExternalParsedEntityDeclHandler handler) { externalParsedEntityDeclHandler = handler; } void XML_SetInternalParsedEntityDeclHandler(XML_Parser parser, XML_InternalParsedEntityDeclHandler handler) { internalParsedEntityDeclHandler = handler; } void XML_SetNotationDeclHandler(XML_Parser parser, XML_NotationDeclHandler handler) { notationDeclHandler = handler; } void XML_SetNamespaceDeclHandler(XML_Parser parser, XML_StartNamespaceDeclHandler start, XML_EndNamespaceDeclHandler end) { startNamespaceDeclHandler = start; endNamespaceDeclHandler = end; } void XML_SetNotStandaloneHandler(XML_Parser parser, XML_NotStandaloneHandler handler) { notStandaloneHandler = handler; } void XML_SetExternalEntityRefHandler(XML_Parser parser, XML_ExternalEntityRefHandler handler) { externalEntityRefHandler = handler; } void XML_SetExternalEntityRefHandlerArg(XML_Parser parser, void *arg) { if (arg) externalEntityRefHandlerArg = arg; else externalEntityRefHandlerArg = parser; } void XML_SetUnknownEncodingHandler(XML_Parser parser, XML_UnknownEncodingHandler handler, void *data) { unknownEncodingHandler = handler; unknownEncodingHandlerData = data; } int XML_SetParamEntityParsing(XML_Parser parser, enum XML_ParamEntityParsing parsing) { #ifdef XML_DTD paramEntityParsing = parsing; return 1; #else return parsing == XML_PARAM_ENTITY_PARSING_NEVER; #endif } int XML_Parse(XML_Parser parser, const char *s, int len, int isFinal) { if (len == 0) { if (!isFinal) return 1; positionPtr = bufferPtr; errorCode = processor(parser, bufferPtr, parseEndPtr = bufferEnd, 0); if (errorCode == XML_ERROR_NONE) return 1; eventEndPtr = eventPtr; processor = errorProcessor; return 0; } else if (bufferPtr == bufferEnd) { const char *end; int nLeftOver; parseEndByteIndex += len; positionPtr = s; if (isFinal) { errorCode = processor(parser, s, parseEndPtr = s + len, 0); if (errorCode == XML_ERROR_NONE) return 1; eventEndPtr = eventPtr; processor = errorProcessor; return 0; } errorCode = processor(parser, s, parseEndPtr = s + len, &end); if (errorCode != XML_ERROR_NONE) { eventEndPtr = eventPtr; processor = errorProcessor; return 0; } XmlUpdatePosition(encoding, positionPtr, end, &position); nLeftOver = s + len - end; if (nLeftOver) { if (buffer == 0 || nLeftOver > bufferLim - buffer) { /* FIXME avoid integer overflow */ buffer = buffer == 0 ? malloc(len * 2) : realloc(buffer, len * 2); /* FIXME storage leak if realloc fails */ if (!buffer) { errorCode = XML_ERROR_NO_MEMORY; eventPtr = eventEndPtr = 0; processor = errorProcessor; return 0; } bufferLim = buffer + len * 2; } memcpy(buffer, end, nLeftOver); bufferPtr = buffer; bufferEnd = buffer + nLeftOver; } return 1; } else { memcpy(XML_GetBuffer(parser, len), s, len); return XML_ParseBuffer(parser, len, isFinal); } } int XML_ParseBuffer(XML_Parser parser, int len, int isFinal) { const char *start = bufferPtr; positionPtr = start; bufferEnd += len; parseEndByteIndex += len; errorCode = processor(parser, start, parseEndPtr = bufferEnd, isFinal ? (const char **)0 : &bufferPtr); if (errorCode == XML_ERROR_NONE) { if (!isFinal) XmlUpdatePosition(encoding, positionPtr, bufferPtr, &position); return 1; } else { eventEndPtr = eventPtr; processor = errorProcessor; return 0; } } void *XML_GetBuffer(XML_Parser parser, int len) { if (len > bufferLim - bufferEnd) { /* FIXME avoid integer overflow */ int neededSize = len + (bufferEnd - bufferPtr); if (neededSize <= bufferLim - buffer) { memmove(buffer, bufferPtr, bufferEnd - bufferPtr); bufferEnd = buffer + (bufferEnd - bufferPtr); bufferPtr = buffer; } else { char *newBuf; int bufferSize = bufferLim - bufferPtr; if (bufferSize == 0) bufferSize = INIT_BUFFER_SIZE; do { bufferSize *= 2; } while (bufferSize < neededSize); newBuf = malloc(bufferSize); if (newBuf == 0) { errorCode = XML_ERROR_NO_MEMORY; return 0; } bufferLim = newBuf + bufferSize; if (bufferPtr) { memcpy(newBuf, bufferPtr, bufferEnd - bufferPtr); free(buffer); } bufferEnd = newBuf + (bufferEnd - bufferPtr); bufferPtr = buffer = newBuf; } } return bufferEnd; } enum XML_Error XML_GetErrorCode(XML_Parser parser) { return errorCode; } long XML_GetCurrentByteIndex(XML_Parser parser) { if (eventPtr) return parseEndByteIndex - (parseEndPtr - eventPtr); return -1; } int XML_GetCurrentByteCount(XML_Parser parser) { if (eventEndPtr && eventPtr) return eventEndPtr - eventPtr; return 0; } int XML_GetCurrentLineNumber(XML_Parser parser) { if (eventPtr) { XmlUpdatePosition(encoding, positionPtr, eventPtr, &position); positionPtr = eventPtr; } return position.lineNumber + 1; } int XML_GetCurrentColumnNumber(XML_Parser parser) { if (eventPtr) { XmlUpdatePosition(encoding, positionPtr, eventPtr, &position); positionPtr = eventPtr; } return position.columnNumber; } void XML_DefaultCurrent(XML_Parser parser) { if (defaultHandler) { if (openInternalEntities) reportDefault(parser, internalEncoding, openInternalEntities->internalEventPtr, openInternalEntities->internalEventEndPtr); else reportDefault(parser, encoding, eventPtr, eventEndPtr); } } const XML_LChar *XML_ErrorString(int code) { static const XML_LChar *message[] = { 0, XML_T("out of memory"), XML_T("syntax error"), XML_T("no element found"), XML_T("not well-formed"), XML_T("unclosed token"), XML_T("unclosed token"), XML_T("mismatched tag"), XML_T("duplicate attribute"), XML_T("junk after document element"), XML_T("illegal parameter entity reference"), XML_T("undefined entity"), XML_T("recursive entity reference"), XML_T("asynchronous entity"), XML_T("reference to invalid character number"), XML_T("reference to binary entity"), XML_T("reference to external entity in attribute"), XML_T("xml processing instruction not at start of external entity"), XML_T("unknown encoding"), XML_T("encoding specified in XML declaration is incorrect"), XML_T("unclosed CDATA section"), XML_T("error in processing external entity reference"), XML_T("document is not standalone") }; if (code > 0 && code < sizeof(message)/sizeof(message[0])) return message[code]; return 0; } static enum XML_Error contentProcessor(XML_Parser parser, const char *start, const char *end, const char **endPtr) { return doContent(parser, 0, encoding, start, end, endPtr); } static enum XML_Error externalEntityInitProcessor(XML_Parser parser, const char *start, const char *end, const char **endPtr) { enum XML_Error result = initializeEncoding(parser); if (result != XML_ERROR_NONE) return result; processor = externalEntityInitProcessor2; return externalEntityInitProcessor2(parser, start, end, endPtr); } static enum XML_Error externalEntityInitProcessor2(XML_Parser parser, const char *start, const char *end, const char **endPtr) { const char *next; int tok = XmlContentTok(encoding, start, end, &next); switch (tok) { case XML_TOK_BOM: start = next; break; case XML_TOK_PARTIAL: if (endPtr) { *endPtr = start; return XML_ERROR_NONE; } eventPtr = start; return XML_ERROR_UNCLOSED_TOKEN; case XML_TOK_PARTIAL_CHAR: if (endPtr) { *endPtr = start; return XML_ERROR_NONE; } eventPtr = start; return XML_ERROR_PARTIAL_CHAR; } processor = externalEntityInitProcessor3; return externalEntityInitProcessor3(parser, start, end, endPtr); } static enum XML_Error externalEntityInitProcessor3(XML_Parser parser, const char *start, const char *end, const char **endPtr) { const char *next; int tok = XmlContentTok(encoding, start, end, &next); switch (tok) { case XML_TOK_XML_DECL: { enum XML_Error result = processXmlDecl(parser, 1, start, next); if (result != XML_ERROR_NONE) return result; start = next; } break; case XML_TOK_PARTIAL: if (endPtr) { *endPtr = start; return XML_ERROR_NONE; } eventPtr = start; return XML_ERROR_UNCLOSED_TOKEN; case XML_TOK_PARTIAL_CHAR: if (endPtr) { *endPtr = start; return XML_ERROR_NONE; } eventPtr = start; return XML_ERROR_PARTIAL_CHAR; } processor = externalEntityContentProcessor; tagLevel = 1; return doContent(parser, 1, encoding, start, end, endPtr); } static enum XML_Error externalEntityContentProcessor(XML_Parser parser, const char *start, const char *end, const char **endPtr) { return doContent(parser, 1, encoding, start, end, endPtr); } static enum XML_Error doContent(XML_Parser parser, int startTagLevel, const ENCODING *enc, const char *s, const char *end, const char **nextPtr) { const char **eventPP; const char **eventEndPP; if (enc == encoding) { eventPP = &eventPtr; eventEndPP = &eventEndPtr; } else { eventPP = &(openInternalEntities->internalEventPtr); eventEndPP = &(openInternalEntities->internalEventEndPtr); } *eventPP = s; for (;;) { const char *next = s; /* XmlContentTok doesn't always set the last arg */ int tok = XmlContentTok(enc, s, end, &next); *eventEndPP = next; switch (tok) { case XML_TOK_TRAILING_CR: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } *eventEndPP = end; if (characterDataHandler) { XML_Char c = 0xA; characterDataHandler(handlerArg, &c, 1); } else if (defaultHandler) reportDefault(parser, enc, s, end); if (startTagLevel == 0) return XML_ERROR_NO_ELEMENTS; if (tagLevel != startTagLevel) return XML_ERROR_ASYNC_ENTITY; return XML_ERROR_NONE; case XML_TOK_NONE: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } if (startTagLevel > 0) { if (tagLevel != startTagLevel) return XML_ERROR_ASYNC_ENTITY; return XML_ERROR_NONE; } return XML_ERROR_NO_ELEMENTS; case XML_TOK_INVALID: *eventPP = next; return XML_ERROR_INVALID_TOKEN; case XML_TOK_PARTIAL: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } return XML_ERROR_UNCLOSED_TOKEN; case XML_TOK_PARTIAL_CHAR: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } return XML_ERROR_PARTIAL_CHAR; case XML_TOK_ENTITY_REF: { const XML_Char *name; ENTITY *entity; XML_Char ch = XmlPredefinedEntityName(enc, s + enc->minBytesPerChar, next - enc->minBytesPerChar); if (ch) { if (characterDataHandler) characterDataHandler(handlerArg, &ch, 1); else if (defaultHandler) reportDefault(parser, enc, s, next); break; } name = poolStoreString(&dtd.pool, enc, s + enc->minBytesPerChar, next - enc->minBytesPerChar); if (!name) return XML_ERROR_NO_MEMORY; entity = (ENTITY *)lookup(&dtd.generalEntities, name, 0); poolDiscard(&dtd.pool); if (!entity) { if (dtd.complete || dtd.standalone) return XML_ERROR_UNDEFINED_ENTITY; if (defaultHandler) reportDefault(parser, enc, s, next); break; } if (entity->open) return XML_ERROR_RECURSIVE_ENTITY_REF; if (entity->notation) return XML_ERROR_BINARY_ENTITY_REF; if (entity) { if (entity->textPtr) { enum XML_Error result; OPEN_INTERNAL_ENTITY openEntity; if (defaultHandler && !defaultExpandInternalEntities) { reportDefault(parser, enc, s, next); break; } entity->open = 1; openEntity.next = openInternalEntities; openInternalEntities = &openEntity; openEntity.entity = entity; openEntity.internalEventPtr = 0; openEntity.internalEventEndPtr = 0; result = doContent(parser, tagLevel, internalEncoding, (char *)entity->textPtr, (char *)(entity->textPtr + entity->textLen), 0); entity->open = 0; openInternalEntities = openEntity.next; if (result) return result; } else if (externalEntityRefHandler) { const XML_Char *context; entity->open = 1; context = getContext(parser); entity->open = 0; if (!context) return XML_ERROR_NO_MEMORY; if (!externalEntityRefHandler(externalEntityRefHandlerArg, context, entity->base, entity->systemId, entity->publicId)) return XML_ERROR_EXTERNAL_ENTITY_HANDLING; poolDiscard(&tempPool); } else if (defaultHandler) reportDefault(parser, enc, s, next); } break; } case XML_TOK_START_TAG_WITH_ATTS: if (!startElementHandler) { enum XML_Error result = storeAtts(parser, enc, s, 0, 0); if (result) return result; } /* fall through */ case XML_TOK_START_TAG_NO_ATTS: { TAG *tag; if (freeTagList) { tag = freeTagList; freeTagList = freeTagList->parent; } else { tag = malloc(sizeof(TAG)); if (!tag) return XML_ERROR_NO_MEMORY; tag->buf = malloc(INIT_TAG_BUF_SIZE); if (!tag->buf) return XML_ERROR_NO_MEMORY; tag->bufEnd = tag->buf + INIT_TAG_BUF_SIZE; } tag->bindings = 0; tag->parent = tagStack; tagStack = tag; tag->name.localPart = 0; tag->rawName = s + enc->minBytesPerChar; tag->rawNameLength = XmlNameLength(enc, tag->rawName); if (nextPtr) { /* Need to guarantee that: tag->buf + ROUND_UP(tag->rawNameLength, sizeof(XML_Char)) <= tag->bufEnd - sizeof(XML_Char) */ if (tag->rawNameLength + (int)(sizeof(XML_Char) - 1) + (int)sizeof(XML_Char) > tag->bufEnd - tag->buf) { int bufSize = tag->rawNameLength * 4; bufSize = ROUND_UP(bufSize, sizeof(XML_Char)); tag->buf = realloc(tag->buf, bufSize); if (!tag->buf) return XML_ERROR_NO_MEMORY; tag->bufEnd = tag->buf + bufSize; } memcpy(tag->buf, tag->rawName, tag->rawNameLength); tag->rawName = tag->buf; } ++tagLevel; if (startElementHandler) { enum XML_Error result; XML_Char *toPtr; for (;;) { const char *rawNameEnd = tag->rawName + tag->rawNameLength; const char *fromPtr = tag->rawName; int bufSize; if (nextPtr) toPtr = (XML_Char *)(tag->buf + ROUND_UP(tag->rawNameLength, sizeof(XML_Char))); else toPtr = (XML_Char *)tag->buf; tag->name.str = toPtr; XmlConvert(enc, &fromPtr, rawNameEnd, (ICHAR **)&toPtr, (ICHAR *)tag->bufEnd - 1); if (fromPtr == rawNameEnd) break; bufSize = (tag->bufEnd - tag->buf) << 1; tag->buf = realloc(tag->buf, bufSize); if (!tag->buf) return XML_ERROR_NO_MEMORY; tag->bufEnd = tag->buf + bufSize; if (nextPtr) tag->rawName = tag->buf; } *toPtr = XML_T('\0'); result = storeAtts(parser, enc, s, &(tag->name), &(tag->bindings)); if (result) return result; startElementHandler(handlerArg, tag->name.str, (const XML_Char **)atts); poolClear(&tempPool); } else { tag->name.str = 0; if (defaultHandler) reportDefault(parser, enc, s, next); } break; } case XML_TOK_EMPTY_ELEMENT_WITH_ATTS: if (!startElementHandler) { enum XML_Error result = storeAtts(parser, enc, s, 0, 0); if (result) return result; } /* fall through */ case XML_TOK_EMPTY_ELEMENT_NO_ATTS: if (startElementHandler || endElementHandler) { const char *rawName = s + enc->minBytesPerChar; enum XML_Error result; BINDING *bindings = 0; TAG_NAME name; name.str = poolStoreString(&tempPool, enc, rawName, rawName + XmlNameLength(enc, rawName)); if (!name.str) return XML_ERROR_NO_MEMORY; poolFinish(&tempPool); result = storeAtts(parser, enc, s, &name, &bindings); if (result) return result; poolFinish(&tempPool); if (startElementHandler) startElementHandler(handlerArg, name.str, (const XML_Char **)atts); if (endElementHandler) { if (startElementHandler) *eventPP = *eventEndPP; endElementHandler(handlerArg, name.str); } poolClear(&tempPool); while (bindings) { BINDING *b = bindings; if (endNamespaceDeclHandler) endNamespaceDeclHandler(handlerArg, b->prefix->name); bindings = bindings->nextTagBinding; b->nextTagBinding = freeBindingList; freeBindingList = b; b->prefix->binding = b->prevPrefixBinding; } } else if (defaultHandler) reportDefault(parser, enc, s, next); if (tagLevel == 0) return epilogProcessor(parser, next, end, nextPtr); break; case XML_TOK_END_TAG: if (tagLevel == startTagLevel) return XML_ERROR_ASYNC_ENTITY; else { int len; const char *rawName; TAG *tag = tagStack; tagStack = tag->parent; tag->parent = freeTagList; freeTagList = tag; rawName = s + enc->minBytesPerChar*2; len = XmlNameLength(enc, rawName); if (len != tag->rawNameLength || memcmp(tag->rawName, rawName, len) != 0) { *eventPP = rawName; return XML_ERROR_TAG_MISMATCH; } --tagLevel; if (endElementHandler && tag->name.str) { if (tag->name.localPart) { XML_Char *to = (XML_Char *)tag->name.str + tag->name.uriLen; const XML_Char *from = tag->name.localPart; while ((*to++ = *from++) != 0) ; } endElementHandler(handlerArg, tag->name.str); } else if (defaultHandler) reportDefault(parser, enc, s, next); while (tag->bindings) { BINDING *b = tag->bindings; if (endNamespaceDeclHandler) endNamespaceDeclHandler(handlerArg, b->prefix->name); tag->bindings = tag->bindings->nextTagBinding; b->nextTagBinding = freeBindingList; freeBindingList = b; b->prefix->binding = b->prevPrefixBinding; } if (tagLevel == 0) return epilogProcessor(parser, next, end, nextPtr); } break; case XML_TOK_CHAR_REF: { int n = XmlCharRefNumber(enc, s); if (n < 0) return XML_ERROR_BAD_CHAR_REF; if (characterDataHandler) { XML_Char buf[XML_ENCODE_MAX]; characterDataHandler(handlerArg, buf, XmlEncode(n, (ICHAR *)buf)); } else if (defaultHandler) reportDefault(parser, enc, s, next); } break; case XML_TOK_XML_DECL: return XML_ERROR_MISPLACED_XML_PI; case XML_TOK_DATA_NEWLINE: if (characterDataHandler) { XML_Char c = 0xA; characterDataHandler(handlerArg, &c, 1); } else if (defaultHandler) reportDefault(parser, enc, s, next); break; case XML_TOK_CDATA_SECT_OPEN: { enum XML_Error result; if (startCdataSectionHandler) startCdataSectionHandler(handlerArg); #if 0 /* Suppose you doing a transformation on a document that involves changing only the character data. You set up a defaultHandler and a characterDataHandler. The defaultHandler simply copies characters through. The characterDataHandler does the transformation and writes the characters out escaping them as necessary. This case will fail to work if we leave out the following two lines (because & and < inside CDATA sections will be incorrectly escaped). However, now we have a start/endCdataSectionHandler, so it seems easier to let the user deal with this. */ else if (characterDataHandler) characterDataHandler(handlerArg, dataBuf, 0); #endif else if (defaultHandler) reportDefault(parser, enc, s, next); result = doCdataSection(parser, enc, &next, end, nextPtr); if (!next) { processor = cdataSectionProcessor; return result; } } break; case XML_TOK_TRAILING_RSQB: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } if (characterDataHandler) { if (MUST_CONVERT(enc, s)) { ICHAR *dataPtr = (ICHAR *)dataBuf; XmlConvert(enc, &s, end, &dataPtr, (ICHAR *)dataBufEnd); characterDataHandler(handlerArg, dataBuf, dataPtr - (ICHAR *)dataBuf); } else characterDataHandler(handlerArg, (XML_Char *)s, (XML_Char *)end - (XML_Char *)s); } else if (defaultHandler) reportDefault(parser, enc, s, end); if (startTagLevel == 0) { *eventPP = end; return XML_ERROR_NO_ELEMENTS; } if (tagLevel != startTagLevel) { *eventPP = end; return XML_ERROR_ASYNC_ENTITY; } return XML_ERROR_NONE; case XML_TOK_DATA_CHARS: if (characterDataHandler) { if (MUST_CONVERT(enc, s)) { for (;;) { ICHAR *dataPtr = (ICHAR *)dataBuf; XmlConvert(enc, &s, next, &dataPtr, (ICHAR *)dataBufEnd); *eventEndPP = s; characterDataHandler(handlerArg, dataBuf, dataPtr - (ICHAR *)dataBuf); if (s == next) break; *eventPP = s; } } else characterDataHandler(handlerArg, (XML_Char *)s, (XML_Char *)next - (XML_Char *)s); } else if (defaultHandler) reportDefault(parser, enc, s, next); break; case XML_TOK_PI: if (!reportProcessingInstruction(parser, enc, s, next)) return XML_ERROR_NO_MEMORY; break; case XML_TOK_COMMENT: if (!reportComment(parser, enc, s, next)) return XML_ERROR_NO_MEMORY; break; default: if (defaultHandler) reportDefault(parser, enc, s, next); break; } *eventPP = s = next; } /* not reached */ } /* If tagNamePtr is non-null, build a real list of attributes, otherwise just check the attributes for well-formedness. */ static enum XML_Error storeAtts(XML_Parser parser, const ENCODING *enc, const char *attStr, TAG_NAME *tagNamePtr, BINDING **bindingsPtr) { ELEMENT_TYPE *elementType = 0; int nDefaultAtts = 0; const XML_Char **appAtts; /* the attribute list to pass to the application */ int attIndex = 0; int i; int n; int nPrefixes = 0; BINDING *binding; const XML_Char *localPart; /* lookup the element type name */ if (tagNamePtr) { elementType = (ELEMENT_TYPE *)lookup(&dtd.elementTypes, tagNamePtr->str, 0); if (!elementType) { tagNamePtr->str = poolCopyString(&dtd.pool, tagNamePtr->str); if (!tagNamePtr->str) return XML_ERROR_NO_MEMORY; elementType = (ELEMENT_TYPE *)lookup(&dtd.elementTypes, tagNamePtr->str, sizeof(ELEMENT_TYPE)); if (!elementType) return XML_ERROR_NO_MEMORY; if (ns && !setElementTypePrefix(parser, elementType)) return XML_ERROR_NO_MEMORY; } nDefaultAtts = elementType->nDefaultAtts; } /* get the attributes from the tokenizer */ n = XmlGetAttributes(enc, attStr, attsSize, atts); if (n + nDefaultAtts > attsSize) { int oldAttsSize = attsSize; attsSize = n + nDefaultAtts + INIT_ATTS_SIZE; atts = realloc((void *)atts, attsSize * sizeof(ATTRIBUTE)); if (!atts) return XML_ERROR_NO_MEMORY; if (n > oldAttsSize) XmlGetAttributes(enc, attStr, n, atts); } appAtts = (const XML_Char **)atts; for (i = 0; i < n; i++) { /* add the name and value to the attribute list */ ATTRIBUTE_ID *attId = getAttributeId(parser, enc, atts[i].name, atts[i].name + XmlNameLength(enc, atts[i].name)); if (!attId) return XML_ERROR_NO_MEMORY; /* detect duplicate attributes */ if ((attId->name)[-1]) { if (enc == encoding) eventPtr = atts[i].name; return XML_ERROR_DUPLICATE_ATTRIBUTE; } (attId->name)[-1] = 1; appAtts[attIndex++] = attId->name; if (!atts[i].normalized) { enum XML_Error result; int isCdata = 1; /* figure out whether declared as other than CDATA */ if (attId->maybeTokenized) { int j; for (j = 0; j < nDefaultAtts; j++) { if (attId == elementType->defaultAtts[j].id) { isCdata = elementType->defaultAtts[j].isCdata; break; } } } /* normalize the attribute value */ result = storeAttributeValue(parser, enc, isCdata, atts[i].valuePtr, atts[i].valueEnd, &tempPool); if (result) return result; if (tagNamePtr) { appAtts[attIndex] = poolStart(&tempPool); poolFinish(&tempPool); } else poolDiscard(&tempPool); } else if (tagNamePtr) { /* the value did not need normalizing */ appAtts[attIndex] = poolStoreString(&tempPool, enc, atts[i].valuePtr, atts[i].valueEnd); if (appAtts[attIndex] == 0) return XML_ERROR_NO_MEMORY; poolFinish(&tempPool); } /* handle prefixed attribute names */ if (attId->prefix && tagNamePtr) { if (attId->xmlns) { /* deal with namespace declarations here */ if (!addBinding(parser, attId->prefix, attId, appAtts[attIndex], bindingsPtr)) return XML_ERROR_NO_MEMORY; --attIndex; } else { /* deal with other prefixed names later */ attIndex++; nPrefixes++; (attId->name)[-1] = 2; } } else attIndex++; } if (tagNamePtr) { int j; nSpecifiedAtts = attIndex; if (elementType->idAtt && (elementType->idAtt->name)[-1]) { for (i = 0; i < attIndex; i += 2) if (appAtts[i] == elementType->idAtt->name) { idAttIndex = i; break; } } else idAttIndex = -1; /* do attribute defaulting */ for (j = 0; j < nDefaultAtts; j++) { const DEFAULT_ATTRIBUTE *da = elementType->defaultAtts + j; if (!(da->id->name)[-1] && da->value) { if (da->id->prefix) { if (da->id->xmlns) { if (!addBinding(parser, da->id->prefix, da->id, da->value, bindingsPtr)) return XML_ERROR_NO_MEMORY; } else { (da->id->name)[-1] = 2; nPrefixes++; appAtts[attIndex++] = da->id->name; appAtts[attIndex++] = da->value; } } else { (da->id->name)[-1] = 1; appAtts[attIndex++] = da->id->name; appAtts[attIndex++] = da->value; } } } appAtts[attIndex] = 0; } i = 0; if (nPrefixes) { /* expand prefixed attribute names */ for (; i < attIndex; i += 2) { if (appAtts[i][-1] == 2) { ATTRIBUTE_ID *id; ((XML_Char *)(appAtts[i]))[-1] = 0; id = (ATTRIBUTE_ID *)lookup(&dtd.attributeIds, appAtts[i], 0); if (id->prefix->binding) { int j; const BINDING *b = id->prefix->binding; const XML_Char *s = appAtts[i]; for (j = 0; j < b->uriLen; j++) { if (!poolAppendChar(&tempPool, b->uri[j])) return XML_ERROR_NO_MEMORY; } while (*s++ != ':') ; do { if (!poolAppendChar(&tempPool, *s)) return XML_ERROR_NO_MEMORY; } while (*s++); appAtts[i] = poolStart(&tempPool); poolFinish(&tempPool); } if (!--nPrefixes) break; } else ((XML_Char *)(appAtts[i]))[-1] = 0; } } /* clear the flags that say whether attributes were specified */ for (; i < attIndex; i += 2) ((XML_Char *)(appAtts[i]))[-1] = 0; if (!tagNamePtr) return XML_ERROR_NONE; for (binding = *bindingsPtr; binding; binding = binding->nextTagBinding) binding->attId->name[-1] = 0; /* expand the element type name */ if (elementType->prefix) { binding = elementType->prefix->binding; if (!binding) return XML_ERROR_NONE; localPart = tagNamePtr->str; while (*localPart++ != XML_T(':')) ; } else if (dtd.defaultPrefix.binding) { binding = dtd.defaultPrefix.binding; localPart = tagNamePtr->str; } else return XML_ERROR_NONE; tagNamePtr->localPart = localPart; tagNamePtr->uriLen = binding->uriLen; for (i = 0; localPart[i++];) ; n = i + binding->uriLen; if (n > binding->uriAlloc) { TAG *p; XML_Char *uri = malloc((n + EXPAND_SPARE) * sizeof(XML_Char)); if (!uri) return XML_ERROR_NO_MEMORY; binding->uriAlloc = n + EXPAND_SPARE; memcpy(uri, binding->uri, binding->uriLen * sizeof(XML_Char)); for (p = tagStack; p; p = p->parent) if (p->name.str == binding->uri) p->name.str = uri; free(binding->uri); binding->uri = uri; } memcpy(binding->uri + binding->uriLen, localPart, i * sizeof(XML_Char)); tagNamePtr->str = binding->uri; return XML_ERROR_NONE; } static int addBinding(XML_Parser parser, PREFIX *prefix, const ATTRIBUTE_ID *attId, const XML_Char *uri, BINDING **bindingsPtr) { BINDING *b; int len; for (len = 0; uri[len]; len++) ; if (namespaceSeparator) len++; if (freeBindingList) { b = freeBindingList; if (len > b->uriAlloc) { b->uri = realloc(b->uri, sizeof(XML_Char) * (len + EXPAND_SPARE)); if (!b->uri) return 0; b->uriAlloc = len + EXPAND_SPARE; } freeBindingList = b->nextTagBinding; } else { b = malloc(sizeof(BINDING)); if (!b) return 0; b->uri = malloc(sizeof(XML_Char) * (len + EXPAND_SPARE)); if (!b->uri) { free(b); return 0; } b->uriAlloc = len + EXPAND_SPARE; } b->uriLen = len; memcpy(b->uri, uri, len * sizeof(XML_Char)); if (namespaceSeparator) b->uri[len - 1] = namespaceSeparator; b->prefix = prefix; b->attId = attId; b->prevPrefixBinding = prefix->binding; if (*uri == XML_T('\0') && prefix == &dtd.defaultPrefix) prefix->binding = 0; else prefix->binding = b; b->nextTagBinding = *bindingsPtr; *bindingsPtr = b; if (startNamespaceDeclHandler) startNamespaceDeclHandler(handlerArg, prefix->name, prefix->binding ? uri : 0); return 1; } /* The idea here is to avoid using stack for each CDATA section when the whole file is parsed with one call. */ static enum XML_Error cdataSectionProcessor(XML_Parser parser, const char *start, const char *end, const char **endPtr) { enum XML_Error result = doCdataSection(parser, encoding, &start, end, endPtr); if (start) { processor = contentProcessor; return contentProcessor(parser, start, end, endPtr); } return result; } /* startPtr gets set to non-null is the section is closed, and to null if the section is not yet closed. */ static enum XML_Error doCdataSection(XML_Parser parser, const ENCODING *enc, const char **startPtr, const char *end, const char **nextPtr) { const char *s = *startPtr; const char **eventPP; const char **eventEndPP; if (enc == encoding) { eventPP = &eventPtr; *eventPP = s; eventEndPP = &eventEndPtr; } else { eventPP = &(openInternalEntities->internalEventPtr); eventEndPP = &(openInternalEntities->internalEventEndPtr); } *eventPP = s; *startPtr = 0; for (;;) { const char *next; int tok = XmlCdataSectionTok(enc, s, end, &next); *eventEndPP = next; switch (tok) { case XML_TOK_CDATA_SECT_CLOSE: if (endCdataSectionHandler) endCdataSectionHandler(handlerArg); #if 0 /* see comment under XML_TOK_CDATA_SECT_OPEN */ else if (characterDataHandler) characterDataHandler(handlerArg, dataBuf, 0); #endif else if (defaultHandler) reportDefault(parser, enc, s, next); *startPtr = next; return XML_ERROR_NONE; case XML_TOK_DATA_NEWLINE: if (characterDataHandler) { XML_Char c = 0xA; characterDataHandler(handlerArg, &c, 1); } else if (defaultHandler) reportDefault(parser, enc, s, next); break; case XML_TOK_DATA_CHARS: if (characterDataHandler) { if (MUST_CONVERT(enc, s)) { for (;;) { ICHAR *dataPtr = (ICHAR *)dataBuf; XmlConvert(enc, &s, next, &dataPtr, (ICHAR *)dataBufEnd); *eventEndPP = next; characterDataHandler(handlerArg, dataBuf, dataPtr - (ICHAR *)dataBuf); if (s == next) break; *eventPP = s; } } else characterDataHandler(handlerArg, (XML_Char *)s, (XML_Char *)next - (XML_Char *)s); } else if (defaultHandler) reportDefault(parser, enc, s, next); break; case XML_TOK_INVALID: *eventPP = next; return XML_ERROR_INVALID_TOKEN; case XML_TOK_PARTIAL_CHAR: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } return XML_ERROR_PARTIAL_CHAR; case XML_TOK_PARTIAL: case XML_TOK_NONE: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } return XML_ERROR_UNCLOSED_CDATA_SECTION; default: abort(); } *eventPP = s = next; } /* not reached */ } #ifdef XML_DTD /* The idea here is to avoid using stack for each IGNORE section when the whole file is parsed with one call. */ static enum XML_Error ignoreSectionProcessor(XML_Parser parser, const char *start, const char *end, const char **endPtr) { enum XML_Error result = doIgnoreSection(parser, encoding, &start, end, endPtr); if (start) { processor = prologProcessor; return prologProcessor(parser, start, end, endPtr); } return result; } /* startPtr gets set to non-null is the section is closed, and to null if the section is not yet closed. */ static enum XML_Error doIgnoreSection(XML_Parser parser, const ENCODING *enc, const char **startPtr, const char *end, const char **nextPtr) { const char *next; int tok; const char *s = *startPtr; const char **eventPP; const char **eventEndPP; if (enc == encoding) { eventPP = &eventPtr; *eventPP = s; eventEndPP = &eventEndPtr; } else { eventPP = &(openInternalEntities->internalEventPtr); eventEndPP = &(openInternalEntities->internalEventEndPtr); } *eventPP = s; *startPtr = 0; tok = XmlIgnoreSectionTok(enc, s, end, &next); *eventEndPP = next; switch (tok) { case XML_TOK_IGNORE_SECT: if (defaultHandler) reportDefault(parser, enc, s, next); *startPtr = next; return XML_ERROR_NONE; case XML_TOK_INVALID: *eventPP = next; return XML_ERROR_INVALID_TOKEN; case XML_TOK_PARTIAL_CHAR: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } return XML_ERROR_PARTIAL_CHAR; case XML_TOK_PARTIAL: case XML_TOK_NONE: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } return XML_ERROR_SYNTAX; /* XML_ERROR_UNCLOSED_IGNORE_SECTION */ default: abort(); } /* not reached */ } #endif /* XML_DTD */ static enum XML_Error initializeEncoding(XML_Parser parser) { const char *s; #ifdef XML_UNICODE char encodingBuf[128]; if (!protocolEncodingName) s = 0; else { int i; for (i = 0; protocolEncodingName[i]; i++) { if (i == sizeof(encodingBuf) - 1 || (protocolEncodingName[i] & ~0x7f) != 0) { encodingBuf[0] = '\0'; break; } encodingBuf[i] = (char)protocolEncodingName[i]; } encodingBuf[i] = '\0'; s = encodingBuf; } #else s = protocolEncodingName; #endif if ((ns ? XmlInitEncodingNS : XmlInitEncoding)(&initEncoding, &encoding, s)) return XML_ERROR_NONE; return handleUnknownEncoding(parser, protocolEncodingName); } static enum XML_Error processXmlDecl(XML_Parser parser, int isGeneralTextEntity, const char *s, const char *next) { const char *encodingName = 0; const ENCODING *newEncoding = 0; const char *version; int standalone = -1; if (!(ns ? XmlParseXmlDeclNS : XmlParseXmlDecl)(isGeneralTextEntity, encoding, s, next, &eventPtr, &version, &encodingName, &newEncoding, &standalone)) return XML_ERROR_SYNTAX; if (!isGeneralTextEntity && standalone == 1) { dtd.standalone = 1; #ifdef XML_DTD if (paramEntityParsing == XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE) paramEntityParsing = XML_PARAM_ENTITY_PARSING_NEVER; #endif /* XML_DTD */ } if (defaultHandler) reportDefault(parser, encoding, s, next); if (!protocolEncodingName) { if (newEncoding) { if (newEncoding->minBytesPerChar != encoding->minBytesPerChar) { eventPtr = encodingName; return XML_ERROR_INCORRECT_ENCODING; } encoding = newEncoding; } else if (encodingName) { enum XML_Error result; const XML_Char *s = poolStoreString(&tempPool, encoding, encodingName, encodingName + XmlNameLength(encoding, encodingName)); if (!s) return XML_ERROR_NO_MEMORY; result = handleUnknownEncoding(parser, s); poolDiscard(&tempPool); if (result == XML_ERROR_UNKNOWN_ENCODING) eventPtr = encodingName; return result; } } return XML_ERROR_NONE; } static enum XML_Error handleUnknownEncoding(XML_Parser parser, const XML_Char *encodingName) { if (unknownEncodingHandler) { XML_Encoding info; int i; for (i = 0; i < 256; i++) info.map[i] = -1; info.convert = 0; info.data = 0; info.release = 0; if (unknownEncodingHandler(unknownEncodingHandlerData, encodingName, &info)) { ENCODING *enc; unknownEncodingMem = malloc(XmlSizeOfUnknownEncoding()); if (!unknownEncodingMem) { if (info.release) info.release(info.data); return XML_ERROR_NO_MEMORY; } enc = (ns ? XmlInitUnknownEncodingNS : XmlInitUnknownEncoding)(unknownEncodingMem, info.map, info.convert, info.data); if (enc) { unknownEncodingData = info.data; unknownEncodingRelease = info.release; encoding = enc; return XML_ERROR_NONE; } } if (info.release) info.release(info.data); } return XML_ERROR_UNKNOWN_ENCODING; } static enum XML_Error prologInitProcessor(XML_Parser parser, const char *s, const char *end, const char **nextPtr) { enum XML_Error result = initializeEncoding(parser); if (result != XML_ERROR_NONE) return result; processor = prologProcessor; return prologProcessor(parser, s, end, nextPtr); } static enum XML_Error prologProcessor(XML_Parser parser, const char *s, const char *end, const char **nextPtr) { const char *next; int tok = XmlPrologTok(encoding, s, end, &next); return doProlog(parser, encoding, s, end, tok, next, nextPtr); } static enum XML_Error doProlog(XML_Parser parser, const ENCODING *enc, const char *s, const char *end, int tok, const char *next, const char **nextPtr) { #ifdef XML_DTD static const XML_Char externalSubsetName[] = { '#' , '\0' }; #endif /* XML_DTD */ const char **eventPP; const char **eventEndPP; if (enc == encoding) { eventPP = &eventPtr; eventEndPP = &eventEndPtr; } else { eventPP = &(openInternalEntities->internalEventPtr); eventEndPP = &(openInternalEntities->internalEventEndPtr); } for (;;) { int role; *eventPP = s; *eventEndPP = next; if (tok <= 0) { if (nextPtr != 0 && tok != XML_TOK_INVALID) { *nextPtr = s; return XML_ERROR_NONE; } switch (tok) { case XML_TOK_INVALID: *eventPP = next; return XML_ERROR_INVALID_TOKEN; case XML_TOK_PARTIAL: return XML_ERROR_UNCLOSED_TOKEN; case XML_TOK_PARTIAL_CHAR: return XML_ERROR_PARTIAL_CHAR; case XML_TOK_NONE: #ifdef XML_DTD if (enc != encoding) return XML_ERROR_NONE; if (parentParser) { if (XmlTokenRole(&prologState, XML_TOK_NONE, end, end, enc) == XML_ROLE_ERROR) return XML_ERROR_SYNTAX; hadExternalDoctype = 0; return XML_ERROR_NONE; } #endif /* XML_DTD */ return XML_ERROR_NO_ELEMENTS; default: tok = -tok; next = end; break; } } role = XmlTokenRole(&prologState, tok, s, next, enc); switch (role) { case XML_ROLE_XML_DECL: { enum XML_Error result = processXmlDecl(parser, 0, s, next); if (result != XML_ERROR_NONE) return result; enc = encoding; } break; case XML_ROLE_DOCTYPE_NAME: if (startDoctypeDeclHandler) { const XML_Char *name = poolStoreString(&tempPool, enc, s, next); if (!name) return XML_ERROR_NO_MEMORY; startDoctypeDeclHandler(handlerArg, name); poolClear(&tempPool); } break; #ifdef XML_DTD case XML_ROLE_TEXT_DECL: { enum XML_Error result = processXmlDecl(parser, 1, s, next); if (result != XML_ERROR_NONE) return result; enc = encoding; } break; #endif /* XML_DTD */ case XML_ROLE_DOCTYPE_PUBLIC_ID: #ifdef XML_DTD declEntity = (ENTITY *)lookup(&dtd.paramEntities, externalSubsetName, sizeof(ENTITY)); if (!declEntity) return XML_ERROR_NO_MEMORY; #endif /* XML_DTD */ /* fall through */ case XML_ROLE_ENTITY_PUBLIC_ID: if (!XmlIsPublicId(enc, s, next, eventPP)) return XML_ERROR_SYNTAX; if (declEntity) { XML_Char *tem = poolStoreString(&dtd.pool, enc, s + enc->minBytesPerChar, next - enc->minBytesPerChar); if (!tem) return XML_ERROR_NO_MEMORY; normalizePublicId(tem); declEntity->publicId = tem; poolFinish(&dtd.pool); } break; case XML_ROLE_DOCTYPE_CLOSE: if (dtd.complete && hadExternalDoctype) { dtd.complete = 0; #ifdef XML_DTD if (paramEntityParsing && externalEntityRefHandler) { ENTITY *entity = (ENTITY *)lookup(&dtd.paramEntities, externalSubsetName, 0); if (!externalEntityRefHandler(externalEntityRefHandlerArg, 0, entity->base, entity->systemId, entity->publicId)) return XML_ERROR_EXTERNAL_ENTITY_HANDLING; } #endif /* XML_DTD */ if (!dtd.complete && !dtd.standalone && notStandaloneHandler && !notStandaloneHandler(handlerArg)) return XML_ERROR_NOT_STANDALONE; } if (endDoctypeDeclHandler) endDoctypeDeclHandler(handlerArg); break; case XML_ROLE_INSTANCE_START: processor = contentProcessor; return contentProcessor(parser, s, end, nextPtr); case XML_ROLE_ATTLIST_ELEMENT_NAME: { const XML_Char *name = poolStoreString(&dtd.pool, enc, s, next); if (!name) return XML_ERROR_NO_MEMORY; declElementType = (ELEMENT_TYPE *)lookup(&dtd.elementTypes, name, sizeof(ELEMENT_TYPE)); if (!declElementType) return XML_ERROR_NO_MEMORY; if (declElementType->name != name) poolDiscard(&dtd.pool); else { poolFinish(&dtd.pool); if (!setElementTypePrefix(parser, declElementType)) return XML_ERROR_NO_MEMORY; } break; } case XML_ROLE_ATTRIBUTE_NAME: declAttributeId = getAttributeId(parser, enc, s, next); if (!declAttributeId) return XML_ERROR_NO_MEMORY; declAttributeIsCdata = 0; declAttributeIsId = 0; break; case XML_ROLE_ATTRIBUTE_TYPE_CDATA: declAttributeIsCdata = 1; break; case XML_ROLE_ATTRIBUTE_TYPE_ID: declAttributeIsId = 1; break; case XML_ROLE_IMPLIED_ATTRIBUTE_VALUE: case XML_ROLE_REQUIRED_ATTRIBUTE_VALUE: if (dtd.complete && !defineAttribute(declElementType, declAttributeId, declAttributeIsCdata, declAttributeIsId, 0)) return XML_ERROR_NO_MEMORY; break; case XML_ROLE_DEFAULT_ATTRIBUTE_VALUE: case XML_ROLE_FIXED_ATTRIBUTE_VALUE: { const XML_Char *attVal; enum XML_Error result = storeAttributeValue(parser, enc, declAttributeIsCdata, s + enc->minBytesPerChar, next - enc->minBytesPerChar, &dtd.pool); if (result) return result; attVal = poolStart(&dtd.pool); poolFinish(&dtd.pool); if (dtd.complete // ID attributes aren't allowed to have a default && !defineAttribute(declElementType, declAttributeId, declAttributeIsCdata, 0, attVal)) return XML_ERROR_NO_MEMORY; break; } case XML_ROLE_ENTITY_VALUE: { enum XML_Error result = storeEntityValue(parser, enc, s + enc->minBytesPerChar, next - enc->minBytesPerChar); if (declEntity) { declEntity->textPtr = poolStart(&dtd.pool); declEntity->textLen = poolLength(&dtd.pool); poolFinish(&dtd.pool); if (internalParsedEntityDeclHandler // Check it's not a parameter entity && ((ENTITY *)lookup(&dtd.generalEntities, declEntity->name, 0) == declEntity)) { *eventEndPP = s; internalParsedEntityDeclHandler(handlerArg, declEntity->name, declEntity->textPtr, declEntity->textLen); } } else poolDiscard(&dtd.pool); if (result != XML_ERROR_NONE) return result; } break; case XML_ROLE_DOCTYPE_SYSTEM_ID: if (!dtd.standalone #ifdef XML_DTD && !paramEntityParsing #endif /* XML_DTD */ && notStandaloneHandler && !notStandaloneHandler(handlerArg)) return XML_ERROR_NOT_STANDALONE; hadExternalDoctype = 1; #ifndef XML_DTD break; #else /* XML_DTD */ if (!declEntity) { declEntity = (ENTITY *)lookup(&dtd.paramEntities, externalSubsetName, sizeof(ENTITY)); if (!declEntity) return XML_ERROR_NO_MEMORY; } /* fall through */ #endif /* XML_DTD */ case XML_ROLE_ENTITY_SYSTEM_ID: if (declEntity) { declEntity->systemId = poolStoreString(&dtd.pool, enc, s + enc->minBytesPerChar, next - enc->minBytesPerChar); if (!declEntity->systemId) return XML_ERROR_NO_MEMORY; declEntity->base = curBase; poolFinish(&dtd.pool); } break; case XML_ROLE_ENTITY_NOTATION_NAME: if (declEntity) { declEntity->notation = poolStoreString(&dtd.pool, enc, s, next); if (!declEntity->notation) return XML_ERROR_NO_MEMORY; poolFinish(&dtd.pool); if (unparsedEntityDeclHandler) { *eventEndPP = s; unparsedEntityDeclHandler(handlerArg, declEntity->name, declEntity->base, declEntity->systemId, declEntity->publicId, declEntity->notation); } } break; case XML_ROLE_EXTERNAL_GENERAL_ENTITY_NO_NOTATION: if (declEntity && externalParsedEntityDeclHandler) { *eventEndPP = s; externalParsedEntityDeclHandler(handlerArg, declEntity->name, declEntity->base, declEntity->systemId, declEntity->publicId); } break; case XML_ROLE_GENERAL_ENTITY_NAME: { const XML_Char *name; if (XmlPredefinedEntityName(enc, s, next)) { declEntity = 0; break; } name = poolStoreString(&dtd.pool, enc, s, next); if (!name) return XML_ERROR_NO_MEMORY; if (dtd.complete) { declEntity = (ENTITY *)lookup(&dtd.generalEntities, name, sizeof(ENTITY)); if (!declEntity) return XML_ERROR_NO_MEMORY; if (declEntity->name != name) { poolDiscard(&dtd.pool); declEntity = 0; } else poolFinish(&dtd.pool); } else { poolDiscard(&dtd.pool); declEntity = 0; } } break; case XML_ROLE_PARAM_ENTITY_NAME: #ifdef XML_DTD if (dtd.complete) { const XML_Char *name = poolStoreString(&dtd.pool, enc, s, next); if (!name) return XML_ERROR_NO_MEMORY; declEntity = (ENTITY *)lookup(&dtd.paramEntities, name, sizeof(ENTITY)); if (!declEntity) return XML_ERROR_NO_MEMORY; if (declEntity->name != name) { poolDiscard(&dtd.pool); declEntity = 0; } else poolFinish(&dtd.pool); } #else /* not XML_DTD */ declEntity = 0; #endif /* not XML_DTD */ break; case XML_ROLE_NOTATION_NAME: declNotationPublicId = 0; declNotationName = 0; if (notationDeclHandler) { declNotationName = poolStoreString(&tempPool, enc, s, next); if (!declNotationName) return XML_ERROR_NO_MEMORY; poolFinish(&tempPool); } break; case XML_ROLE_NOTATION_PUBLIC_ID: if (!XmlIsPublicId(enc, s, next, eventPP)) return XML_ERROR_SYNTAX; if (declNotationName) { XML_Char *tem = poolStoreString(&tempPool, enc, s + enc->minBytesPerChar, next - enc->minBytesPerChar); if (!tem) return XML_ERROR_NO_MEMORY; normalizePublicId(tem); declNotationPublicId = tem; poolFinish(&tempPool); } break; case XML_ROLE_NOTATION_SYSTEM_ID: if (declNotationName && notationDeclHandler) { const XML_Char *systemId = poolStoreString(&tempPool, enc, s + enc->minBytesPerChar, next - enc->minBytesPerChar); if (!systemId) return XML_ERROR_NO_MEMORY; *eventEndPP = s; notationDeclHandler(handlerArg, declNotationName, curBase, systemId, declNotationPublicId); } poolClear(&tempPool); break; case XML_ROLE_NOTATION_NO_SYSTEM_ID: if (declNotationPublicId && notationDeclHandler) { *eventEndPP = s; notationDeclHandler(handlerArg, declNotationName, curBase, 0, declNotationPublicId); } poolClear(&tempPool); break; case XML_ROLE_ERROR: switch (tok) { case XML_TOK_PARAM_ENTITY_REF: return XML_ERROR_PARAM_ENTITY_REF; case XML_TOK_XML_DECL: return XML_ERROR_MISPLACED_XML_PI; default: return XML_ERROR_SYNTAX; } #ifdef XML_DTD case XML_ROLE_IGNORE_SECT: { enum XML_Error result; if (defaultHandler) reportDefault(parser, enc, s, next); result = doIgnoreSection(parser, enc, &next, end, nextPtr); if (!next) { processor = ignoreSectionProcessor; return result; } } break; #endif /* XML_DTD */ case XML_ROLE_GROUP_OPEN: if (prologState.level >= groupSize) { if (groupSize) groupConnector = realloc(groupConnector, groupSize *= 2); else groupConnector = malloc(groupSize = 32); if (!groupConnector) return XML_ERROR_NO_MEMORY; } groupConnector[prologState.level] = 0; break; case XML_ROLE_GROUP_SEQUENCE: if (groupConnector[prologState.level] == '|') return XML_ERROR_SYNTAX; groupConnector[prologState.level] = ','; break; case XML_ROLE_GROUP_CHOICE: if (groupConnector[prologState.level] == ',') return XML_ERROR_SYNTAX; groupConnector[prologState.level] = '|'; break; case XML_ROLE_PARAM_ENTITY_REF: #ifdef XML_DTD case XML_ROLE_INNER_PARAM_ENTITY_REF: if (paramEntityParsing && (dtd.complete || role == XML_ROLE_INNER_PARAM_ENTITY_REF)) { const XML_Char *name; ENTITY *entity; name = poolStoreString(&dtd.pool, enc, s + enc->minBytesPerChar, next - enc->minBytesPerChar); if (!name) return XML_ERROR_NO_MEMORY; entity = (ENTITY *)lookup(&dtd.paramEntities, name, 0); poolDiscard(&dtd.pool); if (!entity) { /* FIXME what to do if !dtd.complete? */ return XML_ERROR_UNDEFINED_ENTITY; } if (entity->open) return XML_ERROR_RECURSIVE_ENTITY_REF; if (entity->textPtr) { enum XML_Error result; result = processInternalParamEntity(parser, entity); if (result != XML_ERROR_NONE) return result; break; } if (role == XML_ROLE_INNER_PARAM_ENTITY_REF) return XML_ERROR_PARAM_ENTITY_REF; if (externalEntityRefHandler) { dtd.complete = 0; entity->open = 1; if (!externalEntityRefHandler(externalEntityRefHandlerArg, 0, entity->base, entity->systemId, entity->publicId)) { entity->open = 0; return XML_ERROR_EXTERNAL_ENTITY_HANDLING; } entity->open = 0; if (dtd.complete) break; } } #endif /* XML_DTD */ if (!dtd.standalone && notStandaloneHandler && !notStandaloneHandler(handlerArg)) return XML_ERROR_NOT_STANDALONE; dtd.complete = 0; if (defaultHandler) reportDefault(parser, enc, s, next); break; case XML_ROLE_NONE: switch (tok) { case XML_TOK_PI: if (!reportProcessingInstruction(parser, enc, s, next)) return XML_ERROR_NO_MEMORY; break; case XML_TOK_COMMENT: if (!reportComment(parser, enc, s, next)) return XML_ERROR_NO_MEMORY; break; } break; } if (defaultHandler) { switch (tok) { case XML_TOK_PI: case XML_TOK_COMMENT: case XML_TOK_BOM: case XML_TOK_XML_DECL: #ifdef XML_DTD case XML_TOK_IGNORE_SECT: #endif /* XML_DTD */ case XML_TOK_PARAM_ENTITY_REF: break; default: #ifdef XML_DTD if (role != XML_ROLE_IGNORE_SECT) #endif /* XML_DTD */ reportDefault(parser, enc, s, next); } } s = next; tok = XmlPrologTok(enc, s, end, &next); } /* not reached */ } static enum XML_Error epilogProcessor(XML_Parser parser, const char *s, const char *end, const char **nextPtr) { processor = epilogProcessor; eventPtr = s; for (;;) { const char *next; int tok = XmlPrologTok(encoding, s, end, &next); eventEndPtr = next; switch (tok) { case -XML_TOK_PROLOG_S: if (defaultHandler) { eventEndPtr = end; reportDefault(parser, encoding, s, end); } /* fall through */ case XML_TOK_NONE: if (nextPtr) *nextPtr = end; return XML_ERROR_NONE; case XML_TOK_PROLOG_S: if (defaultHandler) reportDefault(parser, encoding, s, next); break; case XML_TOK_PI: if (!reportProcessingInstruction(parser, encoding, s, next)) return XML_ERROR_NO_MEMORY; break; case XML_TOK_COMMENT: if (!reportComment(parser, encoding, s, next)) return XML_ERROR_NO_MEMORY; break; case XML_TOK_INVALID: eventPtr = next; return XML_ERROR_INVALID_TOKEN; case XML_TOK_PARTIAL: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } return XML_ERROR_UNCLOSED_TOKEN; case XML_TOK_PARTIAL_CHAR: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } return XML_ERROR_PARTIAL_CHAR; default: return XML_ERROR_JUNK_AFTER_DOC_ELEMENT; } eventPtr = s = next; } } #ifdef XML_DTD static enum XML_Error processInternalParamEntity(XML_Parser parser, ENTITY *entity) { const char *s, *end, *next; int tok; enum XML_Error result; OPEN_INTERNAL_ENTITY openEntity; entity->open = 1; openEntity.next = openInternalEntities; openInternalEntities = &openEntity; openEntity.entity = entity; openEntity.internalEventPtr = 0; openEntity.internalEventEndPtr = 0; s = (char *)entity->textPtr; end = (char *)(entity->textPtr + entity->textLen); tok = XmlPrologTok(internalEncoding, s, end, &next); result = doProlog(parser, internalEncoding, s, end, tok, next, 0); entity->open = 0; openInternalEntities = openEntity.next; return result; } #endif /* XML_DTD */ static enum XML_Error errorProcessor(XML_Parser parser, const char *s, const char *end, const char **nextPtr) { return errorCode; } static enum XML_Error storeAttributeValue(XML_Parser parser, const ENCODING *enc, int isCdata, const char *ptr, const char *end, STRING_POOL *pool) { enum XML_Error result = appendAttributeValue(parser, enc, isCdata, ptr, end, pool); if (result) return result; if (!isCdata && poolLength(pool) && poolLastChar(pool) == 0x20) poolChop(pool); if (!poolAppendChar(pool, XML_T('\0'))) return XML_ERROR_NO_MEMORY; return XML_ERROR_NONE; } static enum XML_Error appendAttributeValue(XML_Parser parser, const ENCODING *enc, int isCdata, const char *ptr, const char *end, STRING_POOL *pool) { for (;;) { const char *next; int tok = XmlAttributeValueTok(enc, ptr, end, &next); switch (tok) { case XML_TOK_NONE: return XML_ERROR_NONE; case XML_TOK_INVALID: if (enc == encoding) eventPtr = next; return XML_ERROR_INVALID_TOKEN; case XML_TOK_PARTIAL: if (enc == encoding) eventPtr = ptr; return XML_ERROR_INVALID_TOKEN; case XML_TOK_CHAR_REF: { XML_Char buf[XML_ENCODE_MAX]; int i; int n = XmlCharRefNumber(enc, ptr); if (n < 0) { if (enc == encoding) eventPtr = ptr; return XML_ERROR_BAD_CHAR_REF; } if (!isCdata && n == 0x20 /* space */ && (poolLength(pool) == 0 || poolLastChar(pool) == 0x20)) break; n = XmlEncode(n, (ICHAR *)buf); if (!n) { if (enc == encoding) eventPtr = ptr; return XML_ERROR_BAD_CHAR_REF; } for (i = 0; i < n; i++) { if (!poolAppendChar(pool, buf[i])) return XML_ERROR_NO_MEMORY; } } break; case XML_TOK_DATA_CHARS: if (!poolAppend(pool, enc, ptr, next)) return XML_ERROR_NO_MEMORY; break; break; case XML_TOK_TRAILING_CR: next = ptr + enc->minBytesPerChar; /* fall through */ case XML_TOK_ATTRIBUTE_VALUE_S: case XML_TOK_DATA_NEWLINE: if (!isCdata && (poolLength(pool) == 0 || poolLastChar(pool) == 0x20)) break; if (!poolAppendChar(pool, 0x20)) return XML_ERROR_NO_MEMORY; break; case XML_TOK_ENTITY_REF: { const XML_Char *name; ENTITY *entity; XML_Char ch = XmlPredefinedEntityName(enc, ptr + enc->minBytesPerChar, next - enc->minBytesPerChar); if (ch) { if (!poolAppendChar(pool, ch)) return XML_ERROR_NO_MEMORY; break; } name = poolStoreString(&temp2Pool, enc, ptr + enc->minBytesPerChar, next - enc->minBytesPerChar); if (!name) return XML_ERROR_NO_MEMORY; entity = (ENTITY *)lookup(&dtd.generalEntities, name, 0); poolDiscard(&temp2Pool); if (!entity) { if (dtd.complete) { if (enc == encoding) eventPtr = ptr; return XML_ERROR_UNDEFINED_ENTITY; } } else if (entity->open) { if (enc == encoding) eventPtr = ptr; return XML_ERROR_RECURSIVE_ENTITY_REF; } else if (entity->notation) { if (enc == encoding) eventPtr = ptr; return XML_ERROR_BINARY_ENTITY_REF; } else if (!entity->textPtr) { if (enc == encoding) eventPtr = ptr; return XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF; } else { enum XML_Error result; const XML_Char *textEnd = entity->textPtr + entity->textLen; entity->open = 1; result = appendAttributeValue(parser, internalEncoding, isCdata, (char *)entity->textPtr, (char *)textEnd, pool); entity->open = 0; if (result) return result; } } break; default: abort(); } ptr = next; } /* not reached */ } static enum XML_Error storeEntityValue(XML_Parser parser, const ENCODING *enc, const char *entityTextPtr, const char *entityTextEnd) { STRING_POOL *pool = &(dtd.pool); for (;;) { const char *next; int tok = XmlEntityValueTok(enc, entityTextPtr, entityTextEnd, &next); switch (tok) { case XML_TOK_PARAM_ENTITY_REF: #ifdef XML_DTD if (parentParser || enc != encoding) { enum XML_Error result; const XML_Char *name; ENTITY *entity; name = poolStoreString(&tempPool, enc, entityTextPtr + enc->minBytesPerChar, next - enc->minBytesPerChar); if (!name) return XML_ERROR_NO_MEMORY; entity = (ENTITY *)lookup(&dtd.paramEntities, name, 0); poolDiscard(&tempPool); if (!entity) { if (enc == encoding) eventPtr = entityTextPtr; return XML_ERROR_UNDEFINED_ENTITY; } if (entity->open) { if (enc == encoding) eventPtr = entityTextPtr; return XML_ERROR_RECURSIVE_ENTITY_REF; } if (entity->systemId) { if (enc == encoding) eventPtr = entityTextPtr; return XML_ERROR_PARAM_ENTITY_REF; } entity->open = 1; result = storeEntityValue(parser, internalEncoding, (char *)entity->textPtr, (char *)(entity->textPtr + entity->textLen)); entity->open = 0; if (result) return result; break; } #endif /* XML_DTD */ eventPtr = entityTextPtr; return XML_ERROR_SYNTAX; case XML_TOK_NONE: return XML_ERROR_NONE; case XML_TOK_ENTITY_REF: case XML_TOK_DATA_CHARS: if (!poolAppend(pool, enc, entityTextPtr, next)) return XML_ERROR_NO_MEMORY; break; case XML_TOK_TRAILING_CR: next = entityTextPtr + enc->minBytesPerChar; /* fall through */ case XML_TOK_DATA_NEWLINE: if (pool->end == pool->ptr && !poolGrow(pool)) return XML_ERROR_NO_MEMORY; *(pool->ptr)++ = 0xA; break; case XML_TOK_CHAR_REF: { XML_Char buf[XML_ENCODE_MAX]; int i; int n = XmlCharRefNumber(enc, entityTextPtr); if (n < 0) { if (enc == encoding) eventPtr = entityTextPtr; return XML_ERROR_BAD_CHAR_REF; } n = XmlEncode(n, (ICHAR *)buf); if (!n) { if (enc == encoding) eventPtr = entityTextPtr; return XML_ERROR_BAD_CHAR_REF; } for (i = 0; i < n; i++) { if (pool->end == pool->ptr && !poolGrow(pool)) return XML_ERROR_NO_MEMORY; *(pool->ptr)++ = buf[i]; } } break; case XML_TOK_PARTIAL: if (enc == encoding) eventPtr = entityTextPtr; return XML_ERROR_INVALID_TOKEN; case XML_TOK_INVALID: if (enc == encoding) eventPtr = next; return XML_ERROR_INVALID_TOKEN; default: abort(); } entityTextPtr = next; } /* not reached */ } static void normalizeLines(XML_Char *s) { XML_Char *p; for (;; s++) { if (*s == XML_T('\0')) return; if (*s == 0xD) break; } p = s; do { if (*s == 0xD) { *p++ = 0xA; if (*++s == 0xA) s++; } else *p++ = *s++; } while (*s); *p = XML_T('\0'); } static int reportProcessingInstruction(XML_Parser parser, const ENCODING *enc, const char *start, const char *end) { const XML_Char *target; XML_Char *data; const char *tem; if (!processingInstructionHandler) { if (defaultHandler) reportDefault(parser, enc, start, end); return 1; } start += enc->minBytesPerChar * 2; tem = start + XmlNameLength(enc, start); target = poolStoreString(&tempPool, enc, start, tem); if (!target) return 0; poolFinish(&tempPool); data = poolStoreString(&tempPool, enc, XmlSkipS(enc, tem), end - enc->minBytesPerChar*2); if (!data) return 0; normalizeLines(data); processingInstructionHandler(handlerArg, target, data); poolClear(&tempPool); return 1; } static int reportComment(XML_Parser parser, const ENCODING *enc, const char *start, const char *end) { XML_Char *data; if (!commentHandler) { if (defaultHandler) reportDefault(parser, enc, start, end); return 1; } data = poolStoreString(&tempPool, enc, start + enc->minBytesPerChar * 4, end - enc->minBytesPerChar * 3); if (!data) return 0; normalizeLines(data); commentHandler(handlerArg, data); poolClear(&tempPool); return 1; } static void reportDefault(XML_Parser parser, const ENCODING *enc, const char *s, const char *end) { if (MUST_CONVERT(enc, s)) { const char **eventPP; const char **eventEndPP; if (enc == encoding) { eventPP = &eventPtr; eventEndPP = &eventEndPtr; } else { eventPP = &(openInternalEntities->internalEventPtr); eventEndPP = &(openInternalEntities->internalEventEndPtr); } do { ICHAR *dataPtr = (ICHAR *)dataBuf; XmlConvert(enc, &s, end, &dataPtr, (ICHAR *)dataBufEnd); *eventEndPP = s; defaultHandler(handlerArg, dataBuf, dataPtr - (ICHAR *)dataBuf); *eventPP = s; } while (s != end); } else defaultHandler(handlerArg, (XML_Char *)s, (XML_Char *)end - (XML_Char *)s); } static int defineAttribute(ELEMENT_TYPE *type, ATTRIBUTE_ID *attId, int isCdata, int isId, const XML_Char *value) { DEFAULT_ATTRIBUTE *att; if (value || isId) { /* The handling of default attributes gets messed up if we have a default which duplicates a non-default. */ int i; for (i = 0; i < type->nDefaultAtts; i++) if (attId == type->defaultAtts[i].id) return 1; if (isId && !type->idAtt && !attId->xmlns) type->idAtt = attId; } if (type->nDefaultAtts == type->allocDefaultAtts) { if (type->allocDefaultAtts == 0) { type->allocDefaultAtts = 8; type->defaultAtts = malloc(type->allocDefaultAtts*sizeof(DEFAULT_ATTRIBUTE)); } else { type->allocDefaultAtts *= 2; type->defaultAtts = realloc(type->defaultAtts, type->allocDefaultAtts*sizeof(DEFAULT_ATTRIBUTE)); } if (!type->defaultAtts) return 0; } att = type->defaultAtts + type->nDefaultAtts; att->id = attId; att->value = value; att->isCdata = isCdata; if (!isCdata) attId->maybeTokenized = 1; type->nDefaultAtts += 1; return 1; } static int setElementTypePrefix(XML_Parser parser, ELEMENT_TYPE *elementType) { const XML_Char *name; for (name = elementType->name; *name; name++) { if (*name == XML_T(':')) { PREFIX *prefix; const XML_Char *s; for (s = elementType->name; s != name; s++) { if (!poolAppendChar(&dtd.pool, *s)) return 0; } if (!poolAppendChar(&dtd.pool, XML_T('\0'))) return 0; prefix = (PREFIX *)lookup(&dtd.prefixes, poolStart(&dtd.pool), sizeof(PREFIX)); if (!prefix) return 0; if (prefix->name == poolStart(&dtd.pool)) poolFinish(&dtd.pool); else poolDiscard(&dtd.pool); elementType->prefix = prefix; } } return 1; } static ATTRIBUTE_ID * getAttributeId(XML_Parser parser, const ENCODING *enc, const char *start, const char *end) { ATTRIBUTE_ID *id; const XML_Char *name; if (!poolAppendChar(&dtd.pool, XML_T('\0'))) return 0; name = poolStoreString(&dtd.pool, enc, start, end); if (!name) return 0; ++name; id = (ATTRIBUTE_ID *)lookup(&dtd.attributeIds, name, sizeof(ATTRIBUTE_ID)); if (!id) return 0; if (id->name != name) poolDiscard(&dtd.pool); else { poolFinish(&dtd.pool); if (!ns) ; else if (name[0] == 'x' && name[1] == 'm' && name[2] == 'l' && name[3] == 'n' && name[4] == 's' && (name[5] == XML_T('\0') || name[5] == XML_T(':'))) { if (name[5] == '\0') id->prefix = &dtd.defaultPrefix; else id->prefix = (PREFIX *)lookup(&dtd.prefixes, name + 6, sizeof(PREFIX)); id->xmlns = 1; } else { int i; for (i = 0; name[i]; i++) { if (name[i] == XML_T(':')) { int j; for (j = 0; j < i; j++) { if (!poolAppendChar(&dtd.pool, name[j])) return 0; } if (!poolAppendChar(&dtd.pool, XML_T('\0'))) return 0; id->prefix = (PREFIX *)lookup(&dtd.prefixes, poolStart(&dtd.pool), sizeof(PREFIX)); if (id->prefix->name == poolStart(&dtd.pool)) poolFinish(&dtd.pool); else poolDiscard(&dtd.pool); break; } } } } return id; } #define CONTEXT_SEP XML_T('\f') static const XML_Char *getContext(XML_Parser parser) { HASH_TABLE_ITER iter; int needSep = 0; if (dtd.defaultPrefix.binding) { int i; int len; if (!poolAppendChar(&tempPool, XML_T('='))) return 0; len = dtd.defaultPrefix.binding->uriLen; if (namespaceSeparator != XML_T('\0')) len--; for (i = 0; i < len; i++) if (!poolAppendChar(&tempPool, dtd.defaultPrefix.binding->uri[i])) return 0; needSep = 1; } hashTableIterInit(&iter, &(dtd.prefixes)); for (;;) { int i; int len; const XML_Char *s; PREFIX *prefix = (PREFIX *)hashTableIterNext(&iter); if (!prefix) break; if (!prefix->binding) continue; if (needSep && !poolAppendChar(&tempPool, CONTEXT_SEP)) return 0; for (s = prefix->name; *s; s++) if (!poolAppendChar(&tempPool, *s)) return 0; if (!poolAppendChar(&tempPool, XML_T('='))) return 0; len = prefix->binding->uriLen; if (namespaceSeparator != XML_T('\0')) len--; for (i = 0; i < len; i++) if (!poolAppendChar(&tempPool, prefix->binding->uri[i])) return 0; needSep = 1; } hashTableIterInit(&iter, &(dtd.generalEntities)); for (;;) { const XML_Char *s; ENTITY *e = (ENTITY *)hashTableIterNext(&iter); if (!e) break; if (!e->open) continue; if (needSep && !poolAppendChar(&tempPool, CONTEXT_SEP)) return 0; for (s = e->name; *s; s++) if (!poolAppendChar(&tempPool, *s)) return 0; needSep = 1; } if (!poolAppendChar(&tempPool, XML_T('\0'))) return 0; return tempPool.start; } static int setContext(XML_Parser parser, const XML_Char *context) { const XML_Char *s = context; while (*context != XML_T('\0')) { if (*s == CONTEXT_SEP || *s == XML_T('\0')) { ENTITY *e; if (!poolAppendChar(&tempPool, XML_T('\0'))) return 0; e = (ENTITY *)lookup(&dtd.generalEntities, poolStart(&tempPool), 0); if (e) e->open = 1; if (*s != XML_T('\0')) s++; context = s; poolDiscard(&tempPool); } else if (*s == '=') { PREFIX *prefix; if (poolLength(&tempPool) == 0) prefix = &dtd.defaultPrefix; else { if (!poolAppendChar(&tempPool, XML_T('\0'))) return 0; prefix = (PREFIX *)lookup(&dtd.prefixes, poolStart(&tempPool), sizeof(PREFIX)); if (!prefix) return 0; if (prefix->name == poolStart(&tempPool)) { prefix->name = poolCopyString(&dtd.pool, prefix->name); if (!prefix->name) return 0; } poolDiscard(&tempPool); } for (context = s + 1; *context != CONTEXT_SEP && *context != XML_T('\0'); context++) if (!poolAppendChar(&tempPool, *context)) return 0; if (!poolAppendChar(&tempPool, XML_T('\0'))) return 0; if (!addBinding(parser, prefix, 0, poolStart(&tempPool), &inheritedBindings)) return 0; poolDiscard(&tempPool); if (*context != XML_T('\0')) ++context; s = context; } else { if (!poolAppendChar(&tempPool, *s)) return 0; s++; } } return 1; } static void normalizePublicId(XML_Char *publicId) { XML_Char *p = publicId; XML_Char *s; for (s = publicId; *s; s++) { switch (*s) { case 0x20: case 0xD: case 0xA: if (p != publicId && p[-1] != 0x20) *p++ = 0x20; break; default: *p++ = *s; } } if (p != publicId && p[-1] == 0x20) --p; *p = XML_T('\0'); } static int dtdInit(DTD *p) { poolInit(&(p->pool)); hashTableInit(&(p->generalEntities)); hashTableInit(&(p->elementTypes)); hashTableInit(&(p->attributeIds)); hashTableInit(&(p->prefixes)); p->complete = 1; p->standalone = 0; #ifdef XML_DTD hashTableInit(&(p->paramEntities)); #endif /* XML_DTD */ p->defaultPrefix.name = 0; p->defaultPrefix.binding = 0; return 1; } #ifdef XML_DTD static void dtdSwap(DTD *p1, DTD *p2) { DTD tem; memcpy(&tem, p1, sizeof(DTD)); memcpy(p1, p2, sizeof(DTD)); memcpy(p2, &tem, sizeof(DTD)); } #endif /* XML_DTD */ static void dtdDestroy(DTD *p) { HASH_TABLE_ITER iter; hashTableIterInit(&iter, &(p->elementTypes)); for (;;) { ELEMENT_TYPE *e = (ELEMENT_TYPE *)hashTableIterNext(&iter); if (!e) break; if (e->allocDefaultAtts != 0) free(e->defaultAtts); } hashTableDestroy(&(p->generalEntities)); #ifdef XML_DTD hashTableDestroy(&(p->paramEntities)); #endif /* XML_DTD */ hashTableDestroy(&(p->elementTypes)); hashTableDestroy(&(p->attributeIds)); hashTableDestroy(&(p->prefixes)); poolDestroy(&(p->pool)); } /* Do a deep copy of the DTD. Return 0 for out of memory; non-zero otherwise. The new DTD has already been initialized. */ static int dtdCopy(DTD *newDtd, const DTD *oldDtd) { HASH_TABLE_ITER iter; /* Copy the prefix table. */ hashTableIterInit(&iter, &(oldDtd->prefixes)); for (;;) { const XML_Char *name; const PREFIX *oldP = (PREFIX *)hashTableIterNext(&iter); if (!oldP) break; name = poolCopyString(&(newDtd->pool), oldP->name); if (!name) return 0; if (!lookup(&(newDtd->prefixes), name, sizeof(PREFIX))) return 0; } hashTableIterInit(&iter, &(oldDtd->attributeIds)); /* Copy the attribute id table. */ for (;;) { ATTRIBUTE_ID *newA; const XML_Char *name; const ATTRIBUTE_ID *oldA = (ATTRIBUTE_ID *)hashTableIterNext(&iter); if (!oldA) break; /* Remember to allocate the scratch byte before the name. */ if (!poolAppendChar(&(newDtd->pool), XML_T('\0'))) return 0; name = poolCopyString(&(newDtd->pool), oldA->name); if (!name) return 0; ++name; newA = (ATTRIBUTE_ID *)lookup(&(newDtd->attributeIds), name, sizeof(ATTRIBUTE_ID)); if (!newA) return 0; newA->maybeTokenized = oldA->maybeTokenized; if (oldA->prefix) { newA->xmlns = oldA->xmlns; if (oldA->prefix == &oldDtd->defaultPrefix) newA->prefix = &newDtd->defaultPrefix; else newA->prefix = (PREFIX *)lookup(&(newDtd->prefixes), oldA->prefix->name, 0); } } /* Copy the element type table. */ hashTableIterInit(&iter, &(oldDtd->elementTypes)); for (;;) { int i; ELEMENT_TYPE *newE; const XML_Char *name; const ELEMENT_TYPE *oldE = (ELEMENT_TYPE *)hashTableIterNext(&iter); if (!oldE) break; name = poolCopyString(&(newDtd->pool), oldE->name); if (!name) return 0; newE = (ELEMENT_TYPE *)lookup(&(newDtd->elementTypes), name, sizeof(ELEMENT_TYPE)); if (!newE) return 0; if (oldE->nDefaultAtts) { newE->defaultAtts = (DEFAULT_ATTRIBUTE *)malloc(oldE->nDefaultAtts * sizeof(DEFAULT_ATTRIBUTE)); if (!newE->defaultAtts) return 0; } if (oldE->idAtt) newE->idAtt = (ATTRIBUTE_ID *)lookup(&(newDtd->attributeIds), oldE->idAtt->name, 0); newE->allocDefaultAtts = newE->nDefaultAtts = oldE->nDefaultAtts; if (oldE->prefix) newE->prefix = (PREFIX *)lookup(&(newDtd->prefixes), oldE->prefix->name, 0); for (i = 0; i < newE->nDefaultAtts; i++) { newE->defaultAtts[i].id = (ATTRIBUTE_ID *)lookup(&(newDtd->attributeIds), oldE->defaultAtts[i].id->name, 0); newE->defaultAtts[i].isCdata = oldE->defaultAtts[i].isCdata; if (oldE->defaultAtts[i].value) { newE->defaultAtts[i].value = poolCopyString(&(newDtd->pool), oldE->defaultAtts[i].value); if (!newE->defaultAtts[i].value) return 0; } else newE->defaultAtts[i].value = 0; } } /* Copy the entity tables. */ if (!copyEntityTable(&(newDtd->generalEntities), &(newDtd->pool), &(oldDtd->generalEntities))) return 0; #ifdef XML_DTD if (!copyEntityTable(&(newDtd->paramEntities), &(newDtd->pool), &(oldDtd->paramEntities))) return 0; #endif /* XML_DTD */ newDtd->complete = oldDtd->complete; newDtd->standalone = oldDtd->standalone; return 1; } static int copyEntityTable(HASH_TABLE *newTable, STRING_POOL *newPool, const HASH_TABLE *oldTable) { HASH_TABLE_ITER iter; const XML_Char *cachedOldBase = 0; const XML_Char *cachedNewBase = 0; hashTableIterInit(&iter, oldTable); for (;;) { ENTITY *newE; const XML_Char *name; const ENTITY *oldE = (ENTITY *)hashTableIterNext(&iter); if (!oldE) break; name = poolCopyString(newPool, oldE->name); if (!name) return 0; newE = (ENTITY *)lookup(newTable, name, sizeof(ENTITY)); if (!newE) return 0; if (oldE->systemId) { const XML_Char *tem = poolCopyString(newPool, oldE->systemId); if (!tem) return 0; newE->systemId = tem; if (oldE->base) { if (oldE->base == cachedOldBase) newE->base = cachedNewBase; else { cachedOldBase = oldE->base; tem = poolCopyString(newPool, cachedOldBase); if (!tem) return 0; cachedNewBase = newE->base = tem; } } } else { const XML_Char *tem = poolCopyStringN(newPool, oldE->textPtr, oldE->textLen); if (!tem) return 0; newE->textPtr = tem; newE->textLen = oldE->textLen; } if (oldE->notation) { const XML_Char *tem = poolCopyString(newPool, oldE->notation); if (!tem) return 0; newE->notation = tem; } } return 1; } #define INIT_SIZE 64 static int keyeq(KEY s1, KEY s2) { for (; *s1 == *s2; s1++, s2++) if (*s1 == 0) return 1; return 0; } static unsigned long hash(KEY s) { unsigned long h = 0; while (*s) h = (h << 5) + h + (unsigned char)*s++; return h; } static NAMED *lookup(HASH_TABLE *table, KEY name, size_t createSize) { size_t i; if (table->size == 0) { if (!createSize) return 0; table->v = calloc(INIT_SIZE, sizeof(NAMED *)); if (!table->v) return 0; table->size = INIT_SIZE; table->usedLim = INIT_SIZE / 2; i = hash(name) & (table->size - 1); } else { unsigned long h = hash(name); for (i = h & (table->size - 1); table->v[i]; i == 0 ? i = table->size - 1 : --i) { if (keyeq(name, table->v[i]->name)) return table->v[i]; } if (!createSize) return 0; if (table->used == table->usedLim) { /* check for overflow */ size_t newSize = table->size * 2; NAMED **newV = calloc(newSize, sizeof(NAMED *)); if (!newV) return 0; for (i = 0; i < table->size; i++) if (table->v[i]) { size_t j; for (j = hash(table->v[i]->name) & (newSize - 1); newV[j]; j == 0 ? j = newSize - 1 : --j) ; newV[j] = table->v[i]; } free(table->v); table->v = newV; table->size = newSize; table->usedLim = newSize/2; for (i = h & (table->size - 1); table->v[i]; i == 0 ? i = table->size - 1 : --i) ; } } table->v[i] = calloc(1, createSize); if (!table->v[i]) return 0; table->v[i]->name = name; (table->used)++; return table->v[i]; } static void hashTableDestroy(HASH_TABLE *table) { size_t i; for (i = 0; i < table->size; i++) { NAMED *p = table->v[i]; if (p) free(p); } if (table->v) free(table->v); } static void hashTableInit(HASH_TABLE *p) { p->size = 0; p->usedLim = 0; p->used = 0; p->v = 0; } static void hashTableIterInit(HASH_TABLE_ITER *iter, const HASH_TABLE *table) { iter->p = table->v; iter->end = iter->p + table->size; } static NAMED *hashTableIterNext(HASH_TABLE_ITER *iter) { while (iter->p != iter->end) { NAMED *tem = *(iter->p)++; if (tem) return tem; } return 0; } static void poolInit(STRING_POOL *pool) { pool->blocks = 0; pool->freeBlocks = 0; pool->start = 0; pool->ptr = 0; pool->end = 0; } static void poolClear(STRING_POOL *pool) { if (!pool->freeBlocks) pool->freeBlocks = pool->blocks; else { BLOCK *p = pool->blocks; while (p) { BLOCK *tem = p->next; p->next = pool->freeBlocks; pool->freeBlocks = p; p = tem; } } pool->blocks = 0; pool->start = 0; pool->ptr = 0; pool->end = 0; } static void poolDestroy(STRING_POOL *pool) { BLOCK *p = pool->blocks; while (p) { BLOCK *tem = p->next; free(p); p = tem; } pool->blocks = 0; p = pool->freeBlocks; while (p) { BLOCK *tem = p->next; free(p); p = tem; } pool->freeBlocks = 0; pool->ptr = 0; pool->start = 0; pool->end = 0; } static XML_Char *poolAppend(STRING_POOL *pool, const ENCODING *enc, const char *ptr, const char *end) { if (!pool->ptr && !poolGrow(pool)) return 0; for (;;) { XmlConvert(enc, &ptr, end, (ICHAR **)&(pool->ptr), (ICHAR *)pool->end); if (ptr == end) break; if (!poolGrow(pool)) return 0; } return pool->start; } static const XML_Char *poolCopyString(STRING_POOL *pool, const XML_Char *s) { do { if (!poolAppendChar(pool, *s)) return 0; } while (*s++); s = pool->start; poolFinish(pool); return s; } static const XML_Char *poolCopyStringN(STRING_POOL *pool, const XML_Char *s, int n) { if (!pool->ptr && !poolGrow(pool)) return 0; for (; n > 0; --n, s++) { if (!poolAppendChar(pool, *s)) return 0; } s = pool->start; poolFinish(pool); return s; } static XML_Char *poolStoreString(STRING_POOL *pool, const ENCODING *enc, const char *ptr, const char *end) { if (!poolAppend(pool, enc, ptr, end)) return 0; if (pool->ptr == pool->end && !poolGrow(pool)) return 0; *(pool->ptr)++ = 0; return pool->start; } static int poolGrow(STRING_POOL *pool) { if (pool->freeBlocks) { if (pool->start == 0) { pool->blocks = pool->freeBlocks; pool->freeBlocks = pool->freeBlocks->next; pool->blocks->next = 0; pool->start = pool->blocks->s; pool->end = pool->start + pool->blocks->size; pool->ptr = pool->start; return 1; } if (pool->end - pool->start < pool->freeBlocks->size) { BLOCK *tem = pool->freeBlocks->next; pool->freeBlocks->next = pool->blocks; pool->blocks = pool->freeBlocks; pool->freeBlocks = tem; memcpy(pool->blocks->s, pool->start, (pool->end - pool->start) * sizeof(XML_Char)); pool->ptr = pool->blocks->s + (pool->ptr - pool->start); pool->start = pool->blocks->s; pool->end = pool->start + pool->blocks->size; return 1; } } if (pool->blocks && pool->start == pool->blocks->s) { int blockSize = (pool->end - pool->start)*2; pool->blocks = realloc(pool->blocks, offsetof(BLOCK, s) + blockSize * sizeof(XML_Char)); if (!pool->blocks) return 0; pool->blocks->size = blockSize; pool->ptr = pool->blocks->s + (pool->ptr - pool->start); pool->start = pool->blocks->s; pool->end = pool->start + blockSize; } else { BLOCK *tem; int blockSize = pool->end - pool->start; if (blockSize < INIT_BLOCK_SIZE) blockSize = INIT_BLOCK_SIZE; else blockSize *= 2; tem = malloc(offsetof(BLOCK, s) + blockSize * sizeof(XML_Char)); if (!tem) return 0; tem->size = blockSize; tem->next = pool->blocks; pool->blocks = tem; if (pool->ptr != pool->start) memcpy(tem->s, pool->start, (pool->ptr - pool->start) * sizeof(XML_Char)); pool->ptr = tem->s + (pool->ptr - pool->start); pool->start = tem->s; pool->end = tem->s + blockSize; } return 1; } swish-e-2.4.7/src/expat/COPYING0000664000077100017500000000253611166010107012772 00000000000000This is James Clark's expat XML parser library. It is convered under the MIT License. http://www.jclark.com/xml/expat.html http://expat.sourceforge.net/ The MIT License Copyright (c) Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. swish-e-2.4.7/src/expat/Makefile.in0000664000077100017500000003454011166010107014004 00000000000000# Makefile.in generated by automake 1.9.6 from Makefile.am. # @configure_input@ # Copyright (C) 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, # 2003, 2004, 2005 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. @SET_MAKE@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = ../.. am__cd = CDPATH="$${ZSH_VERSION+.}$(PATH_SEPARATOR)" && cd INSTALL = @INSTALL@ install_sh_DATA = $(install_sh) -c -m 644 install_sh_PROGRAM = $(install_sh) -c install_sh_SCRIPT = $(install_sh) -c INSTALL_HEADER = $(INSTALL_DATA) transform = $(program_transform_name) NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : build_triplet = @build@ host_triplet = @host@ subdir = src/expat DIST_COMMON = $(srcdir)/Makefile.am $(srcdir)/Makefile.in COPYING ACLOCAL_M4 = $(top_srcdir)/aclocal.m4 am__aclocal_m4_deps = $(top_srcdir)/config/acinclude.m4 \ $(top_srcdir)/configure.in am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \ $(ACLOCAL_M4) mkinstalldirs = $(SHELL) $(top_srcdir)/config/mkinstalldirs CONFIG_HEADER = $(top_builddir)/src/acconfig.h CONFIG_CLEAN_FILES = LTLIBRARIES = $(noinst_LTLIBRARIES) libswexpat_la_LIBADD = am_libswexpat_la_OBJECTS = xmltok.lo xmlrole.lo xmlparse.lo libswexpat_la_OBJECTS = $(am_libswexpat_la_OBJECTS) DEFAULT_INCLUDES = -I. -I$(srcdir) -I$(top_builddir)/src depcomp = $(SHELL) $(top_srcdir)/config/depcomp am__depfiles_maybe = depfiles COMPILE = $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) \ $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) LTCOMPILE = $(LIBTOOL) --tag=CC --mode=compile $(CC) $(DEFS) \ $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) \ $(AM_CFLAGS) $(CFLAGS) CCLD = $(CC) LINK = $(LIBTOOL) --tag=CC --mode=link $(CCLD) $(AM_CFLAGS) $(CFLAGS) \ $(AM_LDFLAGS) $(LDFLAGS) -o $@ SOURCES = $(libswexpat_la_SOURCES) DIST_SOURCES = $(libswexpat_la_SOURCES) ETAGS = etags CTAGS = ctags DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST) ACLOCAL = @ACLOCAL@ ALLOCA = @ALLOCA@ AMDEP_FALSE = @AMDEP_FALSE@ AMDEP_TRUE = @AMDEP_TRUE@ AMTAR = @AMTAR@ AR = @AR@ AS = @AS@ AUTOCONF = @AUTOCONF@ AUTOHEADER = @AUTOHEADER@ AUTOMAKE = @AUTOMAKE@ AWK = @AWK@ BTREE_OBJS = @BTREE_OBJS@ BUILDDOCS_FALSE = @BUILDDOCS_FALSE@ BUILDDOCS_TRUE = @BUILDDOCS_TRUE@ CC = @CC@ CCDEPMODE = @CCDEPMODE@ CFLAGS = @CFLAGS@ CPP = @CPP@ CPPFLAGS = @CPPFLAGS@ CXX = @CXX@ CXXCPP = @CXXCPP@ CXXDEPMODE = @CXXDEPMODE@ CXXFLAGS = @CXXFLAGS@ CYGPATH_W = @CYGPATH_W@ DEFS = @DEFS@ DEPDIR = @DEPDIR@ DLLTOOL = @DLLTOOL@ ECHO = @ECHO@ ECHO_C = @ECHO_C@ ECHO_N = @ECHO_N@ ECHO_T = @ECHO_T@ EGREP = @EGREP@ EXEEXT = @EXEEXT@ F77 = @F77@ FFLAGS = @FFLAGS@ INSTALLDOCS_FALSE = @INSTALLDOCS_FALSE@ INSTALLDOCS_TRUE = @INSTALLDOCS_TRUE@ INSTALL_DATA = @INSTALL_DATA@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@ LARGEFILES_MACROS = @LARGEFILES_MACROS@ LDFLAGS = @LDFLAGS@ LIBOBJS = @LIBOBJS@ LIBS = @LIBS@ LIBTOOL = @LIBTOOL@ LIBXML2_CFLAGS = @LIBXML2_CFLAGS@ LIBXML2_LIB = @LIBXML2_LIB@ LIBXML2_OBJS = @LIBXML2_OBJS@ LIBXML_REQUIRED_VERSION = @LIBXML_REQUIRED_VERSION@ LN_S = @LN_S@ LTLIBOBJS = @LTLIBOBJS@ MAINT = @MAINT@ MAINTAINER_MODE_FALSE = @MAINTAINER_MODE_FALSE@ MAINTAINER_MODE_TRUE = @MAINTAINER_MODE_TRUE@ MAKEINFO = @MAKEINFO@ OBJDUMP = @OBJDUMP@ OBJEXT = @OBJEXT@ PACKAGE = @PACKAGE@ PACKAGE_BUGREPORT = @PACKAGE_BUGREPORT@ PACKAGE_NAME = @PACKAGE_NAME@ PACKAGE_STRING = @PACKAGE_STRING@ PACKAGE_TARNAME = @PACKAGE_TARNAME@ PACKAGE_VERSION = @PACKAGE_VERSION@ PATH_SEPARATOR = @PATH_SEPARATOR@ PCRE_CFLAGS = @PCRE_CFLAGS@ PCRE_CONFIG = @PCRE_CONFIG@ PCRE_LIBS = @PCRE_LIBS@ PCRE_REQUIRED_VERSION = @PCRE_REQUIRED_VERSION@ PERL = @PERL@ POD2MAN = @POD2MAN@ RANLIB = @RANLIB@ SET_MAKE = @SET_MAKE@ SHELL = @SHELL@ STRIP = @STRIP@ SWISH_WEB = @SWISH_WEB@ VERSION = @VERSION@ XML2_CONFIG = @XML2_CONFIG@ Z_CFLAGS = @Z_CFLAGS@ Z_LIBS = @Z_LIBS@ ac_ct_AR = @ac_ct_AR@ ac_ct_AS = @ac_ct_AS@ ac_ct_CC = @ac_ct_CC@ ac_ct_CXX = @ac_ct_CXX@ ac_ct_DLLTOOL = @ac_ct_DLLTOOL@ ac_ct_F77 = @ac_ct_F77@ ac_ct_OBJDUMP = @ac_ct_OBJDUMP@ ac_ct_RANLIB = @ac_ct_RANLIB@ ac_ct_STRIP = @ac_ct_STRIP@ am__fastdepCC_FALSE = @am__fastdepCC_FALSE@ am__fastdepCC_TRUE = @am__fastdepCC_TRUE@ am__fastdepCXX_FALSE = @am__fastdepCXX_FALSE@ am__fastdepCXX_TRUE = @am__fastdepCXX_TRUE@ am__include = @am__include@ am__leading_dot = @am__leading_dot@ am__quote = @am__quote@ am__tar = @am__tar@ am__untar = @am__untar@ bindir = @bindir@ build = @build@ build_alias = @build_alias@ build_cpu = @build_cpu@ build_os = @build_os@ build_vendor = @build_vendor@ datadir = @datadir@ exec_prefix = @exec_prefix@ host = @host@ host_alias = @host_alias@ host_cpu = @host_cpu@ host_os = @host_os@ host_vendor = @host_vendor@ includedir = @includedir@ infodir = @infodir@ install_sh = @install_sh@ libdir = @libdir@ libexecdir = @libexecdir@ localstatedir = @localstatedir@ mandir = @mandir@ mkdir_p = @mkdir_p@ oldincludedir = @oldincludedir@ prefix = @prefix@ program_transform_name = @program_transform_name@ sbindir = @sbindir@ sharedstatedir = @sharedstatedir@ sysconfdir = @sysconfdir@ target_alias = @target_alias@ AM_CPPFLAGS = -I"$(srcdir)/xmlparse" -I"$(srcdir)/xmltok" noinst_LTLIBRARIES = libswexpat.la libswexpat_la_SOURCES = \ xmltok.c \ xmlrole.c \ xmlparse.c EXTRA_DIST = \ COPYING \ expat.dsw \ xmlparse/xmlparse.c \ xmlparse/xmlparse.dsp \ xmlparse/xmlparse.h \ xmltok/ascii.h \ xmltok/asciitab.h \ xmltok/dllmain.c \ xmltok/iasciitab.h \ xmltok/latin1tab.h \ xmltok/nametab.h \ xmltok/utf8tab.h \ xmltok/xmldef.h \ xmltok/xmlrole.c \ xmltok/xmlrole.h \ xmltok/xmltok.c \ xmltok/xmltok.dsp \ xmltok/xmltok.h \ xmltok/xmltok_impl.c \ xmltok/xmltok_impl.h \ xmltok/xmltok_ns.c all: all-am .SUFFIXES: .SUFFIXES: .c .lo .o .obj $(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am $(am__configure_deps) @for dep in $?; do \ case '$(am__configure_deps)' in \ *$$dep*) \ cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh \ && exit 0; \ exit 1;; \ esac; \ done; \ echo ' cd $(top_srcdir) && $(AUTOMAKE) --foreign src/expat/Makefile'; \ cd $(top_srcdir) && \ $(AUTOMAKE) --foreign src/expat/Makefile .PRECIOUS: Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status @case '$?' in \ *config.status*) \ cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh;; \ *) \ echo ' cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe)'; \ cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe);; \ esac; $(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh $(top_srcdir)/configure: @MAINTAINER_MODE_TRUE@ $(am__configure_deps) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh $(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh clean-noinstLTLIBRARIES: -test -z "$(noinst_LTLIBRARIES)" || rm -f $(noinst_LTLIBRARIES) @list='$(noinst_LTLIBRARIES)'; for p in $$list; do \ dir="`echo $$p | sed -e 's|/[^/]*$$||'`"; \ test "$$dir" != "$$p" || dir=.; \ echo "rm -f \"$${dir}/so_locations\""; \ rm -f "$${dir}/so_locations"; \ done libswexpat.la: $(libswexpat_la_OBJECTS) $(libswexpat_la_DEPENDENCIES) $(LINK) $(libswexpat_la_LDFLAGS) $(libswexpat_la_OBJECTS) $(libswexpat_la_LIBADD) $(LIBS) mostlyclean-compile: -rm -f *.$(OBJEXT) distclean-compile: -rm -f *.tab.c @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/xmlparse.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/xmlrole.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/xmltok.Plo@am__quote@ .c.o: @am__fastdepCC_TRUE@ if $(COMPILE) -MT $@ -MD -MP -MF "$(DEPDIR)/$*.Tpo" -c -o $@ $<; \ @am__fastdepCC_TRUE@ then mv -f "$(DEPDIR)/$*.Tpo" "$(DEPDIR)/$*.Po"; else rm -f "$(DEPDIR)/$*.Tpo"; exit 1; fi @AMDEP_TRUE@@am__fastdepCC_FALSE@ source='$<' object='$@' libtool=no @AMDEPBACKSLASH@ @AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ @am__fastdepCC_FALSE@ $(COMPILE) -c $< .c.obj: @am__fastdepCC_TRUE@ if $(COMPILE) -MT $@ -MD -MP -MF "$(DEPDIR)/$*.Tpo" -c -o $@ `$(CYGPATH_W) '$<'`; \ @am__fastdepCC_TRUE@ then mv -f "$(DEPDIR)/$*.Tpo" "$(DEPDIR)/$*.Po"; else rm -f "$(DEPDIR)/$*.Tpo"; exit 1; fi @AMDEP_TRUE@@am__fastdepCC_FALSE@ source='$<' object='$@' libtool=no @AMDEPBACKSLASH@ @AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ @am__fastdepCC_FALSE@ $(COMPILE) -c `$(CYGPATH_W) '$<'` .c.lo: @am__fastdepCC_TRUE@ if $(LTCOMPILE) -MT $@ -MD -MP -MF "$(DEPDIR)/$*.Tpo" -c -o $@ $<; \ @am__fastdepCC_TRUE@ then mv -f "$(DEPDIR)/$*.Tpo" "$(DEPDIR)/$*.Plo"; else rm -f "$(DEPDIR)/$*.Tpo"; exit 1; fi @AMDEP_TRUE@@am__fastdepCC_FALSE@ source='$<' object='$@' libtool=yes @AMDEPBACKSLASH@ @AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ @am__fastdepCC_FALSE@ $(LTCOMPILE) -c -o $@ $< mostlyclean-libtool: -rm -f *.lo clean-libtool: -rm -rf .libs _libs distclean-libtool: -rm -f libtool uninstall-info-am: ID: $(HEADERS) $(SOURCES) $(LISP) $(TAGS_FILES) list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \ unique=`for i in $$list; do \ if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \ done | \ $(AWK) ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ mkid -fID $$unique tags: TAGS TAGS: $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) \ $(TAGS_FILES) $(LISP) tags=; \ here=`pwd`; \ list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \ unique=`for i in $$list; do \ if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \ done | \ $(AWK) ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ if test -z "$(ETAGS_ARGS)$$tags$$unique"; then :; else \ test -n "$$unique" || unique=$$empty_fix; \ $(ETAGS) $(ETAGSFLAGS) $(AM_ETAGSFLAGS) $(ETAGS_ARGS) \ $$tags $$unique; \ fi ctags: CTAGS CTAGS: $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) \ $(TAGS_FILES) $(LISP) tags=; \ here=`pwd`; \ list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \ unique=`for i in $$list; do \ if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \ done | \ $(AWK) ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ test -z "$(CTAGS_ARGS)$$tags$$unique" \ || $(CTAGS) $(CTAGSFLAGS) $(AM_CTAGSFLAGS) $(CTAGS_ARGS) \ $$tags $$unique GTAGS: here=`$(am__cd) $(top_builddir) && pwd` \ && cd $(top_srcdir) \ && gtags -i $(GTAGS_ARGS) $$here distclean-tags: -rm -f TAGS ID GTAGS GRTAGS GSYMS GPATH tags distdir: $(DISTFILES) $(mkdir_p) $(distdir)/xmlparse $(distdir)/xmltok @srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; \ topsrcdirstrip=`echo "$(top_srcdir)" | sed 's|.|.|g'`; \ list='$(DISTFILES)'; for file in $$list; do \ case $$file in \ $(srcdir)/*) file=`echo "$$file" | sed "s|^$$srcdirstrip/||"`;; \ $(top_srcdir)/*) file=`echo "$$file" | sed "s|^$$topsrcdirstrip/|$(top_builddir)/|"`;; \ esac; \ if test -f $$file || test -d $$file; then d=.; else d=$(srcdir); fi; \ dir=`echo "$$file" | sed -e 's,/[^/]*$$,,'`; \ if test "$$dir" != "$$file" && test "$$dir" != "."; then \ dir="/$$dir"; \ $(mkdir_p) "$(distdir)$$dir"; \ else \ dir=''; \ fi; \ if test -d $$d/$$file; then \ if test -d $(srcdir)/$$file && test $$d != $(srcdir); then \ cp -pR $(srcdir)/$$file $(distdir)$$dir || exit 1; \ fi; \ cp -pR $$d/$$file $(distdir)$$dir || exit 1; \ else \ test -f $(distdir)/$$file \ || cp -p $$d/$$file $(distdir)/$$file \ || exit 1; \ fi; \ done check-am: all-am check: check-am all-am: Makefile $(LTLIBRARIES) installdirs: install: install-am install-exec: install-exec-am install-data: install-data-am uninstall: uninstall-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am installcheck: installcheck-am install-strip: $(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \ install_sh_PROGRAM="$(INSTALL_STRIP_PROGRAM)" INSTALL_STRIP_FLAG=-s \ `test -z '$(STRIP)' || \ echo "INSTALL_PROGRAM_ENV=STRIPPROG='$(STRIP)'"` install mostlyclean-generic: clean-generic: distclean-generic: -test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES) maintainer-clean-generic: @echo "This command is intended for maintainers to use" @echo "it deletes files that may require special tools to rebuild." clean: clean-am clean-am: clean-generic clean-libtool clean-noinstLTLIBRARIES \ mostlyclean-am distclean: distclean-am -rm -rf ./$(DEPDIR) -rm -f Makefile distclean-am: clean-am distclean-compile distclean-generic \ distclean-libtool distclean-tags dvi: dvi-am dvi-am: html: html-am info: info-am info-am: install-data-am: install-exec-am: install-info: install-info-am install-man: installcheck-am: maintainer-clean: maintainer-clean-am -rm -rf ./$(DEPDIR) -rm -f Makefile maintainer-clean-am: distclean-am maintainer-clean-generic mostlyclean: mostlyclean-am mostlyclean-am: mostlyclean-compile mostlyclean-generic \ mostlyclean-libtool pdf: pdf-am pdf-am: ps: ps-am ps-am: uninstall-am: uninstall-info-am .PHONY: CTAGS GTAGS all all-am check check-am clean clean-generic \ clean-libtool clean-noinstLTLIBRARIES ctags distclean \ distclean-compile distclean-generic distclean-libtool \ distclean-tags distdir dvi dvi-am html html-am info info-am \ install install-am install-data install-data-am install-exec \ install-exec-am install-info install-info-am install-man \ install-strip installcheck installcheck-am installdirs \ maintainer-clean maintainer-clean-generic mostlyclean \ mostlyclean-compile mostlyclean-generic mostlyclean-libtool \ pdf pdf-am ps ps-am tags uninstall uninstall-am \ uninstall-info-am # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: swish-e-2.4.7/src/expat/xmltok.c0000664000077100017500000011162011166010107013414 00000000000000/* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd See the file copying.txt for copying permission. */ #include "xmldef.h" #include "xmltok.h" #include "nametab.h" #ifdef XML_DTD #define IGNORE_SECTION_TOK_VTABLE , PREFIX(ignoreSectionTok) #else #define IGNORE_SECTION_TOK_VTABLE /* as nothing */ #endif #define VTABLE1 \ { PREFIX(prologTok), PREFIX(contentTok), \ PREFIX(cdataSectionTok) IGNORE_SECTION_TOK_VTABLE }, \ { PREFIX(attributeValueTok), PREFIX(entityValueTok) }, \ PREFIX(sameName), \ PREFIX(nameMatchesAscii), \ PREFIX(nameLength), \ PREFIX(skipS), \ PREFIX(getAtts), \ PREFIX(charRefNumber), \ PREFIX(predefinedEntityName), \ PREFIX(updatePosition), \ PREFIX(isPublicId) #define VTABLE VTABLE1, PREFIX(toUtf8), PREFIX(toUtf16) #define UCS2_GET_NAMING(pages, hi, lo) \ (namingBitmap[(pages[hi] << 3) + ((lo) >> 5)] & (1 << ((lo) & 0x1F))) /* A 2 byte UTF-8 representation splits the characters 11 bits between the bottom 5 and 6 bits of the bytes. We need 8 bits to index into pages, 3 bits to add to that index and 5 bits to generate the mask. */ #define UTF8_GET_NAMING2(pages, byte) \ (namingBitmap[((pages)[(((byte)[0]) >> 2) & 7] << 3) \ + ((((byte)[0]) & 3) << 1) \ + ((((byte)[1]) >> 5) & 1)] \ & (1 << (((byte)[1]) & 0x1F))) /* A 3 byte UTF-8 representation splits the characters 16 bits between the bottom 4, 6 and 6 bits of the bytes. We need 8 bits to index into pages, 3 bits to add to that index and 5 bits to generate the mask. */ #define UTF8_GET_NAMING3(pages, byte) \ (namingBitmap[((pages)[((((byte)[0]) & 0xF) << 4) \ + ((((byte)[1]) >> 2) & 0xF)] \ << 3) \ + ((((byte)[1]) & 3) << 1) \ + ((((byte)[2]) >> 5) & 1)] \ & (1 << (((byte)[2]) & 0x1F))) #define UTF8_GET_NAMING(pages, p, n) \ ((n) == 2 \ ? UTF8_GET_NAMING2(pages, (const unsigned char *)(p)) \ : ((n) == 3 \ ? UTF8_GET_NAMING3(pages, (const unsigned char *)(p)) \ : 0)) #define UTF8_INVALID3(p) \ ((*p) == 0xED \ ? (((p)[1] & 0x20) != 0) \ : ((*p) == 0xEF \ ? ((p)[1] == 0xBF && ((p)[2] == 0xBF || (p)[2] == 0xBE)) \ : 0)) #define UTF8_INVALID4(p) ((*p) == 0xF4 && ((p)[1] & 0x30) != 0) static int isNever(const ENCODING *enc, const char *p) { return 0; } static int utf8_isName2(const ENCODING *enc, const char *p) { return UTF8_GET_NAMING2(namePages, (const unsigned char *)p); } static int utf8_isName3(const ENCODING *enc, const char *p) { return UTF8_GET_NAMING3(namePages, (const unsigned char *)p); } #define utf8_isName4 isNever static int utf8_isNmstrt2(const ENCODING *enc, const char *p) { return UTF8_GET_NAMING2(nmstrtPages, (const unsigned char *)p); } static int utf8_isNmstrt3(const ENCODING *enc, const char *p) { return UTF8_GET_NAMING3(nmstrtPages, (const unsigned char *)p); } #define utf8_isNmstrt4 isNever #define utf8_isInvalid2 isNever static int utf8_isInvalid3(const ENCODING *enc, const char *p) { return UTF8_INVALID3((const unsigned char *)p); } static int utf8_isInvalid4(const ENCODING *enc, const char *p) { return UTF8_INVALID4((const unsigned char *)p); } struct normal_encoding { ENCODING enc; unsigned char type[256]; #ifdef XML_MIN_SIZE int (*byteType)(const ENCODING *, const char *); int (*isNameMin)(const ENCODING *, const char *); int (*isNmstrtMin)(const ENCODING *, const char *); int (*byteToAscii)(const ENCODING *, const char *); int (*charMatches)(const ENCODING *, const char *, int); #endif /* XML_MIN_SIZE */ int (*isName2)(const ENCODING *, const char *); int (*isName3)(const ENCODING *, const char *); int (*isName4)(const ENCODING *, const char *); int (*isNmstrt2)(const ENCODING *, const char *); int (*isNmstrt3)(const ENCODING *, const char *); int (*isNmstrt4)(const ENCODING *, const char *); int (*isInvalid2)(const ENCODING *, const char *); int (*isInvalid3)(const ENCODING *, const char *); int (*isInvalid4)(const ENCODING *, const char *); }; #ifdef XML_MIN_SIZE #define STANDARD_VTABLE(E) \ E ## byteType, \ E ## isNameMin, \ E ## isNmstrtMin, \ E ## byteToAscii, \ E ## charMatches, #else #define STANDARD_VTABLE(E) /* as nothing */ #endif #define NORMAL_VTABLE(E) \ E ## isName2, \ E ## isName3, \ E ## isName4, \ E ## isNmstrt2, \ E ## isNmstrt3, \ E ## isNmstrt4, \ E ## isInvalid2, \ E ## isInvalid3, \ E ## isInvalid4 static int checkCharRefNumber(int); #include "xmltok_impl.h" #include "ascii.h" #ifdef XML_MIN_SIZE #define sb_isNameMin isNever #define sb_isNmstrtMin isNever #endif #ifdef XML_MIN_SIZE #define MINBPC(enc) ((enc)->minBytesPerChar) #else /* minimum bytes per character */ #define MINBPC(enc) 1 #endif #define SB_BYTE_TYPE(enc, p) \ (((struct normal_encoding *)(enc))->type[(unsigned char)*(p)]) #ifdef XML_MIN_SIZE static int sb_byteType(const ENCODING *enc, const char *p) { return SB_BYTE_TYPE(enc, p); } #define BYTE_TYPE(enc, p) \ (((const struct normal_encoding *)(enc))->byteType(enc, p)) #else #define BYTE_TYPE(enc, p) SB_BYTE_TYPE(enc, p) #endif #ifdef XML_MIN_SIZE #define BYTE_TO_ASCII(enc, p) \ (((const struct normal_encoding *)(enc))->byteToAscii(enc, p)) static int sb_byteToAscii(const ENCODING *enc, const char *p) { return *p; } #else #define BYTE_TO_ASCII(enc, p) (*(p)) #endif #define IS_NAME_CHAR(enc, p, n) \ (((const struct normal_encoding *)(enc))->isName ## n(enc, p)) #define IS_NMSTRT_CHAR(enc, p, n) \ (((const struct normal_encoding *)(enc))->isNmstrt ## n(enc, p)) #define IS_INVALID_CHAR(enc, p, n) \ (((const struct normal_encoding *)(enc))->isInvalid ## n(enc, p)) #ifdef XML_MIN_SIZE #define IS_NAME_CHAR_MINBPC(enc, p) \ (((const struct normal_encoding *)(enc))->isNameMin(enc, p)) #define IS_NMSTRT_CHAR_MINBPC(enc, p) \ (((const struct normal_encoding *)(enc))->isNmstrtMin(enc, p)) #else #define IS_NAME_CHAR_MINBPC(enc, p) (0) #define IS_NMSTRT_CHAR_MINBPC(enc, p) (0) #endif #ifdef XML_MIN_SIZE #define CHAR_MATCHES(enc, p, c) \ (((const struct normal_encoding *)(enc))->charMatches(enc, p, c)) static int sb_charMatches(const ENCODING *enc, const char *p, int c) { return *p == c; } #else /* c is an ASCII character */ #define CHAR_MATCHES(enc, p, c) (*(p) == c) #endif #define PREFIX(ident) normal_ ## ident #include "xmltok_impl.c" #undef MINBPC #undef BYTE_TYPE #undef BYTE_TO_ASCII #undef CHAR_MATCHES #undef IS_NAME_CHAR #undef IS_NAME_CHAR_MINBPC #undef IS_NMSTRT_CHAR #undef IS_NMSTRT_CHAR_MINBPC #undef IS_INVALID_CHAR enum { /* UTF8_cvalN is value of masked first byte of N byte sequence */ UTF8_cval1 = 0x00, UTF8_cval2 = 0xc0, UTF8_cval3 = 0xe0, UTF8_cval4 = 0xf0 }; static void utf8_toUtf8(const ENCODING *enc, const char **fromP, const char *fromLim, char **toP, const char *toLim) { char *to; const char *from; if (fromLim - *fromP > toLim - *toP) { /* Avoid copying partial characters. */ for (fromLim = *fromP + (toLim - *toP); fromLim > *fromP; fromLim--) if (((unsigned char)fromLim[-1] & 0xc0) != 0x80) break; } for (to = *toP, from = *fromP; from != fromLim; from++, to++) *to = *from; *fromP = from; *toP = to; } static void utf8_toUtf16(const ENCODING *enc, const char **fromP, const char *fromLim, unsigned short **toP, const unsigned short *toLim) { unsigned short *to = *toP; const char *from = *fromP; while (from != fromLim && to != toLim) { switch (((struct normal_encoding *)enc)->type[(unsigned char)*from]) { case BT_LEAD2: *to++ = ((from[0] & 0x1f) << 6) | (from[1] & 0x3f); from += 2; break; case BT_LEAD3: *to++ = ((from[0] & 0xf) << 12) | ((from[1] & 0x3f) << 6) | (from[2] & 0x3f); from += 3; break; case BT_LEAD4: { unsigned long n; if (to + 1 == toLim) break; n = ((from[0] & 0x7) << 18) | ((from[1] & 0x3f) << 12) | ((from[2] & 0x3f) << 6) | (from[3] & 0x3f); n -= 0x10000; to[0] = (unsigned short)((n >> 10) | 0xD800); to[1] = (unsigned short)((n & 0x3FF) | 0xDC00); to += 2; from += 4; } break; default: *to++ = *from++; break; } } *fromP = from; *toP = to; } #ifdef XML_NS static const struct normal_encoding utf8_encoding_ns = { { VTABLE1, utf8_toUtf8, utf8_toUtf16, 1, 1, 0 }, { #include "asciitab.h" #include "utf8tab.h" }, STANDARD_VTABLE(sb_) NORMAL_VTABLE(utf8_) }; #endif static const struct normal_encoding utf8_encoding = { { VTABLE1, utf8_toUtf8, utf8_toUtf16, 1, 1, 0 }, { #define BT_COLON BT_NMSTRT #include "asciitab.h" #undef BT_COLON #include "utf8tab.h" }, STANDARD_VTABLE(sb_) NORMAL_VTABLE(utf8_) }; #ifdef XML_NS static const struct normal_encoding internal_utf8_encoding_ns = { { VTABLE1, utf8_toUtf8, utf8_toUtf16, 1, 1, 0 }, { #include "iasciitab.h" #include "utf8tab.h" }, STANDARD_VTABLE(sb_) NORMAL_VTABLE(utf8_) }; #endif static const struct normal_encoding internal_utf8_encoding = { { VTABLE1, utf8_toUtf8, utf8_toUtf16, 1, 1, 0 }, { #define BT_COLON BT_NMSTRT #include "iasciitab.h" #undef BT_COLON #include "utf8tab.h" }, STANDARD_VTABLE(sb_) NORMAL_VTABLE(utf8_) }; static void latin1_toUtf8(const ENCODING *enc, const char **fromP, const char *fromLim, char **toP, const char *toLim) { for (;;) { unsigned char c; if (*fromP == fromLim) break; c = (unsigned char)**fromP; if (c & 0x80) { if (toLim - *toP < 2) break; *(*toP)++ = ((c >> 6) | UTF8_cval2); *(*toP)++ = ((c & 0x3f) | 0x80); (*fromP)++; } else { if (*toP == toLim) break; *(*toP)++ = *(*fromP)++; } } } static void latin1_toUtf16(const ENCODING *enc, const char **fromP, const char *fromLim, unsigned short **toP, const unsigned short *toLim) { while (*fromP != fromLim && *toP != toLim) *(*toP)++ = (unsigned char)*(*fromP)++; } #ifdef XML_NS static const struct normal_encoding latin1_encoding_ns = { { VTABLE1, latin1_toUtf8, latin1_toUtf16, 1, 0, 0 }, { #include "asciitab.h" #include "latin1tab.h" }, STANDARD_VTABLE(sb_) }; #endif static const struct normal_encoding latin1_encoding = { { VTABLE1, latin1_toUtf8, latin1_toUtf16, 1, 0, 0 }, { #define BT_COLON BT_NMSTRT #include "asciitab.h" #undef BT_COLON #include "latin1tab.h" }, STANDARD_VTABLE(sb_) }; static void ascii_toUtf8(const ENCODING *enc, const char **fromP, const char *fromLim, char **toP, const char *toLim) { while (*fromP != fromLim && *toP != toLim) *(*toP)++ = *(*fromP)++; } #ifdef XML_NS static const struct normal_encoding ascii_encoding_ns = { { VTABLE1, ascii_toUtf8, latin1_toUtf16, 1, 1, 0 }, { #include "asciitab.h" /* BT_NONXML == 0 */ }, STANDARD_VTABLE(sb_) }; #endif static const struct normal_encoding ascii_encoding = { { VTABLE1, ascii_toUtf8, latin1_toUtf16, 1, 1, 0 }, { #define BT_COLON BT_NMSTRT #include "asciitab.h" #undef BT_COLON /* BT_NONXML == 0 */ }, STANDARD_VTABLE(sb_) }; static int unicode_byte_type(char hi, char lo) { switch ((unsigned char)hi) { case 0xD8: case 0xD9: case 0xDA: case 0xDB: return BT_LEAD4; case 0xDC: case 0xDD: case 0xDE: case 0xDF: return BT_TRAIL; case 0xFF: switch ((unsigned char)lo) { case 0xFF: case 0xFE: return BT_NONXML; } break; } return BT_NONASCII; } #define DEFINE_UTF16_TO_UTF8(E) \ static \ void E ## toUtf8(const ENCODING *enc, \ const char **fromP, const char *fromLim, \ char **toP, const char *toLim) \ { \ const char *from; \ for (from = *fromP; from != fromLim; from += 2) { \ int plane; \ unsigned char lo2; \ unsigned char lo = GET_LO(from); \ unsigned char hi = GET_HI(from); \ switch (hi) { \ case 0: \ if (lo < 0x80) { \ if (*toP == toLim) { \ *fromP = from; \ return; \ } \ *(*toP)++ = lo; \ break; \ } \ /* fall through */ \ case 0x1: case 0x2: case 0x3: \ case 0x4: case 0x5: case 0x6: case 0x7: \ if (toLim - *toP < 2) { \ *fromP = from; \ return; \ } \ *(*toP)++ = ((lo >> 6) | (hi << 2) | UTF8_cval2); \ *(*toP)++ = ((lo & 0x3f) | 0x80); \ break; \ default: \ if (toLim - *toP < 3) { \ *fromP = from; \ return; \ } \ /* 16 bits divided 4, 6, 6 amongst 3 bytes */ \ *(*toP)++ = ((hi >> 4) | UTF8_cval3); \ *(*toP)++ = (((hi & 0xf) << 2) | (lo >> 6) | 0x80); \ *(*toP)++ = ((lo & 0x3f) | 0x80); \ break; \ case 0xD8: case 0xD9: case 0xDA: case 0xDB: \ if (toLim - *toP < 4) { \ *fromP = from; \ return; \ } \ plane = (((hi & 0x3) << 2) | ((lo >> 6) & 0x3)) + 1; \ *(*toP)++ = ((plane >> 2) | UTF8_cval4); \ *(*toP)++ = (((lo >> 2) & 0xF) | ((plane & 0x3) << 4) | 0x80); \ from += 2; \ lo2 = GET_LO(from); \ *(*toP)++ = (((lo & 0x3) << 4) \ | ((GET_HI(from) & 0x3) << 2) \ | (lo2 >> 6) \ | 0x80); \ *(*toP)++ = ((lo2 & 0x3f) | 0x80); \ break; \ } \ } \ *fromP = from; \ } #define DEFINE_UTF16_TO_UTF16(E) \ static \ void E ## toUtf16(const ENCODING *enc, \ const char **fromP, const char *fromLim, \ unsigned short **toP, const unsigned short *toLim) \ { \ /* Avoid copying first half only of surrogate */ \ if (fromLim - *fromP > ((toLim - *toP) << 1) \ && (GET_HI(fromLim - 2) & 0xF8) == 0xD8) \ fromLim -= 2; \ for (; *fromP != fromLim && *toP != toLim; *fromP += 2) \ *(*toP)++ = (GET_HI(*fromP) << 8) | GET_LO(*fromP); \ } #define SET2(ptr, ch) \ (((ptr)[0] = ((ch) & 0xff)), ((ptr)[1] = ((ch) >> 8))) #define GET_LO(ptr) ((unsigned char)(ptr)[0]) #define GET_HI(ptr) ((unsigned char)(ptr)[1]) DEFINE_UTF16_TO_UTF8(little2_) DEFINE_UTF16_TO_UTF16(little2_) #undef SET2 #undef GET_LO #undef GET_HI #define SET2(ptr, ch) \ (((ptr)[0] = ((ch) >> 8)), ((ptr)[1] = ((ch) & 0xFF))) #define GET_LO(ptr) ((unsigned char)(ptr)[1]) #define GET_HI(ptr) ((unsigned char)(ptr)[0]) DEFINE_UTF16_TO_UTF8(big2_) DEFINE_UTF16_TO_UTF16(big2_) #undef SET2 #undef GET_LO #undef GET_HI #define LITTLE2_BYTE_TYPE(enc, p) \ ((p)[1] == 0 \ ? ((struct normal_encoding *)(enc))->type[(unsigned char)*(p)] \ : unicode_byte_type((p)[1], (p)[0])) #define LITTLE2_BYTE_TO_ASCII(enc, p) ((p)[1] == 0 ? (p)[0] : -1) #define LITTLE2_CHAR_MATCHES(enc, p, c) ((p)[1] == 0 && (p)[0] == c) #define LITTLE2_IS_NAME_CHAR_MINBPC(enc, p) \ UCS2_GET_NAMING(namePages, (unsigned char)p[1], (unsigned char)p[0]) #define LITTLE2_IS_NMSTRT_CHAR_MINBPC(enc, p) \ UCS2_GET_NAMING(nmstrtPages, (unsigned char)p[1], (unsigned char)p[0]) #ifdef XML_MIN_SIZE static int little2_byteType(const ENCODING *enc, const char *p) { return LITTLE2_BYTE_TYPE(enc, p); } static int little2_byteToAscii(const ENCODING *enc, const char *p) { return LITTLE2_BYTE_TO_ASCII(enc, p); } static int little2_charMatches(const ENCODING *enc, const char *p, int c) { return LITTLE2_CHAR_MATCHES(enc, p, c); } static int little2_isNameMin(const ENCODING *enc, const char *p) { return LITTLE2_IS_NAME_CHAR_MINBPC(enc, p); } static int little2_isNmstrtMin(const ENCODING *enc, const char *p) { return LITTLE2_IS_NMSTRT_CHAR_MINBPC(enc, p); } #undef VTABLE #define VTABLE VTABLE1, little2_toUtf8, little2_toUtf16 #else /* not XML_MIN_SIZE */ #undef PREFIX #define PREFIX(ident) little2_ ## ident #define MINBPC(enc) 2 /* CHAR_MATCHES is guaranteed to have MINBPC bytes available. */ #define BYTE_TYPE(enc, p) LITTLE2_BYTE_TYPE(enc, p) #define BYTE_TO_ASCII(enc, p) LITTLE2_BYTE_TO_ASCII(enc, p) #define CHAR_MATCHES(enc, p, c) LITTLE2_CHAR_MATCHES(enc, p, c) #define IS_NAME_CHAR(enc, p, n) 0 #define IS_NAME_CHAR_MINBPC(enc, p) LITTLE2_IS_NAME_CHAR_MINBPC(enc, p) #define IS_NMSTRT_CHAR(enc, p, n) (0) #define IS_NMSTRT_CHAR_MINBPC(enc, p) LITTLE2_IS_NMSTRT_CHAR_MINBPC(enc, p) #include "xmltok_impl.c" #undef MINBPC #undef BYTE_TYPE #undef BYTE_TO_ASCII #undef CHAR_MATCHES #undef IS_NAME_CHAR #undef IS_NAME_CHAR_MINBPC #undef IS_NMSTRT_CHAR #undef IS_NMSTRT_CHAR_MINBPC #undef IS_INVALID_CHAR #endif /* not XML_MIN_SIZE */ #ifdef XML_NS static const struct normal_encoding little2_encoding_ns = { { VTABLE, 2, 0, #if XML_BYTE_ORDER == 12 1 #else 0 #endif }, { #include "asciitab.h" #include "latin1tab.h" }, STANDARD_VTABLE(little2_) }; #endif static const struct normal_encoding little2_encoding = { { VTABLE, 2, 0, #if XML_BYTE_ORDER == 12 1 #else 0 #endif }, { #define BT_COLON BT_NMSTRT #include "asciitab.h" #undef BT_COLON #include "latin1tab.h" }, STANDARD_VTABLE(little2_) }; #if XML_BYTE_ORDER != 21 #ifdef XML_NS static const struct normal_encoding internal_little2_encoding_ns = { { VTABLE, 2, 0, 1 }, { #include "iasciitab.h" #include "latin1tab.h" }, STANDARD_VTABLE(little2_) }; #endif static const struct normal_encoding internal_little2_encoding = { { VTABLE, 2, 0, 1 }, { #define BT_COLON BT_NMSTRT #include "iasciitab.h" #undef BT_COLON #include "latin1tab.h" }, STANDARD_VTABLE(little2_) }; #endif #define BIG2_BYTE_TYPE(enc, p) \ ((p)[0] == 0 \ ? ((struct normal_encoding *)(enc))->type[(unsigned char)(p)[1]] \ : unicode_byte_type((p)[0], (p)[1])) #define BIG2_BYTE_TO_ASCII(enc, p) ((p)[0] == 0 ? (p)[1] : -1) #define BIG2_CHAR_MATCHES(enc, p, c) ((p)[0] == 0 && (p)[1] == c) #define BIG2_IS_NAME_CHAR_MINBPC(enc, p) \ UCS2_GET_NAMING(namePages, (unsigned char)p[0], (unsigned char)p[1]) #define BIG2_IS_NMSTRT_CHAR_MINBPC(enc, p) \ UCS2_GET_NAMING(nmstrtPages, (unsigned char)p[0], (unsigned char)p[1]) #ifdef XML_MIN_SIZE static int big2_byteType(const ENCODING *enc, const char *p) { return BIG2_BYTE_TYPE(enc, p); } static int big2_byteToAscii(const ENCODING *enc, const char *p) { return BIG2_BYTE_TO_ASCII(enc, p); } static int big2_charMatches(const ENCODING *enc, const char *p, int c) { return BIG2_CHAR_MATCHES(enc, p, c); } static int big2_isNameMin(const ENCODING *enc, const char *p) { return BIG2_IS_NAME_CHAR_MINBPC(enc, p); } static int big2_isNmstrtMin(const ENCODING *enc, const char *p) { return BIG2_IS_NMSTRT_CHAR_MINBPC(enc, p); } #undef VTABLE #define VTABLE VTABLE1, big2_toUtf8, big2_toUtf16 #else /* not XML_MIN_SIZE */ #undef PREFIX #define PREFIX(ident) big2_ ## ident #define MINBPC(enc) 2 /* CHAR_MATCHES is guaranteed to have MINBPC bytes available. */ #define BYTE_TYPE(enc, p) BIG2_BYTE_TYPE(enc, p) #define BYTE_TO_ASCII(enc, p) BIG2_BYTE_TO_ASCII(enc, p) #define CHAR_MATCHES(enc, p, c) BIG2_CHAR_MATCHES(enc, p, c) #define IS_NAME_CHAR(enc, p, n) 0 #define IS_NAME_CHAR_MINBPC(enc, p) BIG2_IS_NAME_CHAR_MINBPC(enc, p) #define IS_NMSTRT_CHAR(enc, p, n) (0) #define IS_NMSTRT_CHAR_MINBPC(enc, p) BIG2_IS_NMSTRT_CHAR_MINBPC(enc, p) #include "xmltok_impl.c" #undef MINBPC #undef BYTE_TYPE #undef BYTE_TO_ASCII #undef CHAR_MATCHES #undef IS_NAME_CHAR #undef IS_NAME_CHAR_MINBPC #undef IS_NMSTRT_CHAR #undef IS_NMSTRT_CHAR_MINBPC #undef IS_INVALID_CHAR #endif /* not XML_MIN_SIZE */ #ifdef XML_NS static const struct normal_encoding big2_encoding_ns = { { VTABLE, 2, 0, #if XML_BYTE_ORDER == 21 1 #else 0 #endif }, { #include "asciitab.h" #include "latin1tab.h" }, STANDARD_VTABLE(big2_) }; #endif static const struct normal_encoding big2_encoding = { { VTABLE, 2, 0, #if XML_BYTE_ORDER == 21 1 #else 0 #endif }, { #define BT_COLON BT_NMSTRT #include "asciitab.h" #undef BT_COLON #include "latin1tab.h" }, STANDARD_VTABLE(big2_) }; #if XML_BYTE_ORDER != 12 #ifdef XML_NS static const struct normal_encoding internal_big2_encoding_ns = { { VTABLE, 2, 0, 1 }, { #include "iasciitab.h" #include "latin1tab.h" }, STANDARD_VTABLE(big2_) }; #endif static const struct normal_encoding internal_big2_encoding = { { VTABLE, 2, 0, 1 }, { #define BT_COLON BT_NMSTRT #include "iasciitab.h" #undef BT_COLON #include "latin1tab.h" }, STANDARD_VTABLE(big2_) }; #endif #undef PREFIX static int streqci(const char *s1, const char *s2) { for (;;) { char c1 = *s1++; char c2 = *s2++; if (ASCII_a <= c1 && c1 <= ASCII_z) c1 += ASCII_A - ASCII_a; if (ASCII_a <= c2 && c2 <= ASCII_z) c2 += ASCII_A - ASCII_a; if (c1 != c2) return 0; if (!c1) break; } return 1; } static void initUpdatePosition(const ENCODING *enc, const char *ptr, const char *end, POSITION *pos) { normal_updatePosition(&utf8_encoding.enc, ptr, end, pos); } static int toAscii(const ENCODING *enc, const char *ptr, const char *end) { char buf[1]; char *p = buf; XmlUtf8Convert(enc, &ptr, end, &p, p + 1); if (p == buf) return -1; else return buf[0]; } static int isSpace(int c) { switch (c) { case 0x20: case 0xD: case 0xA: case 0x9: return 1; } return 0; } /* Return 1 if there's just optional white space or there's an S followed by name=val. */ static int parsePseudoAttribute(const ENCODING *enc, const char *ptr, const char *end, const char **namePtr, const char **nameEndPtr, const char **valPtr, const char **nextTokPtr) { int c; char open; if (ptr == end) { *namePtr = 0; return 1; } if (!isSpace(toAscii(enc, ptr, end))) { *nextTokPtr = ptr; return 0; } do { ptr += enc->minBytesPerChar; } while (isSpace(toAscii(enc, ptr, end))); if (ptr == end) { *namePtr = 0; return 1; } *namePtr = ptr; for (;;) { c = toAscii(enc, ptr, end); if (c == -1) { *nextTokPtr = ptr; return 0; } if (c == ASCII_EQUALS) { *nameEndPtr = ptr; break; } if (isSpace(c)) { *nameEndPtr = ptr; do { ptr += enc->minBytesPerChar; } while (isSpace(c = toAscii(enc, ptr, end))); if (c != ASCII_EQUALS) { *nextTokPtr = ptr; return 0; } break; } ptr += enc->minBytesPerChar; } if (ptr == *namePtr) { *nextTokPtr = ptr; return 0; } ptr += enc->minBytesPerChar; c = toAscii(enc, ptr, end); while (isSpace(c)) { ptr += enc->minBytesPerChar; c = toAscii(enc, ptr, end); } if (c != ASCII_QUOT && c != ASCII_APOS) { *nextTokPtr = ptr; return 0; } open = c; ptr += enc->minBytesPerChar; *valPtr = ptr; for (;; ptr += enc->minBytesPerChar) { c = toAscii(enc, ptr, end); if (c == open) break; if (!(ASCII_a <= c && c <= ASCII_z) && !(ASCII_A <= c && c <= ASCII_Z) && !(ASCII_0 <= c && c <= ASCII_9) && c != ASCII_PERIOD && c != ASCII_MINUS && c != ASCII_UNDERSCORE) { *nextTokPtr = ptr; return 0; } } *nextTokPtr = ptr + enc->minBytesPerChar; return 1; } static const char KW_version[] = { ASCII_v, ASCII_e, ASCII_r, ASCII_s, ASCII_i, ASCII_o, ASCII_n, '\0' }; static const char KW_encoding[] = { ASCII_e, ASCII_n, ASCII_c, ASCII_o, ASCII_d, ASCII_i, ASCII_n, ASCII_g, '\0' }; static const char KW_standalone[] = { ASCII_s, ASCII_t, ASCII_a, ASCII_n, ASCII_d, ASCII_a, ASCII_l, ASCII_o, ASCII_n, ASCII_e, '\0' }; static const char KW_yes[] = { ASCII_y, ASCII_e, ASCII_s, '\0' }; static const char KW_no[] = { ASCII_n, ASCII_o, '\0' }; static int doParseXmlDecl(const ENCODING *(*encodingFinder)(const ENCODING *, const char *, const char *), int isGeneralTextEntity, const ENCODING *enc, const char *ptr, const char *end, const char **badPtr, const char **versionPtr, const char **encodingName, const ENCODING **encoding, int *standalone) { const char *val = 0; const char *name = 0; const char *nameEnd = 0; ptr += 5 * enc->minBytesPerChar; end -= 2 * enc->minBytesPerChar; if (!parsePseudoAttribute(enc, ptr, end, &name, &nameEnd, &val, &ptr) || !name) { *badPtr = ptr; return 0; } if (!XmlNameMatchesAscii(enc, name, nameEnd, KW_version)) { if (!isGeneralTextEntity) { *badPtr = name; return 0; } } else { if (versionPtr) *versionPtr = val; if (!parsePseudoAttribute(enc, ptr, end, &name, &nameEnd, &val, &ptr)) { *badPtr = ptr; return 0; } if (!name) { if (isGeneralTextEntity) { /* a TextDecl must have an EncodingDecl */ *badPtr = ptr; return 0; } return 1; } } if (XmlNameMatchesAscii(enc, name, nameEnd, KW_encoding)) { int c = toAscii(enc, val, end); if (!(ASCII_a <= c && c <= ASCII_z) && !(ASCII_A <= c && c <= ASCII_Z)) { *badPtr = val; return 0; } if (encodingName) *encodingName = val; if (encoding) *encoding = encodingFinder(enc, val, ptr - enc->minBytesPerChar); if (!parsePseudoAttribute(enc, ptr, end, &name, &nameEnd, &val, &ptr)) { *badPtr = ptr; return 0; } if (!name) return 1; } if (!XmlNameMatchesAscii(enc, name, nameEnd, KW_standalone) || isGeneralTextEntity) { *badPtr = name; return 0; } if (XmlNameMatchesAscii(enc, val, ptr - enc->minBytesPerChar, KW_yes)) { if (standalone) *standalone = 1; } else if (XmlNameMatchesAscii(enc, val, ptr - enc->minBytesPerChar, KW_no)) { if (standalone) *standalone = 0; } else { *badPtr = val; return 0; } while (isSpace(toAscii(enc, ptr, end))) ptr += enc->minBytesPerChar; if (ptr != end) { *badPtr = ptr; return 0; } return 1; } static int checkCharRefNumber(int result) { switch (result >> 8) { case 0xD8: case 0xD9: case 0xDA: case 0xDB: case 0xDC: case 0xDD: case 0xDE: case 0xDF: return -1; case 0: if (latin1_encoding.type[result] == BT_NONXML) return -1; break; case 0xFF: if (result == 0xFFFE || result == 0xFFFF) return -1; break; } return result; } int XmlUtf8Encode(int c, char *buf) { enum { /* minN is minimum legal resulting value for N byte sequence */ min2 = 0x80, min3 = 0x800, min4 = 0x10000 }; if (c < 0) return 0; if (c < min2) { buf[0] = (c | UTF8_cval1); return 1; } if (c < min3) { buf[0] = ((c >> 6) | UTF8_cval2); buf[1] = ((c & 0x3f) | 0x80); return 2; } if (c < min4) { buf[0] = ((c >> 12) | UTF8_cval3); buf[1] = (((c >> 6) & 0x3f) | 0x80); buf[2] = ((c & 0x3f) | 0x80); return 3; } if (c < 0x110000) { buf[0] = ((c >> 18) | UTF8_cval4); buf[1] = (((c >> 12) & 0x3f) | 0x80); buf[2] = (((c >> 6) & 0x3f) | 0x80); buf[3] = ((c & 0x3f) | 0x80); return 4; } return 0; } int XmlUtf16Encode(int charNum, unsigned short *buf) { if (charNum < 0) return 0; if (charNum < 0x10000) { buf[0] = charNum; return 1; } if (charNum < 0x110000) { charNum -= 0x10000; buf[0] = (charNum >> 10) + 0xD800; buf[1] = (charNum & 0x3FF) + 0xDC00; return 2; } return 0; } struct unknown_encoding { struct normal_encoding normal; int (*convert)(void *userData, const char *p); void *userData; unsigned short utf16[256]; char utf8[256][4]; }; int XmlSizeOfUnknownEncoding(void) { return sizeof(struct unknown_encoding); } static int unknown_isName(const ENCODING *enc, const char *p) { int c = ((const struct unknown_encoding *)enc) ->convert(((const struct unknown_encoding *)enc)->userData, p); if (c & ~0xFFFF) return 0; return UCS2_GET_NAMING(namePages, c >> 8, c & 0xFF); } static int unknown_isNmstrt(const ENCODING *enc, const char *p) { int c = ((const struct unknown_encoding *)enc) ->convert(((const struct unknown_encoding *)enc)->userData, p); if (c & ~0xFFFF) return 0; return UCS2_GET_NAMING(nmstrtPages, c >> 8, c & 0xFF); } static int unknown_isInvalid(const ENCODING *enc, const char *p) { int c = ((const struct unknown_encoding *)enc) ->convert(((const struct unknown_encoding *)enc)->userData, p); return (c & ~0xFFFF) || checkCharRefNumber(c) < 0; } static void unknown_toUtf8(const ENCODING *enc, const char **fromP, const char *fromLim, char **toP, const char *toLim) { char buf[XML_UTF8_ENCODE_MAX]; for (;;) { const char *utf8; int n; if (*fromP == fromLim) break; utf8 = ((const struct unknown_encoding *)enc)->utf8[(unsigned char)**fromP]; n = *utf8++; if (n == 0) { int c = ((const struct unknown_encoding *)enc) ->convert(((const struct unknown_encoding *)enc)->userData, *fromP); n = XmlUtf8Encode(c, buf); if (n > toLim - *toP) break; utf8 = buf; *fromP += ((const struct normal_encoding *)enc)->type[(unsigned char)**fromP] - (BT_LEAD2 - 2); } else { if (n > toLim - *toP) break; (*fromP)++; } do { *(*toP)++ = *utf8++; } while (--n != 0); } } static void unknown_toUtf16(const ENCODING *enc, const char **fromP, const char *fromLim, unsigned short **toP, const unsigned short *toLim) { while (*fromP != fromLim && *toP != toLim) { unsigned short c = ((const struct unknown_encoding *)enc)->utf16[(unsigned char)**fromP]; if (c == 0) { c = (unsigned short)((const struct unknown_encoding *)enc) ->convert(((const struct unknown_encoding *)enc)->userData, *fromP); *fromP += ((const struct normal_encoding *)enc)->type[(unsigned char)**fromP] - (BT_LEAD2 - 2); } else (*fromP)++; *(*toP)++ = c; } } ENCODING * XmlInitUnknownEncoding(void *mem, int *table, int (*convert)(void *userData, const char *p), void *userData) { int i; struct unknown_encoding *e = mem; for (i = 0; i < (int)sizeof(struct normal_encoding); i++) ((char *)mem)[i] = ((char *)&latin1_encoding)[i]; for (i = 0; i < 128; i++) if (latin1_encoding.type[i] != BT_OTHER && latin1_encoding.type[i] != BT_NONXML && table[i] != i) return 0; for (i = 0; i < 256; i++) { int c = table[i]; if (c == -1) { e->normal.type[i] = BT_MALFORM; /* This shouldn't really get used. */ e->utf16[i] = 0xFFFF; e->utf8[i][0] = 1; e->utf8[i][1] = 0; } else if (c < 0) { if (c < -4) return 0; e->normal.type[i] = BT_LEAD2 - (c + 2); e->utf8[i][0] = 0; e->utf16[i] = 0; } else if (c < 0x80) { if (latin1_encoding.type[c] != BT_OTHER && latin1_encoding.type[c] != BT_NONXML && c != i) return 0; e->normal.type[i] = latin1_encoding.type[c]; e->utf8[i][0] = 1; e->utf8[i][1] = (char)c; e->utf16[i] = c == 0 ? 0xFFFF : c; } else if (checkCharRefNumber(c) < 0) { e->normal.type[i] = BT_NONXML; /* This shouldn't really get used. */ e->utf16[i] = 0xFFFF; e->utf8[i][0] = 1; e->utf8[i][1] = 0; } else { if (c > 0xFFFF) return 0; if (UCS2_GET_NAMING(nmstrtPages, c >> 8, c & 0xff)) e->normal.type[i] = BT_NMSTRT; else if (UCS2_GET_NAMING(namePages, c >> 8, c & 0xff)) e->normal.type[i] = BT_NAME; else e->normal.type[i] = BT_OTHER; e->utf8[i][0] = (char)XmlUtf8Encode(c, e->utf8[i] + 1); e->utf16[i] = c; } } e->userData = userData; e->convert = convert; if (convert) { e->normal.isName2 = unknown_isName; e->normal.isName3 = unknown_isName; e->normal.isName4 = unknown_isName; e->normal.isNmstrt2 = unknown_isNmstrt; e->normal.isNmstrt3 = unknown_isNmstrt; e->normal.isNmstrt4 = unknown_isNmstrt; e->normal.isInvalid2 = unknown_isInvalid; e->normal.isInvalid3 = unknown_isInvalid; e->normal.isInvalid4 = unknown_isInvalid; } e->normal.enc.utf8Convert = unknown_toUtf8; e->normal.enc.utf16Convert = unknown_toUtf16; return &(e->normal.enc); } /* If this enumeration is changed, getEncodingIndex and encodings must also be changed. */ enum { UNKNOWN_ENC = -1, ISO_8859_1_ENC = 0, US_ASCII_ENC, UTF_8_ENC, UTF_16_ENC, UTF_16BE_ENC, UTF_16LE_ENC, /* must match encodingNames up to here */ NO_ENC }; static const char KW_ISO_8859_1[] = { ASCII_I, ASCII_S, ASCII_O, ASCII_MINUS, ASCII_8, ASCII_8, ASCII_5, ASCII_9, ASCII_MINUS, ASCII_1, '\0' }; static const char KW_US_ASCII[] = { ASCII_U, ASCII_S, ASCII_MINUS, ASCII_A, ASCII_S, ASCII_C, ASCII_I, ASCII_I, '\0' }; static const char KW_UTF_8[] = { ASCII_U, ASCII_T, ASCII_F, ASCII_MINUS, ASCII_8, '\0' }; static const char KW_UTF_16[] = { ASCII_U, ASCII_T, ASCII_F, ASCII_MINUS, ASCII_1, ASCII_6, '\0' }; static const char KW_UTF_16BE[] = { ASCII_U, ASCII_T, ASCII_F, ASCII_MINUS, ASCII_1, ASCII_6, ASCII_B, ASCII_E, '\0' }; static const char KW_UTF_16LE[] = { ASCII_U, ASCII_T, ASCII_F, ASCII_MINUS, ASCII_1, ASCII_6, ASCII_L, ASCII_E, '\0' }; static int getEncodingIndex(const char *name) { static const char *encodingNames[] = { KW_ISO_8859_1, KW_US_ASCII, KW_UTF_8, KW_UTF_16, KW_UTF_16BE, KW_UTF_16LE, }; int i; if (name == 0) return NO_ENC; for (i = 0; i < (int)(sizeof(encodingNames)/sizeof(encodingNames[0])); i++) if (streqci(name, encodingNames[i])) return i; return UNKNOWN_ENC; } /* For binary compatibility, we store the index of the encoding specified at initialization in the isUtf16 member. */ #define INIT_ENC_INDEX(enc) ((int)(enc)->initEnc.isUtf16) #define SET_INIT_ENC_INDEX(enc, i) ((enc)->initEnc.isUtf16 = (char)i) /* This is what detects the encoding. encodingTable maps from encoding indices to encodings; INIT_ENC_INDEX(enc) is the index of the external (protocol) specified encoding; state is XML_CONTENT_STATE if we're parsing an external text entity, and XML_PROLOG_STATE otherwise. */ static int initScan(const ENCODING **encodingTable, const INIT_ENCODING *enc, int state, const char *ptr, const char *end, const char **nextTokPtr) { const ENCODING **encPtr; if (ptr == end) return XML_TOK_NONE; encPtr = enc->encPtr; if (ptr + 1 == end) { /* only a single byte available for auto-detection */ #ifndef XML_DTD /* FIXME */ /* a well-formed document entity must have more than one byte */ if (state != XML_CONTENT_STATE) return XML_TOK_PARTIAL; #endif /* so we're parsing an external text entity... */ /* if UTF-16 was externally specified, then we need at least 2 bytes */ switch (INIT_ENC_INDEX(enc)) { case UTF_16_ENC: case UTF_16LE_ENC: case UTF_16BE_ENC: return XML_TOK_PARTIAL; } switch ((unsigned char)*ptr) { case 0xFE: case 0xFF: case 0xEF: /* possibly first byte of UTF-8 BOM */ if (INIT_ENC_INDEX(enc) == ISO_8859_1_ENC && state == XML_CONTENT_STATE) break; /* fall through */ case 0x00: case 0x3C: return XML_TOK_PARTIAL; } } else { switch (((unsigned char)ptr[0] << 8) | (unsigned char)ptr[1]) { case 0xFEFF: if (INIT_ENC_INDEX(enc) == ISO_8859_1_ENC && state == XML_CONTENT_STATE) break; *nextTokPtr = ptr + 2; *encPtr = encodingTable[UTF_16BE_ENC]; return XML_TOK_BOM; /* 00 3C is handled in the default case */ case 0x3C00: if ((INIT_ENC_INDEX(enc) == UTF_16BE_ENC || INIT_ENC_INDEX(enc) == UTF_16_ENC) && state == XML_CONTENT_STATE) break; *encPtr = encodingTable[UTF_16LE_ENC]; return XmlTok(*encPtr, state, ptr, end, nextTokPtr); case 0xFFFE: if (INIT_ENC_INDEX(enc) == ISO_8859_1_ENC && state == XML_CONTENT_STATE) break; *nextTokPtr = ptr + 2; *encPtr = encodingTable[UTF_16LE_ENC]; return XML_TOK_BOM; case 0xEFBB: /* Maybe a UTF-8 BOM (EF BB BF) */ /* If there's an explicitly specified (external) encoding of ISO-8859-1 or some flavour of UTF-16 and this is an external text entity, don't look for the BOM, because it might be a legal data. */ if (state == XML_CONTENT_STATE) { int e = INIT_ENC_INDEX(enc); if (e == ISO_8859_1_ENC || e == UTF_16BE_ENC || e == UTF_16LE_ENC || e == UTF_16_ENC) break; } if (ptr + 2 == end) return XML_TOK_PARTIAL; if ((unsigned char)ptr[2] == 0xBF) { *encPtr = encodingTable[UTF_8_ENC]; return XML_TOK_BOM; } break; default: if (ptr[0] == '\0') { /* 0 isn't a legal data character. Furthermore a document entity can only start with ASCII characters. So the only way this can fail to be big-endian UTF-16 if it it's an external parsed general entity that's labelled as UTF-16LE. */ if (state == XML_CONTENT_STATE && INIT_ENC_INDEX(enc) == UTF_16LE_ENC) break; *encPtr = encodingTable[UTF_16BE_ENC]; return XmlTok(*encPtr, state, ptr, end, nextTokPtr); } else if (ptr[1] == '\0') { /* We could recover here in the case: - parsing an external entity - second byte is 0 - no externally specified encoding - no encoding declaration by assuming UTF-16LE. But we don't, because this would mean when presented just with a single byte, we couldn't reliably determine whether we needed further bytes. */ if (state == XML_CONTENT_STATE) break; *encPtr = encodingTable[UTF_16LE_ENC]; return XmlTok(*encPtr, state, ptr, end, nextTokPtr); } break; } } *encPtr = encodingTable[INIT_ENC_INDEX(enc)]; return XmlTok(*encPtr, state, ptr, end, nextTokPtr); } #define NS(x) x #define ns(x) x #include "xmltok_ns.c" #undef NS #undef ns #ifdef XML_NS #define NS(x) x ## NS #define ns(x) x ## _ns #include "xmltok_ns.c" #undef NS #undef ns ENCODING * XmlInitUnknownEncodingNS(void *mem, int *table, int (*convert)(void *userData, const char *p), void *userData) { ENCODING *enc = XmlInitUnknownEncoding(mem, table, convert, userData); if (enc) ((struct normal_encoding *)enc)->type[ASCII_COLON] = BT_COLON; return enc; } #endif /* XML_NS */ swish-e-2.4.7/src/expat/xmlparse/0000777000077100017500000000000011166013172013653 500000000000000swish-e-2.4.7/src/expat/xmlparse/xmlparse.dsp0000775000077100017500000001531611166010106016137 00000000000000# Microsoft Developer Studio Project File - Name="xmlparse" - Package Owner=<4> # Microsoft Developer Studio Generated Build File, Format Version 6.00 # ** DO NOT EDIT ** # TARGTYPE "Win32 (x86) Dynamic-Link Library" 0x0102 CFG=xmlparse - Win32 Release !MESSAGE This is not a valid makefile. To build this project using NMAKE, !MESSAGE use the Export Makefile command and run !MESSAGE !MESSAGE NMAKE /f "xmlparse.mak". !MESSAGE !MESSAGE You can specify a configuration when running NMAKE !MESSAGE by defining the macro CFG on the command line. For example: !MESSAGE !MESSAGE NMAKE /f "xmlparse.mak" CFG="xmlparse - Win32 Release" !MESSAGE !MESSAGE Possible choices for configuration are: !MESSAGE !MESSAGE "xmlparse - Win32 Release" (based on "Win32 (x86) Dynamic-Link Library") !MESSAGE "xmlparse - Win32 Debug" (based on "Win32 (x86) Dynamic-Link Library") !MESSAGE "xmlparse - Win32 MinSize" (based on "Win32 (x86) Dynamic-Link Library") !MESSAGE # Begin Project # PROP AllowPerConfigDependencies 0 # PROP Scc_ProjName "" # PROP Scc_LocalPath "" CPP=cl.exe MTL=midl.exe RSC=rc.exe !IF "$(CFG)" == "xmlparse - Win32 Release" # PROP BASE Use_MFC 0 # PROP BASE Use_Debug_Libraries 0 # PROP BASE Output_Dir ".\Release" # PROP BASE Intermediate_Dir ".\Release" # PROP BASE Target_Dir "." # PROP Use_MFC 0 # PROP Use_Debug_Libraries 0 # PROP Output_Dir ".\Release" # PROP Intermediate_Dir ".\Release" # PROP Ignore_Export_Lib 0 # PROP Target_Dir "." # ADD BASE CPP /nologo /MT /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_WINDOWS" /YX /c # ADD CPP /nologo /W3 /GX /O2 /I "..\xmltok" /I "..\xmlwf" /D XMLTOKAPI=__declspec(dllimport) /D XMLPARSEAPI=__declspec(dllexport) /D "NDEBUG" /D "XML_NS" /D "WIN32" /D "_WINDOWS" /D "XML_DTD" /YX /FD /c # ADD BASE MTL /nologo /D "NDEBUG" /win32 # ADD MTL /nologo /D "NDEBUG" /mktyplib203 /win32 # ADD BASE RSC /l 0x809 /d "NDEBUG" # ADD RSC /l 0x809 /d "NDEBUG" BSC32=bscmake.exe # ADD BASE BSC32 /nologo # ADD BSC32 /nologo LINK32=link.exe # ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /dll /machine:I386 # ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /base:"0x20000000" /subsystem:windows /dll /machine:I386 /out:"..\bin\xmlparse.dll" /link50compat # SUBTRACT LINK32 /pdb:none !ELSEIF "$(CFG)" == "xmlparse - Win32 Debug" # PROP BASE Use_MFC 0 # PROP BASE Use_Debug_Libraries 1 # PROP BASE Output_Dir ".\Debug" # PROP BASE Intermediate_Dir ".\Debug" # PROP BASE Target_Dir "." # PROP Use_MFC 0 # PROP Use_Debug_Libraries 1 # PROP Output_Dir ".\Debug" # PROP Intermediate_Dir ".\Debug" # PROP Ignore_Export_Lib 0 # PROP Target_Dir "." # ADD BASE CPP /nologo /MTd /W3 /Gm /GX /Zi /Od /D "WIN32" /D "_DEBUG" /D "_WINDOWS" /YX /c # ADD CPP /nologo /MDd /W3 /Gm /GX /ZI /Od /I "..\xmltok" /I "..\xmlwf" /D "_DEBUG" /D XMLTOKAPI=__declspec(dllimport) /D XMLPARSEAPI=__declspec(dllexport) /D "WIN32" /D "_WINDOWS" /D "XML_DTD" /YX /FD /c # ADD BASE MTL /nologo /D "_DEBUG" /win32 # ADD MTL /nologo /D "_DEBUG" /mktyplib203 /win32 # ADD BASE RSC /l 0x809 /d "_DEBUG" # ADD RSC /l 0x809 /d "_DEBUG" BSC32=bscmake.exe # ADD BASE BSC32 /nologo # ADD BSC32 /nologo LINK32=link.exe # ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /dll /debug /machine:I386 # ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /base:"0x20000000" /subsystem:windows /dll /debug /machine:I386 /out:"..\dbgbin\xmlparse.dll" !ELSEIF "$(CFG)" == "xmlparse - Win32 MinSize" # PROP BASE Use_MFC 0 # PROP BASE Use_Debug_Libraries 0 # PROP BASE Output_Dir "MinSize" # PROP BASE Intermediate_Dir "MinSize" # PROP BASE Ignore_Export_Lib 0 # PROP BASE Target_Dir "" # PROP Use_MFC 0 # PROP Use_Debug_Libraries 0 # PROP Output_Dir "MinSize" # PROP Intermediate_Dir "MinSize" # PROP Ignore_Export_Lib 0 # PROP Target_Dir "" # ADD BASE CPP /nologo /MD /W3 /GX /O2 /I "..\xmltok" /I "..\xmlwf" /D XMLTOKAPI=__declspec(dllimport) /D XMLPARSEAPI=__declspec(dllexport) /D "NDEBUG" /D "WIN32" /D "_WINDOWS" /D "XML_NS" /YX /FD /c # ADD CPP /nologo /W3 /GX /O1 /I "..\xmltok" /I "..\xmlwf" /D "XML_MIN_SIZE" /D "XML_WINLIB" /D XMLPARSEAPI=__declspec(dllexport) /D "NDEBUG" /D "WIN32" /D "_WINDOWS" /YX /FD /c # ADD BASE MTL /nologo /D "NDEBUG" /mktyplib203 /win32 # ADD MTL /nologo /D "NDEBUG" /mktyplib203 /win32 # ADD BASE RSC /l 0x809 /d "NDEBUG" # ADD RSC /l 0x809 /d "NDEBUG" BSC32=bscmake.exe # ADD BASE BSC32 /nologo # ADD BSC32 /nologo LINK32=link.exe # ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /base:"0x20000000" /subsystem:windows /dll /machine:I386 /out:"..\bin\xmlparse.dll" # SUBTRACT BASE LINK32 /profile # ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /base:"0x20000000" /entry:"DllMain" /subsystem:windows /dll /machine:I386 /out:"..\bin\xmlparse.dll" # SUBTRACT LINK32 /profile /nodefaultlib !ENDIF # Begin Target # Name "xmlparse - Win32 Release" # Name "xmlparse - Win32 Debug" # Name "xmlparse - Win32 MinSize" # Begin Group "Source Files" # PROP Default_Filter "cpp;c;cxx;rc;def;r;odl;idl;hpj;bat;for;f90" # Begin Source File SOURCE=..\xmltok\dllmain.c !IF "$(CFG)" == "xmlparse - Win32 Release" # PROP Exclude_From_Build 1 !ELSEIF "$(CFG)" == "xmlparse - Win32 Debug" # PROP Exclude_From_Build 1 !ELSEIF "$(CFG)" == "xmlparse - Win32 MinSize" !ENDIF # End Source File # Begin Source File SOURCE=.\xmlparse.c # End Source File # Begin Source File SOURCE=..\xmltok\xmlrole.c !IF "$(CFG)" == "xmlparse - Win32 Release" # PROP Exclude_From_Build 1 !ELSEIF "$(CFG)" == "xmlparse - Win32 Debug" # PROP Exclude_From_Build 1 !ELSEIF "$(CFG)" == "xmlparse - Win32 MinSize" !ENDIF # End Source File # Begin Source File SOURCE=..\xmltok\xmltok.c !IF "$(CFG)" == "xmlparse - Win32 Release" # PROP Exclude_From_Build 1 !ELSEIF "$(CFG)" == "xmlparse - Win32 Debug" # PROP Exclude_From_Build 1 !ELSEIF "$(CFG)" == "xmlparse - Win32 MinSize" !ENDIF # End Source File # End Group # Begin Group "Header Files" # PROP Default_Filter "h;hpp;hxx;hm;inl;fi;fd" # Begin Source File SOURCE=.\xmlparse.h # End Source File # End Group # Begin Group "Resource Files" # PROP Default_Filter "ico;cur;bmp;dlg;rc2;rct;bin;cnt;rtf;gif;jpg;jpeg;jpe" # End Group # End Target # End Project swish-e-2.4.7/src/expat/xmlparse/xmlparse.c0000775000077100017500000032221711166010106015574 00000000000000/* Copyright (c) 1998, 1999, 2000 Thai Open Source Software Center Ltd See the file copying.txt for copying permission. */ #include "xmldef.h" #include "xmlparse.h" #include #ifdef XML_UNICODE #define XML_ENCODE_MAX XML_UTF16_ENCODE_MAX #define XmlConvert XmlUtf16Convert #define XmlGetInternalEncoding XmlGetUtf16InternalEncoding #define XmlGetInternalEncodingNS XmlGetUtf16InternalEncodingNS #define XmlEncode XmlUtf16Encode #define MUST_CONVERT(enc, s) (!(enc)->isUtf16 || (((unsigned long)s) & 1)) typedef unsigned short ICHAR; #else #define XML_ENCODE_MAX XML_UTF8_ENCODE_MAX #define XmlConvert XmlUtf8Convert #define XmlGetInternalEncoding XmlGetUtf8InternalEncoding #define XmlGetInternalEncodingNS XmlGetUtf8InternalEncodingNS #define XmlEncode XmlUtf8Encode #define MUST_CONVERT(enc, s) (!(enc)->isUtf8) typedef char ICHAR; #endif #ifndef XML_NS #define XmlInitEncodingNS XmlInitEncoding #define XmlInitUnknownEncodingNS XmlInitUnknownEncoding #undef XmlGetInternalEncodingNS #define XmlGetInternalEncodingNS XmlGetInternalEncoding #define XmlParseXmlDeclNS XmlParseXmlDecl #endif #ifdef XML_UNICODE_WCHAR_T #define XML_T(x) L ## x #else #define XML_T(x) x #endif /* Round up n to be a multiple of sz, where sz is a power of 2. */ #define ROUND_UP(n, sz) (((n) + ((sz) - 1)) & ~((sz) - 1)) #include "xmltok.h" #include "xmlrole.h" typedef const XML_Char *KEY; typedef struct { KEY name; } NAMED; typedef struct { NAMED **v; size_t size; size_t used; size_t usedLim; } HASH_TABLE; typedef struct { NAMED **p; NAMED **end; } HASH_TABLE_ITER; #define INIT_TAG_BUF_SIZE 32 /* must be a multiple of sizeof(XML_Char) */ #define INIT_DATA_BUF_SIZE 1024 #define INIT_ATTS_SIZE 16 #define INIT_BLOCK_SIZE 1024 #define INIT_BUFFER_SIZE 1024 #define EXPAND_SPARE 24 typedef struct binding { struct prefix *prefix; struct binding *nextTagBinding; struct binding *prevPrefixBinding; const struct attribute_id *attId; XML_Char *uri; int uriLen; int uriAlloc; } BINDING; typedef struct prefix { const XML_Char *name; BINDING *binding; } PREFIX; typedef struct { const XML_Char *str; const XML_Char *localPart; int uriLen; } TAG_NAME; typedef struct tag { struct tag *parent; const char *rawName; int rawNameLength; TAG_NAME name; char *buf; char *bufEnd; BINDING *bindings; } TAG; typedef struct { const XML_Char *name; const XML_Char *textPtr; int textLen; const XML_Char *systemId; const XML_Char *base; const XML_Char *publicId; const XML_Char *notation; char open; } ENTITY; typedef struct block { struct block *next; int size; XML_Char s[1]; } BLOCK; typedef struct { BLOCK *blocks; BLOCK *freeBlocks; const XML_Char *end; XML_Char *ptr; XML_Char *start; } STRING_POOL; /* The XML_Char before the name is used to determine whether an attribute has been specified. */ typedef struct attribute_id { XML_Char *name; PREFIX *prefix; char maybeTokenized; char xmlns; } ATTRIBUTE_ID; typedef struct { const ATTRIBUTE_ID *id; char isCdata; const XML_Char *value; } DEFAULT_ATTRIBUTE; typedef struct { const XML_Char *name; PREFIX *prefix; const ATTRIBUTE_ID *idAtt; int nDefaultAtts; int allocDefaultAtts; DEFAULT_ATTRIBUTE *defaultAtts; } ELEMENT_TYPE; typedef struct { HASH_TABLE generalEntities; HASH_TABLE elementTypes; HASH_TABLE attributeIds; HASH_TABLE prefixes; STRING_POOL pool; int complete; int standalone; #ifdef XML_DTD HASH_TABLE paramEntities; #endif /* XML_DTD */ PREFIX defaultPrefix; } DTD; typedef struct open_internal_entity { const char *internalEventPtr; const char *internalEventEndPtr; struct open_internal_entity *next; ENTITY *entity; } OPEN_INTERNAL_ENTITY; typedef enum XML_Error Processor(XML_Parser parser, const char *start, const char *end, const char **endPtr); static Processor prologProcessor; static Processor prologInitProcessor; static Processor contentProcessor; static Processor cdataSectionProcessor; #ifdef XML_DTD static Processor ignoreSectionProcessor; #endif /* XML_DTD */ static Processor epilogProcessor; static Processor errorProcessor; static Processor externalEntityInitProcessor; static Processor externalEntityInitProcessor2; static Processor externalEntityInitProcessor3; static Processor externalEntityContentProcessor; static enum XML_Error handleUnknownEncoding(XML_Parser parser, const XML_Char *encodingName); static enum XML_Error processXmlDecl(XML_Parser parser, int isGeneralTextEntity, const char *, const char *); static enum XML_Error initializeEncoding(XML_Parser parser); static enum XML_Error doProlog(XML_Parser parser, const ENCODING *enc, const char *s, const char *end, int tok, const char *next, const char **nextPtr); #ifdef XML_DTD static enum XML_Error processInternalParamEntity(XML_Parser parser, ENTITY *entity); #endif static enum XML_Error doContent(XML_Parser parser, int startTagLevel, const ENCODING *enc, const char *start, const char *end, const char **endPtr); static enum XML_Error doCdataSection(XML_Parser parser, const ENCODING *, const char **startPtr, const char *end, const char **nextPtr); #ifdef XML_DTD static enum XML_Error doIgnoreSection(XML_Parser parser, const ENCODING *, const char **startPtr, const char *end, const char **nextPtr); #endif /* XML_DTD */ static enum XML_Error storeAtts(XML_Parser parser, const ENCODING *, const char *s, TAG_NAME *tagNamePtr, BINDING **bindingsPtr); static int addBinding(XML_Parser parser, PREFIX *prefix, const ATTRIBUTE_ID *attId, const XML_Char *uri, BINDING **bindingsPtr); static int defineAttribute(ELEMENT_TYPE *type, ATTRIBUTE_ID *, int isCdata, int isId, const XML_Char *dfltValue); static enum XML_Error storeAttributeValue(XML_Parser parser, const ENCODING *, int isCdata, const char *, const char *, STRING_POOL *); static enum XML_Error appendAttributeValue(XML_Parser parser, const ENCODING *, int isCdata, const char *, const char *, STRING_POOL *); static ATTRIBUTE_ID * getAttributeId(XML_Parser parser, const ENCODING *enc, const char *start, const char *end); static int setElementTypePrefix(XML_Parser parser, ELEMENT_TYPE *); static enum XML_Error storeEntityValue(XML_Parser parser, const ENCODING *enc, const char *start, const char *end); static int reportProcessingInstruction(XML_Parser parser, const ENCODING *enc, const char *start, const char *end); static int reportComment(XML_Parser parser, const ENCODING *enc, const char *start, const char *end); static void reportDefault(XML_Parser parser, const ENCODING *enc, const char *start, const char *end); static const XML_Char *getContext(XML_Parser parser); static int setContext(XML_Parser parser, const XML_Char *context); static void normalizePublicId(XML_Char *s); static int dtdInit(DTD *); static void dtdDestroy(DTD *); static int dtdCopy(DTD *newDtd, const DTD *oldDtd); static int copyEntityTable(HASH_TABLE *, STRING_POOL *, const HASH_TABLE *); #ifdef XML_DTD static void dtdSwap(DTD *, DTD *); #endif /* XML_DTD */ static NAMED *lookup(HASH_TABLE *table, KEY name, size_t createSize); static void hashTableInit(HASH_TABLE *); static void hashTableDestroy(HASH_TABLE *); static void hashTableIterInit(HASH_TABLE_ITER *, const HASH_TABLE *); static NAMED *hashTableIterNext(HASH_TABLE_ITER *); static void poolInit(STRING_POOL *); static void poolClear(STRING_POOL *); static void poolDestroy(STRING_POOL *); static XML_Char *poolAppend(STRING_POOL *pool, const ENCODING *enc, const char *ptr, const char *end); static XML_Char *poolStoreString(STRING_POOL *pool, const ENCODING *enc, const char *ptr, const char *end); static int poolGrow(STRING_POOL *pool); static const XML_Char *poolCopyString(STRING_POOL *pool, const XML_Char *s); static const XML_Char *poolCopyStringN(STRING_POOL *pool, const XML_Char *s, int n); #define poolStart(pool) ((pool)->start) #define poolEnd(pool) ((pool)->ptr) #define poolLength(pool) ((pool)->ptr - (pool)->start) #define poolChop(pool) ((void)--(pool->ptr)) #define poolLastChar(pool) (((pool)->ptr)[-1]) #define poolDiscard(pool) ((pool)->ptr = (pool)->start) #define poolFinish(pool) ((pool)->start = (pool)->ptr) #define poolAppendChar(pool, c) \ (((pool)->ptr == (pool)->end && !poolGrow(pool)) \ ? 0 \ : ((*((pool)->ptr)++ = c), 1)) typedef struct { /* The first member must be userData so that the XML_GetUserData macro works. */ void *m_userData; void *m_handlerArg; char *m_buffer; /* first character to be parsed */ const char *m_bufferPtr; /* past last character to be parsed */ char *m_bufferEnd; /* allocated end of buffer */ const char *m_bufferLim; long m_parseEndByteIndex; const char *m_parseEndPtr; XML_Char *m_dataBuf; XML_Char *m_dataBufEnd; XML_StartElementHandler m_startElementHandler; XML_EndElementHandler m_endElementHandler; XML_CharacterDataHandler m_characterDataHandler; XML_ProcessingInstructionHandler m_processingInstructionHandler; XML_CommentHandler m_commentHandler; XML_StartCdataSectionHandler m_startCdataSectionHandler; XML_EndCdataSectionHandler m_endCdataSectionHandler; XML_DefaultHandler m_defaultHandler; XML_StartDoctypeDeclHandler m_startDoctypeDeclHandler; XML_EndDoctypeDeclHandler m_endDoctypeDeclHandler; XML_UnparsedEntityDeclHandler m_unparsedEntityDeclHandler; XML_NotationDeclHandler m_notationDeclHandler; XML_ExternalParsedEntityDeclHandler m_externalParsedEntityDeclHandler; XML_InternalParsedEntityDeclHandler m_internalParsedEntityDeclHandler; XML_StartNamespaceDeclHandler m_startNamespaceDeclHandler; XML_EndNamespaceDeclHandler m_endNamespaceDeclHandler; XML_NotStandaloneHandler m_notStandaloneHandler; XML_ExternalEntityRefHandler m_externalEntityRefHandler; void *m_externalEntityRefHandlerArg; XML_UnknownEncodingHandler m_unknownEncodingHandler; const ENCODING *m_encoding; INIT_ENCODING m_initEncoding; const ENCODING *m_internalEncoding; const XML_Char *m_protocolEncodingName; int m_ns; void *m_unknownEncodingMem; void *m_unknownEncodingData; void *m_unknownEncodingHandlerData; void (*m_unknownEncodingRelease)(void *); PROLOG_STATE m_prologState; Processor *m_processor; enum XML_Error m_errorCode; const char *m_eventPtr; const char *m_eventEndPtr; const char *m_positionPtr; OPEN_INTERNAL_ENTITY *m_openInternalEntities; int m_defaultExpandInternalEntities; int m_tagLevel; ENTITY *m_declEntity; const XML_Char *m_declNotationName; const XML_Char *m_declNotationPublicId; ELEMENT_TYPE *m_declElementType; ATTRIBUTE_ID *m_declAttributeId; char m_declAttributeIsCdata; char m_declAttributeIsId; DTD m_dtd; const XML_Char *m_curBase; TAG *m_tagStack; TAG *m_freeTagList; BINDING *m_inheritedBindings; BINDING *m_freeBindingList; int m_attsSize; int m_nSpecifiedAtts; int m_idAttIndex; ATTRIBUTE *m_atts; POSITION m_position; STRING_POOL m_tempPool; STRING_POOL m_temp2Pool; char *m_groupConnector; unsigned m_groupSize; int m_hadExternalDoctype; XML_Char m_namespaceSeparator; #ifdef XML_DTD enum XML_ParamEntityParsing m_paramEntityParsing; XML_Parser m_parentParser; #endif } Parser; #define userData (((Parser *)parser)->m_userData) #define handlerArg (((Parser *)parser)->m_handlerArg) #define startElementHandler (((Parser *)parser)->m_startElementHandler) #define endElementHandler (((Parser *)parser)->m_endElementHandler) #define characterDataHandler (((Parser *)parser)->m_characterDataHandler) #define processingInstructionHandler (((Parser *)parser)->m_processingInstructionHandler) #define commentHandler (((Parser *)parser)->m_commentHandler) #define startCdataSectionHandler (((Parser *)parser)->m_startCdataSectionHandler) #define endCdataSectionHandler (((Parser *)parser)->m_endCdataSectionHandler) #define defaultHandler (((Parser *)parser)->m_defaultHandler) #define startDoctypeDeclHandler (((Parser *)parser)->m_startDoctypeDeclHandler) #define endDoctypeDeclHandler (((Parser *)parser)->m_endDoctypeDeclHandler) #define unparsedEntityDeclHandler (((Parser *)parser)->m_unparsedEntityDeclHandler) #define notationDeclHandler (((Parser *)parser)->m_notationDeclHandler) #define externalParsedEntityDeclHandler (((Parser *)parser)->m_externalParsedEntityDeclHandler) #define internalParsedEntityDeclHandler (((Parser *)parser)->m_internalParsedEntityDeclHandler) #define startNamespaceDeclHandler (((Parser *)parser)->m_startNamespaceDeclHandler) #define endNamespaceDeclHandler (((Parser *)parser)->m_endNamespaceDeclHandler) #define notStandaloneHandler (((Parser *)parser)->m_notStandaloneHandler) #define externalEntityRefHandler (((Parser *)parser)->m_externalEntityRefHandler) #define externalEntityRefHandlerArg (((Parser *)parser)->m_externalEntityRefHandlerArg) #define unknownEncodingHandler (((Parser *)parser)->m_unknownEncodingHandler) #define encoding (((Parser *)parser)->m_encoding) #define initEncoding (((Parser *)parser)->m_initEncoding) #define internalEncoding (((Parser *)parser)->m_internalEncoding) #define unknownEncodingMem (((Parser *)parser)->m_unknownEncodingMem) #define unknownEncodingData (((Parser *)parser)->m_unknownEncodingData) #define unknownEncodingHandlerData \ (((Parser *)parser)->m_unknownEncodingHandlerData) #define unknownEncodingRelease (((Parser *)parser)->m_unknownEncodingRelease) #define protocolEncodingName (((Parser *)parser)->m_protocolEncodingName) #define ns (((Parser *)parser)->m_ns) #define prologState (((Parser *)parser)->m_prologState) #define processor (((Parser *)parser)->m_processor) #define errorCode (((Parser *)parser)->m_errorCode) #define eventPtr (((Parser *)parser)->m_eventPtr) #define eventEndPtr (((Parser *)parser)->m_eventEndPtr) #define positionPtr (((Parser *)parser)->m_positionPtr) #define position (((Parser *)parser)->m_position) #define openInternalEntities (((Parser *)parser)->m_openInternalEntities) #define defaultExpandInternalEntities (((Parser *)parser)->m_defaultExpandInternalEntities) #define tagLevel (((Parser *)parser)->m_tagLevel) #define buffer (((Parser *)parser)->m_buffer) #define bufferPtr (((Parser *)parser)->m_bufferPtr) #define bufferEnd (((Parser *)parser)->m_bufferEnd) #define parseEndByteIndex (((Parser *)parser)->m_parseEndByteIndex) #define parseEndPtr (((Parser *)parser)->m_parseEndPtr) #define bufferLim (((Parser *)parser)->m_bufferLim) #define dataBuf (((Parser *)parser)->m_dataBuf) #define dataBufEnd (((Parser *)parser)->m_dataBufEnd) #define dtd (((Parser *)parser)->m_dtd) #define curBase (((Parser *)parser)->m_curBase) #define declEntity (((Parser *)parser)->m_declEntity) #define declNotationName (((Parser *)parser)->m_declNotationName) #define declNotationPublicId (((Parser *)parser)->m_declNotationPublicId) #define declElementType (((Parser *)parser)->m_declElementType) #define declAttributeId (((Parser *)parser)->m_declAttributeId) #define declAttributeIsCdata (((Parser *)parser)->m_declAttributeIsCdata) #define declAttributeIsId (((Parser *)parser)->m_declAttributeIsId) #define freeTagList (((Parser *)parser)->m_freeTagList) #define freeBindingList (((Parser *)parser)->m_freeBindingList) #define inheritedBindings (((Parser *)parser)->m_inheritedBindings) #define tagStack (((Parser *)parser)->m_tagStack) #define atts (((Parser *)parser)->m_atts) #define attsSize (((Parser *)parser)->m_attsSize) #define nSpecifiedAtts (((Parser *)parser)->m_nSpecifiedAtts) #define idAttIndex (((Parser *)parser)->m_idAttIndex) #define tempPool (((Parser *)parser)->m_tempPool) #define temp2Pool (((Parser *)parser)->m_temp2Pool) #define groupConnector (((Parser *)parser)->m_groupConnector) #define groupSize (((Parser *)parser)->m_groupSize) #define hadExternalDoctype (((Parser *)parser)->m_hadExternalDoctype) #define namespaceSeparator (((Parser *)parser)->m_namespaceSeparator) #ifdef XML_DTD #define parentParser (((Parser *)parser)->m_parentParser) #define paramEntityParsing (((Parser *)parser)->m_paramEntityParsing) #endif /* XML_DTD */ #ifdef _MSC_VER #ifdef _DEBUG Parser *asParser(XML_Parser parser) { return parser; } #endif #endif XML_Parser XML_ParserCreate(const XML_Char *encodingName) { XML_Parser parser = malloc(sizeof(Parser)); if (!parser) return parser; processor = prologInitProcessor; XmlPrologStateInit(&prologState); userData = 0; handlerArg = 0; startElementHandler = 0; endElementHandler = 0; characterDataHandler = 0; processingInstructionHandler = 0; commentHandler = 0; startCdataSectionHandler = 0; endCdataSectionHandler = 0; defaultHandler = 0; startDoctypeDeclHandler = 0; endDoctypeDeclHandler = 0; unparsedEntityDeclHandler = 0; notationDeclHandler = 0; externalParsedEntityDeclHandler = 0; internalParsedEntityDeclHandler = 0; startNamespaceDeclHandler = 0; endNamespaceDeclHandler = 0; notStandaloneHandler = 0; externalEntityRefHandler = 0; externalEntityRefHandlerArg = parser; unknownEncodingHandler = 0; buffer = 0; bufferPtr = 0; bufferEnd = 0; parseEndByteIndex = 0; parseEndPtr = 0; bufferLim = 0; declElementType = 0; declAttributeId = 0; declEntity = 0; declNotationName = 0; declNotationPublicId = 0; memset(&position, 0, sizeof(POSITION)); errorCode = XML_ERROR_NONE; eventPtr = 0; eventEndPtr = 0; positionPtr = 0; openInternalEntities = 0; tagLevel = 0; tagStack = 0; freeTagList = 0; freeBindingList = 0; inheritedBindings = 0; attsSize = INIT_ATTS_SIZE; atts = malloc(attsSize * sizeof(ATTRIBUTE)); nSpecifiedAtts = 0; dataBuf = malloc(INIT_DATA_BUF_SIZE * sizeof(XML_Char)); groupSize = 0; groupConnector = 0; hadExternalDoctype = 0; unknownEncodingMem = 0; unknownEncodingRelease = 0; unknownEncodingData = 0; unknownEncodingHandlerData = 0; namespaceSeparator = '!'; #ifdef XML_DTD parentParser = 0; paramEntityParsing = XML_PARAM_ENTITY_PARSING_NEVER; #endif ns = 0; poolInit(&tempPool); poolInit(&temp2Pool); protocolEncodingName = encodingName ? poolCopyString(&tempPool, encodingName) : 0; curBase = 0; if (!dtdInit(&dtd) || !atts || !dataBuf || (encodingName && !protocolEncodingName)) { XML_ParserFree(parser); return 0; } dataBufEnd = dataBuf + INIT_DATA_BUF_SIZE; XmlInitEncoding(&initEncoding, &encoding, 0); internalEncoding = XmlGetInternalEncoding(); return parser; } XML_Parser XML_ParserCreateNS(const XML_Char *encodingName, XML_Char nsSep) { static const XML_Char implicitContext[] = { XML_T('x'), XML_T('m'), XML_T('l'), XML_T('='), XML_T('h'), XML_T('t'), XML_T('t'), XML_T('p'), XML_T(':'), XML_T('/'), XML_T('/'), XML_T('w'), XML_T('w'), XML_T('w'), XML_T('.'), XML_T('w'), XML_T('3'), XML_T('.'), XML_T('o'), XML_T('r'), XML_T('g'), XML_T('/'), XML_T('X'), XML_T('M'), XML_T('L'), XML_T('/'), XML_T('1'), XML_T('9'), XML_T('9'), XML_T('8'), XML_T('/'), XML_T('n'), XML_T('a'), XML_T('m'), XML_T('e'), XML_T('s'), XML_T('p'), XML_T('a'), XML_T('c'), XML_T('e'), XML_T('\0') }; XML_Parser parser = XML_ParserCreate(encodingName); if (parser) { XmlInitEncodingNS(&initEncoding, &encoding, 0); ns = 1; internalEncoding = XmlGetInternalEncodingNS(); namespaceSeparator = nsSep; } if (!setContext(parser, implicitContext)) { XML_ParserFree(parser); return 0; } return parser; } int XML_SetEncoding(XML_Parser parser, const XML_Char *encodingName) { if (!encodingName) protocolEncodingName = 0; else { protocolEncodingName = poolCopyString(&tempPool, encodingName); if (!protocolEncodingName) return 0; } return 1; } XML_Parser XML_ExternalEntityParserCreate(XML_Parser oldParser, const XML_Char *context, const XML_Char *encodingName) { XML_Parser parser = oldParser; DTD *oldDtd = &dtd; XML_StartElementHandler oldStartElementHandler = startElementHandler; XML_EndElementHandler oldEndElementHandler = endElementHandler; XML_CharacterDataHandler oldCharacterDataHandler = characterDataHandler; XML_ProcessingInstructionHandler oldProcessingInstructionHandler = processingInstructionHandler; XML_CommentHandler oldCommentHandler = commentHandler; XML_StartCdataSectionHandler oldStartCdataSectionHandler = startCdataSectionHandler; XML_EndCdataSectionHandler oldEndCdataSectionHandler = endCdataSectionHandler; XML_DefaultHandler oldDefaultHandler = defaultHandler; XML_UnparsedEntityDeclHandler oldUnparsedEntityDeclHandler = unparsedEntityDeclHandler; XML_NotationDeclHandler oldNotationDeclHandler = notationDeclHandler; XML_ExternalParsedEntityDeclHandler oldExternalParsedEntityDeclHandler = externalParsedEntityDeclHandler; XML_InternalParsedEntityDeclHandler oldInternalParsedEntityDeclHandler = internalParsedEntityDeclHandler; XML_StartNamespaceDeclHandler oldStartNamespaceDeclHandler = startNamespaceDeclHandler; XML_EndNamespaceDeclHandler oldEndNamespaceDeclHandler = endNamespaceDeclHandler; XML_NotStandaloneHandler oldNotStandaloneHandler = notStandaloneHandler; XML_ExternalEntityRefHandler oldExternalEntityRefHandler = externalEntityRefHandler; XML_UnknownEncodingHandler oldUnknownEncodingHandler = unknownEncodingHandler; void *oldUserData = userData; void *oldHandlerArg = handlerArg; int oldDefaultExpandInternalEntities = defaultExpandInternalEntities; void *oldExternalEntityRefHandlerArg = externalEntityRefHandlerArg; #ifdef XML_DTD int oldParamEntityParsing = paramEntityParsing; #endif parser = (ns ? XML_ParserCreateNS(encodingName, namespaceSeparator) : XML_ParserCreate(encodingName)); if (!parser) return 0; startElementHandler = oldStartElementHandler; endElementHandler = oldEndElementHandler; characterDataHandler = oldCharacterDataHandler; processingInstructionHandler = oldProcessingInstructionHandler; commentHandler = oldCommentHandler; startCdataSectionHandler = oldStartCdataSectionHandler; endCdataSectionHandler = oldEndCdataSectionHandler; defaultHandler = oldDefaultHandler; unparsedEntityDeclHandler = oldUnparsedEntityDeclHandler; notationDeclHandler = oldNotationDeclHandler; externalParsedEntityDeclHandler = oldExternalParsedEntityDeclHandler; internalParsedEntityDeclHandler = oldInternalParsedEntityDeclHandler; startNamespaceDeclHandler = oldStartNamespaceDeclHandler; endNamespaceDeclHandler = oldEndNamespaceDeclHandler; notStandaloneHandler = oldNotStandaloneHandler; externalEntityRefHandler = oldExternalEntityRefHandler; unknownEncodingHandler = oldUnknownEncodingHandler; userData = oldUserData; if (oldUserData == oldHandlerArg) handlerArg = userData; else handlerArg = parser; if (oldExternalEntityRefHandlerArg != oldParser) externalEntityRefHandlerArg = oldExternalEntityRefHandlerArg; defaultExpandInternalEntities = oldDefaultExpandInternalEntities; #ifdef XML_DTD paramEntityParsing = oldParamEntityParsing; if (context) { #endif /* XML_DTD */ if (!dtdCopy(&dtd, oldDtd) || !setContext(parser, context)) { XML_ParserFree(parser); return 0; } processor = externalEntityInitProcessor; #ifdef XML_DTD } else { dtdSwap(&dtd, oldDtd); parentParser = oldParser; XmlPrologStateInitExternalEntity(&prologState); dtd.complete = 1; hadExternalDoctype = 1; } #endif /* XML_DTD */ return parser; } static void destroyBindings(BINDING *bindings) { for (;;) { BINDING *b = bindings; if (!b) break; bindings = b->nextTagBinding; free(b->uri); free(b); } } void XML_ParserFree(XML_Parser parser) { for (;;) { TAG *p; if (tagStack == 0) { if (freeTagList == 0) break; tagStack = freeTagList; freeTagList = 0; } p = tagStack; tagStack = tagStack->parent; free(p->buf); destroyBindings(p->bindings); free(p); } destroyBindings(freeBindingList); destroyBindings(inheritedBindings); poolDestroy(&tempPool); poolDestroy(&temp2Pool); #ifdef XML_DTD if (parentParser) { if (hadExternalDoctype) dtd.complete = 0; dtdSwap(&dtd, &((Parser *)parentParser)->m_dtd); } #endif /* XML_DTD */ dtdDestroy(&dtd); free((void *)atts); free(groupConnector); free(buffer); free(dataBuf); free(unknownEncodingMem); if (unknownEncodingRelease) unknownEncodingRelease(unknownEncodingData); free(parser); } void XML_UseParserAsHandlerArg(XML_Parser parser) { handlerArg = parser; } void XML_SetUserData(XML_Parser parser, void *p) { if (handlerArg == userData) handlerArg = userData = p; else userData = p; } int XML_SetBase(XML_Parser parser, const XML_Char *p) { if (p) { p = poolCopyString(&dtd.pool, p); if (!p) return 0; curBase = p; } else curBase = 0; return 1; } const XML_Char *XML_GetBase(XML_Parser parser) { return curBase; } int XML_GetSpecifiedAttributeCount(XML_Parser parser) { return nSpecifiedAtts; } int XML_GetIdAttributeIndex(XML_Parser parser) { return idAttIndex; } void XML_SetElementHandler(XML_Parser parser, XML_StartElementHandler start, XML_EndElementHandler end) { startElementHandler = start; endElementHandler = end; } void XML_SetCharacterDataHandler(XML_Parser parser, XML_CharacterDataHandler handler) { characterDataHandler = handler; } void XML_SetProcessingInstructionHandler(XML_Parser parser, XML_ProcessingInstructionHandler handler) { processingInstructionHandler = handler; } void XML_SetCommentHandler(XML_Parser parser, XML_CommentHandler handler) { commentHandler = handler; } void XML_SetCdataSectionHandler(XML_Parser parser, XML_StartCdataSectionHandler start, XML_EndCdataSectionHandler end) { startCdataSectionHandler = start; endCdataSectionHandler = end; } void XML_SetDefaultHandler(XML_Parser parser, XML_DefaultHandler handler) { defaultHandler = handler; defaultExpandInternalEntities = 0; } void XML_SetDefaultHandlerExpand(XML_Parser parser, XML_DefaultHandler handler) { defaultHandler = handler; defaultExpandInternalEntities = 1; } void XML_SetDoctypeDeclHandler(XML_Parser parser, XML_StartDoctypeDeclHandler start, XML_EndDoctypeDeclHandler end) { startDoctypeDeclHandler = start; endDoctypeDeclHandler = end; } void XML_SetUnparsedEntityDeclHandler(XML_Parser parser, XML_UnparsedEntityDeclHandler handler) { unparsedEntityDeclHandler = handler; } void XML_SetExternalParsedEntityDeclHandler(XML_Parser parser, XML_ExternalParsedEntityDeclHandler handler) { externalParsedEntityDeclHandler = handler; } void XML_SetInternalParsedEntityDeclHandler(XML_Parser parser, XML_InternalParsedEntityDeclHandler handler) { internalParsedEntityDeclHandler = handler; } void XML_SetNotationDeclHandler(XML_Parser parser, XML_NotationDeclHandler handler) { notationDeclHandler = handler; } void XML_SetNamespaceDeclHandler(XML_Parser parser, XML_StartNamespaceDeclHandler start, XML_EndNamespaceDeclHandler end) { startNamespaceDeclHandler = start; endNamespaceDeclHandler = end; } void XML_SetNotStandaloneHandler(XML_Parser parser, XML_NotStandaloneHandler handler) { notStandaloneHandler = handler; } void XML_SetExternalEntityRefHandler(XML_Parser parser, XML_ExternalEntityRefHandler handler) { externalEntityRefHandler = handler; } void XML_SetExternalEntityRefHandlerArg(XML_Parser parser, void *arg) { if (arg) externalEntityRefHandlerArg = arg; else externalEntityRefHandlerArg = parser; } void XML_SetUnknownEncodingHandler(XML_Parser parser, XML_UnknownEncodingHandler handler, void *data) { unknownEncodingHandler = handler; unknownEncodingHandlerData = data; } int XML_SetParamEntityParsing(XML_Parser parser, enum XML_ParamEntityParsing parsing) { #ifdef XML_DTD paramEntityParsing = parsing; return 1; #else return parsing == XML_PARAM_ENTITY_PARSING_NEVER; #endif } int XML_Parse(XML_Parser parser, const char *s, int len, int isFinal) { if (len == 0) { if (!isFinal) return 1; positionPtr = bufferPtr; errorCode = processor(parser, bufferPtr, parseEndPtr = bufferEnd, 0); if (errorCode == XML_ERROR_NONE) return 1; eventEndPtr = eventPtr; processor = errorProcessor; return 0; } else if (bufferPtr == bufferEnd) { const char *end; int nLeftOver; parseEndByteIndex += len; positionPtr = s; if (isFinal) { errorCode = processor(parser, s, parseEndPtr = s + len, 0); if (errorCode == XML_ERROR_NONE) return 1; eventEndPtr = eventPtr; processor = errorProcessor; return 0; } errorCode = processor(parser, s, parseEndPtr = s + len, &end); if (errorCode != XML_ERROR_NONE) { eventEndPtr = eventPtr; processor = errorProcessor; return 0; } XmlUpdatePosition(encoding, positionPtr, end, &position); nLeftOver = s + len - end; if (nLeftOver) { if (buffer == 0 || nLeftOver > bufferLim - buffer) { /* FIXME avoid integer overflow */ buffer = buffer == 0 ? malloc(len * 2) : realloc(buffer, len * 2); /* FIXME storage leak if realloc fails */ if (!buffer) { errorCode = XML_ERROR_NO_MEMORY; eventPtr = eventEndPtr = 0; processor = errorProcessor; return 0; } bufferLim = buffer + len * 2; } memcpy(buffer, end, nLeftOver); bufferPtr = buffer; bufferEnd = buffer + nLeftOver; } return 1; } else { memcpy(XML_GetBuffer(parser, len), s, len); return XML_ParseBuffer(parser, len, isFinal); } } int XML_ParseBuffer(XML_Parser parser, int len, int isFinal) { const char *start = bufferPtr; positionPtr = start; bufferEnd += len; parseEndByteIndex += len; errorCode = processor(parser, start, parseEndPtr = bufferEnd, isFinal ? (const char **)0 : &bufferPtr); if (errorCode == XML_ERROR_NONE) { if (!isFinal) XmlUpdatePosition(encoding, positionPtr, bufferPtr, &position); return 1; } else { eventEndPtr = eventPtr; processor = errorProcessor; return 0; } } void *XML_GetBuffer(XML_Parser parser, int len) { if (len > bufferLim - bufferEnd) { /* FIXME avoid integer overflow */ int neededSize = len + (bufferEnd - bufferPtr); if (neededSize <= bufferLim - buffer) { memmove(buffer, bufferPtr, bufferEnd - bufferPtr); bufferEnd = buffer + (bufferEnd - bufferPtr); bufferPtr = buffer; } else { char *newBuf; int bufferSize = bufferLim - bufferPtr; if (bufferSize == 0) bufferSize = INIT_BUFFER_SIZE; do { bufferSize *= 2; } while (bufferSize < neededSize); newBuf = malloc(bufferSize); if (newBuf == 0) { errorCode = XML_ERROR_NO_MEMORY; return 0; } bufferLim = newBuf + bufferSize; if (bufferPtr) { memcpy(newBuf, bufferPtr, bufferEnd - bufferPtr); free(buffer); } bufferEnd = newBuf + (bufferEnd - bufferPtr); bufferPtr = buffer = newBuf; } } return bufferEnd; } enum XML_Error XML_GetErrorCode(XML_Parser parser) { return errorCode; } long XML_GetCurrentByteIndex(XML_Parser parser) { if (eventPtr) return parseEndByteIndex - (parseEndPtr - eventPtr); return -1; } int XML_GetCurrentByteCount(XML_Parser parser) { if (eventEndPtr && eventPtr) return eventEndPtr - eventPtr; return 0; } int XML_GetCurrentLineNumber(XML_Parser parser) { if (eventPtr) { XmlUpdatePosition(encoding, positionPtr, eventPtr, &position); positionPtr = eventPtr; } return position.lineNumber + 1; } int XML_GetCurrentColumnNumber(XML_Parser parser) { if (eventPtr) { XmlUpdatePosition(encoding, positionPtr, eventPtr, &position); positionPtr = eventPtr; } return position.columnNumber; } void XML_DefaultCurrent(XML_Parser parser) { if (defaultHandler) { if (openInternalEntities) reportDefault(parser, internalEncoding, openInternalEntities->internalEventPtr, openInternalEntities->internalEventEndPtr); else reportDefault(parser, encoding, eventPtr, eventEndPtr); } } const XML_LChar *XML_ErrorString(int code) { static const XML_LChar *message[] = { 0, XML_T("out of memory"), XML_T("syntax error"), XML_T("no element found"), XML_T("not well-formed"), XML_T("unclosed token"), XML_T("unclosed token"), XML_T("mismatched tag"), XML_T("duplicate attribute"), XML_T("junk after document element"), XML_T("illegal parameter entity reference"), XML_T("undefined entity"), XML_T("recursive entity reference"), XML_T("asynchronous entity"), XML_T("reference to invalid character number"), XML_T("reference to binary entity"), XML_T("reference to external entity in attribute"), XML_T("xml processing instruction not at start of external entity"), XML_T("unknown encoding"), XML_T("encoding specified in XML declaration is incorrect"), XML_T("unclosed CDATA section"), XML_T("error in processing external entity reference"), XML_T("document is not standalone") }; if (code > 0 && code < sizeof(message)/sizeof(message[0])) return message[code]; return 0; } static enum XML_Error contentProcessor(XML_Parser parser, const char *start, const char *end, const char **endPtr) { return doContent(parser, 0, encoding, start, end, endPtr); } static enum XML_Error externalEntityInitProcessor(XML_Parser parser, const char *start, const char *end, const char **endPtr) { enum XML_Error result = initializeEncoding(parser); if (result != XML_ERROR_NONE) return result; processor = externalEntityInitProcessor2; return externalEntityInitProcessor2(parser, start, end, endPtr); } static enum XML_Error externalEntityInitProcessor2(XML_Parser parser, const char *start, const char *end, const char **endPtr) { const char *next; int tok = XmlContentTok(encoding, start, end, &next); switch (tok) { case XML_TOK_BOM: start = next; break; case XML_TOK_PARTIAL: if (endPtr) { *endPtr = start; return XML_ERROR_NONE; } eventPtr = start; return XML_ERROR_UNCLOSED_TOKEN; case XML_TOK_PARTIAL_CHAR: if (endPtr) { *endPtr = start; return XML_ERROR_NONE; } eventPtr = start; return XML_ERROR_PARTIAL_CHAR; } processor = externalEntityInitProcessor3; return externalEntityInitProcessor3(parser, start, end, endPtr); } static enum XML_Error externalEntityInitProcessor3(XML_Parser parser, const char *start, const char *end, const char **endPtr) { const char *next; int tok = XmlContentTok(encoding, start, end, &next); switch (tok) { case XML_TOK_XML_DECL: { enum XML_Error result = processXmlDecl(parser, 1, start, next); if (result != XML_ERROR_NONE) return result; start = next; } break; case XML_TOK_PARTIAL: if (endPtr) { *endPtr = start; return XML_ERROR_NONE; } eventPtr = start; return XML_ERROR_UNCLOSED_TOKEN; case XML_TOK_PARTIAL_CHAR: if (endPtr) { *endPtr = start; return XML_ERROR_NONE; } eventPtr = start; return XML_ERROR_PARTIAL_CHAR; } processor = externalEntityContentProcessor; tagLevel = 1; return doContent(parser, 1, encoding, start, end, endPtr); } static enum XML_Error externalEntityContentProcessor(XML_Parser parser, const char *start, const char *end, const char **endPtr) { return doContent(parser, 1, encoding, start, end, endPtr); } static enum XML_Error doContent(XML_Parser parser, int startTagLevel, const ENCODING *enc, const char *s, const char *end, const char **nextPtr) { const char **eventPP; const char **eventEndPP; if (enc == encoding) { eventPP = &eventPtr; eventEndPP = &eventEndPtr; } else { eventPP = &(openInternalEntities->internalEventPtr); eventEndPP = &(openInternalEntities->internalEventEndPtr); } *eventPP = s; for (;;) { const char *next = s; /* XmlContentTok doesn't always set the last arg */ int tok = XmlContentTok(enc, s, end, &next); *eventEndPP = next; switch (tok) { case XML_TOK_TRAILING_CR: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } *eventEndPP = end; if (characterDataHandler) { XML_Char c = 0xA; characterDataHandler(handlerArg, &c, 1); } else if (defaultHandler) reportDefault(parser, enc, s, end); if (startTagLevel == 0) return XML_ERROR_NO_ELEMENTS; if (tagLevel != startTagLevel) return XML_ERROR_ASYNC_ENTITY; return XML_ERROR_NONE; case XML_TOK_NONE: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } if (startTagLevel > 0) { if (tagLevel != startTagLevel) return XML_ERROR_ASYNC_ENTITY; return XML_ERROR_NONE; } return XML_ERROR_NO_ELEMENTS; case XML_TOK_INVALID: *eventPP = next; return XML_ERROR_INVALID_TOKEN; case XML_TOK_PARTIAL: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } return XML_ERROR_UNCLOSED_TOKEN; case XML_TOK_PARTIAL_CHAR: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } return XML_ERROR_PARTIAL_CHAR; case XML_TOK_ENTITY_REF: { const XML_Char *name; ENTITY *entity; XML_Char ch = XmlPredefinedEntityName(enc, s + enc->minBytesPerChar, next - enc->minBytesPerChar); if (ch) { if (characterDataHandler) characterDataHandler(handlerArg, &ch, 1); else if (defaultHandler) reportDefault(parser, enc, s, next); break; } name = poolStoreString(&dtd.pool, enc, s + enc->minBytesPerChar, next - enc->minBytesPerChar); if (!name) return XML_ERROR_NO_MEMORY; entity = (ENTITY *)lookup(&dtd.generalEntities, name, 0); poolDiscard(&dtd.pool); if (!entity) { if (dtd.complete || dtd.standalone) return XML_ERROR_UNDEFINED_ENTITY; if (defaultHandler) reportDefault(parser, enc, s, next); break; } if (entity->open) return XML_ERROR_RECURSIVE_ENTITY_REF; if (entity->notation) return XML_ERROR_BINARY_ENTITY_REF; if (entity) { if (entity->textPtr) { enum XML_Error result; OPEN_INTERNAL_ENTITY openEntity; if (defaultHandler && !defaultExpandInternalEntities) { reportDefault(parser, enc, s, next); break; } entity->open = 1; openEntity.next = openInternalEntities; openInternalEntities = &openEntity; openEntity.entity = entity; openEntity.internalEventPtr = 0; openEntity.internalEventEndPtr = 0; result = doContent(parser, tagLevel, internalEncoding, (char *)entity->textPtr, (char *)(entity->textPtr + entity->textLen), 0); entity->open = 0; openInternalEntities = openEntity.next; if (result) return result; } else if (externalEntityRefHandler) { const XML_Char *context; entity->open = 1; context = getContext(parser); entity->open = 0; if (!context) return XML_ERROR_NO_MEMORY; if (!externalEntityRefHandler(externalEntityRefHandlerArg, context, entity->base, entity->systemId, entity->publicId)) return XML_ERROR_EXTERNAL_ENTITY_HANDLING; poolDiscard(&tempPool); } else if (defaultHandler) reportDefault(parser, enc, s, next); } break; } case XML_TOK_START_TAG_WITH_ATTS: if (!startElementHandler) { enum XML_Error result = storeAtts(parser, enc, s, 0, 0); if (result) return result; } /* fall through */ case XML_TOK_START_TAG_NO_ATTS: { TAG *tag; if (freeTagList) { tag = freeTagList; freeTagList = freeTagList->parent; } else { tag = malloc(sizeof(TAG)); if (!tag) return XML_ERROR_NO_MEMORY; tag->buf = malloc(INIT_TAG_BUF_SIZE); if (!tag->buf) return XML_ERROR_NO_MEMORY; tag->bufEnd = tag->buf + INIT_TAG_BUF_SIZE; } tag->bindings = 0; tag->parent = tagStack; tagStack = tag; tag->name.localPart = 0; tag->rawName = s + enc->minBytesPerChar; tag->rawNameLength = XmlNameLength(enc, tag->rawName); if (nextPtr) { /* Need to guarantee that: tag->buf + ROUND_UP(tag->rawNameLength, sizeof(XML_Char)) <= tag->bufEnd - sizeof(XML_Char) */ if (tag->rawNameLength + (int)(sizeof(XML_Char) - 1) + (int)sizeof(XML_Char) > tag->bufEnd - tag->buf) { int bufSize = tag->rawNameLength * 4; bufSize = ROUND_UP(bufSize, sizeof(XML_Char)); tag->buf = realloc(tag->buf, bufSize); if (!tag->buf) return XML_ERROR_NO_MEMORY; tag->bufEnd = tag->buf + bufSize; } memcpy(tag->buf, tag->rawName, tag->rawNameLength); tag->rawName = tag->buf; } ++tagLevel; if (startElementHandler) { enum XML_Error result; XML_Char *toPtr; for (;;) { const char *rawNameEnd = tag->rawName + tag->rawNameLength; const char *fromPtr = tag->rawName; int bufSize; if (nextPtr) toPtr = (XML_Char *)(tag->buf + ROUND_UP(tag->rawNameLength, sizeof(XML_Char))); else toPtr = (XML_Char *)tag->buf; tag->name.str = toPtr; XmlConvert(enc, &fromPtr, rawNameEnd, (ICHAR **)&toPtr, (ICHAR *)tag->bufEnd - 1); if (fromPtr == rawNameEnd) break; bufSize = (tag->bufEnd - tag->buf) << 1; tag->buf = realloc(tag->buf, bufSize); if (!tag->buf) return XML_ERROR_NO_MEMORY; tag->bufEnd = tag->buf + bufSize; if (nextPtr) tag->rawName = tag->buf; } *toPtr = XML_T('\0'); result = storeAtts(parser, enc, s, &(tag->name), &(tag->bindings)); if (result) return result; startElementHandler(handlerArg, tag->name.str, (const XML_Char **)atts); poolClear(&tempPool); } else { tag->name.str = 0; if (defaultHandler) reportDefault(parser, enc, s, next); } break; } case XML_TOK_EMPTY_ELEMENT_WITH_ATTS: if (!startElementHandler) { enum XML_Error result = storeAtts(parser, enc, s, 0, 0); if (result) return result; } /* fall through */ case XML_TOK_EMPTY_ELEMENT_NO_ATTS: if (startElementHandler || endElementHandler) { const char *rawName = s + enc->minBytesPerChar; enum XML_Error result; BINDING *bindings = 0; TAG_NAME name; name.str = poolStoreString(&tempPool, enc, rawName, rawName + XmlNameLength(enc, rawName)); if (!name.str) return XML_ERROR_NO_MEMORY; poolFinish(&tempPool); result = storeAtts(parser, enc, s, &name, &bindings); if (result) return result; poolFinish(&tempPool); if (startElementHandler) startElementHandler(handlerArg, name.str, (const XML_Char **)atts); if (endElementHandler) { if (startElementHandler) *eventPP = *eventEndPP; endElementHandler(handlerArg, name.str); } poolClear(&tempPool); while (bindings) { BINDING *b = bindings; if (endNamespaceDeclHandler) endNamespaceDeclHandler(handlerArg, b->prefix->name); bindings = bindings->nextTagBinding; b->nextTagBinding = freeBindingList; freeBindingList = b; b->prefix->binding = b->prevPrefixBinding; } } else if (defaultHandler) reportDefault(parser, enc, s, next); if (tagLevel == 0) return epilogProcessor(parser, next, end, nextPtr); break; case XML_TOK_END_TAG: if (tagLevel == startTagLevel) return XML_ERROR_ASYNC_ENTITY; else { int len; const char *rawName; TAG *tag = tagStack; tagStack = tag->parent; tag->parent = freeTagList; freeTagList = tag; rawName = s + enc->minBytesPerChar*2; len = XmlNameLength(enc, rawName); if (len != tag->rawNameLength || memcmp(tag->rawName, rawName, len) != 0) { *eventPP = rawName; return XML_ERROR_TAG_MISMATCH; } --tagLevel; if (endElementHandler && tag->name.str) { if (tag->name.localPart) { XML_Char *to = (XML_Char *)tag->name.str + tag->name.uriLen; const XML_Char *from = tag->name.localPart; while ((*to++ = *from++) != 0) ; } endElementHandler(handlerArg, tag->name.str); } else if (defaultHandler) reportDefault(parser, enc, s, next); while (tag->bindings) { BINDING *b = tag->bindings; if (endNamespaceDeclHandler) endNamespaceDeclHandler(handlerArg, b->prefix->name); tag->bindings = tag->bindings->nextTagBinding; b->nextTagBinding = freeBindingList; freeBindingList = b; b->prefix->binding = b->prevPrefixBinding; } if (tagLevel == 0) return epilogProcessor(parser, next, end, nextPtr); } break; case XML_TOK_CHAR_REF: { int n = XmlCharRefNumber(enc, s); if (n < 0) return XML_ERROR_BAD_CHAR_REF; if (characterDataHandler) { XML_Char buf[XML_ENCODE_MAX]; characterDataHandler(handlerArg, buf, XmlEncode(n, (ICHAR *)buf)); } else if (defaultHandler) reportDefault(parser, enc, s, next); } break; case XML_TOK_XML_DECL: return XML_ERROR_MISPLACED_XML_PI; case XML_TOK_DATA_NEWLINE: if (characterDataHandler) { XML_Char c = 0xA; characterDataHandler(handlerArg, &c, 1); } else if (defaultHandler) reportDefault(parser, enc, s, next); break; case XML_TOK_CDATA_SECT_OPEN: { enum XML_Error result; if (startCdataSectionHandler) startCdataSectionHandler(handlerArg); #if 0 /* Suppose you doing a transformation on a document that involves changing only the character data. You set up a defaultHandler and a characterDataHandler. The defaultHandler simply copies characters through. The characterDataHandler does the transformation and writes the characters out escaping them as necessary. This case will fail to work if we leave out the following two lines (because & and < inside CDATA sections will be incorrectly escaped). However, now we have a start/endCdataSectionHandler, so it seems easier to let the user deal with this. */ else if (characterDataHandler) characterDataHandler(handlerArg, dataBuf, 0); #endif else if (defaultHandler) reportDefault(parser, enc, s, next); result = doCdataSection(parser, enc, &next, end, nextPtr); if (!next) { processor = cdataSectionProcessor; return result; } } break; case XML_TOK_TRAILING_RSQB: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } if (characterDataHandler) { if (MUST_CONVERT(enc, s)) { ICHAR *dataPtr = (ICHAR *)dataBuf; XmlConvert(enc, &s, end, &dataPtr, (ICHAR *)dataBufEnd); characterDataHandler(handlerArg, dataBuf, dataPtr - (ICHAR *)dataBuf); } else characterDataHandler(handlerArg, (XML_Char *)s, (XML_Char *)end - (XML_Char *)s); } else if (defaultHandler) reportDefault(parser, enc, s, end); if (startTagLevel == 0) { *eventPP = end; return XML_ERROR_NO_ELEMENTS; } if (tagLevel != startTagLevel) { *eventPP = end; return XML_ERROR_ASYNC_ENTITY; } return XML_ERROR_NONE; case XML_TOK_DATA_CHARS: if (characterDataHandler) { if (MUST_CONVERT(enc, s)) { for (;;) { ICHAR *dataPtr = (ICHAR *)dataBuf; XmlConvert(enc, &s, next, &dataPtr, (ICHAR *)dataBufEnd); *eventEndPP = s; characterDataHandler(handlerArg, dataBuf, dataPtr - (ICHAR *)dataBuf); if (s == next) break; *eventPP = s; } } else characterDataHandler(handlerArg, (XML_Char *)s, (XML_Char *)next - (XML_Char *)s); } else if (defaultHandler) reportDefault(parser, enc, s, next); break; case XML_TOK_PI: if (!reportProcessingInstruction(parser, enc, s, next)) return XML_ERROR_NO_MEMORY; break; case XML_TOK_COMMENT: if (!reportComment(parser, enc, s, next)) return XML_ERROR_NO_MEMORY; break; default: if (defaultHandler) reportDefault(parser, enc, s, next); break; } *eventPP = s = next; } /* not reached */ } /* If tagNamePtr is non-null, build a real list of attributes, otherwise just check the attributes for well-formedness. */ static enum XML_Error storeAtts(XML_Parser parser, const ENCODING *enc, const char *attStr, TAG_NAME *tagNamePtr, BINDING **bindingsPtr) { ELEMENT_TYPE *elementType = 0; int nDefaultAtts = 0; const XML_Char **appAtts; /* the attribute list to pass to the application */ int attIndex = 0; int i; int n; int nPrefixes = 0; BINDING *binding; const XML_Char *localPart; /* lookup the element type name */ if (tagNamePtr) { elementType = (ELEMENT_TYPE *)lookup(&dtd.elementTypes, tagNamePtr->str, 0); if (!elementType) { tagNamePtr->str = poolCopyString(&dtd.pool, tagNamePtr->str); if (!tagNamePtr->str) return XML_ERROR_NO_MEMORY; elementType = (ELEMENT_TYPE *)lookup(&dtd.elementTypes, tagNamePtr->str, sizeof(ELEMENT_TYPE)); if (!elementType) return XML_ERROR_NO_MEMORY; if (ns && !setElementTypePrefix(parser, elementType)) return XML_ERROR_NO_MEMORY; } nDefaultAtts = elementType->nDefaultAtts; } /* get the attributes from the tokenizer */ n = XmlGetAttributes(enc, attStr, attsSize, atts); if (n + nDefaultAtts > attsSize) { int oldAttsSize = attsSize; attsSize = n + nDefaultAtts + INIT_ATTS_SIZE; atts = realloc((void *)atts, attsSize * sizeof(ATTRIBUTE)); if (!atts) return XML_ERROR_NO_MEMORY; if (n > oldAttsSize) XmlGetAttributes(enc, attStr, n, atts); } appAtts = (const XML_Char **)atts; for (i = 0; i < n; i++) { /* add the name and value to the attribute list */ ATTRIBUTE_ID *attId = getAttributeId(parser, enc, atts[i].name, atts[i].name + XmlNameLength(enc, atts[i].name)); if (!attId) return XML_ERROR_NO_MEMORY; /* detect duplicate attributes */ if ((attId->name)[-1]) { if (enc == encoding) eventPtr = atts[i].name; return XML_ERROR_DUPLICATE_ATTRIBUTE; } (attId->name)[-1] = 1; appAtts[attIndex++] = attId->name; if (!atts[i].normalized) { enum XML_Error result; int isCdata = 1; /* figure out whether declared as other than CDATA */ if (attId->maybeTokenized) { int j; for (j = 0; j < nDefaultAtts; j++) { if (attId == elementType->defaultAtts[j].id) { isCdata = elementType->defaultAtts[j].isCdata; break; } } } /* normalize the attribute value */ result = storeAttributeValue(parser, enc, isCdata, atts[i].valuePtr, atts[i].valueEnd, &tempPool); if (result) return result; if (tagNamePtr) { appAtts[attIndex] = poolStart(&tempPool); poolFinish(&tempPool); } else poolDiscard(&tempPool); } else if (tagNamePtr) { /* the value did not need normalizing */ appAtts[attIndex] = poolStoreString(&tempPool, enc, atts[i].valuePtr, atts[i].valueEnd); if (appAtts[attIndex] == 0) return XML_ERROR_NO_MEMORY; poolFinish(&tempPool); } /* handle prefixed attribute names */ if (attId->prefix && tagNamePtr) { if (attId->xmlns) { /* deal with namespace declarations here */ if (!addBinding(parser, attId->prefix, attId, appAtts[attIndex], bindingsPtr)) return XML_ERROR_NO_MEMORY; --attIndex; } else { /* deal with other prefixed names later */ attIndex++; nPrefixes++; (attId->name)[-1] = 2; } } else attIndex++; } if (tagNamePtr) { int j; nSpecifiedAtts = attIndex; if (elementType->idAtt && (elementType->idAtt->name)[-1]) { for (i = 0; i < attIndex; i += 2) if (appAtts[i] == elementType->idAtt->name) { idAttIndex = i; break; } } else idAttIndex = -1; /* do attribute defaulting */ for (j = 0; j < nDefaultAtts; j++) { const DEFAULT_ATTRIBUTE *da = elementType->defaultAtts + j; if (!(da->id->name)[-1] && da->value) { if (da->id->prefix) { if (da->id->xmlns) { if (!addBinding(parser, da->id->prefix, da->id, da->value, bindingsPtr)) return XML_ERROR_NO_MEMORY; } else { (da->id->name)[-1] = 2; nPrefixes++; appAtts[attIndex++] = da->id->name; appAtts[attIndex++] = da->value; } } else { (da->id->name)[-1] = 1; appAtts[attIndex++] = da->id->name; appAtts[attIndex++] = da->value; } } } appAtts[attIndex] = 0; } i = 0; if (nPrefixes) { /* expand prefixed attribute names */ for (; i < attIndex; i += 2) { if (appAtts[i][-1] == 2) { ATTRIBUTE_ID *id; ((XML_Char *)(appAtts[i]))[-1] = 0; id = (ATTRIBUTE_ID *)lookup(&dtd.attributeIds, appAtts[i], 0); if (id->prefix->binding) { int j; const BINDING *b = id->prefix->binding; const XML_Char *s = appAtts[i]; for (j = 0; j < b->uriLen; j++) { if (!poolAppendChar(&tempPool, b->uri[j])) return XML_ERROR_NO_MEMORY; } while (*s++ != ':') ; do { if (!poolAppendChar(&tempPool, *s)) return XML_ERROR_NO_MEMORY; } while (*s++); appAtts[i] = poolStart(&tempPool); poolFinish(&tempPool); } if (!--nPrefixes) break; } else ((XML_Char *)(appAtts[i]))[-1] = 0; } } /* clear the flags that say whether attributes were specified */ for (; i < attIndex; i += 2) ((XML_Char *)(appAtts[i]))[-1] = 0; if (!tagNamePtr) return XML_ERROR_NONE; for (binding = *bindingsPtr; binding; binding = binding->nextTagBinding) binding->attId->name[-1] = 0; /* expand the element type name */ if (elementType->prefix) { binding = elementType->prefix->binding; if (!binding) return XML_ERROR_NONE; localPart = tagNamePtr->str; while (*localPart++ != XML_T(':')) ; } else if (dtd.defaultPrefix.binding) { binding = dtd.defaultPrefix.binding; localPart = tagNamePtr->str; } else return XML_ERROR_NONE; tagNamePtr->localPart = localPart; tagNamePtr->uriLen = binding->uriLen; for (i = 0; localPart[i++];) ; n = i + binding->uriLen; if (n > binding->uriAlloc) { TAG *p; XML_Char *uri = malloc((n + EXPAND_SPARE) * sizeof(XML_Char)); if (!uri) return XML_ERROR_NO_MEMORY; binding->uriAlloc = n + EXPAND_SPARE; memcpy(uri, binding->uri, binding->uriLen * sizeof(XML_Char)); for (p = tagStack; p; p = p->parent) if (p->name.str == binding->uri) p->name.str = uri; free(binding->uri); binding->uri = uri; } memcpy(binding->uri + binding->uriLen, localPart, i * sizeof(XML_Char)); tagNamePtr->str = binding->uri; return XML_ERROR_NONE; } static int addBinding(XML_Parser parser, PREFIX *prefix, const ATTRIBUTE_ID *attId, const XML_Char *uri, BINDING **bindingsPtr) { BINDING *b; int len; for (len = 0; uri[len]; len++) ; if (namespaceSeparator) len++; if (freeBindingList) { b = freeBindingList; if (len > b->uriAlloc) { b->uri = realloc(b->uri, sizeof(XML_Char) * (len + EXPAND_SPARE)); if (!b->uri) return 0; b->uriAlloc = len + EXPAND_SPARE; } freeBindingList = b->nextTagBinding; } else { b = malloc(sizeof(BINDING)); if (!b) return 0; b->uri = malloc(sizeof(XML_Char) * (len + EXPAND_SPARE)); if (!b->uri) { free(b); return 0; } b->uriAlloc = len + EXPAND_SPARE; } b->uriLen = len; memcpy(b->uri, uri, len * sizeof(XML_Char)); if (namespaceSeparator) b->uri[len - 1] = namespaceSeparator; b->prefix = prefix; b->attId = attId; b->prevPrefixBinding = prefix->binding; if (*uri == XML_T('\0') && prefix == &dtd.defaultPrefix) prefix->binding = 0; else prefix->binding = b; b->nextTagBinding = *bindingsPtr; *bindingsPtr = b; if (startNamespaceDeclHandler) startNamespaceDeclHandler(handlerArg, prefix->name, prefix->binding ? uri : 0); return 1; } /* The idea here is to avoid using stack for each CDATA section when the whole file is parsed with one call. */ static enum XML_Error cdataSectionProcessor(XML_Parser parser, const char *start, const char *end, const char **endPtr) { enum XML_Error result = doCdataSection(parser, encoding, &start, end, endPtr); if (start) { processor = contentProcessor; return contentProcessor(parser, start, end, endPtr); } return result; } /* startPtr gets set to non-null is the section is closed, and to null if the section is not yet closed. */ static enum XML_Error doCdataSection(XML_Parser parser, const ENCODING *enc, const char **startPtr, const char *end, const char **nextPtr) { const char *s = *startPtr; const char **eventPP; const char **eventEndPP; if (enc == encoding) { eventPP = &eventPtr; *eventPP = s; eventEndPP = &eventEndPtr; } else { eventPP = &(openInternalEntities->internalEventPtr); eventEndPP = &(openInternalEntities->internalEventEndPtr); } *eventPP = s; *startPtr = 0; for (;;) { const char *next; int tok = XmlCdataSectionTok(enc, s, end, &next); *eventEndPP = next; switch (tok) { case XML_TOK_CDATA_SECT_CLOSE: if (endCdataSectionHandler) endCdataSectionHandler(handlerArg); #if 0 /* see comment under XML_TOK_CDATA_SECT_OPEN */ else if (characterDataHandler) characterDataHandler(handlerArg, dataBuf, 0); #endif else if (defaultHandler) reportDefault(parser, enc, s, next); *startPtr = next; return XML_ERROR_NONE; case XML_TOK_DATA_NEWLINE: if (characterDataHandler) { XML_Char c = 0xA; characterDataHandler(handlerArg, &c, 1); } else if (defaultHandler) reportDefault(parser, enc, s, next); break; case XML_TOK_DATA_CHARS: if (characterDataHandler) { if (MUST_CONVERT(enc, s)) { for (;;) { ICHAR *dataPtr = (ICHAR *)dataBuf; XmlConvert(enc, &s, next, &dataPtr, (ICHAR *)dataBufEnd); *eventEndPP = next; characterDataHandler(handlerArg, dataBuf, dataPtr - (ICHAR *)dataBuf); if (s == next) break; *eventPP = s; } } else characterDataHandler(handlerArg, (XML_Char *)s, (XML_Char *)next - (XML_Char *)s); } else if (defaultHandler) reportDefault(parser, enc, s, next); break; case XML_TOK_INVALID: *eventPP = next; return XML_ERROR_INVALID_TOKEN; case XML_TOK_PARTIAL_CHAR: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } return XML_ERROR_PARTIAL_CHAR; case XML_TOK_PARTIAL: case XML_TOK_NONE: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } return XML_ERROR_UNCLOSED_CDATA_SECTION; default: abort(); } *eventPP = s = next; } /* not reached */ } #ifdef XML_DTD /* The idea here is to avoid using stack for each IGNORE section when the whole file is parsed with one call. */ static enum XML_Error ignoreSectionProcessor(XML_Parser parser, const char *start, const char *end, const char **endPtr) { enum XML_Error result = doIgnoreSection(parser, encoding, &start, end, endPtr); if (start) { processor = prologProcessor; return prologProcessor(parser, start, end, endPtr); } return result; } /* startPtr gets set to non-null is the section is closed, and to null if the section is not yet closed. */ static enum XML_Error doIgnoreSection(XML_Parser parser, const ENCODING *enc, const char **startPtr, const char *end, const char **nextPtr) { const char *next; int tok; const char *s = *startPtr; const char **eventPP; const char **eventEndPP; if (enc == encoding) { eventPP = &eventPtr; *eventPP = s; eventEndPP = &eventEndPtr; } else { eventPP = &(openInternalEntities->internalEventPtr); eventEndPP = &(openInternalEntities->internalEventEndPtr); } *eventPP = s; *startPtr = 0; tok = XmlIgnoreSectionTok(enc, s, end, &next); *eventEndPP = next; switch (tok) { case XML_TOK_IGNORE_SECT: if (defaultHandler) reportDefault(parser, enc, s, next); *startPtr = next; return XML_ERROR_NONE; case XML_TOK_INVALID: *eventPP = next; return XML_ERROR_INVALID_TOKEN; case XML_TOK_PARTIAL_CHAR: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } return XML_ERROR_PARTIAL_CHAR; case XML_TOK_PARTIAL: case XML_TOK_NONE: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } return XML_ERROR_SYNTAX; /* XML_ERROR_UNCLOSED_IGNORE_SECTION */ default: abort(); } /* not reached */ } #endif /* XML_DTD */ static enum XML_Error initializeEncoding(XML_Parser parser) { const char *s; #ifdef XML_UNICODE char encodingBuf[128]; if (!protocolEncodingName) s = 0; else { int i; for (i = 0; protocolEncodingName[i]; i++) { if (i == sizeof(encodingBuf) - 1 || (protocolEncodingName[i] & ~0x7f) != 0) { encodingBuf[0] = '\0'; break; } encodingBuf[i] = (char)protocolEncodingName[i]; } encodingBuf[i] = '\0'; s = encodingBuf; } #else s = protocolEncodingName; #endif if ((ns ? XmlInitEncodingNS : XmlInitEncoding)(&initEncoding, &encoding, s)) return XML_ERROR_NONE; return handleUnknownEncoding(parser, protocolEncodingName); } static enum XML_Error processXmlDecl(XML_Parser parser, int isGeneralTextEntity, const char *s, const char *next) { const char *encodingName = 0; const ENCODING *newEncoding = 0; const char *version; int standalone = -1; if (!(ns ? XmlParseXmlDeclNS : XmlParseXmlDecl)(isGeneralTextEntity, encoding, s, next, &eventPtr, &version, &encodingName, &newEncoding, &standalone)) return XML_ERROR_SYNTAX; if (!isGeneralTextEntity && standalone == 1) { dtd.standalone = 1; #ifdef XML_DTD if (paramEntityParsing == XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE) paramEntityParsing = XML_PARAM_ENTITY_PARSING_NEVER; #endif /* XML_DTD */ } if (defaultHandler) reportDefault(parser, encoding, s, next); if (!protocolEncodingName) { if (newEncoding) { if (newEncoding->minBytesPerChar != encoding->minBytesPerChar) { eventPtr = encodingName; return XML_ERROR_INCORRECT_ENCODING; } encoding = newEncoding; } else if (encodingName) { enum XML_Error result; const XML_Char *s = poolStoreString(&tempPool, encoding, encodingName, encodingName + XmlNameLength(encoding, encodingName)); if (!s) return XML_ERROR_NO_MEMORY; result = handleUnknownEncoding(parser, s); poolDiscard(&tempPool); if (result == XML_ERROR_UNKNOWN_ENCODING) eventPtr = encodingName; return result; } } return XML_ERROR_NONE; } static enum XML_Error handleUnknownEncoding(XML_Parser parser, const XML_Char *encodingName) { if (unknownEncodingHandler) { XML_Encoding info; int i; for (i = 0; i < 256; i++) info.map[i] = -1; info.convert = 0; info.data = 0; info.release = 0; if (unknownEncodingHandler(unknownEncodingHandlerData, encodingName, &info)) { ENCODING *enc; unknownEncodingMem = malloc(XmlSizeOfUnknownEncoding()); if (!unknownEncodingMem) { if (info.release) info.release(info.data); return XML_ERROR_NO_MEMORY; } enc = (ns ? XmlInitUnknownEncodingNS : XmlInitUnknownEncoding)(unknownEncodingMem, info.map, info.convert, info.data); if (enc) { unknownEncodingData = info.data; unknownEncodingRelease = info.release; encoding = enc; return XML_ERROR_NONE; } } if (info.release) info.release(info.data); } return XML_ERROR_UNKNOWN_ENCODING; } static enum XML_Error prologInitProcessor(XML_Parser parser, const char *s, const char *end, const char **nextPtr) { enum XML_Error result = initializeEncoding(parser); if (result != XML_ERROR_NONE) return result; processor = prologProcessor; return prologProcessor(parser, s, end, nextPtr); } static enum XML_Error prologProcessor(XML_Parser parser, const char *s, const char *end, const char **nextPtr) { const char *next; int tok = XmlPrologTok(encoding, s, end, &next); return doProlog(parser, encoding, s, end, tok, next, nextPtr); } static enum XML_Error doProlog(XML_Parser parser, const ENCODING *enc, const char *s, const char *end, int tok, const char *next, const char **nextPtr) { #ifdef XML_DTD static const XML_Char externalSubsetName[] = { '#' , '\0' }; #endif /* XML_DTD */ const char **eventPP; const char **eventEndPP; if (enc == encoding) { eventPP = &eventPtr; eventEndPP = &eventEndPtr; } else { eventPP = &(openInternalEntities->internalEventPtr); eventEndPP = &(openInternalEntities->internalEventEndPtr); } for (;;) { int role; *eventPP = s; *eventEndPP = next; if (tok <= 0) { if (nextPtr != 0 && tok != XML_TOK_INVALID) { *nextPtr = s; return XML_ERROR_NONE; } switch (tok) { case XML_TOK_INVALID: *eventPP = next; return XML_ERROR_INVALID_TOKEN; case XML_TOK_PARTIAL: return XML_ERROR_UNCLOSED_TOKEN; case XML_TOK_PARTIAL_CHAR: return XML_ERROR_PARTIAL_CHAR; case XML_TOK_NONE: #ifdef XML_DTD if (enc != encoding) return XML_ERROR_NONE; if (parentParser) { if (XmlTokenRole(&prologState, XML_TOK_NONE, end, end, enc) == XML_ROLE_ERROR) return XML_ERROR_SYNTAX; hadExternalDoctype = 0; return XML_ERROR_NONE; } #endif /* XML_DTD */ return XML_ERROR_NO_ELEMENTS; default: tok = -tok; next = end; break; } } role = XmlTokenRole(&prologState, tok, s, next, enc); switch (role) { case XML_ROLE_XML_DECL: { enum XML_Error result = processXmlDecl(parser, 0, s, next); if (result != XML_ERROR_NONE) return result; enc = encoding; } break; case XML_ROLE_DOCTYPE_NAME: if (startDoctypeDeclHandler) { const XML_Char *name = poolStoreString(&tempPool, enc, s, next); if (!name) return XML_ERROR_NO_MEMORY; startDoctypeDeclHandler(handlerArg, name); poolClear(&tempPool); } break; #ifdef XML_DTD case XML_ROLE_TEXT_DECL: { enum XML_Error result = processXmlDecl(parser, 1, s, next); if (result != XML_ERROR_NONE) return result; enc = encoding; } break; #endif /* XML_DTD */ case XML_ROLE_DOCTYPE_PUBLIC_ID: #ifdef XML_DTD declEntity = (ENTITY *)lookup(&dtd.paramEntities, externalSubsetName, sizeof(ENTITY)); if (!declEntity) return XML_ERROR_NO_MEMORY; #endif /* XML_DTD */ /* fall through */ case XML_ROLE_ENTITY_PUBLIC_ID: if (!XmlIsPublicId(enc, s, next, eventPP)) return XML_ERROR_SYNTAX; if (declEntity) { XML_Char *tem = poolStoreString(&dtd.pool, enc, s + enc->minBytesPerChar, next - enc->minBytesPerChar); if (!tem) return XML_ERROR_NO_MEMORY; normalizePublicId(tem); declEntity->publicId = tem; poolFinish(&dtd.pool); } break; case XML_ROLE_DOCTYPE_CLOSE: if (dtd.complete && hadExternalDoctype) { dtd.complete = 0; #ifdef XML_DTD if (paramEntityParsing && externalEntityRefHandler) { ENTITY *entity = (ENTITY *)lookup(&dtd.paramEntities, externalSubsetName, 0); if (!externalEntityRefHandler(externalEntityRefHandlerArg, 0, entity->base, entity->systemId, entity->publicId)) return XML_ERROR_EXTERNAL_ENTITY_HANDLING; } #endif /* XML_DTD */ if (!dtd.complete && !dtd.standalone && notStandaloneHandler && !notStandaloneHandler(handlerArg)) return XML_ERROR_NOT_STANDALONE; } if (endDoctypeDeclHandler) endDoctypeDeclHandler(handlerArg); break; case XML_ROLE_INSTANCE_START: processor = contentProcessor; return contentProcessor(parser, s, end, nextPtr); case XML_ROLE_ATTLIST_ELEMENT_NAME: { const XML_Char *name = poolStoreString(&dtd.pool, enc, s, next); if (!name) return XML_ERROR_NO_MEMORY; declElementType = (ELEMENT_TYPE *)lookup(&dtd.elementTypes, name, sizeof(ELEMENT_TYPE)); if (!declElementType) return XML_ERROR_NO_MEMORY; if (declElementType->name != name) poolDiscard(&dtd.pool); else { poolFinish(&dtd.pool); if (!setElementTypePrefix(parser, declElementType)) return XML_ERROR_NO_MEMORY; } break; } case XML_ROLE_ATTRIBUTE_NAME: declAttributeId = getAttributeId(parser, enc, s, next); if (!declAttributeId) return XML_ERROR_NO_MEMORY; declAttributeIsCdata = 0; declAttributeIsId = 0; break; case XML_ROLE_ATTRIBUTE_TYPE_CDATA: declAttributeIsCdata = 1; break; case XML_ROLE_ATTRIBUTE_TYPE_ID: declAttributeIsId = 1; break; case XML_ROLE_IMPLIED_ATTRIBUTE_VALUE: case XML_ROLE_REQUIRED_ATTRIBUTE_VALUE: if (dtd.complete && !defineAttribute(declElementType, declAttributeId, declAttributeIsCdata, declAttributeIsId, 0)) return XML_ERROR_NO_MEMORY; break; case XML_ROLE_DEFAULT_ATTRIBUTE_VALUE: case XML_ROLE_FIXED_ATTRIBUTE_VALUE: { const XML_Char *attVal; enum XML_Error result = storeAttributeValue(parser, enc, declAttributeIsCdata, s + enc->minBytesPerChar, next - enc->minBytesPerChar, &dtd.pool); if (result) return result; attVal = poolStart(&dtd.pool); poolFinish(&dtd.pool); if (dtd.complete // ID attributes aren't allowed to have a default && !defineAttribute(declElementType, declAttributeId, declAttributeIsCdata, 0, attVal)) return XML_ERROR_NO_MEMORY; break; } case XML_ROLE_ENTITY_VALUE: { enum XML_Error result = storeEntityValue(parser, enc, s + enc->minBytesPerChar, next - enc->minBytesPerChar); if (declEntity) { declEntity->textPtr = poolStart(&dtd.pool); declEntity->textLen = poolLength(&dtd.pool); poolFinish(&dtd.pool); if (internalParsedEntityDeclHandler // Check it's not a parameter entity && ((ENTITY *)lookup(&dtd.generalEntities, declEntity->name, 0) == declEntity)) { *eventEndPP = s; internalParsedEntityDeclHandler(handlerArg, declEntity->name, declEntity->textPtr, declEntity->textLen); } } else poolDiscard(&dtd.pool); if (result != XML_ERROR_NONE) return result; } break; case XML_ROLE_DOCTYPE_SYSTEM_ID: if (!dtd.standalone #ifdef XML_DTD && !paramEntityParsing #endif /* XML_DTD */ && notStandaloneHandler && !notStandaloneHandler(handlerArg)) return XML_ERROR_NOT_STANDALONE; hadExternalDoctype = 1; #ifndef XML_DTD break; #else /* XML_DTD */ if (!declEntity) { declEntity = (ENTITY *)lookup(&dtd.paramEntities, externalSubsetName, sizeof(ENTITY)); if (!declEntity) return XML_ERROR_NO_MEMORY; } /* fall through */ #endif /* XML_DTD */ case XML_ROLE_ENTITY_SYSTEM_ID: if (declEntity) { declEntity->systemId = poolStoreString(&dtd.pool, enc, s + enc->minBytesPerChar, next - enc->minBytesPerChar); if (!declEntity->systemId) return XML_ERROR_NO_MEMORY; declEntity->base = curBase; poolFinish(&dtd.pool); } break; case XML_ROLE_ENTITY_NOTATION_NAME: if (declEntity) { declEntity->notation = poolStoreString(&dtd.pool, enc, s, next); if (!declEntity->notation) return XML_ERROR_NO_MEMORY; poolFinish(&dtd.pool); if (unparsedEntityDeclHandler) { *eventEndPP = s; unparsedEntityDeclHandler(handlerArg, declEntity->name, declEntity->base, declEntity->systemId, declEntity->publicId, declEntity->notation); } } break; case XML_ROLE_EXTERNAL_GENERAL_ENTITY_NO_NOTATION: if (declEntity && externalParsedEntityDeclHandler) { *eventEndPP = s; externalParsedEntityDeclHandler(handlerArg, declEntity->name, declEntity->base, declEntity->systemId, declEntity->publicId); } break; case XML_ROLE_GENERAL_ENTITY_NAME: { const XML_Char *name; if (XmlPredefinedEntityName(enc, s, next)) { declEntity = 0; break; } name = poolStoreString(&dtd.pool, enc, s, next); if (!name) return XML_ERROR_NO_MEMORY; if (dtd.complete) { declEntity = (ENTITY *)lookup(&dtd.generalEntities, name, sizeof(ENTITY)); if (!declEntity) return XML_ERROR_NO_MEMORY; if (declEntity->name != name) { poolDiscard(&dtd.pool); declEntity = 0; } else poolFinish(&dtd.pool); } else { poolDiscard(&dtd.pool); declEntity = 0; } } break; case XML_ROLE_PARAM_ENTITY_NAME: #ifdef XML_DTD if (dtd.complete) { const XML_Char *name = poolStoreString(&dtd.pool, enc, s, next); if (!name) return XML_ERROR_NO_MEMORY; declEntity = (ENTITY *)lookup(&dtd.paramEntities, name, sizeof(ENTITY)); if (!declEntity) return XML_ERROR_NO_MEMORY; if (declEntity->name != name) { poolDiscard(&dtd.pool); declEntity = 0; } else poolFinish(&dtd.pool); } #else /* not XML_DTD */ declEntity = 0; #endif /* not XML_DTD */ break; case XML_ROLE_NOTATION_NAME: declNotationPublicId = 0; declNotationName = 0; if (notationDeclHandler) { declNotationName = poolStoreString(&tempPool, enc, s, next); if (!declNotationName) return XML_ERROR_NO_MEMORY; poolFinish(&tempPool); } break; case XML_ROLE_NOTATION_PUBLIC_ID: if (!XmlIsPublicId(enc, s, next, eventPP)) return XML_ERROR_SYNTAX; if (declNotationName) { XML_Char *tem = poolStoreString(&tempPool, enc, s + enc->minBytesPerChar, next - enc->minBytesPerChar); if (!tem) return XML_ERROR_NO_MEMORY; normalizePublicId(tem); declNotationPublicId = tem; poolFinish(&tempPool); } break; case XML_ROLE_NOTATION_SYSTEM_ID: if (declNotationName && notationDeclHandler) { const XML_Char *systemId = poolStoreString(&tempPool, enc, s + enc->minBytesPerChar, next - enc->minBytesPerChar); if (!systemId) return XML_ERROR_NO_MEMORY; *eventEndPP = s; notationDeclHandler(handlerArg, declNotationName, curBase, systemId, declNotationPublicId); } poolClear(&tempPool); break; case XML_ROLE_NOTATION_NO_SYSTEM_ID: if (declNotationPublicId && notationDeclHandler) { *eventEndPP = s; notationDeclHandler(handlerArg, declNotationName, curBase, 0, declNotationPublicId); } poolClear(&tempPool); break; case XML_ROLE_ERROR: switch (tok) { case XML_TOK_PARAM_ENTITY_REF: return XML_ERROR_PARAM_ENTITY_REF; case XML_TOK_XML_DECL: return XML_ERROR_MISPLACED_XML_PI; default: return XML_ERROR_SYNTAX; } #ifdef XML_DTD case XML_ROLE_IGNORE_SECT: { enum XML_Error result; if (defaultHandler) reportDefault(parser, enc, s, next); result = doIgnoreSection(parser, enc, &next, end, nextPtr); if (!next) { processor = ignoreSectionProcessor; return result; } } break; #endif /* XML_DTD */ case XML_ROLE_GROUP_OPEN: if (prologState.level >= groupSize) { if (groupSize) groupConnector = realloc(groupConnector, groupSize *= 2); else groupConnector = malloc(groupSize = 32); if (!groupConnector) return XML_ERROR_NO_MEMORY; } groupConnector[prologState.level] = 0; break; case XML_ROLE_GROUP_SEQUENCE: if (groupConnector[prologState.level] == '|') return XML_ERROR_SYNTAX; groupConnector[prologState.level] = ','; break; case XML_ROLE_GROUP_CHOICE: if (groupConnector[prologState.level] == ',') return XML_ERROR_SYNTAX; groupConnector[prologState.level] = '|'; break; case XML_ROLE_PARAM_ENTITY_REF: #ifdef XML_DTD case XML_ROLE_INNER_PARAM_ENTITY_REF: if (paramEntityParsing && (dtd.complete || role == XML_ROLE_INNER_PARAM_ENTITY_REF)) { const XML_Char *name; ENTITY *entity; name = poolStoreString(&dtd.pool, enc, s + enc->minBytesPerChar, next - enc->minBytesPerChar); if (!name) return XML_ERROR_NO_MEMORY; entity = (ENTITY *)lookup(&dtd.paramEntities, name, 0); poolDiscard(&dtd.pool); if (!entity) { /* FIXME what to do if !dtd.complete? */ return XML_ERROR_UNDEFINED_ENTITY; } if (entity->open) return XML_ERROR_RECURSIVE_ENTITY_REF; if (entity->textPtr) { enum XML_Error result; result = processInternalParamEntity(parser, entity); if (result != XML_ERROR_NONE) return result; break; } if (role == XML_ROLE_INNER_PARAM_ENTITY_REF) return XML_ERROR_PARAM_ENTITY_REF; if (externalEntityRefHandler) { dtd.complete = 0; entity->open = 1; if (!externalEntityRefHandler(externalEntityRefHandlerArg, 0, entity->base, entity->systemId, entity->publicId)) { entity->open = 0; return XML_ERROR_EXTERNAL_ENTITY_HANDLING; } entity->open = 0; if (dtd.complete) break; } } #endif /* XML_DTD */ if (!dtd.standalone && notStandaloneHandler && !notStandaloneHandler(handlerArg)) return XML_ERROR_NOT_STANDALONE; dtd.complete = 0; if (defaultHandler) reportDefault(parser, enc, s, next); break; case XML_ROLE_NONE: switch (tok) { case XML_TOK_PI: if (!reportProcessingInstruction(parser, enc, s, next)) return XML_ERROR_NO_MEMORY; break; case XML_TOK_COMMENT: if (!reportComment(parser, enc, s, next)) return XML_ERROR_NO_MEMORY; break; } break; } if (defaultHandler) { switch (tok) { case XML_TOK_PI: case XML_TOK_COMMENT: case XML_TOK_BOM: case XML_TOK_XML_DECL: #ifdef XML_DTD case XML_TOK_IGNORE_SECT: #endif /* XML_DTD */ case XML_TOK_PARAM_ENTITY_REF: break; default: #ifdef XML_DTD if (role != XML_ROLE_IGNORE_SECT) #endif /* XML_DTD */ reportDefault(parser, enc, s, next); } } s = next; tok = XmlPrologTok(enc, s, end, &next); } /* not reached */ } static enum XML_Error epilogProcessor(XML_Parser parser, const char *s, const char *end, const char **nextPtr) { processor = epilogProcessor; eventPtr = s; for (;;) { const char *next; int tok = XmlPrologTok(encoding, s, end, &next); eventEndPtr = next; switch (tok) { case -XML_TOK_PROLOG_S: if (defaultHandler) { eventEndPtr = end; reportDefault(parser, encoding, s, end); } /* fall through */ case XML_TOK_NONE: if (nextPtr) *nextPtr = end; return XML_ERROR_NONE; case XML_TOK_PROLOG_S: if (defaultHandler) reportDefault(parser, encoding, s, next); break; case XML_TOK_PI: if (!reportProcessingInstruction(parser, encoding, s, next)) return XML_ERROR_NO_MEMORY; break; case XML_TOK_COMMENT: if (!reportComment(parser, encoding, s, next)) return XML_ERROR_NO_MEMORY; break; case XML_TOK_INVALID: eventPtr = next; return XML_ERROR_INVALID_TOKEN; case XML_TOK_PARTIAL: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } return XML_ERROR_UNCLOSED_TOKEN; case XML_TOK_PARTIAL_CHAR: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } return XML_ERROR_PARTIAL_CHAR; default: return XML_ERROR_JUNK_AFTER_DOC_ELEMENT; } eventPtr = s = next; } } #ifdef XML_DTD static enum XML_Error processInternalParamEntity(XML_Parser parser, ENTITY *entity) { const char *s, *end, *next; int tok; enum XML_Error result; OPEN_INTERNAL_ENTITY openEntity; entity->open = 1; openEntity.next = openInternalEntities; openInternalEntities = &openEntity; openEntity.entity = entity; openEntity.internalEventPtr = 0; openEntity.internalEventEndPtr = 0; s = (char *)entity->textPtr; end = (char *)(entity->textPtr + entity->textLen); tok = XmlPrologTok(internalEncoding, s, end, &next); result = doProlog(parser, internalEncoding, s, end, tok, next, 0); entity->open = 0; openInternalEntities = openEntity.next; return result; } #endif /* XML_DTD */ static enum XML_Error errorProcessor(XML_Parser parser, const char *s, const char *end, const char **nextPtr) { return errorCode; } static enum XML_Error storeAttributeValue(XML_Parser parser, const ENCODING *enc, int isCdata, const char *ptr, const char *end, STRING_POOL *pool) { enum XML_Error result = appendAttributeValue(parser, enc, isCdata, ptr, end, pool); if (result) return result; if (!isCdata && poolLength(pool) && poolLastChar(pool) == 0x20) poolChop(pool); if (!poolAppendChar(pool, XML_T('\0'))) return XML_ERROR_NO_MEMORY; return XML_ERROR_NONE; } static enum XML_Error appendAttributeValue(XML_Parser parser, const ENCODING *enc, int isCdata, const char *ptr, const char *end, STRING_POOL *pool) { for (;;) { const char *next; int tok = XmlAttributeValueTok(enc, ptr, end, &next); switch (tok) { case XML_TOK_NONE: return XML_ERROR_NONE; case XML_TOK_INVALID: if (enc == encoding) eventPtr = next; return XML_ERROR_INVALID_TOKEN; case XML_TOK_PARTIAL: if (enc == encoding) eventPtr = ptr; return XML_ERROR_INVALID_TOKEN; case XML_TOK_CHAR_REF: { XML_Char buf[XML_ENCODE_MAX]; int i; int n = XmlCharRefNumber(enc, ptr); if (n < 0) { if (enc == encoding) eventPtr = ptr; return XML_ERROR_BAD_CHAR_REF; } if (!isCdata && n == 0x20 /* space */ && (poolLength(pool) == 0 || poolLastChar(pool) == 0x20)) break; n = XmlEncode(n, (ICHAR *)buf); if (!n) { if (enc == encoding) eventPtr = ptr; return XML_ERROR_BAD_CHAR_REF; } for (i = 0; i < n; i++) { if (!poolAppendChar(pool, buf[i])) return XML_ERROR_NO_MEMORY; } } break; case XML_TOK_DATA_CHARS: if (!poolAppend(pool, enc, ptr, next)) return XML_ERROR_NO_MEMORY; break; break; case XML_TOK_TRAILING_CR: next = ptr + enc->minBytesPerChar; /* fall through */ case XML_TOK_ATTRIBUTE_VALUE_S: case XML_TOK_DATA_NEWLINE: if (!isCdata && (poolLength(pool) == 0 || poolLastChar(pool) == 0x20)) break; if (!poolAppendChar(pool, 0x20)) return XML_ERROR_NO_MEMORY; break; case XML_TOK_ENTITY_REF: { const XML_Char *name; ENTITY *entity; XML_Char ch = XmlPredefinedEntityName(enc, ptr + enc->minBytesPerChar, next - enc->minBytesPerChar); if (ch) { if (!poolAppendChar(pool, ch)) return XML_ERROR_NO_MEMORY; break; } name = poolStoreString(&temp2Pool, enc, ptr + enc->minBytesPerChar, next - enc->minBytesPerChar); if (!name) return XML_ERROR_NO_MEMORY; entity = (ENTITY *)lookup(&dtd.generalEntities, name, 0); poolDiscard(&temp2Pool); if (!entity) { if (dtd.complete) { if (enc == encoding) eventPtr = ptr; return XML_ERROR_UNDEFINED_ENTITY; } } else if (entity->open) { if (enc == encoding) eventPtr = ptr; return XML_ERROR_RECURSIVE_ENTITY_REF; } else if (entity->notation) { if (enc == encoding) eventPtr = ptr; return XML_ERROR_BINARY_ENTITY_REF; } else if (!entity->textPtr) { if (enc == encoding) eventPtr = ptr; return XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF; } else { enum XML_Error result; const XML_Char *textEnd = entity->textPtr + entity->textLen; entity->open = 1; result = appendAttributeValue(parser, internalEncoding, isCdata, (char *)entity->textPtr, (char *)textEnd, pool); entity->open = 0; if (result) return result; } } break; default: abort(); } ptr = next; } /* not reached */ } static enum XML_Error storeEntityValue(XML_Parser parser, const ENCODING *enc, const char *entityTextPtr, const char *entityTextEnd) { STRING_POOL *pool = &(dtd.pool); for (;;) { const char *next; int tok = XmlEntityValueTok(enc, entityTextPtr, entityTextEnd, &next); switch (tok) { case XML_TOK_PARAM_ENTITY_REF: #ifdef XML_DTD if (parentParser || enc != encoding) { enum XML_Error result; const XML_Char *name; ENTITY *entity; name = poolStoreString(&tempPool, enc, entityTextPtr + enc->minBytesPerChar, next - enc->minBytesPerChar); if (!name) return XML_ERROR_NO_MEMORY; entity = (ENTITY *)lookup(&dtd.paramEntities, name, 0); poolDiscard(&tempPool); if (!entity) { if (enc == encoding) eventPtr = entityTextPtr; return XML_ERROR_UNDEFINED_ENTITY; } if (entity->open) { if (enc == encoding) eventPtr = entityTextPtr; return XML_ERROR_RECURSIVE_ENTITY_REF; } if (entity->systemId) { if (enc == encoding) eventPtr = entityTextPtr; return XML_ERROR_PARAM_ENTITY_REF; } entity->open = 1; result = storeEntityValue(parser, internalEncoding, (char *)entity->textPtr, (char *)(entity->textPtr + entity->textLen)); entity->open = 0; if (result) return result; break; } #endif /* XML_DTD */ eventPtr = entityTextPtr; return XML_ERROR_SYNTAX; case XML_TOK_NONE: return XML_ERROR_NONE; case XML_TOK_ENTITY_REF: case XML_TOK_DATA_CHARS: if (!poolAppend(pool, enc, entityTextPtr, next)) return XML_ERROR_NO_MEMORY; break; case XML_TOK_TRAILING_CR: next = entityTextPtr + enc->minBytesPerChar; /* fall through */ case XML_TOK_DATA_NEWLINE: if (pool->end == pool->ptr && !poolGrow(pool)) return XML_ERROR_NO_MEMORY; *(pool->ptr)++ = 0xA; break; case XML_TOK_CHAR_REF: { XML_Char buf[XML_ENCODE_MAX]; int i; int n = XmlCharRefNumber(enc, entityTextPtr); if (n < 0) { if (enc == encoding) eventPtr = entityTextPtr; return XML_ERROR_BAD_CHAR_REF; } n = XmlEncode(n, (ICHAR *)buf); if (!n) { if (enc == encoding) eventPtr = entityTextPtr; return XML_ERROR_BAD_CHAR_REF; } for (i = 0; i < n; i++) { if (pool->end == pool->ptr && !poolGrow(pool)) return XML_ERROR_NO_MEMORY; *(pool->ptr)++ = buf[i]; } } break; case XML_TOK_PARTIAL: if (enc == encoding) eventPtr = entityTextPtr; return XML_ERROR_INVALID_TOKEN; case XML_TOK_INVALID: if (enc == encoding) eventPtr = next; return XML_ERROR_INVALID_TOKEN; default: abort(); } entityTextPtr = next; } /* not reached */ } static void normalizeLines(XML_Char *s) { XML_Char *p; for (;; s++) { if (*s == XML_T('\0')) return; if (*s == 0xD) break; } p = s; do { if (*s == 0xD) { *p++ = 0xA; if (*++s == 0xA) s++; } else *p++ = *s++; } while (*s); *p = XML_T('\0'); } static int reportProcessingInstruction(XML_Parser parser, const ENCODING *enc, const char *start, const char *end) { const XML_Char *target; XML_Char *data; const char *tem; if (!processingInstructionHandler) { if (defaultHandler) reportDefault(parser, enc, start, end); return 1; } start += enc->minBytesPerChar * 2; tem = start + XmlNameLength(enc, start); target = poolStoreString(&tempPool, enc, start, tem); if (!target) return 0; poolFinish(&tempPool); data = poolStoreString(&tempPool, enc, XmlSkipS(enc, tem), end - enc->minBytesPerChar*2); if (!data) return 0; normalizeLines(data); processingInstructionHandler(handlerArg, target, data); poolClear(&tempPool); return 1; } static int reportComment(XML_Parser parser, const ENCODING *enc, const char *start, const char *end) { XML_Char *data; if (!commentHandler) { if (defaultHandler) reportDefault(parser, enc, start, end); return 1; } data = poolStoreString(&tempPool, enc, start + enc->minBytesPerChar * 4, end - enc->minBytesPerChar * 3); if (!data) return 0; normalizeLines(data); commentHandler(handlerArg, data); poolClear(&tempPool); return 1; } static void reportDefault(XML_Parser parser, const ENCODING *enc, const char *s, const char *end) { if (MUST_CONVERT(enc, s)) { const char **eventPP; const char **eventEndPP; if (enc == encoding) { eventPP = &eventPtr; eventEndPP = &eventEndPtr; } else { eventPP = &(openInternalEntities->internalEventPtr); eventEndPP = &(openInternalEntities->internalEventEndPtr); } do { ICHAR *dataPtr = (ICHAR *)dataBuf; XmlConvert(enc, &s, end, &dataPtr, (ICHAR *)dataBufEnd); *eventEndPP = s; defaultHandler(handlerArg, dataBuf, dataPtr - (ICHAR *)dataBuf); *eventPP = s; } while (s != end); } else defaultHandler(handlerArg, (XML_Char *)s, (XML_Char *)end - (XML_Char *)s); } static int defineAttribute(ELEMENT_TYPE *type, ATTRIBUTE_ID *attId, int isCdata, int isId, const XML_Char *value) { DEFAULT_ATTRIBUTE *att; if (value || isId) { /* The handling of default attributes gets messed up if we have a default which duplicates a non-default. */ int i; for (i = 0; i < type->nDefaultAtts; i++) if (attId == type->defaultAtts[i].id) return 1; if (isId && !type->idAtt && !attId->xmlns) type->idAtt = attId; } if (type->nDefaultAtts == type->allocDefaultAtts) { if (type->allocDefaultAtts == 0) { type->allocDefaultAtts = 8; type->defaultAtts = malloc(type->allocDefaultAtts*sizeof(DEFAULT_ATTRIBUTE)); } else { type->allocDefaultAtts *= 2; type->defaultAtts = realloc(type->defaultAtts, type->allocDefaultAtts*sizeof(DEFAULT_ATTRIBUTE)); } if (!type->defaultAtts) return 0; } att = type->defaultAtts + type->nDefaultAtts; att->id = attId; att->value = value; att->isCdata = isCdata; if (!isCdata) attId->maybeTokenized = 1; type->nDefaultAtts += 1; return 1; } static int setElementTypePrefix(XML_Parser parser, ELEMENT_TYPE *elementType) { const XML_Char *name; for (name = elementType->name; *name; name++) { if (*name == XML_T(':')) { PREFIX *prefix; const XML_Char *s; for (s = elementType->name; s != name; s++) { if (!poolAppendChar(&dtd.pool, *s)) return 0; } if (!poolAppendChar(&dtd.pool, XML_T('\0'))) return 0; prefix = (PREFIX *)lookup(&dtd.prefixes, poolStart(&dtd.pool), sizeof(PREFIX)); if (!prefix) return 0; if (prefix->name == poolStart(&dtd.pool)) poolFinish(&dtd.pool); else poolDiscard(&dtd.pool); elementType->prefix = prefix; } } return 1; } static ATTRIBUTE_ID * getAttributeId(XML_Parser parser, const ENCODING *enc, const char *start, const char *end) { ATTRIBUTE_ID *id; const XML_Char *name; if (!poolAppendChar(&dtd.pool, XML_T('\0'))) return 0; name = poolStoreString(&dtd.pool, enc, start, end); if (!name) return 0; ++name; id = (ATTRIBUTE_ID *)lookup(&dtd.attributeIds, name, sizeof(ATTRIBUTE_ID)); if (!id) return 0; if (id->name != name) poolDiscard(&dtd.pool); else { poolFinish(&dtd.pool); if (!ns) ; else if (name[0] == 'x' && name[1] == 'm' && name[2] == 'l' && name[3] == 'n' && name[4] == 's' && (name[5] == XML_T('\0') || name[5] == XML_T(':'))) { if (name[5] == '\0') id->prefix = &dtd.defaultPrefix; else id->prefix = (PREFIX *)lookup(&dtd.prefixes, name + 6, sizeof(PREFIX)); id->xmlns = 1; } else { int i; for (i = 0; name[i]; i++) { if (name[i] == XML_T(':')) { int j; for (j = 0; j < i; j++) { if (!poolAppendChar(&dtd.pool, name[j])) return 0; } if (!poolAppendChar(&dtd.pool, XML_T('\0'))) return 0; id->prefix = (PREFIX *)lookup(&dtd.prefixes, poolStart(&dtd.pool), sizeof(PREFIX)); if (id->prefix->name == poolStart(&dtd.pool)) poolFinish(&dtd.pool); else poolDiscard(&dtd.pool); break; } } } } return id; } #define CONTEXT_SEP XML_T('\f') static const XML_Char *getContext(XML_Parser parser) { HASH_TABLE_ITER iter; int needSep = 0; if (dtd.defaultPrefix.binding) { int i; int len; if (!poolAppendChar(&tempPool, XML_T('='))) return 0; len = dtd.defaultPrefix.binding->uriLen; if (namespaceSeparator != XML_T('\0')) len--; for (i = 0; i < len; i++) if (!poolAppendChar(&tempPool, dtd.defaultPrefix.binding->uri[i])) return 0; needSep = 1; } hashTableIterInit(&iter, &(dtd.prefixes)); for (;;) { int i; int len; const XML_Char *s; PREFIX *prefix = (PREFIX *)hashTableIterNext(&iter); if (!prefix) break; if (!prefix->binding) continue; if (needSep && !poolAppendChar(&tempPool, CONTEXT_SEP)) return 0; for (s = prefix->name; *s; s++) if (!poolAppendChar(&tempPool, *s)) return 0; if (!poolAppendChar(&tempPool, XML_T('='))) return 0; len = prefix->binding->uriLen; if (namespaceSeparator != XML_T('\0')) len--; for (i = 0; i < len; i++) if (!poolAppendChar(&tempPool, prefix->binding->uri[i])) return 0; needSep = 1; } hashTableIterInit(&iter, &(dtd.generalEntities)); for (;;) { const XML_Char *s; ENTITY *e = (ENTITY *)hashTableIterNext(&iter); if (!e) break; if (!e->open) continue; if (needSep && !poolAppendChar(&tempPool, CONTEXT_SEP)) return 0; for (s = e->name; *s; s++) if (!poolAppendChar(&tempPool, *s)) return 0; needSep = 1; } if (!poolAppendChar(&tempPool, XML_T('\0'))) return 0; return tempPool.start; } static int setContext(XML_Parser parser, const XML_Char *context) { const XML_Char *s = context; while (*context != XML_T('\0')) { if (*s == CONTEXT_SEP || *s == XML_T('\0')) { ENTITY *e; if (!poolAppendChar(&tempPool, XML_T('\0'))) return 0; e = (ENTITY *)lookup(&dtd.generalEntities, poolStart(&tempPool), 0); if (e) e->open = 1; if (*s != XML_T('\0')) s++; context = s; poolDiscard(&tempPool); } else if (*s == '=') { PREFIX *prefix; if (poolLength(&tempPool) == 0) prefix = &dtd.defaultPrefix; else { if (!poolAppendChar(&tempPool, XML_T('\0'))) return 0; prefix = (PREFIX *)lookup(&dtd.prefixes, poolStart(&tempPool), sizeof(PREFIX)); if (!prefix) return 0; if (prefix->name == poolStart(&tempPool)) { prefix->name = poolCopyString(&dtd.pool, prefix->name); if (!prefix->name) return 0; } poolDiscard(&tempPool); } for (context = s + 1; *context != CONTEXT_SEP && *context != XML_T('\0'); context++) if (!poolAppendChar(&tempPool, *context)) return 0; if (!poolAppendChar(&tempPool, XML_T('\0'))) return 0; if (!addBinding(parser, prefix, 0, poolStart(&tempPool), &inheritedBindings)) return 0; poolDiscard(&tempPool); if (*context != XML_T('\0')) ++context; s = context; } else { if (!poolAppendChar(&tempPool, *s)) return 0; s++; } } return 1; } static void normalizePublicId(XML_Char *publicId) { XML_Char *p = publicId; XML_Char *s; for (s = publicId; *s; s++) { switch (*s) { case 0x20: case 0xD: case 0xA: if (p != publicId && p[-1] != 0x20) *p++ = 0x20; break; default: *p++ = *s; } } if (p != publicId && p[-1] == 0x20) --p; *p = XML_T('\0'); } static int dtdInit(DTD *p) { poolInit(&(p->pool)); hashTableInit(&(p->generalEntities)); hashTableInit(&(p->elementTypes)); hashTableInit(&(p->attributeIds)); hashTableInit(&(p->prefixes)); p->complete = 1; p->standalone = 0; #ifdef XML_DTD hashTableInit(&(p->paramEntities)); #endif /* XML_DTD */ p->defaultPrefix.name = 0; p->defaultPrefix.binding = 0; return 1; } #ifdef XML_DTD static void dtdSwap(DTD *p1, DTD *p2) { DTD tem; memcpy(&tem, p1, sizeof(DTD)); memcpy(p1, p2, sizeof(DTD)); memcpy(p2, &tem, sizeof(DTD)); } #endif /* XML_DTD */ static void dtdDestroy(DTD *p) { HASH_TABLE_ITER iter; hashTableIterInit(&iter, &(p->elementTypes)); for (;;) { ELEMENT_TYPE *e = (ELEMENT_TYPE *)hashTableIterNext(&iter); if (!e) break; if (e->allocDefaultAtts != 0) free(e->defaultAtts); } hashTableDestroy(&(p->generalEntities)); #ifdef XML_DTD hashTableDestroy(&(p->paramEntities)); #endif /* XML_DTD */ hashTableDestroy(&(p->elementTypes)); hashTableDestroy(&(p->attributeIds)); hashTableDestroy(&(p->prefixes)); poolDestroy(&(p->pool)); } /* Do a deep copy of the DTD. Return 0 for out of memory; non-zero otherwise. The new DTD has already been initialized. */ static int dtdCopy(DTD *newDtd, const DTD *oldDtd) { HASH_TABLE_ITER iter; /* Copy the prefix table. */ hashTableIterInit(&iter, &(oldDtd->prefixes)); for (;;) { const XML_Char *name; const PREFIX *oldP = (PREFIX *)hashTableIterNext(&iter); if (!oldP) break; name = poolCopyString(&(newDtd->pool), oldP->name); if (!name) return 0; if (!lookup(&(newDtd->prefixes), name, sizeof(PREFIX))) return 0; } hashTableIterInit(&iter, &(oldDtd->attributeIds)); /* Copy the attribute id table. */ for (;;) { ATTRIBUTE_ID *newA; const XML_Char *name; const ATTRIBUTE_ID *oldA = (ATTRIBUTE_ID *)hashTableIterNext(&iter); if (!oldA) break; /* Remember to allocate the scratch byte before the name. */ if (!poolAppendChar(&(newDtd->pool), XML_T('\0'))) return 0; name = poolCopyString(&(newDtd->pool), oldA->name); if (!name) return 0; ++name; newA = (ATTRIBUTE_ID *)lookup(&(newDtd->attributeIds), name, sizeof(ATTRIBUTE_ID)); if (!newA) return 0; newA->maybeTokenized = oldA->maybeTokenized; if (oldA->prefix) { newA->xmlns = oldA->xmlns; if (oldA->prefix == &oldDtd->defaultPrefix) newA->prefix = &newDtd->defaultPrefix; else newA->prefix = (PREFIX *)lookup(&(newDtd->prefixes), oldA->prefix->name, 0); } } /* Copy the element type table. */ hashTableIterInit(&iter, &(oldDtd->elementTypes)); for (;;) { int i; ELEMENT_TYPE *newE; const XML_Char *name; const ELEMENT_TYPE *oldE = (ELEMENT_TYPE *)hashTableIterNext(&iter); if (!oldE) break; name = poolCopyString(&(newDtd->pool), oldE->name); if (!name) return 0; newE = (ELEMENT_TYPE *)lookup(&(newDtd->elementTypes), name, sizeof(ELEMENT_TYPE)); if (!newE) return 0; if (oldE->nDefaultAtts) { newE->defaultAtts = (DEFAULT_ATTRIBUTE *)malloc(oldE->nDefaultAtts * sizeof(DEFAULT_ATTRIBUTE)); if (!newE->defaultAtts) return 0; } if (oldE->idAtt) newE->idAtt = (ATTRIBUTE_ID *)lookup(&(newDtd->attributeIds), oldE->idAtt->name, 0); newE->allocDefaultAtts = newE->nDefaultAtts = oldE->nDefaultAtts; if (oldE->prefix) newE->prefix = (PREFIX *)lookup(&(newDtd->prefixes), oldE->prefix->name, 0); for (i = 0; i < newE->nDefaultAtts; i++) { newE->defaultAtts[i].id = (ATTRIBUTE_ID *)lookup(&(newDtd->attributeIds), oldE->defaultAtts[i].id->name, 0); newE->defaultAtts[i].isCdata = oldE->defaultAtts[i].isCdata; if (oldE->defaultAtts[i].value) { newE->defaultAtts[i].value = poolCopyString(&(newDtd->pool), oldE->defaultAtts[i].value); if (!newE->defaultAtts[i].value) return 0; } else newE->defaultAtts[i].value = 0; } } /* Copy the entity tables. */ if (!copyEntityTable(&(newDtd->generalEntities), &(newDtd->pool), &(oldDtd->generalEntities))) return 0; #ifdef XML_DTD if (!copyEntityTable(&(newDtd->paramEntities), &(newDtd->pool), &(oldDtd->paramEntities))) return 0; #endif /* XML_DTD */ newDtd->complete = oldDtd->complete; newDtd->standalone = oldDtd->standalone; return 1; } static int copyEntityTable(HASH_TABLE *newTable, STRING_POOL *newPool, const HASH_TABLE *oldTable) { HASH_TABLE_ITER iter; const XML_Char *cachedOldBase = 0; const XML_Char *cachedNewBase = 0; hashTableIterInit(&iter, oldTable); for (;;) { ENTITY *newE; const XML_Char *name; const ENTITY *oldE = (ENTITY *)hashTableIterNext(&iter); if (!oldE) break; name = poolCopyString(newPool, oldE->name); if (!name) return 0; newE = (ENTITY *)lookup(newTable, name, sizeof(ENTITY)); if (!newE) return 0; if (oldE->systemId) { const XML_Char *tem = poolCopyString(newPool, oldE->systemId); if (!tem) return 0; newE->systemId = tem; if (oldE->base) { if (oldE->base == cachedOldBase) newE->base = cachedNewBase; else { cachedOldBase = oldE->base; tem = poolCopyString(newPool, cachedOldBase); if (!tem) return 0; cachedNewBase = newE->base = tem; } } } else { const XML_Char *tem = poolCopyStringN(newPool, oldE->textPtr, oldE->textLen); if (!tem) return 0; newE->textPtr = tem; newE->textLen = oldE->textLen; } if (oldE->notation) { const XML_Char *tem = poolCopyString(newPool, oldE->notation); if (!tem) return 0; newE->notation = tem; } } return 1; } #define INIT_SIZE 64 static int keyeq(KEY s1, KEY s2) { for (; *s1 == *s2; s1++, s2++) if (*s1 == 0) return 1; return 0; } static unsigned long hash(KEY s) { unsigned long h = 0; while (*s) h = (h << 5) + h + (unsigned char)*s++; return h; } static NAMED *lookup(HASH_TABLE *table, KEY name, size_t createSize) { size_t i; if (table->size == 0) { if (!createSize) return 0; table->v = calloc(INIT_SIZE, sizeof(NAMED *)); if (!table->v) return 0; table->size = INIT_SIZE; table->usedLim = INIT_SIZE / 2; i = hash(name) & (table->size - 1); } else { unsigned long h = hash(name); for (i = h & (table->size - 1); table->v[i]; i == 0 ? i = table->size - 1 : --i) { if (keyeq(name, table->v[i]->name)) return table->v[i]; } if (!createSize) return 0; if (table->used == table->usedLim) { /* check for overflow */ size_t newSize = table->size * 2; NAMED **newV = calloc(newSize, sizeof(NAMED *)); if (!newV) return 0; for (i = 0; i < table->size; i++) if (table->v[i]) { size_t j; for (j = hash(table->v[i]->name) & (newSize - 1); newV[j]; j == 0 ? j = newSize - 1 : --j) ; newV[j] = table->v[i]; } free(table->v); table->v = newV; table->size = newSize; table->usedLim = newSize/2; for (i = h & (table->size - 1); table->v[i]; i == 0 ? i = table->size - 1 : --i) ; } } table->v[i] = calloc(1, createSize); if (!table->v[i]) return 0; table->v[i]->name = name; (table->used)++; return table->v[i]; } static void hashTableDestroy(HASH_TABLE *table) { size_t i; for (i = 0; i < table->size; i++) { NAMED *p = table->v[i]; if (p) free(p); } if (table->v) free(table->v); } static void hashTableInit(HASH_TABLE *p) { p->size = 0; p->usedLim = 0; p->used = 0; p->v = 0; } static void hashTableIterInit(HASH_TABLE_ITER *iter, const HASH_TABLE *table) { iter->p = table->v; iter->end = iter->p + table->size; } static NAMED *hashTableIterNext(HASH_TABLE_ITER *iter) { while (iter->p != iter->end) { NAMED *tem = *(iter->p)++; if (tem) return tem; } return 0; } static void poolInit(STRING_POOL *pool) { pool->blocks = 0; pool->freeBlocks = 0; pool->start = 0; pool->ptr = 0; pool->end = 0; } static void poolClear(STRING_POOL *pool) { if (!pool->freeBlocks) pool->freeBlocks = pool->blocks; else { BLOCK *p = pool->blocks; while (p) { BLOCK *tem = p->next; p->next = pool->freeBlocks; pool->freeBlocks = p; p = tem; } } pool->blocks = 0; pool->start = 0; pool->ptr = 0; pool->end = 0; } static void poolDestroy(STRING_POOL *pool) { BLOCK *p = pool->blocks; while (p) { BLOCK *tem = p->next; free(p); p = tem; } pool->blocks = 0; p = pool->freeBlocks; while (p) { BLOCK *tem = p->next; free(p); p = tem; } pool->freeBlocks = 0; pool->ptr = 0; pool->start = 0; pool->end = 0; } static XML_Char *poolAppend(STRING_POOL *pool, const ENCODING *enc, const char *ptr, const char *end) { if (!pool->ptr && !poolGrow(pool)) return 0; for (;;) { XmlConvert(enc, &ptr, end, (ICHAR **)&(pool->ptr), (ICHAR *)pool->end); if (ptr == end) break; if (!poolGrow(pool)) return 0; } return pool->start; } static const XML_Char *poolCopyString(STRING_POOL *pool, const XML_Char *s) { do { if (!poolAppendChar(pool, *s)) return 0; } while (*s++); s = pool->start; poolFinish(pool); return s; } static const XML_Char *poolCopyStringN(STRING_POOL *pool, const XML_Char *s, int n) { if (!pool->ptr && !poolGrow(pool)) return 0; for (; n > 0; --n, s++) { if (!poolAppendChar(pool, *s)) return 0; } s = pool->start; poolFinish(pool); return s; } static XML_Char *poolStoreString(STRING_POOL *pool, const ENCODING *enc, const char *ptr, const char *end) { if (!poolAppend(pool, enc, ptr, end)) return 0; if (pool->ptr == pool->end && !poolGrow(pool)) return 0; *(pool->ptr)++ = 0; return pool->start; } static int poolGrow(STRING_POOL *pool) { if (pool->freeBlocks) { if (pool->start == 0) { pool->blocks = pool->freeBlocks; pool->freeBlocks = pool->freeBlocks->next; pool->blocks->next = 0; pool->start = pool->blocks->s; pool->end = pool->start + pool->blocks->size; pool->ptr = pool->start; return 1; } if (pool->end - pool->start < pool->freeBlocks->size) { BLOCK *tem = pool->freeBlocks->next; pool->freeBlocks->next = pool->blocks; pool->blocks = pool->freeBlocks; pool->freeBlocks = tem; memcpy(pool->blocks->s, pool->start, (pool->end - pool->start) * sizeof(XML_Char)); pool->ptr = pool->blocks->s + (pool->ptr - pool->start); pool->start = pool->blocks->s; pool->end = pool->start + pool->blocks->size; return 1; } } if (pool->blocks && pool->start == pool->blocks->s) { int blockSize = (pool->end - pool->start)*2; pool->blocks = realloc(pool->blocks, offsetof(BLOCK, s) + blockSize * sizeof(XML_Char)); if (!pool->blocks) return 0; pool->blocks->size = blockSize; pool->ptr = pool->blocks->s + (pool->ptr - pool->start); pool->start = pool->blocks->s; pool->end = pool->start + blockSize; } else { BLOCK *tem; int blockSize = pool->end - pool->start; if (blockSize < INIT_BLOCK_SIZE) blockSize = INIT_BLOCK_SIZE; else blockSize *= 2; tem = malloc(offsetof(BLOCK, s) + blockSize * sizeof(XML_Char)); if (!tem) return 0; tem->size = blockSize; tem->next = pool->blocks; pool->blocks = tem; if (pool->ptr != pool->start) memcpy(tem->s, pool->start, (pool->ptr - pool->start) * sizeof(XML_Char)); pool->ptr = tem->s + (pool->ptr - pool->start); pool->start = tem->s; pool->end = tem->s + blockSize; } return 1; } swish-e-2.4.7/src/expat/xmlparse/xmlparse.h0000775000077100017500000004704411166010106015603 00000000000000/* Copyright (c) 1998, 1999, 2000 Thai Open Source Software Center Ltd See the file copying.txt for copying permission. */ #ifndef XmlParse_INCLUDED #define XmlParse_INCLUDED 1 #ifdef __cplusplus extern "C" { #endif #ifndef XMLPARSEAPI #define XMLPARSEAPI /* as nothing */ #endif typedef void *XML_Parser; #ifdef XML_UNICODE_WCHAR_T /* XML_UNICODE_WCHAR_T will work only if sizeof(wchar_t) == 2 and wchar_t uses Unicode. */ /* Information is UTF-16 encoded as wchar_ts */ #ifndef XML_UNICODE #define XML_UNICODE #endif #include typedef wchar_t XML_Char; typedef wchar_t XML_LChar; #else /* not XML_UNICODE_WCHAR_T */ #ifdef XML_UNICODE /* Information is UTF-16 encoded as unsigned shorts */ typedef unsigned short XML_Char; typedef char XML_LChar; #else /* not XML_UNICODE */ /* Information is UTF-8 encoded. */ typedef char XML_Char; typedef char XML_LChar; #endif /* not XML_UNICODE */ #endif /* not XML_UNICODE_WCHAR_T */ /* Constructs a new parser; encoding is the encoding specified by the external protocol or null if there is none specified. */ XML_Parser XMLPARSEAPI XML_ParserCreate(const XML_Char *encoding); /* Constructs a new parser and namespace processor. Element type names and attribute names that belong to a namespace will be expanded; unprefixed attribute names are never expanded; unprefixed element type names are expanded only if there is a default namespace. The expanded name is the concatenation of the namespace URI, the namespace separator character, and the local part of the name. If the namespace separator is '\0' then the namespace URI and the local part will be concatenated without any separator. When a namespace is not declared, the name and prefix will be passed through without expansion. */ XML_Parser XMLPARSEAPI XML_ParserCreateNS(const XML_Char *encoding, XML_Char namespaceSeparator); /* atts is array of name/value pairs, terminated by 0; names and values are 0 terminated. */ typedef void (*XML_StartElementHandler)(void *userData, const XML_Char *name, const XML_Char **atts); typedef void (*XML_EndElementHandler)(void *userData, const XML_Char *name); /* s is not 0 terminated. */ typedef void (*XML_CharacterDataHandler)(void *userData, const XML_Char *s, int len); /* target and data are 0 terminated */ typedef void (*XML_ProcessingInstructionHandler)(void *userData, const XML_Char *target, const XML_Char *data); /* data is 0 terminated */ typedef void (*XML_CommentHandler)(void *userData, const XML_Char *data); typedef void (*XML_StartCdataSectionHandler)(void *userData); typedef void (*XML_EndCdataSectionHandler)(void *userData); /* This is called for any characters in the XML document for which there is no applicable handler. This includes both characters that are part of markup which is of a kind that is not reported (comments, markup declarations), or characters that are part of a construct which could be reported but for which no handler has been supplied. The characters are passed exactly as they were in the XML document except that they will be encoded in UTF-8. Line boundaries are not normalized. Note that a byte order mark character is not passed to the default handler. There are no guarantees about how characters are divided between calls to the default handler: for example, a comment might be split between multiple calls. */ typedef void (*XML_DefaultHandler)(void *userData, const XML_Char *s, int len); /* This is called for the start of the DOCTYPE declaration when the name of the DOCTYPE is encountered. */ typedef void (*XML_StartDoctypeDeclHandler)(void *userData, const XML_Char *doctypeName); /* This is called for the start of the DOCTYPE declaration when the closing > is encountered, but after processing any external subset. */ typedef void (*XML_EndDoctypeDeclHandler)(void *userData); /* This is called for a declaration of an unparsed (NDATA) entity. The base argument is whatever was set by XML_SetBase. The entityName, systemId and notationName arguments will never be null. The other arguments may be. */ typedef void (*XML_UnparsedEntityDeclHandler)(void *userData, const XML_Char *entityName, const XML_Char *base, const XML_Char *systemId, const XML_Char *publicId, const XML_Char *notationName); /* This is called for a declaration of notation. The base argument is whatever was set by XML_SetBase. The notationName will never be null. The other arguments can be. */ typedef void (*XML_NotationDeclHandler)(void *userData, const XML_Char *notationName, const XML_Char *base, const XML_Char *systemId, const XML_Char *publicId); typedef void (*XML_ExternalParsedEntityDeclHandler)(void *userData, const XML_Char *entityName, const XML_Char *base, const XML_Char *systemId, const XML_Char *publicId); typedef void (*XML_InternalParsedEntityDeclHandler)(void *userData, const XML_Char *entityName, const XML_Char *replacementText, int replacementTextLength); /* When namespace processing is enabled, these are called once for each namespace declaration. The call to the start and end element handlers occur between the calls to the start and end namespace declaration handlers. For an xmlns attribute, prefix will be null. For an xmlns="" attribute, uri will be null. */ typedef void (*XML_StartNamespaceDeclHandler)(void *userData, const XML_Char *prefix, const XML_Char *uri); typedef void (*XML_EndNamespaceDeclHandler)(void *userData, const XML_Char *prefix); /* This is called if the document is not standalone (it has an external subset or a reference to a parameter entity, but does not have standalone="yes"). If this handler returns 0, then processing will not continue, and the parser will return a XML_ERROR_NOT_STANDALONE error. */ typedef int (*XML_NotStandaloneHandler)(void *userData); /* This is called for a reference to an external parsed general entity. The referenced entity is not automatically parsed. The application can parse it immediately or later using XML_ExternalEntityParserCreate. The parser argument is the parser parsing the entity containing the reference; it can be passed as the parser argument to XML_ExternalEntityParserCreate. The systemId argument is the system identifier as specified in the entity declaration; it will not be null. The base argument is the system identifier that should be used as the base for resolving systemId if systemId was relative; this is set by XML_SetBase; it may be null. The publicId argument is the public identifier as specified in the entity declaration, or null if none was specified; the whitespace in the public identifier will have been normalized as required by the XML spec. The context argument specifies the parsing context in the format expected by the context argument to XML_ExternalEntityParserCreate; context is valid only until the handler returns, so if the referenced entity is to be parsed later, it must be copied. The handler should return 0 if processing should not continue because of a fatal error in the handling of the external entity. In this case the calling parser will return an XML_ERROR_EXTERNAL_ENTITY_HANDLING error. Note that unlike other handlers the first argument is the parser, not userData. */ typedef int (*XML_ExternalEntityRefHandler)(XML_Parser parser, const XML_Char *context, const XML_Char *base, const XML_Char *systemId, const XML_Char *publicId); /* This structure is filled in by the XML_UnknownEncodingHandler to provide information to the parser about encodings that are unknown to the parser. The map[b] member gives information about byte sequences whose first byte is b. If map[b] is c where c is >= 0, then b by itself encodes the Unicode scalar value c. If map[b] is -1, then the byte sequence is malformed. If map[b] is -n, where n >= 2, then b is the first byte of an n-byte sequence that encodes a single Unicode scalar value. The data member will be passed as the first argument to the convert function. The convert function is used to convert multibyte sequences; s will point to a n-byte sequence where map[(unsigned char)*s] == -n. The convert function must return the Unicode scalar value represented by this byte sequence or -1 if the byte sequence is malformed. The convert function may be null if the encoding is a single-byte encoding, that is if map[b] >= -1 for all bytes b. When the parser is finished with the encoding, then if release is not null, it will call release passing it the data member; once release has been called, the convert function will not be called again. Expat places certain restrictions on the encodings that are supported using this mechanism. 1. Every ASCII character that can appear in a well-formed XML document, other than the characters $@\^`{}~ must be represented by a single byte, and that byte must be the same byte that represents that character in ASCII. 2. No character may require more than 4 bytes to encode. 3. All characters encoded must have Unicode scalar values <= 0xFFFF, (ie characters that would be encoded by surrogates in UTF-16 are not allowed). Note that this restriction doesn't apply to the built-in support for UTF-8 and UTF-16. 4. No Unicode character may be encoded by more than one distinct sequence of bytes. */ typedef struct { int map[256]; void *data; int (*convert)(void *data, const char *s); void (*release)(void *data); } XML_Encoding; /* This is called for an encoding that is unknown to the parser. The encodingHandlerData argument is that which was passed as the second argument to XML_SetUnknownEncodingHandler. The name argument gives the name of the encoding as specified in the encoding declaration. If the callback can provide information about the encoding, it must fill in the XML_Encoding structure, and return 1. Otherwise it must return 0. If info does not describe a suitable encoding, then the parser will return an XML_UNKNOWN_ENCODING error. */ typedef int (*XML_UnknownEncodingHandler)(void *encodingHandlerData, const XML_Char *name, XML_Encoding *info); void XMLPARSEAPI XML_SetElementHandler(XML_Parser parser, XML_StartElementHandler start, XML_EndElementHandler end); void XMLPARSEAPI XML_SetCharacterDataHandler(XML_Parser parser, XML_CharacterDataHandler handler); void XMLPARSEAPI XML_SetProcessingInstructionHandler(XML_Parser parser, XML_ProcessingInstructionHandler handler); void XMLPARSEAPI XML_SetCommentHandler(XML_Parser parser, XML_CommentHandler handler); void XMLPARSEAPI XML_SetCdataSectionHandler(XML_Parser parser, XML_StartCdataSectionHandler start, XML_EndCdataSectionHandler end); /* This sets the default handler and also inhibits expansion of internal entities. The entity reference will be passed to the default handler. */ void XMLPARSEAPI XML_SetDefaultHandler(XML_Parser parser, XML_DefaultHandler handler); /* This sets the default handler but does not inhibit expansion of internal entities. The entity reference will not be passed to the default handler. */ void XMLPARSEAPI XML_SetDefaultHandlerExpand(XML_Parser parser, XML_DefaultHandler handler); void XMLPARSEAPI XML_SetDoctypeDeclHandler(XML_Parser parser, XML_StartDoctypeDeclHandler start, XML_EndDoctypeDeclHandler end); void XMLPARSEAPI XML_SetUnparsedEntityDeclHandler(XML_Parser parser, XML_UnparsedEntityDeclHandler handler); void XMLPARSEAPI XML_SetNotationDeclHandler(XML_Parser parser, XML_NotationDeclHandler handler); void XMLPARSEAPI XML_SetExternalParsedEntityDeclHandler(XML_Parser parser, XML_ExternalParsedEntityDeclHandler handler); void XMLPARSEAPI XML_SetInternalParsedEntityDeclHandler(XML_Parser parser, XML_InternalParsedEntityDeclHandler handler); void XMLPARSEAPI XML_SetNamespaceDeclHandler(XML_Parser parser, XML_StartNamespaceDeclHandler start, XML_EndNamespaceDeclHandler end); void XMLPARSEAPI XML_SetNotStandaloneHandler(XML_Parser parser, XML_NotStandaloneHandler handler); void XMLPARSEAPI XML_SetExternalEntityRefHandler(XML_Parser parser, XML_ExternalEntityRefHandler handler); /* If a non-null value for arg is specified here, then it will be passed as the first argument to the external entity ref handler instead of the parser object. */ void XMLPARSEAPI XML_SetExternalEntityRefHandlerArg(XML_Parser, void *arg); void XMLPARSEAPI XML_SetUnknownEncodingHandler(XML_Parser parser, XML_UnknownEncodingHandler handler, void *encodingHandlerData); /* This can be called within a handler for a start element, end element, processing instruction or character data. It causes the corresponding markup to be passed to the default handler. */ void XMLPARSEAPI XML_DefaultCurrent(XML_Parser parser); /* This value is passed as the userData argument to callbacks. */ void XMLPARSEAPI XML_SetUserData(XML_Parser parser, void *userData); /* Returns the last value set by XML_SetUserData or null. */ #define XML_GetUserData(parser) (*(void **)(parser)) /* This is equivalent to supplying an encoding argument to XML_ParserCreate. It must not be called after XML_Parse or XML_ParseBuffer. */ int XMLPARSEAPI XML_SetEncoding(XML_Parser parser, const XML_Char *encoding); /* If this function is called, then the parser will be passed as the first argument to callbacks instead of userData. The userData will still be accessible using XML_GetUserData. */ void XMLPARSEAPI XML_UseParserAsHandlerArg(XML_Parser parser); /* Sets the base to be used for resolving relative URIs in system identifiers in declarations. Resolving relative identifiers is left to the application: this value will be passed through as the base argument to the XML_ExternalEntityRefHandler, XML_NotationDeclHandler and XML_UnparsedEntityDeclHandler. The base argument will be copied. Returns zero if out of memory, non-zero otherwise. */ int XMLPARSEAPI XML_SetBase(XML_Parser parser, const XML_Char *base); const XML_Char XMLPARSEAPI * XML_GetBase(XML_Parser parser); /* Returns the number of the attribute/value pairs passed in last call to the XML_StartElementHandler that were specified in the start-tag rather than defaulted. Each attribute/value pair counts as 2; thus this correspondds to an index into the atts array passed to the XML_StartElementHandler. */ int XMLPARSEAPI XML_GetSpecifiedAttributeCount(XML_Parser parser); /* Returns the index of the ID attribute passed in the last call to XML_StartElementHandler, or -1 if there is no ID attribute. Each attribute/value pair counts as 2; thus this correspondds to an index into the atts array passed to the XML_StartElementHandler. */ int XMLPARSEAPI XML_GetIdAttributeIndex(XML_Parser parser); /* Parses some input. Returns 0 if a fatal error is detected. The last call to XML_Parse must have isFinal true; len may be zero for this call (or any other). */ int XMLPARSEAPI XML_Parse(XML_Parser parser, const char *s, int len, int isFinal); void XMLPARSEAPI * XML_GetBuffer(XML_Parser parser, int len); int XMLPARSEAPI XML_ParseBuffer(XML_Parser parser, int len, int isFinal); /* Creates an XML_Parser object that can parse an external general entity; context is a '\0'-terminated string specifying the parse context; encoding is a '\0'-terminated string giving the name of the externally specified encoding, or null if there is no externally specified encoding. The context string consists of a sequence of tokens separated by formfeeds (\f); a token consisting of a name specifies that the general entity of the name is open; a token of the form prefix=uri specifies the namespace for a particular prefix; a token of the form =uri specifies the default namespace. This can be called at any point after the first call to an ExternalEntityRefHandler so longer as the parser has not yet been freed. The new parser is completely independent and may safely be used in a separate thread. The handlers and userData are initialized from the parser argument. Returns 0 if out of memory. Otherwise returns a new XML_Parser object. */ XML_Parser XMLPARSEAPI XML_ExternalEntityParserCreate(XML_Parser parser, const XML_Char *context, const XML_Char *encoding); enum XML_ParamEntityParsing { XML_PARAM_ENTITY_PARSING_NEVER, XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE, XML_PARAM_ENTITY_PARSING_ALWAYS }; /* Controls parsing of parameter entities (including the external DTD subset). If parsing of parameter entities is enabled, then references to external parameter entities (including the external DTD subset) will be passed to the handler set with XML_SetExternalEntityRefHandler. The context passed will be 0. Unlike external general entities, external parameter entities can only be parsed synchronously. If the external parameter entity is to be parsed, it must be parsed during the call to the external entity ref handler: the complete sequence of XML_ExternalEntityParserCreate, XML_Parse/XML_ParseBuffer and XML_ParserFree calls must be made during this call. After XML_ExternalEntityParserCreate has been called to create the parser for the external parameter entity (context must be 0 for this call), it is illegal to make any calls on the old parser until XML_ParserFree has been called on the newly created parser. If the library has been compiled without support for parameter entity parsing (ie without XML_DTD being defined), then XML_SetParamEntityParsing will return 0 if parsing of parameter entities is requested; otherwise it will return non-zero. */ int XMLPARSEAPI XML_SetParamEntityParsing(XML_Parser parser, enum XML_ParamEntityParsing parsing); enum XML_Error { XML_ERROR_NONE, XML_ERROR_NO_MEMORY, XML_ERROR_SYNTAX, XML_ERROR_NO_ELEMENTS, XML_ERROR_INVALID_TOKEN, XML_ERROR_UNCLOSED_TOKEN, XML_ERROR_PARTIAL_CHAR, XML_ERROR_TAG_MISMATCH, XML_ERROR_DUPLICATE_ATTRIBUTE, XML_ERROR_JUNK_AFTER_DOC_ELEMENT, XML_ERROR_PARAM_ENTITY_REF, XML_ERROR_UNDEFINED_ENTITY, XML_ERROR_RECURSIVE_ENTITY_REF, XML_ERROR_ASYNC_ENTITY, XML_ERROR_BAD_CHAR_REF, XML_ERROR_BINARY_ENTITY_REF, XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF, XML_ERROR_MISPLACED_XML_PI, XML_ERROR_UNKNOWN_ENCODING, XML_ERROR_INCORRECT_ENCODING, XML_ERROR_UNCLOSED_CDATA_SECTION, XML_ERROR_EXTERNAL_ENTITY_HANDLING, XML_ERROR_NOT_STANDALONE }; /* If XML_Parse or XML_ParseBuffer have returned 0, then XML_GetErrorCode returns information about the error. */ enum XML_Error XMLPARSEAPI XML_GetErrorCode(XML_Parser parser); /* These functions return information about the current parse location. They may be called when XML_Parse or XML_ParseBuffer return 0; in this case the location is the location of the character at which the error was detected. They may also be called from any other callback called to report some parse event; in this the location is the location of the first of the sequence of characters that generated the event. */ int XMLPARSEAPI XML_GetCurrentLineNumber(XML_Parser parser); int XMLPARSEAPI XML_GetCurrentColumnNumber(XML_Parser parser); long XMLPARSEAPI XML_GetCurrentByteIndex(XML_Parser parser); /* Return the number of bytes in the current event. Returns 0 if the event is in an internal entity. */ int XMLPARSEAPI XML_GetCurrentByteCount(XML_Parser parser); /* For backwards compatibility with previous versions. */ #define XML_GetErrorLineNumber XML_GetCurrentLineNumber #define XML_GetErrorColumnNumber XML_GetCurrentColumnNumber #define XML_GetErrorByteIndex XML_GetCurrentByteIndex /* Frees memory used by the parser. */ void XMLPARSEAPI XML_ParserFree(XML_Parser parser); /* Returns a string describing the error. */ const XML_LChar XMLPARSEAPI *XML_ErrorString(int code); #ifdef __cplusplus } #endif #endif /* not XmlParse_INCLUDED */ swish-e-2.4.7/src/expat/xmltok/0000777000077100017500000000000011166013172013336 500000000000000swish-e-2.4.7/src/expat/xmltok/iasciitab.h0000775000077100017500000000344711166010107015363 00000000000000/* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd See the file copying.txt for copying permission. */ /* Like asciitab.h, except that 0xD has code BT_S rather than BT_CR */ /* 0x00 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0x04 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0x08 */ BT_NONXML, BT_S, BT_LF, BT_NONXML, /* 0x0C */ BT_NONXML, BT_S, BT_NONXML, BT_NONXML, /* 0x10 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0x14 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0x18 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0x1C */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0x20 */ BT_S, BT_EXCL, BT_QUOT, BT_NUM, /* 0x24 */ BT_OTHER, BT_PERCNT, BT_AMP, BT_APOS, /* 0x28 */ BT_LPAR, BT_RPAR, BT_AST, BT_PLUS, /* 0x2C */ BT_COMMA, BT_MINUS, BT_NAME, BT_SOL, /* 0x30 */ BT_DIGIT, BT_DIGIT, BT_DIGIT, BT_DIGIT, /* 0x34 */ BT_DIGIT, BT_DIGIT, BT_DIGIT, BT_DIGIT, /* 0x38 */ BT_DIGIT, BT_DIGIT, BT_COLON, BT_SEMI, /* 0x3C */ BT_LT, BT_EQUALS, BT_GT, BT_QUEST, /* 0x40 */ BT_OTHER, BT_HEX, BT_HEX, BT_HEX, /* 0x44 */ BT_HEX, BT_HEX, BT_HEX, BT_NMSTRT, /* 0x48 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x4C */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x50 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x54 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x58 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_LSQB, /* 0x5C */ BT_OTHER, BT_RSQB, BT_OTHER, BT_NMSTRT, /* 0x60 */ BT_OTHER, BT_HEX, BT_HEX, BT_HEX, /* 0x64 */ BT_HEX, BT_HEX, BT_HEX, BT_NMSTRT, /* 0x68 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x6C */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x70 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x74 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x78 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_OTHER, /* 0x7C */ BT_VERBAR, BT_OTHER, BT_OTHER, BT_OTHER, swish-e-2.4.7/src/expat/xmltok/xmltok_impl.c0000775000077100017500000011765311166010106015771 00000000000000/* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd See the file copying.txt for copying permission. */ #ifndef IS_INVALID_CHAR #define IS_INVALID_CHAR(enc, ptr, n) (0) #endif #define INVALID_LEAD_CASE(n, ptr, nextTokPtr) \ case BT_LEAD ## n: \ if (end - ptr < n) \ return XML_TOK_PARTIAL_CHAR; \ if (IS_INVALID_CHAR(enc, ptr, n)) { \ *(nextTokPtr) = (ptr); \ return XML_TOK_INVALID; \ } \ ptr += n; \ break; #define INVALID_CASES(ptr, nextTokPtr) \ INVALID_LEAD_CASE(2, ptr, nextTokPtr) \ INVALID_LEAD_CASE(3, ptr, nextTokPtr) \ INVALID_LEAD_CASE(4, ptr, nextTokPtr) \ case BT_NONXML: \ case BT_MALFORM: \ case BT_TRAIL: \ *(nextTokPtr) = (ptr); \ return XML_TOK_INVALID; #define CHECK_NAME_CASE(n, enc, ptr, end, nextTokPtr) \ case BT_LEAD ## n: \ if (end - ptr < n) \ return XML_TOK_PARTIAL_CHAR; \ if (!IS_NAME_CHAR(enc, ptr, n)) { \ *nextTokPtr = ptr; \ return XML_TOK_INVALID; \ } \ ptr += n; \ break; #define CHECK_NAME_CASES(enc, ptr, end, nextTokPtr) \ case BT_NONASCII: \ if (!IS_NAME_CHAR_MINBPC(enc, ptr)) { \ *nextTokPtr = ptr; \ return XML_TOK_INVALID; \ } \ case BT_NMSTRT: \ case BT_HEX: \ case BT_DIGIT: \ case BT_NAME: \ case BT_MINUS: \ ptr += MINBPC(enc); \ break; \ CHECK_NAME_CASE(2, enc, ptr, end, nextTokPtr) \ CHECK_NAME_CASE(3, enc, ptr, end, nextTokPtr) \ CHECK_NAME_CASE(4, enc, ptr, end, nextTokPtr) #define CHECK_NMSTRT_CASE(n, enc, ptr, end, nextTokPtr) \ case BT_LEAD ## n: \ if (end - ptr < n) \ return XML_TOK_PARTIAL_CHAR; \ if (!IS_NMSTRT_CHAR(enc, ptr, n)) { \ *nextTokPtr = ptr; \ return XML_TOK_INVALID; \ } \ ptr += n; \ break; #define CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr) \ case BT_NONASCII: \ if (!IS_NMSTRT_CHAR_MINBPC(enc, ptr)) { \ *nextTokPtr = ptr; \ return XML_TOK_INVALID; \ } \ case BT_NMSTRT: \ case BT_HEX: \ ptr += MINBPC(enc); \ break; \ CHECK_NMSTRT_CASE(2, enc, ptr, end, nextTokPtr) \ CHECK_NMSTRT_CASE(3, enc, ptr, end, nextTokPtr) \ CHECK_NMSTRT_CASE(4, enc, ptr, end, nextTokPtr) #ifndef PREFIX #define PREFIX(ident) ident #endif /* ptr points to character following " */ switch (BYTE_TYPE(enc, ptr + MINBPC(enc))) { case BT_S: case BT_CR: case BT_LF: case BT_PERCNT: *nextTokPtr = ptr; return XML_TOK_INVALID; } /* fall through */ case BT_S: case BT_CR: case BT_LF: *nextTokPtr = ptr; return XML_TOK_DECL_OPEN; case BT_NMSTRT: case BT_HEX: ptr += MINBPC(enc); break; default: *nextTokPtr = ptr; return XML_TOK_INVALID; } } return XML_TOK_PARTIAL; } static int PREFIX(checkPiTarget)(const ENCODING *enc, const char *ptr, const char *end, int *tokPtr) { int upper = 0; *tokPtr = XML_TOK_PI; if (end - ptr != MINBPC(enc)*3) return 1; switch (BYTE_TO_ASCII(enc, ptr)) { case ASCII_x: break; case ASCII_X: upper = 1; break; default: return 1; } ptr += MINBPC(enc); switch (BYTE_TO_ASCII(enc, ptr)) { case ASCII_m: break; case ASCII_M: upper = 1; break; default: return 1; } ptr += MINBPC(enc); switch (BYTE_TO_ASCII(enc, ptr)) { case ASCII_l: break; case ASCII_L: upper = 1; break; default: return 1; } if (upper) return 0; *tokPtr = XML_TOK_XML_DECL; return 1; } /* ptr points to character following " 1) { size_t n = end - ptr; if (n & (MINBPC(enc) - 1)) { n &= ~(MINBPC(enc) - 1); if (n == 0) return XML_TOK_PARTIAL; end = ptr + n; } } switch (BYTE_TYPE(enc, ptr)) { case BT_RSQB: ptr += MINBPC(enc); if (ptr == end) return XML_TOK_PARTIAL; if (!CHAR_MATCHES(enc, ptr, ASCII_RSQB)) break; ptr += MINBPC(enc); if (ptr == end) return XML_TOK_PARTIAL; if (!CHAR_MATCHES(enc, ptr, ASCII_GT)) { ptr -= MINBPC(enc); break; } *nextTokPtr = ptr + MINBPC(enc); return XML_TOK_CDATA_SECT_CLOSE; case BT_CR: ptr += MINBPC(enc); if (ptr == end) return XML_TOK_PARTIAL; if (BYTE_TYPE(enc, ptr) == BT_LF) ptr += MINBPC(enc); *nextTokPtr = ptr; return XML_TOK_DATA_NEWLINE; case BT_LF: *nextTokPtr = ptr + MINBPC(enc); return XML_TOK_DATA_NEWLINE; INVALID_CASES(ptr, nextTokPtr) default: ptr += MINBPC(enc); break; } while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { #define LEAD_CASE(n) \ case BT_LEAD ## n: \ if (end - ptr < n || IS_INVALID_CHAR(enc, ptr, n)) { \ *nextTokPtr = ptr; \ return XML_TOK_DATA_CHARS; \ } \ ptr += n; \ break; LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4) #undef LEAD_CASE case BT_NONXML: case BT_MALFORM: case BT_TRAIL: case BT_CR: case BT_LF: case BT_RSQB: *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; default: ptr += MINBPC(enc); break; } } *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; } /* ptr points to character following " 1) { size_t n = end - ptr; if (n & (MINBPC(enc) - 1)) { n &= ~(MINBPC(enc) - 1); if (n == 0) return XML_TOK_PARTIAL; end = ptr + n; } } switch (BYTE_TYPE(enc, ptr)) { case BT_LT: return PREFIX(scanLt)(enc, ptr + MINBPC(enc), end, nextTokPtr); case BT_AMP: return PREFIX(scanRef)(enc, ptr + MINBPC(enc), end, nextTokPtr); case BT_CR: ptr += MINBPC(enc); if (ptr == end) return XML_TOK_TRAILING_CR; if (BYTE_TYPE(enc, ptr) == BT_LF) ptr += MINBPC(enc); *nextTokPtr = ptr; return XML_TOK_DATA_NEWLINE; case BT_LF: *nextTokPtr = ptr + MINBPC(enc); return XML_TOK_DATA_NEWLINE; case BT_RSQB: ptr += MINBPC(enc); if (ptr == end) return XML_TOK_TRAILING_RSQB; if (!CHAR_MATCHES(enc, ptr, ASCII_RSQB)) break; ptr += MINBPC(enc); if (ptr == end) return XML_TOK_TRAILING_RSQB; if (!CHAR_MATCHES(enc, ptr, ASCII_GT)) { ptr -= MINBPC(enc); break; } *nextTokPtr = ptr; return XML_TOK_INVALID; INVALID_CASES(ptr, nextTokPtr) default: ptr += MINBPC(enc); break; } while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { #define LEAD_CASE(n) \ case BT_LEAD ## n: \ if (end - ptr < n || IS_INVALID_CHAR(enc, ptr, n)) { \ *nextTokPtr = ptr; \ return XML_TOK_DATA_CHARS; \ } \ ptr += n; \ break; LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4) #undef LEAD_CASE case BT_RSQB: if (ptr + MINBPC(enc) != end) { if (!CHAR_MATCHES(enc, ptr + MINBPC(enc), ASCII_RSQB)) { ptr += MINBPC(enc); break; } if (ptr + 2*MINBPC(enc) != end) { if (!CHAR_MATCHES(enc, ptr + 2*MINBPC(enc), ASCII_GT)) { ptr += MINBPC(enc); break; } *nextTokPtr = ptr + 2*MINBPC(enc); return XML_TOK_INVALID; } } /* fall through */ case BT_AMP: case BT_LT: case BT_NONXML: case BT_MALFORM: case BT_TRAIL: case BT_CR: case BT_LF: *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; default: ptr += MINBPC(enc); break; } } *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; } /* ptr points to character following "%" */ static int PREFIX(scanPercent)(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { if (ptr == end) return XML_TOK_PARTIAL; switch (BYTE_TYPE(enc, ptr)) { CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr) case BT_S: case BT_LF: case BT_CR: case BT_PERCNT: *nextTokPtr = ptr; return XML_TOK_PERCENT; default: *nextTokPtr = ptr; return XML_TOK_INVALID; } while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { CHECK_NAME_CASES(enc, ptr, end, nextTokPtr) case BT_SEMI: *nextTokPtr = ptr + MINBPC(enc); return XML_TOK_PARAM_ENTITY_REF; default: *nextTokPtr = ptr; return XML_TOK_INVALID; } } return XML_TOK_PARTIAL; } static int PREFIX(scanPoundName)(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { if (ptr == end) return XML_TOK_PARTIAL; switch (BYTE_TYPE(enc, ptr)) { CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr) default: *nextTokPtr = ptr; return XML_TOK_INVALID; } while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { CHECK_NAME_CASES(enc, ptr, end, nextTokPtr) case BT_CR: case BT_LF: case BT_S: case BT_RPAR: case BT_GT: case BT_PERCNT: case BT_VERBAR: *nextTokPtr = ptr; return XML_TOK_POUND_NAME; default: *nextTokPtr = ptr; return XML_TOK_INVALID; } } return -XML_TOK_POUND_NAME; } static int PREFIX(scanLit)(int open, const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { while (ptr != end) { int t = BYTE_TYPE(enc, ptr); switch (t) { INVALID_CASES(ptr, nextTokPtr) case BT_QUOT: case BT_APOS: ptr += MINBPC(enc); if (t != open) break; if (ptr == end) return -XML_TOK_LITERAL; *nextTokPtr = ptr; switch (BYTE_TYPE(enc, ptr)) { case BT_S: case BT_CR: case BT_LF: case BT_GT: case BT_PERCNT: case BT_LSQB: return XML_TOK_LITERAL; default: return XML_TOK_INVALID; } default: ptr += MINBPC(enc); break; } } return XML_TOK_PARTIAL; } static int PREFIX(prologTok)(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { int tok; if (ptr == end) return XML_TOK_NONE; if (MINBPC(enc) > 1) { size_t n = end - ptr; if (n & (MINBPC(enc) - 1)) { n &= ~(MINBPC(enc) - 1); if (n == 0) return XML_TOK_PARTIAL; end = ptr + n; } } switch (BYTE_TYPE(enc, ptr)) { case BT_QUOT: return PREFIX(scanLit)(BT_QUOT, enc, ptr + MINBPC(enc), end, nextTokPtr); case BT_APOS: return PREFIX(scanLit)(BT_APOS, enc, ptr + MINBPC(enc), end, nextTokPtr); case BT_LT: { ptr += MINBPC(enc); if (ptr == end) return XML_TOK_PARTIAL; switch (BYTE_TYPE(enc, ptr)) { case BT_EXCL: return PREFIX(scanDecl)(enc, ptr + MINBPC(enc), end, nextTokPtr); case BT_QUEST: return PREFIX(scanPi)(enc, ptr + MINBPC(enc), end, nextTokPtr); case BT_NMSTRT: case BT_HEX: case BT_NONASCII: case BT_LEAD2: case BT_LEAD3: case BT_LEAD4: *nextTokPtr = ptr - MINBPC(enc); return XML_TOK_INSTANCE_START; } *nextTokPtr = ptr; return XML_TOK_INVALID; } case BT_CR: if (ptr + MINBPC(enc) == end) return -XML_TOK_PROLOG_S; /* fall through */ case BT_S: case BT_LF: for (;;) { ptr += MINBPC(enc); if (ptr == end) break; switch (BYTE_TYPE(enc, ptr)) { case BT_S: case BT_LF: break; case BT_CR: /* don't split CR/LF pair */ if (ptr + MINBPC(enc) != end) break; /* fall through */ default: *nextTokPtr = ptr; return XML_TOK_PROLOG_S; } } *nextTokPtr = ptr; return XML_TOK_PROLOG_S; case BT_PERCNT: return PREFIX(scanPercent)(enc, ptr + MINBPC(enc), end, nextTokPtr); case BT_COMMA: *nextTokPtr = ptr + MINBPC(enc); return XML_TOK_COMMA; case BT_LSQB: *nextTokPtr = ptr + MINBPC(enc); return XML_TOK_OPEN_BRACKET; case BT_RSQB: ptr += MINBPC(enc); if (ptr == end) return -XML_TOK_CLOSE_BRACKET; if (CHAR_MATCHES(enc, ptr, ASCII_RSQB)) { if (ptr + MINBPC(enc) == end) return XML_TOK_PARTIAL; if (CHAR_MATCHES(enc, ptr + MINBPC(enc), ASCII_GT)) { *nextTokPtr = ptr + 2*MINBPC(enc); return XML_TOK_COND_SECT_CLOSE; } } *nextTokPtr = ptr; return XML_TOK_CLOSE_BRACKET; case BT_LPAR: *nextTokPtr = ptr + MINBPC(enc); return XML_TOK_OPEN_PAREN; case BT_RPAR: ptr += MINBPC(enc); if (ptr == end) return -XML_TOK_CLOSE_PAREN; switch (BYTE_TYPE(enc, ptr)) { case BT_AST: *nextTokPtr = ptr + MINBPC(enc); return XML_TOK_CLOSE_PAREN_ASTERISK; case BT_QUEST: *nextTokPtr = ptr + MINBPC(enc); return XML_TOK_CLOSE_PAREN_QUESTION; case BT_PLUS: *nextTokPtr = ptr + MINBPC(enc); return XML_TOK_CLOSE_PAREN_PLUS; case BT_CR: case BT_LF: case BT_S: case BT_GT: case BT_COMMA: case BT_VERBAR: case BT_RPAR: *nextTokPtr = ptr; return XML_TOK_CLOSE_PAREN; } *nextTokPtr = ptr; return XML_TOK_INVALID; case BT_VERBAR: *nextTokPtr = ptr + MINBPC(enc); return XML_TOK_OR; case BT_GT: *nextTokPtr = ptr + MINBPC(enc); return XML_TOK_DECL_CLOSE; case BT_NUM: return PREFIX(scanPoundName)(enc, ptr + MINBPC(enc), end, nextTokPtr); #define LEAD_CASE(n) \ case BT_LEAD ## n: \ if (end - ptr < n) \ return XML_TOK_PARTIAL_CHAR; \ if (IS_NMSTRT_CHAR(enc, ptr, n)) { \ ptr += n; \ tok = XML_TOK_NAME; \ break; \ } \ if (IS_NAME_CHAR(enc, ptr, n)) { \ ptr += n; \ tok = XML_TOK_NMTOKEN; \ break; \ } \ *nextTokPtr = ptr; \ return XML_TOK_INVALID; LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4) #undef LEAD_CASE case BT_NMSTRT: case BT_HEX: tok = XML_TOK_NAME; ptr += MINBPC(enc); break; case BT_DIGIT: case BT_NAME: case BT_MINUS: #ifdef XML_NS case BT_COLON: #endif tok = XML_TOK_NMTOKEN; ptr += MINBPC(enc); break; case BT_NONASCII: if (IS_NMSTRT_CHAR_MINBPC(enc, ptr)) { ptr += MINBPC(enc); tok = XML_TOK_NAME; break; } if (IS_NAME_CHAR_MINBPC(enc, ptr)) { ptr += MINBPC(enc); tok = XML_TOK_NMTOKEN; break; } /* fall through */ default: *nextTokPtr = ptr; return XML_TOK_INVALID; } while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { CHECK_NAME_CASES(enc, ptr, end, nextTokPtr) case BT_GT: case BT_RPAR: case BT_COMMA: case BT_VERBAR: case BT_LSQB: case BT_PERCNT: case BT_S: case BT_CR: case BT_LF: *nextTokPtr = ptr; return tok; #ifdef XML_NS case BT_COLON: ptr += MINBPC(enc); switch (tok) { case XML_TOK_NAME: if (ptr == end) return XML_TOK_PARTIAL; tok = XML_TOK_PREFIXED_NAME; switch (BYTE_TYPE(enc, ptr)) { CHECK_NAME_CASES(enc, ptr, end, nextTokPtr) default: tok = XML_TOK_NMTOKEN; break; } break; case XML_TOK_PREFIXED_NAME: tok = XML_TOK_NMTOKEN; break; } break; #endif case BT_PLUS: if (tok == XML_TOK_NMTOKEN) { *nextTokPtr = ptr; return XML_TOK_INVALID; } *nextTokPtr = ptr + MINBPC(enc); return XML_TOK_NAME_PLUS; case BT_AST: if (tok == XML_TOK_NMTOKEN) { *nextTokPtr = ptr; return XML_TOK_INVALID; } *nextTokPtr = ptr + MINBPC(enc); return XML_TOK_NAME_ASTERISK; case BT_QUEST: if (tok == XML_TOK_NMTOKEN) { *nextTokPtr = ptr; return XML_TOK_INVALID; } *nextTokPtr = ptr + MINBPC(enc); return XML_TOK_NAME_QUESTION; default: *nextTokPtr = ptr; return XML_TOK_INVALID; } } return -tok; } static int PREFIX(attributeValueTok)(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { const char *start; if (ptr == end) return XML_TOK_NONE; start = ptr; while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { #define LEAD_CASE(n) \ case BT_LEAD ## n: ptr += n; break; LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4) #undef LEAD_CASE case BT_AMP: if (ptr == start) return PREFIX(scanRef)(enc, ptr + MINBPC(enc), end, nextTokPtr); *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; case BT_LT: /* this is for inside entity references */ *nextTokPtr = ptr; return XML_TOK_INVALID; case BT_LF: if (ptr == start) { *nextTokPtr = ptr + MINBPC(enc); return XML_TOK_DATA_NEWLINE; } *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; case BT_CR: if (ptr == start) { ptr += MINBPC(enc); if (ptr == end) return XML_TOK_TRAILING_CR; if (BYTE_TYPE(enc, ptr) == BT_LF) ptr += MINBPC(enc); *nextTokPtr = ptr; return XML_TOK_DATA_NEWLINE; } *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; case BT_S: if (ptr == start) { *nextTokPtr = ptr + MINBPC(enc); return XML_TOK_ATTRIBUTE_VALUE_S; } *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; default: ptr += MINBPC(enc); break; } } *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; } static int PREFIX(entityValueTok)(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { const char *start; if (ptr == end) return XML_TOK_NONE; start = ptr; while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { #define LEAD_CASE(n) \ case BT_LEAD ## n: ptr += n; break; LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4) #undef LEAD_CASE case BT_AMP: if (ptr == start) return PREFIX(scanRef)(enc, ptr + MINBPC(enc), end, nextTokPtr); *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; case BT_PERCNT: if (ptr == start) return PREFIX(scanPercent)(enc, ptr + MINBPC(enc), end, nextTokPtr); *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; case BT_LF: if (ptr == start) { *nextTokPtr = ptr + MINBPC(enc); return XML_TOK_DATA_NEWLINE; } *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; case BT_CR: if (ptr == start) { ptr += MINBPC(enc); if (ptr == end) return XML_TOK_TRAILING_CR; if (BYTE_TYPE(enc, ptr) == BT_LF) ptr += MINBPC(enc); *nextTokPtr = ptr; return XML_TOK_DATA_NEWLINE; } *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; default: ptr += MINBPC(enc); break; } } *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; } #ifdef XML_DTD static int PREFIX(ignoreSectionTok)(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { int level = 0; if (MINBPC(enc) > 1) { size_t n = end - ptr; if (n & (MINBPC(enc) - 1)) { n &= ~(MINBPC(enc) - 1); end = ptr + n; } } while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { INVALID_CASES(ptr, nextTokPtr) case BT_LT: if ((ptr += MINBPC(enc)) == end) return XML_TOK_PARTIAL; if (CHAR_MATCHES(enc, ptr, ASCII_EXCL)) { if ((ptr += MINBPC(enc)) == end) return XML_TOK_PARTIAL; if (CHAR_MATCHES(enc, ptr, ASCII_LSQB)) { ++level; ptr += MINBPC(enc); } } break; case BT_RSQB: if ((ptr += MINBPC(enc)) == end) return XML_TOK_PARTIAL; if (CHAR_MATCHES(enc, ptr, ASCII_RSQB)) { if ((ptr += MINBPC(enc)) == end) return XML_TOK_PARTIAL; if (CHAR_MATCHES(enc, ptr, ASCII_GT)) { ptr += MINBPC(enc); if (level == 0) { *nextTokPtr = ptr; return XML_TOK_IGNORE_SECT; } --level; } } break; default: ptr += MINBPC(enc); break; } } return XML_TOK_PARTIAL; } #endif /* XML_DTD */ static int PREFIX(isPublicId)(const ENCODING *enc, const char *ptr, const char *end, const char **badPtr) { ptr += MINBPC(enc); end -= MINBPC(enc); for (; ptr != end; ptr += MINBPC(enc)) { switch (BYTE_TYPE(enc, ptr)) { case BT_DIGIT: case BT_HEX: case BT_MINUS: case BT_APOS: case BT_LPAR: case BT_RPAR: case BT_PLUS: case BT_COMMA: case BT_SOL: case BT_EQUALS: case BT_QUEST: case BT_CR: case BT_LF: case BT_SEMI: case BT_EXCL: case BT_AST: case BT_PERCNT: case BT_NUM: #ifdef XML_NS case BT_COLON: #endif break; case BT_S: if (CHAR_MATCHES(enc, ptr, ASCII_TAB)) { *badPtr = ptr; return 0; } break; case BT_NAME: case BT_NMSTRT: if (!(BYTE_TO_ASCII(enc, ptr) & ~0x7f)) break; default: switch (BYTE_TO_ASCII(enc, ptr)) { case 0x24: /* $ */ case 0x40: /* @ */ break; default: *badPtr = ptr; return 0; } break; } } return 1; } /* This must only be called for a well-formed start-tag or empty element tag. Returns the number of attributes. Pointers to the first attsMax attributes are stored in atts. */ static int PREFIX(getAtts)(const ENCODING *enc, const char *ptr, int attsMax, ATTRIBUTE *atts) { enum { other, inName, inValue } state = inName; int nAtts = 0; int open = 0; /* defined when state == inValue; initialization just to shut up compilers */ for (ptr += MINBPC(enc);; ptr += MINBPC(enc)) { switch (BYTE_TYPE(enc, ptr)) { #define START_NAME \ if (state == other) { \ if (nAtts < attsMax) { \ atts[nAtts].name = ptr; \ atts[nAtts].normalized = 1; \ } \ state = inName; \ } #define LEAD_CASE(n) \ case BT_LEAD ## n: START_NAME ptr += (n - MINBPC(enc)); break; LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4) #undef LEAD_CASE case BT_NONASCII: case BT_NMSTRT: case BT_HEX: START_NAME break; #undef START_NAME case BT_QUOT: if (state != inValue) { if (nAtts < attsMax) atts[nAtts].valuePtr = ptr + MINBPC(enc); state = inValue; open = BT_QUOT; } else if (open == BT_QUOT) { state = other; if (nAtts < attsMax) atts[nAtts].valueEnd = ptr; nAtts++; } break; case BT_APOS: if (state != inValue) { if (nAtts < attsMax) atts[nAtts].valuePtr = ptr + MINBPC(enc); state = inValue; open = BT_APOS; } else if (open == BT_APOS) { state = other; if (nAtts < attsMax) atts[nAtts].valueEnd = ptr; nAtts++; } break; case BT_AMP: if (nAtts < attsMax) atts[nAtts].normalized = 0; break; case BT_S: if (state == inName) state = other; else if (state == inValue && nAtts < attsMax && atts[nAtts].normalized && (ptr == atts[nAtts].valuePtr || BYTE_TO_ASCII(enc, ptr) != ASCII_SPACE || BYTE_TO_ASCII(enc, ptr + MINBPC(enc)) == ASCII_SPACE || BYTE_TYPE(enc, ptr + MINBPC(enc)) == open)) atts[nAtts].normalized = 0; break; case BT_CR: case BT_LF: /* This case ensures that the first attribute name is counted Apart from that we could just change state on the quote. */ if (state == inName) state = other; else if (state == inValue && nAtts < attsMax) atts[nAtts].normalized = 0; break; case BT_GT: case BT_SOL: if (state != inValue) return nAtts; break; default: break; } } /* not reached */ } static int PREFIX(charRefNumber)(const ENCODING *enc, const char *ptr) { int result = 0; /* skip &# */ ptr += 2*MINBPC(enc); if (CHAR_MATCHES(enc, ptr, ASCII_x)) { for (ptr += MINBPC(enc); !CHAR_MATCHES(enc, ptr, ASCII_SEMI); ptr += MINBPC(enc)) { int c = BYTE_TO_ASCII(enc, ptr); switch (c) { case ASCII_0: case ASCII_1: case ASCII_2: case ASCII_3: case ASCII_4: case ASCII_5: case ASCII_6: case ASCII_7: case ASCII_8: case ASCII_9: result <<= 4; result |= (c - ASCII_0); break; case ASCII_A: case ASCII_B: case ASCII_C: case ASCII_D: case ASCII_E: case ASCII_F: result <<= 4; result += 10 + (c - ASCII_A); break; case ASCII_a: case ASCII_b: case ASCII_c: case ASCII_d: case ASCII_e: case ASCII_f: result <<= 4; result += 10 + (c - ASCII_a); break; } if (result >= 0x110000) return -1; } } else { for (; !CHAR_MATCHES(enc, ptr, ASCII_SEMI); ptr += MINBPC(enc)) { int c = BYTE_TO_ASCII(enc, ptr); result *= 10; result += (c - ASCII_0); if (result >= 0x110000) return -1; } } return checkCharRefNumber(result); } static int PREFIX(predefinedEntityName)(const ENCODING *enc, const char *ptr, const char *end) { switch ((end - ptr)/MINBPC(enc)) { case 2: if (CHAR_MATCHES(enc, ptr + MINBPC(enc), ASCII_t)) { switch (BYTE_TO_ASCII(enc, ptr)) { case ASCII_l: return ASCII_LT; case ASCII_g: return ASCII_GT; } } break; case 3: if (CHAR_MATCHES(enc, ptr, ASCII_a)) { ptr += MINBPC(enc); if (CHAR_MATCHES(enc, ptr, ASCII_m)) { ptr += MINBPC(enc); if (CHAR_MATCHES(enc, ptr, ASCII_p)) return ASCII_AMP; } } break; case 4: switch (BYTE_TO_ASCII(enc, ptr)) { case ASCII_q: ptr += MINBPC(enc); if (CHAR_MATCHES(enc, ptr, ASCII_u)) { ptr += MINBPC(enc); if (CHAR_MATCHES(enc, ptr, ASCII_o)) { ptr += MINBPC(enc); if (CHAR_MATCHES(enc, ptr, ASCII_t)) return ASCII_QUOT; } } break; case ASCII_a: ptr += MINBPC(enc); if (CHAR_MATCHES(enc, ptr, ASCII_p)) { ptr += MINBPC(enc); if (CHAR_MATCHES(enc, ptr, ASCII_o)) { ptr += MINBPC(enc); if (CHAR_MATCHES(enc, ptr, ASCII_s)) return ASCII_APOS; } } break; } } return 0; } static int PREFIX(sameName)(const ENCODING *enc, const char *ptr1, const char *ptr2) { for (;;) { switch (BYTE_TYPE(enc, ptr1)) { #define LEAD_CASE(n) \ case BT_LEAD ## n: \ if (*ptr1++ != *ptr2++) \ return 0; LEAD_CASE(4) LEAD_CASE(3) LEAD_CASE(2) #undef LEAD_CASE /* fall through */ if (*ptr1++ != *ptr2++) return 0; break; case BT_NONASCII: case BT_NMSTRT: #ifdef XML_NS case BT_COLON: #endif case BT_HEX: case BT_DIGIT: case BT_NAME: case BT_MINUS: if (*ptr2++ != *ptr1++) return 0; if (MINBPC(enc) > 1) { if (*ptr2++ != *ptr1++) return 0; if (MINBPC(enc) > 2) { if (*ptr2++ != *ptr1++) return 0; if (MINBPC(enc) > 3) { if (*ptr2++ != *ptr1++) return 0; } } } break; default: if (MINBPC(enc) == 1 && *ptr1 == *ptr2) return 1; switch (BYTE_TYPE(enc, ptr2)) { case BT_LEAD2: case BT_LEAD3: case BT_LEAD4: case BT_NONASCII: case BT_NMSTRT: #ifdef XML_NS case BT_COLON: #endif case BT_HEX: case BT_DIGIT: case BT_NAME: case BT_MINUS: return 0; default: return 1; } } } /* not reached */ } static int PREFIX(nameMatchesAscii)(const ENCODING *enc, const char *ptr1, const char *end1, const char *ptr2) { for (; *ptr2; ptr1 += MINBPC(enc), ptr2++) { if (ptr1 == end1) return 0; if (!CHAR_MATCHES(enc, ptr1, *ptr2)) return 0; } return ptr1 == end1; } static int PREFIX(nameLength)(const ENCODING *enc, const char *ptr) { const char *start = ptr; for (;;) { switch (BYTE_TYPE(enc, ptr)) { #define LEAD_CASE(n) \ case BT_LEAD ## n: ptr += n; break; LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4) #undef LEAD_CASE case BT_NONASCII: case BT_NMSTRT: #ifdef XML_NS case BT_COLON: #endif case BT_HEX: case BT_DIGIT: case BT_NAME: case BT_MINUS: ptr += MINBPC(enc); break; default: return ptr - start; } } } static const char *PREFIX(skipS)(const ENCODING *enc, const char *ptr) { for (;;) { switch (BYTE_TYPE(enc, ptr)) { case BT_LF: case BT_CR: case BT_S: ptr += MINBPC(enc); break; default: return ptr; } } } static void PREFIX(updatePosition)(const ENCODING *enc, const char *ptr, const char *end, POSITION *pos) { while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { #define LEAD_CASE(n) \ case BT_LEAD ## n: \ ptr += n; \ break; LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4) #undef LEAD_CASE case BT_LF: pos->columnNumber = (unsigned)-1; pos->lineNumber++; ptr += MINBPC(enc); break; case BT_CR: pos->lineNumber++; ptr += MINBPC(enc); if (ptr != end && BYTE_TYPE(enc, ptr) == BT_LF) ptr += MINBPC(enc); pos->columnNumber = (unsigned)-1; break; default: ptr += MINBPC(enc); break; } pos->columnNumber++; } } #undef DO_LEAD_CASE #undef MULTIBYTE_CASES #undef INVALID_CASES #undef CHECK_NAME_CASE #undef CHECK_NAME_CASES #undef CHECK_NMSTRT_CASE #undef CHECK_NMSTRT_CASES swish-e-2.4.7/src/expat/xmltok/nametab.h0000775000077100017500000001561211166010106015036 00000000000000static const unsigned namingBitmap[] = { 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x00000000, 0x04000000, 0x87FFFFFE, 0x07FFFFFE, 0x00000000, 0x00000000, 0xFF7FFFFF, 0xFF7FFFFF, 0xFFFFFFFF, 0x7FF3FFFF, 0xFFFFFDFE, 0x7FFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFE00F, 0xFC31FFFF, 0x00FFFFFF, 0x00000000, 0xFFFF0000, 0xFFFFFFFF, 0xFFFFFFFF, 0xF80001FF, 0x00000003, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0xFFFFD740, 0xFFFFFFFB, 0x547F7FFF, 0x000FFFFD, 0xFFFFDFFE, 0xFFFFFFFF, 0xDFFEFFFF, 0xFFFFFFFF, 0xFFFF0003, 0xFFFFFFFF, 0xFFFF199F, 0x033FCFFF, 0x00000000, 0xFFFE0000, 0x027FFFFF, 0xFFFFFFFE, 0x0000007F, 0x00000000, 0xFFFF0000, 0x000707FF, 0x00000000, 0x07FFFFFE, 0x000007FE, 0xFFFE0000, 0xFFFFFFFF, 0x7CFFFFFF, 0x002F7FFF, 0x00000060, 0xFFFFFFE0, 0x23FFFFFF, 0xFF000000, 0x00000003, 0xFFF99FE0, 0x03C5FDFF, 0xB0000000, 0x00030003, 0xFFF987E0, 0x036DFDFF, 0x5E000000, 0x001C0000, 0xFFFBAFE0, 0x23EDFDFF, 0x00000000, 0x00000001, 0xFFF99FE0, 0x23CDFDFF, 0xB0000000, 0x00000003, 0xD63DC7E0, 0x03BFC718, 0x00000000, 0x00000000, 0xFFFDDFE0, 0x03EFFDFF, 0x00000000, 0x00000003, 0xFFFDDFE0, 0x03EFFDFF, 0x40000000, 0x00000003, 0xFFFDDFE0, 0x03FFFDFF, 0x00000000, 0x00000003, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0xFFFFFFFE, 0x000D7FFF, 0x0000003F, 0x00000000, 0xFEF02596, 0x200D6CAE, 0x0000001F, 0x00000000, 0x00000000, 0x00000000, 0xFFFFFEFF, 0x000003FF, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0xFFFFFFFF, 0xFFFF003F, 0x007FFFFF, 0x0007DAED, 0x50000000, 0x82315001, 0x002C62AB, 0x40000000, 0xF580C900, 0x00000007, 0x02010800, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x0FFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x03FFFFFF, 0x3F3FFFFF, 0xFFFFFFFF, 0xAAFF3F3F, 0x3FFFFFFF, 0xFFFFFFFF, 0x5FDFFFFF, 0x0FCF1FDC, 0x1FDC1FFF, 0x00000000, 0x00004C40, 0x00000000, 0x00000000, 0x00000007, 0x00000000, 0x00000000, 0x00000000, 0x00000080, 0x000003FE, 0xFFFFFFFE, 0xFFFFFFFF, 0x001FFFFF, 0xFFFFFFFE, 0xFFFFFFFF, 0x07FFFFFF, 0xFFFFFFE0, 0x00001FFF, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x0000003F, 0x00000000, 0x00000000, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x0000000F, 0x00000000, 0x00000000, 0x00000000, 0x07FF6000, 0x87FFFFFE, 0x07FFFFFE, 0x00000000, 0x00800000, 0xFF7FFFFF, 0xFF7FFFFF, 0x00FFFFFF, 0x00000000, 0xFFFF0000, 0xFFFFFFFF, 0xFFFFFFFF, 0xF80001FF, 0x00030003, 0x00000000, 0xFFFFFFFF, 0xFFFFFFFF, 0x0000003F, 0x00000003, 0xFFFFD7C0, 0xFFFFFFFB, 0x547F7FFF, 0x000FFFFD, 0xFFFFDFFE, 0xFFFFFFFF, 0xDFFEFFFF, 0xFFFFFFFF, 0xFFFF007B, 0xFFFFFFFF, 0xFFFF199F, 0x033FCFFF, 0x00000000, 0xFFFE0000, 0x027FFFFF, 0xFFFFFFFE, 0xFFFE007F, 0xBBFFFFFB, 0xFFFF0016, 0x000707FF, 0x00000000, 0x07FFFFFE, 0x0007FFFF, 0xFFFF03FF, 0xFFFFFFFF, 0x7CFFFFFF, 0xFFEF7FFF, 0x03FF3DFF, 0xFFFFFFEE, 0xF3FFFFFF, 0xFF1E3FFF, 0x0000FFCF, 0xFFF99FEE, 0xD3C5FDFF, 0xB080399F, 0x0003FFCF, 0xFFF987E4, 0xD36DFDFF, 0x5E003987, 0x001FFFC0, 0xFFFBAFEE, 0xF3EDFDFF, 0x00003BBF, 0x0000FFC1, 0xFFF99FEE, 0xF3CDFDFF, 0xB0C0398F, 0x0000FFC3, 0xD63DC7EC, 0xC3BFC718, 0x00803DC7, 0x0000FF80, 0xFFFDDFEE, 0xC3EFFDFF, 0x00603DDF, 0x0000FFC3, 0xFFFDDFEC, 0xC3EFFDFF, 0x40603DDF, 0x0000FFC3, 0xFFFDDFEC, 0xC3FFFDFF, 0x00803DCF, 0x0000FFC3, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0xFFFFFFFE, 0x07FF7FFF, 0x03FF7FFF, 0x00000000, 0xFEF02596, 0x3BFF6CAE, 0x03FF3F5F, 0x00000000, 0x03000000, 0xC2A003FF, 0xFFFFFEFF, 0xFFFE03FF, 0xFEBF0FDF, 0x02FE3FFF, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x1FFF0000, 0x00000002, 0x000000A0, 0x003EFFFE, 0xFFFFFFFE, 0xFFFFFFFF, 0x661FFFFF, 0xFFFFFFFE, 0xFFFFFFFF, 0x77FFFFFF, }; static const unsigned char nmstrtPages[] = { 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x00, 0x00, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F, 0x10, 0x11, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x12, 0x13, 0x00, 0x14, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x15, 0x16, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x17, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x18, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, }; static const unsigned char namePages[] = { 0x19, 0x03, 0x1A, 0x1B, 0x1C, 0x1D, 0x1E, 0x00, 0x00, 0x1F, 0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x10, 0x11, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x12, 0x13, 0x26, 0x14, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x27, 0x16, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x17, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x18, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, }; swish-e-2.4.7/src/expat/xmltok/xmlrole.h0000775000077100017500000000466711166010107015122 00000000000000/* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd See the file copying.txt for copying permission. */ #ifndef XmlRole_INCLUDED #define XmlRole_INCLUDED 1 #include "xmltok.h" #ifdef __cplusplus extern "C" { #endif enum { XML_ROLE_ERROR = -1, XML_ROLE_NONE = 0, XML_ROLE_XML_DECL, XML_ROLE_INSTANCE_START, XML_ROLE_DOCTYPE_NAME, XML_ROLE_DOCTYPE_SYSTEM_ID, XML_ROLE_DOCTYPE_PUBLIC_ID, XML_ROLE_DOCTYPE_CLOSE, XML_ROLE_GENERAL_ENTITY_NAME, XML_ROLE_PARAM_ENTITY_NAME, XML_ROLE_ENTITY_VALUE, XML_ROLE_ENTITY_SYSTEM_ID, XML_ROLE_ENTITY_PUBLIC_ID, XML_ROLE_ENTITY_NOTATION_NAME, XML_ROLE_NOTATION_NAME, XML_ROLE_NOTATION_SYSTEM_ID, XML_ROLE_NOTATION_NO_SYSTEM_ID, XML_ROLE_NOTATION_PUBLIC_ID, XML_ROLE_ATTRIBUTE_NAME, XML_ROLE_ATTRIBUTE_TYPE_CDATA, XML_ROLE_ATTRIBUTE_TYPE_ID, XML_ROLE_ATTRIBUTE_TYPE_IDREF, XML_ROLE_ATTRIBUTE_TYPE_IDREFS, XML_ROLE_ATTRIBUTE_TYPE_ENTITY, XML_ROLE_ATTRIBUTE_TYPE_ENTITIES, XML_ROLE_ATTRIBUTE_TYPE_NMTOKEN, XML_ROLE_ATTRIBUTE_TYPE_NMTOKENS, XML_ROLE_ATTRIBUTE_ENUM_VALUE, XML_ROLE_ATTRIBUTE_NOTATION_VALUE, XML_ROLE_ATTLIST_ELEMENT_NAME, XML_ROLE_IMPLIED_ATTRIBUTE_VALUE, XML_ROLE_REQUIRED_ATTRIBUTE_VALUE, XML_ROLE_DEFAULT_ATTRIBUTE_VALUE, XML_ROLE_FIXED_ATTRIBUTE_VALUE, XML_ROLE_ELEMENT_NAME, XML_ROLE_CONTENT_ANY, XML_ROLE_CONTENT_EMPTY, XML_ROLE_CONTENT_PCDATA, XML_ROLE_GROUP_OPEN, XML_ROLE_GROUP_CLOSE, XML_ROLE_GROUP_CLOSE_REP, XML_ROLE_GROUP_CLOSE_OPT, XML_ROLE_GROUP_CLOSE_PLUS, XML_ROLE_GROUP_CHOICE, XML_ROLE_GROUP_SEQUENCE, XML_ROLE_CONTENT_ELEMENT, XML_ROLE_CONTENT_ELEMENT_REP, XML_ROLE_CONTENT_ELEMENT_OPT, XML_ROLE_CONTENT_ELEMENT_PLUS, #ifdef XML_DTD XML_ROLE_TEXT_DECL, XML_ROLE_IGNORE_SECT, XML_ROLE_INNER_PARAM_ENTITY_REF, #endif /* XML_DTD */ XML_ROLE_PARAM_ENTITY_REF, XML_ROLE_EXTERNAL_GENERAL_ENTITY_NO_NOTATION }; typedef struct prolog_state { int (*handler)(struct prolog_state *state, int tok, const char *ptr, const char *end, const ENCODING *enc); unsigned level; #ifdef XML_DTD unsigned includeLevel; int documentEntity; #endif /* XML_DTD */ } PROLOG_STATE; void XMLTOKAPI XmlPrologStateInit(PROLOG_STATE *); #ifdef XML_DTD void XMLTOKAPI XmlPrologStateInitExternalEntity(PROLOG_STATE *); #endif /* XML_DTD */ #define XmlTokenRole(state, tok, ptr, end, enc) \ (((state)->handler)(state, tok, ptr, end, enc)) #ifdef __cplusplus } #endif #endif /* not XmlRole_INCLUDED */ swish-e-2.4.7/src/expat/xmltok/ascii.h0000775000077100017500000000342711166010107014521 00000000000000/* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd See the file copying.txt for copying permission. */ #define ASCII_A 0x41 #define ASCII_B 0x42 #define ASCII_C 0x43 #define ASCII_D 0x44 #define ASCII_E 0x45 #define ASCII_F 0x46 #define ASCII_G 0x47 #define ASCII_H 0x48 #define ASCII_I 0x49 #define ASCII_J 0x4A #define ASCII_K 0x4B #define ASCII_L 0x4C #define ASCII_M 0x4D #define ASCII_N 0x4E #define ASCII_O 0x4F #define ASCII_P 0x50 #define ASCII_Q 0x51 #define ASCII_R 0x52 #define ASCII_S 0x53 #define ASCII_T 0x54 #define ASCII_U 0x55 #define ASCII_V 0x56 #define ASCII_W 0x57 #define ASCII_X 0x58 #define ASCII_Y 0x59 #define ASCII_Z 0x5A #define ASCII_a 0x61 #define ASCII_b 0x62 #define ASCII_c 0x63 #define ASCII_d 0x64 #define ASCII_e 0x65 #define ASCII_f 0x66 #define ASCII_g 0x67 #define ASCII_h 0x68 #define ASCII_i 0x69 #define ASCII_j 0x6A #define ASCII_k 0x6B #define ASCII_l 0x6C #define ASCII_m 0x6D #define ASCII_n 0x6E #define ASCII_o 0x6F #define ASCII_p 0x70 #define ASCII_q 0x71 #define ASCII_r 0x72 #define ASCII_s 0x73 #define ASCII_t 0x74 #define ASCII_u 0x75 #define ASCII_v 0x76 #define ASCII_w 0x77 #define ASCII_x 0x78 #define ASCII_y 0x79 #define ASCII_z 0x7A #define ASCII_0 0x30 #define ASCII_1 0x31 #define ASCII_2 0x32 #define ASCII_3 0x33 #define ASCII_4 0x34 #define ASCII_5 0x35 #define ASCII_6 0x36 #define ASCII_7 0x37 #define ASCII_8 0x38 #define ASCII_9 0x39 #define ASCII_TAB 0x09 #define ASCII_SPACE 0x20 #define ASCII_EXCL 0x21 #define ASCII_QUOT 0x22 #define ASCII_AMP 0x26 #define ASCII_APOS 0x27 #define ASCII_MINUS 0x2D #define ASCII_PERIOD 0x2E #define ASCII_COLON 0x3A #define ASCII_SEMI 0x3B #define ASCII_LT 0x3C #define ASCII_EQUALS 0x3D #define ASCII_GT 0x3E #define ASCII_LSQB 0x5B #define ASCII_RSQB 0x5D #define ASCII_UNDERSCORE 0x5F swish-e-2.4.7/src/expat/xmltok/xmltok.c0000775000077100017500000011162011166010106014734 00000000000000/* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd See the file copying.txt for copying permission. */ #include "xmldef.h" #include "xmltok.h" #include "nametab.h" #ifdef XML_DTD #define IGNORE_SECTION_TOK_VTABLE , PREFIX(ignoreSectionTok) #else #define IGNORE_SECTION_TOK_VTABLE /* as nothing */ #endif #define VTABLE1 \ { PREFIX(prologTok), PREFIX(contentTok), \ PREFIX(cdataSectionTok) IGNORE_SECTION_TOK_VTABLE }, \ { PREFIX(attributeValueTok), PREFIX(entityValueTok) }, \ PREFIX(sameName), \ PREFIX(nameMatchesAscii), \ PREFIX(nameLength), \ PREFIX(skipS), \ PREFIX(getAtts), \ PREFIX(charRefNumber), \ PREFIX(predefinedEntityName), \ PREFIX(updatePosition), \ PREFIX(isPublicId) #define VTABLE VTABLE1, PREFIX(toUtf8), PREFIX(toUtf16) #define UCS2_GET_NAMING(pages, hi, lo) \ (namingBitmap[(pages[hi] << 3) + ((lo) >> 5)] & (1 << ((lo) & 0x1F))) /* A 2 byte UTF-8 representation splits the characters 11 bits between the bottom 5 and 6 bits of the bytes. We need 8 bits to index into pages, 3 bits to add to that index and 5 bits to generate the mask. */ #define UTF8_GET_NAMING2(pages, byte) \ (namingBitmap[((pages)[(((byte)[0]) >> 2) & 7] << 3) \ + ((((byte)[0]) & 3) << 1) \ + ((((byte)[1]) >> 5) & 1)] \ & (1 << (((byte)[1]) & 0x1F))) /* A 3 byte UTF-8 representation splits the characters 16 bits between the bottom 4, 6 and 6 bits of the bytes. We need 8 bits to index into pages, 3 bits to add to that index and 5 bits to generate the mask. */ #define UTF8_GET_NAMING3(pages, byte) \ (namingBitmap[((pages)[((((byte)[0]) & 0xF) << 4) \ + ((((byte)[1]) >> 2) & 0xF)] \ << 3) \ + ((((byte)[1]) & 3) << 1) \ + ((((byte)[2]) >> 5) & 1)] \ & (1 << (((byte)[2]) & 0x1F))) #define UTF8_GET_NAMING(pages, p, n) \ ((n) == 2 \ ? UTF8_GET_NAMING2(pages, (const unsigned char *)(p)) \ : ((n) == 3 \ ? UTF8_GET_NAMING3(pages, (const unsigned char *)(p)) \ : 0)) #define UTF8_INVALID3(p) \ ((*p) == 0xED \ ? (((p)[1] & 0x20) != 0) \ : ((*p) == 0xEF \ ? ((p)[1] == 0xBF && ((p)[2] == 0xBF || (p)[2] == 0xBE)) \ : 0)) #define UTF8_INVALID4(p) ((*p) == 0xF4 && ((p)[1] & 0x30) != 0) static int isNever(const ENCODING *enc, const char *p) { return 0; } static int utf8_isName2(const ENCODING *enc, const char *p) { return UTF8_GET_NAMING2(namePages, (const unsigned char *)p); } static int utf8_isName3(const ENCODING *enc, const char *p) { return UTF8_GET_NAMING3(namePages, (const unsigned char *)p); } #define utf8_isName4 isNever static int utf8_isNmstrt2(const ENCODING *enc, const char *p) { return UTF8_GET_NAMING2(nmstrtPages, (const unsigned char *)p); } static int utf8_isNmstrt3(const ENCODING *enc, const char *p) { return UTF8_GET_NAMING3(nmstrtPages, (const unsigned char *)p); } #define utf8_isNmstrt4 isNever #define utf8_isInvalid2 isNever static int utf8_isInvalid3(const ENCODING *enc, const char *p) { return UTF8_INVALID3((const unsigned char *)p); } static int utf8_isInvalid4(const ENCODING *enc, const char *p) { return UTF8_INVALID4((const unsigned char *)p); } struct normal_encoding { ENCODING enc; unsigned char type[256]; #ifdef XML_MIN_SIZE int (*byteType)(const ENCODING *, const char *); int (*isNameMin)(const ENCODING *, const char *); int (*isNmstrtMin)(const ENCODING *, const char *); int (*byteToAscii)(const ENCODING *, const char *); int (*charMatches)(const ENCODING *, const char *, int); #endif /* XML_MIN_SIZE */ int (*isName2)(const ENCODING *, const char *); int (*isName3)(const ENCODING *, const char *); int (*isName4)(const ENCODING *, const char *); int (*isNmstrt2)(const ENCODING *, const char *); int (*isNmstrt3)(const ENCODING *, const char *); int (*isNmstrt4)(const ENCODING *, const char *); int (*isInvalid2)(const ENCODING *, const char *); int (*isInvalid3)(const ENCODING *, const char *); int (*isInvalid4)(const ENCODING *, const char *); }; #ifdef XML_MIN_SIZE #define STANDARD_VTABLE(E) \ E ## byteType, \ E ## isNameMin, \ E ## isNmstrtMin, \ E ## byteToAscii, \ E ## charMatches, #else #define STANDARD_VTABLE(E) /* as nothing */ #endif #define NORMAL_VTABLE(E) \ E ## isName2, \ E ## isName3, \ E ## isName4, \ E ## isNmstrt2, \ E ## isNmstrt3, \ E ## isNmstrt4, \ E ## isInvalid2, \ E ## isInvalid3, \ E ## isInvalid4 static int checkCharRefNumber(int); #include "xmltok_impl.h" #include "ascii.h" #ifdef XML_MIN_SIZE #define sb_isNameMin isNever #define sb_isNmstrtMin isNever #endif #ifdef XML_MIN_SIZE #define MINBPC(enc) ((enc)->minBytesPerChar) #else /* minimum bytes per character */ #define MINBPC(enc) 1 #endif #define SB_BYTE_TYPE(enc, p) \ (((struct normal_encoding *)(enc))->type[(unsigned char)*(p)]) #ifdef XML_MIN_SIZE static int sb_byteType(const ENCODING *enc, const char *p) { return SB_BYTE_TYPE(enc, p); } #define BYTE_TYPE(enc, p) \ (((const struct normal_encoding *)(enc))->byteType(enc, p)) #else #define BYTE_TYPE(enc, p) SB_BYTE_TYPE(enc, p) #endif #ifdef XML_MIN_SIZE #define BYTE_TO_ASCII(enc, p) \ (((const struct normal_encoding *)(enc))->byteToAscii(enc, p)) static int sb_byteToAscii(const ENCODING *enc, const char *p) { return *p; } #else #define BYTE_TO_ASCII(enc, p) (*(p)) #endif #define IS_NAME_CHAR(enc, p, n) \ (((const struct normal_encoding *)(enc))->isName ## n(enc, p)) #define IS_NMSTRT_CHAR(enc, p, n) \ (((const struct normal_encoding *)(enc))->isNmstrt ## n(enc, p)) #define IS_INVALID_CHAR(enc, p, n) \ (((const struct normal_encoding *)(enc))->isInvalid ## n(enc, p)) #ifdef XML_MIN_SIZE #define IS_NAME_CHAR_MINBPC(enc, p) \ (((const struct normal_encoding *)(enc))->isNameMin(enc, p)) #define IS_NMSTRT_CHAR_MINBPC(enc, p) \ (((const struct normal_encoding *)(enc))->isNmstrtMin(enc, p)) #else #define IS_NAME_CHAR_MINBPC(enc, p) (0) #define IS_NMSTRT_CHAR_MINBPC(enc, p) (0) #endif #ifdef XML_MIN_SIZE #define CHAR_MATCHES(enc, p, c) \ (((const struct normal_encoding *)(enc))->charMatches(enc, p, c)) static int sb_charMatches(const ENCODING *enc, const char *p, int c) { return *p == c; } #else /* c is an ASCII character */ #define CHAR_MATCHES(enc, p, c) (*(p) == c) #endif #define PREFIX(ident) normal_ ## ident #include "xmltok_impl.c" #undef MINBPC #undef BYTE_TYPE #undef BYTE_TO_ASCII #undef CHAR_MATCHES #undef IS_NAME_CHAR #undef IS_NAME_CHAR_MINBPC #undef IS_NMSTRT_CHAR #undef IS_NMSTRT_CHAR_MINBPC #undef IS_INVALID_CHAR enum { /* UTF8_cvalN is value of masked first byte of N byte sequence */ UTF8_cval1 = 0x00, UTF8_cval2 = 0xc0, UTF8_cval3 = 0xe0, UTF8_cval4 = 0xf0 }; static void utf8_toUtf8(const ENCODING *enc, const char **fromP, const char *fromLim, char **toP, const char *toLim) { char *to; const char *from; if (fromLim - *fromP > toLim - *toP) { /* Avoid copying partial characters. */ for (fromLim = *fromP + (toLim - *toP); fromLim > *fromP; fromLim--) if (((unsigned char)fromLim[-1] & 0xc0) != 0x80) break; } for (to = *toP, from = *fromP; from != fromLim; from++, to++) *to = *from; *fromP = from; *toP = to; } static void utf8_toUtf16(const ENCODING *enc, const char **fromP, const char *fromLim, unsigned short **toP, const unsigned short *toLim) { unsigned short *to = *toP; const char *from = *fromP; while (from != fromLim && to != toLim) { switch (((struct normal_encoding *)enc)->type[(unsigned char)*from]) { case BT_LEAD2: *to++ = ((from[0] & 0x1f) << 6) | (from[1] & 0x3f); from += 2; break; case BT_LEAD3: *to++ = ((from[0] & 0xf) << 12) | ((from[1] & 0x3f) << 6) | (from[2] & 0x3f); from += 3; break; case BT_LEAD4: { unsigned long n; if (to + 1 == toLim) break; n = ((from[0] & 0x7) << 18) | ((from[1] & 0x3f) << 12) | ((from[2] & 0x3f) << 6) | (from[3] & 0x3f); n -= 0x10000; to[0] = (unsigned short)((n >> 10) | 0xD800); to[1] = (unsigned short)((n & 0x3FF) | 0xDC00); to += 2; from += 4; } break; default: *to++ = *from++; break; } } *fromP = from; *toP = to; } #ifdef XML_NS static const struct normal_encoding utf8_encoding_ns = { { VTABLE1, utf8_toUtf8, utf8_toUtf16, 1, 1, 0 }, { #include "asciitab.h" #include "utf8tab.h" }, STANDARD_VTABLE(sb_) NORMAL_VTABLE(utf8_) }; #endif static const struct normal_encoding utf8_encoding = { { VTABLE1, utf8_toUtf8, utf8_toUtf16, 1, 1, 0 }, { #define BT_COLON BT_NMSTRT #include "asciitab.h" #undef BT_COLON #include "utf8tab.h" }, STANDARD_VTABLE(sb_) NORMAL_VTABLE(utf8_) }; #ifdef XML_NS static const struct normal_encoding internal_utf8_encoding_ns = { { VTABLE1, utf8_toUtf8, utf8_toUtf16, 1, 1, 0 }, { #include "iasciitab.h" #include "utf8tab.h" }, STANDARD_VTABLE(sb_) NORMAL_VTABLE(utf8_) }; #endif static const struct normal_encoding internal_utf8_encoding = { { VTABLE1, utf8_toUtf8, utf8_toUtf16, 1, 1, 0 }, { #define BT_COLON BT_NMSTRT #include "iasciitab.h" #undef BT_COLON #include "utf8tab.h" }, STANDARD_VTABLE(sb_) NORMAL_VTABLE(utf8_) }; static void latin1_toUtf8(const ENCODING *enc, const char **fromP, const char *fromLim, char **toP, const char *toLim) { for (;;) { unsigned char c; if (*fromP == fromLim) break; c = (unsigned char)**fromP; if (c & 0x80) { if (toLim - *toP < 2) break; *(*toP)++ = ((c >> 6) | UTF8_cval2); *(*toP)++ = ((c & 0x3f) | 0x80); (*fromP)++; } else { if (*toP == toLim) break; *(*toP)++ = *(*fromP)++; } } } static void latin1_toUtf16(const ENCODING *enc, const char **fromP, const char *fromLim, unsigned short **toP, const unsigned short *toLim) { while (*fromP != fromLim && *toP != toLim) *(*toP)++ = (unsigned char)*(*fromP)++; } #ifdef XML_NS static const struct normal_encoding latin1_encoding_ns = { { VTABLE1, latin1_toUtf8, latin1_toUtf16, 1, 0, 0 }, { #include "asciitab.h" #include "latin1tab.h" }, STANDARD_VTABLE(sb_) }; #endif static const struct normal_encoding latin1_encoding = { { VTABLE1, latin1_toUtf8, latin1_toUtf16, 1, 0, 0 }, { #define BT_COLON BT_NMSTRT #include "asciitab.h" #undef BT_COLON #include "latin1tab.h" }, STANDARD_VTABLE(sb_) }; static void ascii_toUtf8(const ENCODING *enc, const char **fromP, const char *fromLim, char **toP, const char *toLim) { while (*fromP != fromLim && *toP != toLim) *(*toP)++ = *(*fromP)++; } #ifdef XML_NS static const struct normal_encoding ascii_encoding_ns = { { VTABLE1, ascii_toUtf8, latin1_toUtf16, 1, 1, 0 }, { #include "asciitab.h" /* BT_NONXML == 0 */ }, STANDARD_VTABLE(sb_) }; #endif static const struct normal_encoding ascii_encoding = { { VTABLE1, ascii_toUtf8, latin1_toUtf16, 1, 1, 0 }, { #define BT_COLON BT_NMSTRT #include "asciitab.h" #undef BT_COLON /* BT_NONXML == 0 */ }, STANDARD_VTABLE(sb_) }; static int unicode_byte_type(char hi, char lo) { switch ((unsigned char)hi) { case 0xD8: case 0xD9: case 0xDA: case 0xDB: return BT_LEAD4; case 0xDC: case 0xDD: case 0xDE: case 0xDF: return BT_TRAIL; case 0xFF: switch ((unsigned char)lo) { case 0xFF: case 0xFE: return BT_NONXML; } break; } return BT_NONASCII; } #define DEFINE_UTF16_TO_UTF8(E) \ static \ void E ## toUtf8(const ENCODING *enc, \ const char **fromP, const char *fromLim, \ char **toP, const char *toLim) \ { \ const char *from; \ for (from = *fromP; from != fromLim; from += 2) { \ int plane; \ unsigned char lo2; \ unsigned char lo = GET_LO(from); \ unsigned char hi = GET_HI(from); \ switch (hi) { \ case 0: \ if (lo < 0x80) { \ if (*toP == toLim) { \ *fromP = from; \ return; \ } \ *(*toP)++ = lo; \ break; \ } \ /* fall through */ \ case 0x1: case 0x2: case 0x3: \ case 0x4: case 0x5: case 0x6: case 0x7: \ if (toLim - *toP < 2) { \ *fromP = from; \ return; \ } \ *(*toP)++ = ((lo >> 6) | (hi << 2) | UTF8_cval2); \ *(*toP)++ = ((lo & 0x3f) | 0x80); \ break; \ default: \ if (toLim - *toP < 3) { \ *fromP = from; \ return; \ } \ /* 16 bits divided 4, 6, 6 amongst 3 bytes */ \ *(*toP)++ = ((hi >> 4) | UTF8_cval3); \ *(*toP)++ = (((hi & 0xf) << 2) | (lo >> 6) | 0x80); \ *(*toP)++ = ((lo & 0x3f) | 0x80); \ break; \ case 0xD8: case 0xD9: case 0xDA: case 0xDB: \ if (toLim - *toP < 4) { \ *fromP = from; \ return; \ } \ plane = (((hi & 0x3) << 2) | ((lo >> 6) & 0x3)) + 1; \ *(*toP)++ = ((plane >> 2) | UTF8_cval4); \ *(*toP)++ = (((lo >> 2) & 0xF) | ((plane & 0x3) << 4) | 0x80); \ from += 2; \ lo2 = GET_LO(from); \ *(*toP)++ = (((lo & 0x3) << 4) \ | ((GET_HI(from) & 0x3) << 2) \ | (lo2 >> 6) \ | 0x80); \ *(*toP)++ = ((lo2 & 0x3f) | 0x80); \ break; \ } \ } \ *fromP = from; \ } #define DEFINE_UTF16_TO_UTF16(E) \ static \ void E ## toUtf16(const ENCODING *enc, \ const char **fromP, const char *fromLim, \ unsigned short **toP, const unsigned short *toLim) \ { \ /* Avoid copying first half only of surrogate */ \ if (fromLim - *fromP > ((toLim - *toP) << 1) \ && (GET_HI(fromLim - 2) & 0xF8) == 0xD8) \ fromLim -= 2; \ for (; *fromP != fromLim && *toP != toLim; *fromP += 2) \ *(*toP)++ = (GET_HI(*fromP) << 8) | GET_LO(*fromP); \ } #define SET2(ptr, ch) \ (((ptr)[0] = ((ch) & 0xff)), ((ptr)[1] = ((ch) >> 8))) #define GET_LO(ptr) ((unsigned char)(ptr)[0]) #define GET_HI(ptr) ((unsigned char)(ptr)[1]) DEFINE_UTF16_TO_UTF8(little2_) DEFINE_UTF16_TO_UTF16(little2_) #undef SET2 #undef GET_LO #undef GET_HI #define SET2(ptr, ch) \ (((ptr)[0] = ((ch) >> 8)), ((ptr)[1] = ((ch) & 0xFF))) #define GET_LO(ptr) ((unsigned char)(ptr)[1]) #define GET_HI(ptr) ((unsigned char)(ptr)[0]) DEFINE_UTF16_TO_UTF8(big2_) DEFINE_UTF16_TO_UTF16(big2_) #undef SET2 #undef GET_LO #undef GET_HI #define LITTLE2_BYTE_TYPE(enc, p) \ ((p)[1] == 0 \ ? ((struct normal_encoding *)(enc))->type[(unsigned char)*(p)] \ : unicode_byte_type((p)[1], (p)[0])) #define LITTLE2_BYTE_TO_ASCII(enc, p) ((p)[1] == 0 ? (p)[0] : -1) #define LITTLE2_CHAR_MATCHES(enc, p, c) ((p)[1] == 0 && (p)[0] == c) #define LITTLE2_IS_NAME_CHAR_MINBPC(enc, p) \ UCS2_GET_NAMING(namePages, (unsigned char)p[1], (unsigned char)p[0]) #define LITTLE2_IS_NMSTRT_CHAR_MINBPC(enc, p) \ UCS2_GET_NAMING(nmstrtPages, (unsigned char)p[1], (unsigned char)p[0]) #ifdef XML_MIN_SIZE static int little2_byteType(const ENCODING *enc, const char *p) { return LITTLE2_BYTE_TYPE(enc, p); } static int little2_byteToAscii(const ENCODING *enc, const char *p) { return LITTLE2_BYTE_TO_ASCII(enc, p); } static int little2_charMatches(const ENCODING *enc, const char *p, int c) { return LITTLE2_CHAR_MATCHES(enc, p, c); } static int little2_isNameMin(const ENCODING *enc, const char *p) { return LITTLE2_IS_NAME_CHAR_MINBPC(enc, p); } static int little2_isNmstrtMin(const ENCODING *enc, const char *p) { return LITTLE2_IS_NMSTRT_CHAR_MINBPC(enc, p); } #undef VTABLE #define VTABLE VTABLE1, little2_toUtf8, little2_toUtf16 #else /* not XML_MIN_SIZE */ #undef PREFIX #define PREFIX(ident) little2_ ## ident #define MINBPC(enc) 2 /* CHAR_MATCHES is guaranteed to have MINBPC bytes available. */ #define BYTE_TYPE(enc, p) LITTLE2_BYTE_TYPE(enc, p) #define BYTE_TO_ASCII(enc, p) LITTLE2_BYTE_TO_ASCII(enc, p) #define CHAR_MATCHES(enc, p, c) LITTLE2_CHAR_MATCHES(enc, p, c) #define IS_NAME_CHAR(enc, p, n) 0 #define IS_NAME_CHAR_MINBPC(enc, p) LITTLE2_IS_NAME_CHAR_MINBPC(enc, p) #define IS_NMSTRT_CHAR(enc, p, n) (0) #define IS_NMSTRT_CHAR_MINBPC(enc, p) LITTLE2_IS_NMSTRT_CHAR_MINBPC(enc, p) #include "xmltok_impl.c" #undef MINBPC #undef BYTE_TYPE #undef BYTE_TO_ASCII #undef CHAR_MATCHES #undef IS_NAME_CHAR #undef IS_NAME_CHAR_MINBPC #undef IS_NMSTRT_CHAR #undef IS_NMSTRT_CHAR_MINBPC #undef IS_INVALID_CHAR #endif /* not XML_MIN_SIZE */ #ifdef XML_NS static const struct normal_encoding little2_encoding_ns = { { VTABLE, 2, 0, #if XML_BYTE_ORDER == 12 1 #else 0 #endif }, { #include "asciitab.h" #include "latin1tab.h" }, STANDARD_VTABLE(little2_) }; #endif static const struct normal_encoding little2_encoding = { { VTABLE, 2, 0, #if XML_BYTE_ORDER == 12 1 #else 0 #endif }, { #define BT_COLON BT_NMSTRT #include "asciitab.h" #undef BT_COLON #include "latin1tab.h" }, STANDARD_VTABLE(little2_) }; #if XML_BYTE_ORDER != 21 #ifdef XML_NS static const struct normal_encoding internal_little2_encoding_ns = { { VTABLE, 2, 0, 1 }, { #include "iasciitab.h" #include "latin1tab.h" }, STANDARD_VTABLE(little2_) }; #endif static const struct normal_encoding internal_little2_encoding = { { VTABLE, 2, 0, 1 }, { #define BT_COLON BT_NMSTRT #include "iasciitab.h" #undef BT_COLON #include "latin1tab.h" }, STANDARD_VTABLE(little2_) }; #endif #define BIG2_BYTE_TYPE(enc, p) \ ((p)[0] == 0 \ ? ((struct normal_encoding *)(enc))->type[(unsigned char)(p)[1]] \ : unicode_byte_type((p)[0], (p)[1])) #define BIG2_BYTE_TO_ASCII(enc, p) ((p)[0] == 0 ? (p)[1] : -1) #define BIG2_CHAR_MATCHES(enc, p, c) ((p)[0] == 0 && (p)[1] == c) #define BIG2_IS_NAME_CHAR_MINBPC(enc, p) \ UCS2_GET_NAMING(namePages, (unsigned char)p[0], (unsigned char)p[1]) #define BIG2_IS_NMSTRT_CHAR_MINBPC(enc, p) \ UCS2_GET_NAMING(nmstrtPages, (unsigned char)p[0], (unsigned char)p[1]) #ifdef XML_MIN_SIZE static int big2_byteType(const ENCODING *enc, const char *p) { return BIG2_BYTE_TYPE(enc, p); } static int big2_byteToAscii(const ENCODING *enc, const char *p) { return BIG2_BYTE_TO_ASCII(enc, p); } static int big2_charMatches(const ENCODING *enc, const char *p, int c) { return BIG2_CHAR_MATCHES(enc, p, c); } static int big2_isNameMin(const ENCODING *enc, const char *p) { return BIG2_IS_NAME_CHAR_MINBPC(enc, p); } static int big2_isNmstrtMin(const ENCODING *enc, const char *p) { return BIG2_IS_NMSTRT_CHAR_MINBPC(enc, p); } #undef VTABLE #define VTABLE VTABLE1, big2_toUtf8, big2_toUtf16 #else /* not XML_MIN_SIZE */ #undef PREFIX #define PREFIX(ident) big2_ ## ident #define MINBPC(enc) 2 /* CHAR_MATCHES is guaranteed to have MINBPC bytes available. */ #define BYTE_TYPE(enc, p) BIG2_BYTE_TYPE(enc, p) #define BYTE_TO_ASCII(enc, p) BIG2_BYTE_TO_ASCII(enc, p) #define CHAR_MATCHES(enc, p, c) BIG2_CHAR_MATCHES(enc, p, c) #define IS_NAME_CHAR(enc, p, n) 0 #define IS_NAME_CHAR_MINBPC(enc, p) BIG2_IS_NAME_CHAR_MINBPC(enc, p) #define IS_NMSTRT_CHAR(enc, p, n) (0) #define IS_NMSTRT_CHAR_MINBPC(enc, p) BIG2_IS_NMSTRT_CHAR_MINBPC(enc, p) #include "xmltok_impl.c" #undef MINBPC #undef BYTE_TYPE #undef BYTE_TO_ASCII #undef CHAR_MATCHES #undef IS_NAME_CHAR #undef IS_NAME_CHAR_MINBPC #undef IS_NMSTRT_CHAR #undef IS_NMSTRT_CHAR_MINBPC #undef IS_INVALID_CHAR #endif /* not XML_MIN_SIZE */ #ifdef XML_NS static const struct normal_encoding big2_encoding_ns = { { VTABLE, 2, 0, #if XML_BYTE_ORDER == 21 1 #else 0 #endif }, { #include "asciitab.h" #include "latin1tab.h" }, STANDARD_VTABLE(big2_) }; #endif static const struct normal_encoding big2_encoding = { { VTABLE, 2, 0, #if XML_BYTE_ORDER == 21 1 #else 0 #endif }, { #define BT_COLON BT_NMSTRT #include "asciitab.h" #undef BT_COLON #include "latin1tab.h" }, STANDARD_VTABLE(big2_) }; #if XML_BYTE_ORDER != 12 #ifdef XML_NS static const struct normal_encoding internal_big2_encoding_ns = { { VTABLE, 2, 0, 1 }, { #include "iasciitab.h" #include "latin1tab.h" }, STANDARD_VTABLE(big2_) }; #endif static const struct normal_encoding internal_big2_encoding = { { VTABLE, 2, 0, 1 }, { #define BT_COLON BT_NMSTRT #include "iasciitab.h" #undef BT_COLON #include "latin1tab.h" }, STANDARD_VTABLE(big2_) }; #endif #undef PREFIX static int streqci(const char *s1, const char *s2) { for (;;) { char c1 = *s1++; char c2 = *s2++; if (ASCII_a <= c1 && c1 <= ASCII_z) c1 += ASCII_A - ASCII_a; if (ASCII_a <= c2 && c2 <= ASCII_z) c2 += ASCII_A - ASCII_a; if (c1 != c2) return 0; if (!c1) break; } return 1; } static void initUpdatePosition(const ENCODING *enc, const char *ptr, const char *end, POSITION *pos) { normal_updatePosition(&utf8_encoding.enc, ptr, end, pos); } static int toAscii(const ENCODING *enc, const char *ptr, const char *end) { char buf[1]; char *p = buf; XmlUtf8Convert(enc, &ptr, end, &p, p + 1); if (p == buf) return -1; else return buf[0]; } static int isSpace(int c) { switch (c) { case 0x20: case 0xD: case 0xA: case 0x9: return 1; } return 0; } /* Return 1 if there's just optional white space or there's an S followed by name=val. */ static int parsePseudoAttribute(const ENCODING *enc, const char *ptr, const char *end, const char **namePtr, const char **nameEndPtr, const char **valPtr, const char **nextTokPtr) { int c; char open; if (ptr == end) { *namePtr = 0; return 1; } if (!isSpace(toAscii(enc, ptr, end))) { *nextTokPtr = ptr; return 0; } do { ptr += enc->minBytesPerChar; } while (isSpace(toAscii(enc, ptr, end))); if (ptr == end) { *namePtr = 0; return 1; } *namePtr = ptr; for (;;) { c = toAscii(enc, ptr, end); if (c == -1) { *nextTokPtr = ptr; return 0; } if (c == ASCII_EQUALS) { *nameEndPtr = ptr; break; } if (isSpace(c)) { *nameEndPtr = ptr; do { ptr += enc->minBytesPerChar; } while (isSpace(c = toAscii(enc, ptr, end))); if (c != ASCII_EQUALS) { *nextTokPtr = ptr; return 0; } break; } ptr += enc->minBytesPerChar; } if (ptr == *namePtr) { *nextTokPtr = ptr; return 0; } ptr += enc->minBytesPerChar; c = toAscii(enc, ptr, end); while (isSpace(c)) { ptr += enc->minBytesPerChar; c = toAscii(enc, ptr, end); } if (c != ASCII_QUOT && c != ASCII_APOS) { *nextTokPtr = ptr; return 0; } open = c; ptr += enc->minBytesPerChar; *valPtr = ptr; for (;; ptr += enc->minBytesPerChar) { c = toAscii(enc, ptr, end); if (c == open) break; if (!(ASCII_a <= c && c <= ASCII_z) && !(ASCII_A <= c && c <= ASCII_Z) && !(ASCII_0 <= c && c <= ASCII_9) && c != ASCII_PERIOD && c != ASCII_MINUS && c != ASCII_UNDERSCORE) { *nextTokPtr = ptr; return 0; } } *nextTokPtr = ptr + enc->minBytesPerChar; return 1; } static const char KW_version[] = { ASCII_v, ASCII_e, ASCII_r, ASCII_s, ASCII_i, ASCII_o, ASCII_n, '\0' }; static const char KW_encoding[] = { ASCII_e, ASCII_n, ASCII_c, ASCII_o, ASCII_d, ASCII_i, ASCII_n, ASCII_g, '\0' }; static const char KW_standalone[] = { ASCII_s, ASCII_t, ASCII_a, ASCII_n, ASCII_d, ASCII_a, ASCII_l, ASCII_o, ASCII_n, ASCII_e, '\0' }; static const char KW_yes[] = { ASCII_y, ASCII_e, ASCII_s, '\0' }; static const char KW_no[] = { ASCII_n, ASCII_o, '\0' }; static int doParseXmlDecl(const ENCODING *(*encodingFinder)(const ENCODING *, const char *, const char *), int isGeneralTextEntity, const ENCODING *enc, const char *ptr, const char *end, const char **badPtr, const char **versionPtr, const char **encodingName, const ENCODING **encoding, int *standalone) { const char *val = 0; const char *name = 0; const char *nameEnd = 0; ptr += 5 * enc->minBytesPerChar; end -= 2 * enc->minBytesPerChar; if (!parsePseudoAttribute(enc, ptr, end, &name, &nameEnd, &val, &ptr) || !name) { *badPtr = ptr; return 0; } if (!XmlNameMatchesAscii(enc, name, nameEnd, KW_version)) { if (!isGeneralTextEntity) { *badPtr = name; return 0; } } else { if (versionPtr) *versionPtr = val; if (!parsePseudoAttribute(enc, ptr, end, &name, &nameEnd, &val, &ptr)) { *badPtr = ptr; return 0; } if (!name) { if (isGeneralTextEntity) { /* a TextDecl must have an EncodingDecl */ *badPtr = ptr; return 0; } return 1; } } if (XmlNameMatchesAscii(enc, name, nameEnd, KW_encoding)) { int c = toAscii(enc, val, end); if (!(ASCII_a <= c && c <= ASCII_z) && !(ASCII_A <= c && c <= ASCII_Z)) { *badPtr = val; return 0; } if (encodingName) *encodingName = val; if (encoding) *encoding = encodingFinder(enc, val, ptr - enc->minBytesPerChar); if (!parsePseudoAttribute(enc, ptr, end, &name, &nameEnd, &val, &ptr)) { *badPtr = ptr; return 0; } if (!name) return 1; } if (!XmlNameMatchesAscii(enc, name, nameEnd, KW_standalone) || isGeneralTextEntity) { *badPtr = name; return 0; } if (XmlNameMatchesAscii(enc, val, ptr - enc->minBytesPerChar, KW_yes)) { if (standalone) *standalone = 1; } else if (XmlNameMatchesAscii(enc, val, ptr - enc->minBytesPerChar, KW_no)) { if (standalone) *standalone = 0; } else { *badPtr = val; return 0; } while (isSpace(toAscii(enc, ptr, end))) ptr += enc->minBytesPerChar; if (ptr != end) { *badPtr = ptr; return 0; } return 1; } static int checkCharRefNumber(int result) { switch (result >> 8) { case 0xD8: case 0xD9: case 0xDA: case 0xDB: case 0xDC: case 0xDD: case 0xDE: case 0xDF: return -1; case 0: if (latin1_encoding.type[result] == BT_NONXML) return -1; break; case 0xFF: if (result == 0xFFFE || result == 0xFFFF) return -1; break; } return result; } int XmlUtf8Encode(int c, char *buf) { enum { /* minN is minimum legal resulting value for N byte sequence */ min2 = 0x80, min3 = 0x800, min4 = 0x10000 }; if (c < 0) return 0; if (c < min2) { buf[0] = (c | UTF8_cval1); return 1; } if (c < min3) { buf[0] = ((c >> 6) | UTF8_cval2); buf[1] = ((c & 0x3f) | 0x80); return 2; } if (c < min4) { buf[0] = ((c >> 12) | UTF8_cval3); buf[1] = (((c >> 6) & 0x3f) | 0x80); buf[2] = ((c & 0x3f) | 0x80); return 3; } if (c < 0x110000) { buf[0] = ((c >> 18) | UTF8_cval4); buf[1] = (((c >> 12) & 0x3f) | 0x80); buf[2] = (((c >> 6) & 0x3f) | 0x80); buf[3] = ((c & 0x3f) | 0x80); return 4; } return 0; } int XmlUtf16Encode(int charNum, unsigned short *buf) { if (charNum < 0) return 0; if (charNum < 0x10000) { buf[0] = charNum; return 1; } if (charNum < 0x110000) { charNum -= 0x10000; buf[0] = (charNum >> 10) + 0xD800; buf[1] = (charNum & 0x3FF) + 0xDC00; return 2; } return 0; } struct unknown_encoding { struct normal_encoding normal; int (*convert)(void *userData, const char *p); void *userData; unsigned short utf16[256]; char utf8[256][4]; }; int XmlSizeOfUnknownEncoding(void) { return sizeof(struct unknown_encoding); } static int unknown_isName(const ENCODING *enc, const char *p) { int c = ((const struct unknown_encoding *)enc) ->convert(((const struct unknown_encoding *)enc)->userData, p); if (c & ~0xFFFF) return 0; return UCS2_GET_NAMING(namePages, c >> 8, c & 0xFF); } static int unknown_isNmstrt(const ENCODING *enc, const char *p) { int c = ((const struct unknown_encoding *)enc) ->convert(((const struct unknown_encoding *)enc)->userData, p); if (c & ~0xFFFF) return 0; return UCS2_GET_NAMING(nmstrtPages, c >> 8, c & 0xFF); } static int unknown_isInvalid(const ENCODING *enc, const char *p) { int c = ((const struct unknown_encoding *)enc) ->convert(((const struct unknown_encoding *)enc)->userData, p); return (c & ~0xFFFF) || checkCharRefNumber(c) < 0; } static void unknown_toUtf8(const ENCODING *enc, const char **fromP, const char *fromLim, char **toP, const char *toLim) { char buf[XML_UTF8_ENCODE_MAX]; for (;;) { const char *utf8; int n; if (*fromP == fromLim) break; utf8 = ((const struct unknown_encoding *)enc)->utf8[(unsigned char)**fromP]; n = *utf8++; if (n == 0) { int c = ((const struct unknown_encoding *)enc) ->convert(((const struct unknown_encoding *)enc)->userData, *fromP); n = XmlUtf8Encode(c, buf); if (n > toLim - *toP) break; utf8 = buf; *fromP += ((const struct normal_encoding *)enc)->type[(unsigned char)**fromP] - (BT_LEAD2 - 2); } else { if (n > toLim - *toP) break; (*fromP)++; } do { *(*toP)++ = *utf8++; } while (--n != 0); } } static void unknown_toUtf16(const ENCODING *enc, const char **fromP, const char *fromLim, unsigned short **toP, const unsigned short *toLim) { while (*fromP != fromLim && *toP != toLim) { unsigned short c = ((const struct unknown_encoding *)enc)->utf16[(unsigned char)**fromP]; if (c == 0) { c = (unsigned short)((const struct unknown_encoding *)enc) ->convert(((const struct unknown_encoding *)enc)->userData, *fromP); *fromP += ((const struct normal_encoding *)enc)->type[(unsigned char)**fromP] - (BT_LEAD2 - 2); } else (*fromP)++; *(*toP)++ = c; } } ENCODING * XmlInitUnknownEncoding(void *mem, int *table, int (*convert)(void *userData, const char *p), void *userData) { int i; struct unknown_encoding *e = mem; for (i = 0; i < (int)sizeof(struct normal_encoding); i++) ((char *)mem)[i] = ((char *)&latin1_encoding)[i]; for (i = 0; i < 128; i++) if (latin1_encoding.type[i] != BT_OTHER && latin1_encoding.type[i] != BT_NONXML && table[i] != i) return 0; for (i = 0; i < 256; i++) { int c = table[i]; if (c == -1) { e->normal.type[i] = BT_MALFORM; /* This shouldn't really get used. */ e->utf16[i] = 0xFFFF; e->utf8[i][0] = 1; e->utf8[i][1] = 0; } else if (c < 0) { if (c < -4) return 0; e->normal.type[i] = BT_LEAD2 - (c + 2); e->utf8[i][0] = 0; e->utf16[i] = 0; } else if (c < 0x80) { if (latin1_encoding.type[c] != BT_OTHER && latin1_encoding.type[c] != BT_NONXML && c != i) return 0; e->normal.type[i] = latin1_encoding.type[c]; e->utf8[i][0] = 1; e->utf8[i][1] = (char)c; e->utf16[i] = c == 0 ? 0xFFFF : c; } else if (checkCharRefNumber(c) < 0) { e->normal.type[i] = BT_NONXML; /* This shouldn't really get used. */ e->utf16[i] = 0xFFFF; e->utf8[i][0] = 1; e->utf8[i][1] = 0; } else { if (c > 0xFFFF) return 0; if (UCS2_GET_NAMING(nmstrtPages, c >> 8, c & 0xff)) e->normal.type[i] = BT_NMSTRT; else if (UCS2_GET_NAMING(namePages, c >> 8, c & 0xff)) e->normal.type[i] = BT_NAME; else e->normal.type[i] = BT_OTHER; e->utf8[i][0] = (char)XmlUtf8Encode(c, e->utf8[i] + 1); e->utf16[i] = c; } } e->userData = userData; e->convert = convert; if (convert) { e->normal.isName2 = unknown_isName; e->normal.isName3 = unknown_isName; e->normal.isName4 = unknown_isName; e->normal.isNmstrt2 = unknown_isNmstrt; e->normal.isNmstrt3 = unknown_isNmstrt; e->normal.isNmstrt4 = unknown_isNmstrt; e->normal.isInvalid2 = unknown_isInvalid; e->normal.isInvalid3 = unknown_isInvalid; e->normal.isInvalid4 = unknown_isInvalid; } e->normal.enc.utf8Convert = unknown_toUtf8; e->normal.enc.utf16Convert = unknown_toUtf16; return &(e->normal.enc); } /* If this enumeration is changed, getEncodingIndex and encodings must also be changed. */ enum { UNKNOWN_ENC = -1, ISO_8859_1_ENC = 0, US_ASCII_ENC, UTF_8_ENC, UTF_16_ENC, UTF_16BE_ENC, UTF_16LE_ENC, /* must match encodingNames up to here */ NO_ENC }; static const char KW_ISO_8859_1[] = { ASCII_I, ASCII_S, ASCII_O, ASCII_MINUS, ASCII_8, ASCII_8, ASCII_5, ASCII_9, ASCII_MINUS, ASCII_1, '\0' }; static const char KW_US_ASCII[] = { ASCII_U, ASCII_S, ASCII_MINUS, ASCII_A, ASCII_S, ASCII_C, ASCII_I, ASCII_I, '\0' }; static const char KW_UTF_8[] = { ASCII_U, ASCII_T, ASCII_F, ASCII_MINUS, ASCII_8, '\0' }; static const char KW_UTF_16[] = { ASCII_U, ASCII_T, ASCII_F, ASCII_MINUS, ASCII_1, ASCII_6, '\0' }; static const char KW_UTF_16BE[] = { ASCII_U, ASCII_T, ASCII_F, ASCII_MINUS, ASCII_1, ASCII_6, ASCII_B, ASCII_E, '\0' }; static const char KW_UTF_16LE[] = { ASCII_U, ASCII_T, ASCII_F, ASCII_MINUS, ASCII_1, ASCII_6, ASCII_L, ASCII_E, '\0' }; static int getEncodingIndex(const char *name) { static const char *encodingNames[] = { KW_ISO_8859_1, KW_US_ASCII, KW_UTF_8, KW_UTF_16, KW_UTF_16BE, KW_UTF_16LE, }; int i; if (name == 0) return NO_ENC; for (i = 0; i < (int)(sizeof(encodingNames)/sizeof(encodingNames[0])); i++) if (streqci(name, encodingNames[i])) return i; return UNKNOWN_ENC; } /* For binary compatibility, we store the index of the encoding specified at initialization in the isUtf16 member. */ #define INIT_ENC_INDEX(enc) ((int)(enc)->initEnc.isUtf16) #define SET_INIT_ENC_INDEX(enc, i) ((enc)->initEnc.isUtf16 = (char)i) /* This is what detects the encoding. encodingTable maps from encoding indices to encodings; INIT_ENC_INDEX(enc) is the index of the external (protocol) specified encoding; state is XML_CONTENT_STATE if we're parsing an external text entity, and XML_PROLOG_STATE otherwise. */ static int initScan(const ENCODING **encodingTable, const INIT_ENCODING *enc, int state, const char *ptr, const char *end, const char **nextTokPtr) { const ENCODING **encPtr; if (ptr == end) return XML_TOK_NONE; encPtr = enc->encPtr; if (ptr + 1 == end) { /* only a single byte available for auto-detection */ #ifndef XML_DTD /* FIXME */ /* a well-formed document entity must have more than one byte */ if (state != XML_CONTENT_STATE) return XML_TOK_PARTIAL; #endif /* so we're parsing an external text entity... */ /* if UTF-16 was externally specified, then we need at least 2 bytes */ switch (INIT_ENC_INDEX(enc)) { case UTF_16_ENC: case UTF_16LE_ENC: case UTF_16BE_ENC: return XML_TOK_PARTIAL; } switch ((unsigned char)*ptr) { case 0xFE: case 0xFF: case 0xEF: /* possibly first byte of UTF-8 BOM */ if (INIT_ENC_INDEX(enc) == ISO_8859_1_ENC && state == XML_CONTENT_STATE) break; /* fall through */ case 0x00: case 0x3C: return XML_TOK_PARTIAL; } } else { switch (((unsigned char)ptr[0] << 8) | (unsigned char)ptr[1]) { case 0xFEFF: if (INIT_ENC_INDEX(enc) == ISO_8859_1_ENC && state == XML_CONTENT_STATE) break; *nextTokPtr = ptr + 2; *encPtr = encodingTable[UTF_16BE_ENC]; return XML_TOK_BOM; /* 00 3C is handled in the default case */ case 0x3C00: if ((INIT_ENC_INDEX(enc) == UTF_16BE_ENC || INIT_ENC_INDEX(enc) == UTF_16_ENC) && state == XML_CONTENT_STATE) break; *encPtr = encodingTable[UTF_16LE_ENC]; return XmlTok(*encPtr, state, ptr, end, nextTokPtr); case 0xFFFE: if (INIT_ENC_INDEX(enc) == ISO_8859_1_ENC && state == XML_CONTENT_STATE) break; *nextTokPtr = ptr + 2; *encPtr = encodingTable[UTF_16LE_ENC]; return XML_TOK_BOM; case 0xEFBB: /* Maybe a UTF-8 BOM (EF BB BF) */ /* If there's an explicitly specified (external) encoding of ISO-8859-1 or some flavour of UTF-16 and this is an external text entity, don't look for the BOM, because it might be a legal data. */ if (state == XML_CONTENT_STATE) { int e = INIT_ENC_INDEX(enc); if (e == ISO_8859_1_ENC || e == UTF_16BE_ENC || e == UTF_16LE_ENC || e == UTF_16_ENC) break; } if (ptr + 2 == end) return XML_TOK_PARTIAL; if ((unsigned char)ptr[2] == 0xBF) { *encPtr = encodingTable[UTF_8_ENC]; return XML_TOK_BOM; } break; default: if (ptr[0] == '\0') { /* 0 isn't a legal data character. Furthermore a document entity can only start with ASCII characters. So the only way this can fail to be big-endian UTF-16 if it it's an external parsed general entity that's labelled as UTF-16LE. */ if (state == XML_CONTENT_STATE && INIT_ENC_INDEX(enc) == UTF_16LE_ENC) break; *encPtr = encodingTable[UTF_16BE_ENC]; return XmlTok(*encPtr, state, ptr, end, nextTokPtr); } else if (ptr[1] == '\0') { /* We could recover here in the case: - parsing an external entity - second byte is 0 - no externally specified encoding - no encoding declaration by assuming UTF-16LE. But we don't, because this would mean when presented just with a single byte, we couldn't reliably determine whether we needed further bytes. */ if (state == XML_CONTENT_STATE) break; *encPtr = encodingTable[UTF_16LE_ENC]; return XmlTok(*encPtr, state, ptr, end, nextTokPtr); } break; } } *encPtr = encodingTable[INIT_ENC_INDEX(enc)]; return XmlTok(*encPtr, state, ptr, end, nextTokPtr); } #define NS(x) x #define ns(x) x #include "xmltok_ns.c" #undef NS #undef ns #ifdef XML_NS #define NS(x) x ## NS #define ns(x) x ## _ns #include "xmltok_ns.c" #undef NS #undef ns ENCODING * XmlInitUnknownEncodingNS(void *mem, int *table, int (*convert)(void *userData, const char *p), void *userData) { ENCODING *enc = XmlInitUnknownEncoding(mem, table, convert, userData); if (enc) ((struct normal_encoding *)enc)->type[ASCII_COLON] = BT_COLON; return enc; } #endif /* XML_NS */ swish-e-2.4.7/src/expat/xmltok/asciitab.h0000775000077100017500000000334111166010107015203 00000000000000/* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd See the file copying.txt for copying permission. */ /* 0x00 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0x04 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0x08 */ BT_NONXML, BT_S, BT_LF, BT_NONXML, /* 0x0C */ BT_NONXML, BT_CR, BT_NONXML, BT_NONXML, /* 0x10 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0x14 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0x18 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0x1C */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0x20 */ BT_S, BT_EXCL, BT_QUOT, BT_NUM, /* 0x24 */ BT_OTHER, BT_PERCNT, BT_AMP, BT_APOS, /* 0x28 */ BT_LPAR, BT_RPAR, BT_AST, BT_PLUS, /* 0x2C */ BT_COMMA, BT_MINUS, BT_NAME, BT_SOL, /* 0x30 */ BT_DIGIT, BT_DIGIT, BT_DIGIT, BT_DIGIT, /* 0x34 */ BT_DIGIT, BT_DIGIT, BT_DIGIT, BT_DIGIT, /* 0x38 */ BT_DIGIT, BT_DIGIT, BT_COLON, BT_SEMI, /* 0x3C */ BT_LT, BT_EQUALS, BT_GT, BT_QUEST, /* 0x40 */ BT_OTHER, BT_HEX, BT_HEX, BT_HEX, /* 0x44 */ BT_HEX, BT_HEX, BT_HEX, BT_NMSTRT, /* 0x48 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x4C */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x50 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x54 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x58 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_LSQB, /* 0x5C */ BT_OTHER, BT_RSQB, BT_OTHER, BT_NMSTRT, /* 0x60 */ BT_OTHER, BT_HEX, BT_HEX, BT_HEX, /* 0x64 */ BT_HEX, BT_HEX, BT_HEX, BT_NMSTRT, /* 0x68 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x6C */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x70 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x74 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x78 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_OTHER, /* 0x7C */ BT_VERBAR, BT_OTHER, BT_OTHER, BT_OTHER, swish-e-2.4.7/src/expat/xmltok/xmltok.h0000775000077100017500000002347211166010107014751 00000000000000/* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd See the file copying.txt for copying permission. */ #ifndef XmlTok_INCLUDED #define XmlTok_INCLUDED 1 #ifdef __cplusplus extern "C" { #endif #ifndef XMLTOKAPI #define XMLTOKAPI /* as nothing */ #endif /* The following token may be returned by XmlContentTok */ #define XML_TOK_TRAILING_RSQB -5 /* ] or ]] at the end of the scan; might be start of illegal ]]> sequence */ /* The following tokens may be returned by both XmlPrologTok and XmlContentTok */ #define XML_TOK_NONE -4 /* The string to be scanned is empty */ #define XML_TOK_TRAILING_CR -3 /* A CR at the end of the scan; might be part of CRLF sequence */ #define XML_TOK_PARTIAL_CHAR -2 /* only part of a multibyte sequence */ #define XML_TOK_PARTIAL -1 /* only part of a token */ #define XML_TOK_INVALID 0 /* The following tokens are returned by XmlContentTok; some are also returned by XmlAttributeValueTok, XmlEntityTok, XmlCdataSectionTok */ #define XML_TOK_START_TAG_WITH_ATTS 1 #define XML_TOK_START_TAG_NO_ATTS 2 #define XML_TOK_EMPTY_ELEMENT_WITH_ATTS 3 /* empty element tag */ #define XML_TOK_EMPTY_ELEMENT_NO_ATTS 4 #define XML_TOK_END_TAG 5 #define XML_TOK_DATA_CHARS 6 #define XML_TOK_DATA_NEWLINE 7 #define XML_TOK_CDATA_SECT_OPEN 8 #define XML_TOK_ENTITY_REF 9 #define XML_TOK_CHAR_REF 10 /* numeric character reference */ /* The following tokens may be returned by both XmlPrologTok and XmlContentTok */ #define XML_TOK_PI 11 /* processing instruction */ #define XML_TOK_XML_DECL 12 /* XML decl or text decl */ #define XML_TOK_COMMENT 13 #define XML_TOK_BOM 14 /* Byte order mark */ /* The following tokens are returned only by XmlPrologTok */ #define XML_TOK_PROLOG_S 15 #define XML_TOK_DECL_OPEN 16 /* */ #define XML_TOK_NAME 18 #define XML_TOK_NMTOKEN 19 #define XML_TOK_POUND_NAME 20 /* #name */ #define XML_TOK_OR 21 /* | */ #define XML_TOK_PERCENT 22 #define XML_TOK_OPEN_PAREN 23 #define XML_TOK_CLOSE_PAREN 24 #define XML_TOK_OPEN_BRACKET 25 #define XML_TOK_CLOSE_BRACKET 26 #define XML_TOK_LITERAL 27 #define XML_TOK_PARAM_ENTITY_REF 28 #define XML_TOK_INSTANCE_START 29 /* The following occur only in element type declarations */ #define XML_TOK_NAME_QUESTION 30 /* name? */ #define XML_TOK_NAME_ASTERISK 31 /* name* */ #define XML_TOK_NAME_PLUS 32 /* name+ */ #define XML_TOK_COND_SECT_OPEN 33 /* */ #define XML_TOK_CLOSE_PAREN_QUESTION 35 /* )? */ #define XML_TOK_CLOSE_PAREN_ASTERISK 36 /* )* */ #define XML_TOK_CLOSE_PAREN_PLUS 37 /* )+ */ #define XML_TOK_COMMA 38 /* The following token is returned only by XmlAttributeValueTok */ #define XML_TOK_ATTRIBUTE_VALUE_S 39 /* The following token is returned only by XmlCdataSectionTok */ #define XML_TOK_CDATA_SECT_CLOSE 40 /* With namespace processing this is returned by XmlPrologTok for a name with a colon. */ #define XML_TOK_PREFIXED_NAME 41 #ifdef XML_DTD #define XML_TOK_IGNORE_SECT 42 #endif /* XML_DTD */ #ifdef XML_DTD #define XML_N_STATES 4 #else /* not XML_DTD */ #define XML_N_STATES 3 #endif /* not XML_DTD */ #define XML_PROLOG_STATE 0 #define XML_CONTENT_STATE 1 #define XML_CDATA_SECTION_STATE 2 #ifdef XML_DTD #define XML_IGNORE_SECTION_STATE 3 #endif /* XML_DTD */ #define XML_N_LITERAL_TYPES 2 #define XML_ATTRIBUTE_VALUE_LITERAL 0 #define XML_ENTITY_VALUE_LITERAL 1 /* The size of the buffer passed to XmlUtf8Encode must be at least this. */ #define XML_UTF8_ENCODE_MAX 4 /* The size of the buffer passed to XmlUtf16Encode must be at least this. */ #define XML_UTF16_ENCODE_MAX 2 typedef struct position { /* first line and first column are 0 not 1 */ unsigned long lineNumber; unsigned long columnNumber; } POSITION; typedef struct { const char *name; const char *valuePtr; const char *valueEnd; char normalized; } ATTRIBUTE; struct encoding; typedef struct encoding ENCODING; struct encoding { int (*scanners[XML_N_STATES])(const ENCODING *, const char *, const char *, const char **); int (*literalScanners[XML_N_LITERAL_TYPES])(const ENCODING *, const char *, const char *, const char **); int (*sameName)(const ENCODING *, const char *, const char *); int (*nameMatchesAscii)(const ENCODING *, const char *, const char *, const char *); int (*nameLength)(const ENCODING *, const char *); const char *(*skipS)(const ENCODING *, const char *); int (*getAtts)(const ENCODING *enc, const char *ptr, int attsMax, ATTRIBUTE *atts); int (*charRefNumber)(const ENCODING *enc, const char *ptr); int (*predefinedEntityName)(const ENCODING *, const char *, const char *); void (*updatePosition)(const ENCODING *, const char *ptr, const char *end, POSITION *); int (*isPublicId)(const ENCODING *enc, const char *ptr, const char *end, const char **badPtr); void (*utf8Convert)(const ENCODING *enc, const char **fromP, const char *fromLim, char **toP, const char *toLim); void (*utf16Convert)(const ENCODING *enc, const char **fromP, const char *fromLim, unsigned short **toP, const unsigned short *toLim); int minBytesPerChar; char isUtf8; char isUtf16; }; /* Scan the string starting at ptr until the end of the next complete token, but do not scan past eptr. Return an integer giving the type of token. Return XML_TOK_NONE when ptr == eptr; nextTokPtr will not be set. Return XML_TOK_PARTIAL when the string does not contain a complete token; nextTokPtr will not be set. Return XML_TOK_INVALID when the string does not start a valid token; nextTokPtr will be set to point to the character which made the token invalid. Otherwise the string starts with a valid token; nextTokPtr will be set to point to the character following the end of that token. Each data character counts as a single token, but adjacent data characters may be returned together. Similarly for characters in the prolog outside literals, comments and processing instructions. */ #define XmlTok(enc, state, ptr, end, nextTokPtr) \ (((enc)->scanners[state])(enc, ptr, end, nextTokPtr)) #define XmlPrologTok(enc, ptr, end, nextTokPtr) \ XmlTok(enc, XML_PROLOG_STATE, ptr, end, nextTokPtr) #define XmlContentTok(enc, ptr, end, nextTokPtr) \ XmlTok(enc, XML_CONTENT_STATE, ptr, end, nextTokPtr) #define XmlCdataSectionTok(enc, ptr, end, nextTokPtr) \ XmlTok(enc, XML_CDATA_SECTION_STATE, ptr, end, nextTokPtr) #ifdef XML_DTD #define XmlIgnoreSectionTok(enc, ptr, end, nextTokPtr) \ XmlTok(enc, XML_IGNORE_SECTION_STATE, ptr, end, nextTokPtr) #endif /* XML_DTD */ /* This is used for performing a 2nd-level tokenization on the content of a literal that has already been returned by XmlTok. */ #define XmlLiteralTok(enc, literalType, ptr, end, nextTokPtr) \ (((enc)->literalScanners[literalType])(enc, ptr, end, nextTokPtr)) #define XmlAttributeValueTok(enc, ptr, end, nextTokPtr) \ XmlLiteralTok(enc, XML_ATTRIBUTE_VALUE_LITERAL, ptr, end, nextTokPtr) #define XmlEntityValueTok(enc, ptr, end, nextTokPtr) \ XmlLiteralTok(enc, XML_ENTITY_VALUE_LITERAL, ptr, end, nextTokPtr) #define XmlSameName(enc, ptr1, ptr2) (((enc)->sameName)(enc, ptr1, ptr2)) #define XmlNameMatchesAscii(enc, ptr1, end1, ptr2) \ (((enc)->nameMatchesAscii)(enc, ptr1, end1, ptr2)) #define XmlNameLength(enc, ptr) \ (((enc)->nameLength)(enc, ptr)) #define XmlSkipS(enc, ptr) \ (((enc)->skipS)(enc, ptr)) #define XmlGetAttributes(enc, ptr, attsMax, atts) \ (((enc)->getAtts)(enc, ptr, attsMax, atts)) #define XmlCharRefNumber(enc, ptr) \ (((enc)->charRefNumber)(enc, ptr)) #define XmlPredefinedEntityName(enc, ptr, end) \ (((enc)->predefinedEntityName)(enc, ptr, end)) #define XmlUpdatePosition(enc, ptr, end, pos) \ (((enc)->updatePosition)(enc, ptr, end, pos)) #define XmlIsPublicId(enc, ptr, end, badPtr) \ (((enc)->isPublicId)(enc, ptr, end, badPtr)) #define XmlUtf8Convert(enc, fromP, fromLim, toP, toLim) \ (((enc)->utf8Convert)(enc, fromP, fromLim, toP, toLim)) #define XmlUtf16Convert(enc, fromP, fromLim, toP, toLim) \ (((enc)->utf16Convert)(enc, fromP, fromLim, toP, toLim)) typedef struct { ENCODING initEnc; const ENCODING **encPtr; } INIT_ENCODING; int XMLTOKAPI XmlParseXmlDecl(int isGeneralTextEntity, const ENCODING *enc, const char *ptr, const char *end, const char **badPtr, const char **versionPtr, const char **encodingNamePtr, const ENCODING **namedEncodingPtr, int *standalonePtr); int XMLTOKAPI XmlInitEncoding(INIT_ENCODING *, const ENCODING **, const char *name); const ENCODING XMLTOKAPI *XmlGetUtf8InternalEncoding(void); const ENCODING XMLTOKAPI *XmlGetUtf16InternalEncoding(void); int XMLTOKAPI XmlUtf8Encode(int charNumber, char *buf); int XMLTOKAPI XmlUtf16Encode(int charNumber, unsigned short *buf); int XMLTOKAPI XmlSizeOfUnknownEncoding(void); ENCODING XMLTOKAPI * XmlInitUnknownEncoding(void *mem, int *table, int (*conv)(void *userData, const char *p), void *userData); int XMLTOKAPI XmlParseXmlDeclNS(int isGeneralTextEntity, const ENCODING *enc, const char *ptr, const char *end, const char **badPtr, const char **versionPtr, const char **encodingNamePtr, const ENCODING **namedEncodingPtr, int *standalonePtr); int XMLTOKAPI XmlInitEncodingNS(INIT_ENCODING *, const ENCODING **, const char *name); const ENCODING XMLTOKAPI *XmlGetUtf8InternalEncodingNS(void); const ENCODING XMLTOKAPI *XmlGetUtf16InternalEncodingNS(void); ENCODING XMLTOKAPI * XmlInitUnknownEncodingNS(void *mem, int *table, int (*conv)(void *userData, const char *p), void *userData); #ifdef __cplusplus } #endif #endif /* not XmlTok_INCLUDED */ swish-e-2.4.7/src/expat/xmltok/dllmain.c0000775000077100017500000000043711166010107015042 00000000000000/* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd See the file copying.txt for copying permission. */ #define STRICT 1 #define WIN32_LEAN_AND_MEAN 1 #include BOOL WINAPI DllMain(HANDLE hInst, ULONG ul_reason_for_call, LPVOID lpReserved) { return TRUE; } swish-e-2.4.7/src/expat/xmltok/xmldef.h0000775000077100017500000000222011166010107014676 00000000000000/* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd See the file copying.txt for copying permission. */ #include #ifdef XML_WINLIB #define WIN32_LEAN_AND_MEAN #define STRICT #include #define malloc(x) HeapAlloc(GetProcessHeap(), 0, (x)) #define calloc(x, y) HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, (x)*(y)) #define free(x) HeapFree(GetProcessHeap(), 0, (x)) #define realloc(x, y) HeapReAlloc(GetProcessHeap(), 0, x, y) #define abort() /* as nothing */ #else /* not XML_WINLIB */ #include #endif /* not XML_WINLIB */ /* This file can be used for any definitions needed in particular environments. */ /* Mozilla specific defines */ #ifdef MOZILLA_CLIENT #include "nspr.h" #define malloc(x) PR_Malloc((size_t)(x)) #define realloc(x, y) PR_Realloc((x), (size_t)(y)) #define calloc(x, y) PR_Calloc((x),(y)) #define free(x) PR_Free(x) #if PR_BYTES_PER_INT != 4 #define int int32 #endif /* Enable Unicode string processing in expat. */ #ifndef XML_UNICODE #define XML_UNICODE #endif /* Enable external parameter entity parsing in expat */ #ifndef XML_DTD #define XML_DTD 1 #endif #endif /* MOZILLA_CLIENT */ swish-e-2.4.7/src/expat/xmltok/xmltok.dsp0000775000077100017500000001311711166010107015303 00000000000000# Microsoft Developer Studio Project File - Name="xmltok" - Package Owner=<4> # Microsoft Developer Studio Generated Build File, Format Version 6.00 # ** DO NOT EDIT ** # TARGTYPE "Win32 (x86) Dynamic-Link Library" 0x0102 CFG=xmltok - Win32 Release !MESSAGE This is not a valid makefile. To build this project using NMAKE, !MESSAGE use the Export Makefile command and run !MESSAGE !MESSAGE NMAKE /f "xmltok.mak". !MESSAGE !MESSAGE You can specify a configuration when running NMAKE !MESSAGE by defining the macro CFG on the command line. For example: !MESSAGE !MESSAGE NMAKE /f "xmltok.mak" CFG="xmltok - Win32 Release" !MESSAGE !MESSAGE Possible choices for configuration are: !MESSAGE !MESSAGE "xmltok - Win32 Release" (based on "Win32 (x86) Dynamic-Link Library") !MESSAGE "xmltok - Win32 Debug" (based on "Win32 (x86) Dynamic-Link Library") !MESSAGE # Begin Project # PROP AllowPerConfigDependencies 0 # PROP Scc_ProjName "" # PROP Scc_LocalPath "" CPP=cl.exe MTL=midl.exe RSC=rc.exe !IF "$(CFG)" == "xmltok - Win32 Release" # PROP BASE Use_MFC 0 # PROP BASE Use_Debug_Libraries 0 # PROP BASE Output_Dir ".\Release" # PROP BASE Intermediate_Dir ".\Release" # PROP BASE Target_Dir "." # PROP Use_MFC 0 # PROP Use_Debug_Libraries 0 # PROP Output_Dir ".\Release" # PROP Intermediate_Dir ".\Release" # PROP Ignore_Export_Lib 0 # PROP Target_Dir "." # ADD BASE CPP /nologo /MT /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_WINDOWS" /YX /c # ADD CPP /nologo /MT /W3 /GX /O2 /D "NDEBUG" /D "XML_NS" /D XMLTOKAPI=__declspec(dllexport) /D "WIN32" /D "_WINDOWS" /D "XML_DTD" /YX /FD /c # ADD BASE MTL /nologo /D "NDEBUG" /win32 # ADD MTL /nologo /D "NDEBUG" /mktyplib203 /win32 # ADD BASE RSC /l 0x809 /d "NDEBUG" # ADD RSC /l 0x809 /d "NDEBUG" BSC32=bscmake.exe # ADD BASE BSC32 /nologo # ADD BSC32 /nologo LINK32=link.exe # ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /dll /machine:I386 # ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /entry:"DllMain" /subsystem:windows /dll /machine:I386 /out:"..\bin\xmltok.dll" /link50compat # SUBTRACT LINK32 /pdb:none !ELSEIF "$(CFG)" == "xmltok - Win32 Debug" # PROP BASE Use_MFC 0 # PROP BASE Use_Debug_Libraries 1 # PROP BASE Output_Dir ".\Debug" # PROP BASE Intermediate_Dir ".\Debug" # PROP BASE Target_Dir "." # PROP Use_MFC 0 # PROP Use_Debug_Libraries 1 # PROP Output_Dir ".\Debug" # PROP Intermediate_Dir ".\Debug" # PROP Ignore_Export_Lib 0 # PROP Target_Dir "." # ADD BASE CPP /nologo /MTd /W3 /Gm /GX /Zi /Od /D "WIN32" /D "_DEBUG" /D "_WINDOWS" /YX /c # ADD CPP /nologo /MTd /W3 /Gm /GX /ZI /Od /D "_DEBUG" /D XMLTOKAPI=__declspec(dllexport) /D "WIN32" /D "_WINDOWS" /D "XML_DTD" /D "XML_NS" /YX /FD /c # ADD BASE MTL /nologo /D "_DEBUG" /win32 # ADD MTL /nologo /D "_DEBUG" /mktyplib203 /win32 # ADD BASE RSC /l 0x809 /d "_DEBUG" # ADD RSC /l 0x809 /d "_DEBUG" BSC32=bscmake.exe # ADD BASE BSC32 /nologo # ADD BSC32 /nologo LINK32=link.exe # ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /dll /debug /machine:I386 # ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /dll /debug /machine:I386 /out:"..\dbgbin\xmltok.dll" !ENDIF # Begin Target # Name "xmltok - Win32 Release" # Name "xmltok - Win32 Debug" # Begin Group "Source Files" # PROP Default_Filter "cpp;c;cxx;rc;def;r;odl;idl;hpj;bat;for;f90" # Begin Source File SOURCE=.\dllmain.c # End Source File # Begin Source File SOURCE=..\gennmtab\gennmtab.c !IF "$(CFG)" == "xmltok - Win32 Release" # PROP Ignore_Default_Tool 1 # Begin Custom Build - Creating nametab.h InputDir=\home\work\xmls\gennmtab OutDir=.\Release ProjDir=. InputPath=..\gennmtab\gennmtab.c "$(ProjDir)\nametab.h" : $(SOURCE) "$(INTDIR)" "$(OUTDIR)" $(InputDir)\$(OutDir)\gennmtab >$(ProjDir)\nametab.h # End Custom Build !ELSEIF "$(CFG)" == "xmltok - Win32 Debug" # PROP Ignore_Default_Tool 1 # Begin Custom Build - Creating nametab.h InputDir=\home\work\xmls\gennmtab OutDir=.\Debug ProjDir=. InputPath=..\gennmtab\gennmtab.c "$(ProjDir)\nametab.h" : $(SOURCE) "$(INTDIR)" "$(OUTDIR)" $(InputDir)\$(OutDir)\gennmtab >$(ProjDir)\nametab.h # End Custom Build !ENDIF # End Source File # Begin Source File SOURCE=.\xmlrole.c # End Source File # Begin Source File SOURCE=.\xmltok.c # End Source File # End Group # Begin Group "Header Files" # PROP Default_Filter "h;hpp;hxx;hm;inl;fi;fd" # Begin Source File SOURCE=.\asciitab.h # End Source File # Begin Source File SOURCE=.\iasciitab.h # End Source File # Begin Source File SOURCE=.\latin1tab.h # End Source File # Begin Source File SOURCE=.\nametab.h # End Source File # Begin Source File SOURCE=.\utf8tab.h # End Source File # Begin Source File SOURCE=.\xmldef.h # End Source File # Begin Source File SOURCE=.\xmlrole.h # End Source File # Begin Source File SOURCE=.\xmltok.h # End Source File # Begin Source File SOURCE=.\xmltok_impl.c # PROP BASE Exclude_From_Build 1 # PROP Exclude_From_Build 1 # End Source File # Begin Source File SOURCE=.\xmltok_impl.h # End Source File # Begin Source File SOURCE=.\xmltok_ns.c # PROP Exclude_From_Build 1 # End Source File # End Group # Begin Group "Resource Files" # PROP Default_Filter "ico;cur;bmp;dlg;rc2;rct;bin;cnt;rtf;gif;jpg;jpeg;jpe" # End Group # End Target # End Project swish-e-2.4.7/src/expat/xmltok/xmlrole.c0000775000077100017500000006752311166010107015115 00000000000000/* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd See the file copying.txt for copying permission. */ #include "xmldef.h" #include "xmlrole.h" #include "ascii.h" /* Doesn't check: that ,| are not mixed in a model group content of literals */ static const char KW_ANY[] = { ASCII_A, ASCII_N, ASCII_Y, '\0' }; static const char KW_ATTLIST[] = { ASCII_A, ASCII_T, ASCII_T, ASCII_L, ASCII_I, ASCII_S, ASCII_T, '\0' }; static const char KW_CDATA[] = { ASCII_C, ASCII_D, ASCII_A, ASCII_T, ASCII_A, '\0' }; static const char KW_DOCTYPE[] = { ASCII_D, ASCII_O, ASCII_C, ASCII_T, ASCII_Y, ASCII_P, ASCII_E, '\0' }; static const char KW_ELEMENT[] = { ASCII_E, ASCII_L, ASCII_E, ASCII_M, ASCII_E, ASCII_N, ASCII_T, '\0' }; static const char KW_EMPTY[] = { ASCII_E, ASCII_M, ASCII_P, ASCII_T, ASCII_Y, '\0' }; static const char KW_ENTITIES[] = { ASCII_E, ASCII_N, ASCII_T, ASCII_I, ASCII_T, ASCII_I, ASCII_E, ASCII_S, '\0' }; static const char KW_ENTITY[] = { ASCII_E, ASCII_N, ASCII_T, ASCII_I, ASCII_T, ASCII_Y, '\0' }; static const char KW_FIXED[] = { ASCII_F, ASCII_I, ASCII_X, ASCII_E, ASCII_D, '\0' }; static const char KW_ID[] = { ASCII_I, ASCII_D, '\0' }; static const char KW_IDREF[] = { ASCII_I, ASCII_D, ASCII_R, ASCII_E, ASCII_F, '\0' }; static const char KW_IDREFS[] = { ASCII_I, ASCII_D, ASCII_R, ASCII_E, ASCII_F, ASCII_S, '\0' }; static const char KW_IGNORE[] = { ASCII_I, ASCII_G, ASCII_N, ASCII_O, ASCII_R, ASCII_E, '\0' }; static const char KW_IMPLIED[] = { ASCII_I, ASCII_M, ASCII_P, ASCII_L, ASCII_I, ASCII_E, ASCII_D, '\0' }; static const char KW_INCLUDE[] = { ASCII_I, ASCII_N, ASCII_C, ASCII_L, ASCII_U, ASCII_D, ASCII_E, '\0' }; static const char KW_NDATA[] = { ASCII_N, ASCII_D, ASCII_A, ASCII_T, ASCII_A, '\0' }; static const char KW_NMTOKEN[] = { ASCII_N, ASCII_M, ASCII_T, ASCII_O, ASCII_K, ASCII_E, ASCII_N, '\0' }; static const char KW_NMTOKENS[] = { ASCII_N, ASCII_M, ASCII_T, ASCII_O, ASCII_K, ASCII_E, ASCII_N, ASCII_S, '\0' }; static const char KW_NOTATION[] = { ASCII_N, ASCII_O, ASCII_T, ASCII_A, ASCII_T, ASCII_I, ASCII_O, ASCII_N, '\0' }; static const char KW_PCDATA[] = { ASCII_P, ASCII_C, ASCII_D, ASCII_A, ASCII_T, ASCII_A, '\0' }; static const char KW_PUBLIC[] = { ASCII_P, ASCII_U, ASCII_B, ASCII_L, ASCII_I, ASCII_C, '\0' }; static const char KW_REQUIRED[] = { ASCII_R, ASCII_E, ASCII_Q, ASCII_U, ASCII_I, ASCII_R, ASCII_E, ASCII_D, '\0' }; static const char KW_SYSTEM[] = { ASCII_S, ASCII_Y, ASCII_S, ASCII_T, ASCII_E, ASCII_M, '\0' }; #ifndef MIN_BYTES_PER_CHAR #define MIN_BYTES_PER_CHAR(enc) ((enc)->minBytesPerChar) #endif #ifdef XML_DTD #define setTopLevel(state) \ ((state)->handler = ((state)->documentEntity \ ? internalSubset \ : externalSubset1)) #else /* not XML_DTD */ #define setTopLevel(state) ((state)->handler = internalSubset) #endif /* not XML_DTD */ typedef int PROLOG_HANDLER(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc); static PROLOG_HANDLER prolog0, prolog1, prolog2, doctype0, doctype1, doctype2, doctype3, doctype4, doctype5, internalSubset, entity0, entity1, entity2, entity3, entity4, entity5, entity6, entity7, entity8, entity9, notation0, notation1, notation2, notation3, notation4, attlist0, attlist1, attlist2, attlist3, attlist4, attlist5, attlist6, attlist7, attlist8, attlist9, element0, element1, element2, element3, element4, element5, element6, element7, #ifdef XML_DTD externalSubset0, externalSubset1, condSect0, condSect1, condSect2, #endif /* XML_DTD */ declClose, error; static int common(PROLOG_STATE *state, int tok); static int prolog0(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: state->handler = prolog1; return XML_ROLE_NONE; case XML_TOK_XML_DECL: state->handler = prolog1; return XML_ROLE_XML_DECL; case XML_TOK_PI: state->handler = prolog1; return XML_ROLE_NONE; case XML_TOK_COMMENT: state->handler = prolog1; case XML_TOK_BOM: return XML_ROLE_NONE; case XML_TOK_DECL_OPEN: if (!XmlNameMatchesAscii(enc, ptr + 2 * MIN_BYTES_PER_CHAR(enc), end, KW_DOCTYPE)) break; state->handler = doctype0; return XML_ROLE_NONE; case XML_TOK_INSTANCE_START: state->handler = error; return XML_ROLE_INSTANCE_START; } return common(state, tok); } static int prolog1(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_PI: case XML_TOK_COMMENT: case XML_TOK_BOM: return XML_ROLE_NONE; case XML_TOK_DECL_OPEN: if (!XmlNameMatchesAscii(enc, ptr + 2 * MIN_BYTES_PER_CHAR(enc), end, KW_DOCTYPE)) break; state->handler = doctype0; return XML_ROLE_NONE; case XML_TOK_INSTANCE_START: state->handler = error; return XML_ROLE_INSTANCE_START; } return common(state, tok); } static int prolog2(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_PI: case XML_TOK_COMMENT: return XML_ROLE_NONE; case XML_TOK_INSTANCE_START: state->handler = error; return XML_ROLE_INSTANCE_START; } return common(state, tok); } static int doctype0(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: case XML_TOK_PREFIXED_NAME: state->handler = doctype1; return XML_ROLE_DOCTYPE_NAME; } return common(state, tok); } static int doctype1(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_OPEN_BRACKET: state->handler = internalSubset; return XML_ROLE_NONE; case XML_TOK_DECL_CLOSE: state->handler = prolog2; return XML_ROLE_DOCTYPE_CLOSE; case XML_TOK_NAME: if (XmlNameMatchesAscii(enc, ptr, end, KW_SYSTEM)) { state->handler = doctype3; return XML_ROLE_NONE; } if (XmlNameMatchesAscii(enc, ptr, end, KW_PUBLIC)) { state->handler = doctype2; return XML_ROLE_NONE; } break; } return common(state, tok); } static int doctype2(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = doctype3; return XML_ROLE_DOCTYPE_PUBLIC_ID; } return common(state, tok); } static int doctype3(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = doctype4; return XML_ROLE_DOCTYPE_SYSTEM_ID; } return common(state, tok); } static int doctype4(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_OPEN_BRACKET: state->handler = internalSubset; return XML_ROLE_NONE; case XML_TOK_DECL_CLOSE: state->handler = prolog2; return XML_ROLE_DOCTYPE_CLOSE; } return common(state, tok); } static int doctype5(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_DECL_CLOSE: state->handler = prolog2; return XML_ROLE_DOCTYPE_CLOSE; } return common(state, tok); } static int internalSubset(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_DECL_OPEN: if (XmlNameMatchesAscii(enc, ptr + 2 * MIN_BYTES_PER_CHAR(enc), end, KW_ENTITY)) { state->handler = entity0; return XML_ROLE_NONE; } if (XmlNameMatchesAscii(enc, ptr + 2 * MIN_BYTES_PER_CHAR(enc), end, KW_ATTLIST)) { state->handler = attlist0; return XML_ROLE_NONE; } if (XmlNameMatchesAscii(enc, ptr + 2 * MIN_BYTES_PER_CHAR(enc), end, KW_ELEMENT)) { state->handler = element0; return XML_ROLE_NONE; } if (XmlNameMatchesAscii(enc, ptr + 2 * MIN_BYTES_PER_CHAR(enc), end, KW_NOTATION)) { state->handler = notation0; return XML_ROLE_NONE; } break; case XML_TOK_PI: case XML_TOK_COMMENT: return XML_ROLE_NONE; case XML_TOK_PARAM_ENTITY_REF: return XML_ROLE_PARAM_ENTITY_REF; case XML_TOK_CLOSE_BRACKET: state->handler = doctype5; return XML_ROLE_NONE; } return common(state, tok); } #ifdef XML_DTD static int externalSubset0(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { state->handler = externalSubset1; if (tok == XML_TOK_XML_DECL) return XML_ROLE_TEXT_DECL; return externalSubset1(state, tok, ptr, end, enc); } static int externalSubset1(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_COND_SECT_OPEN: state->handler = condSect0; return XML_ROLE_NONE; case XML_TOK_COND_SECT_CLOSE: if (state->includeLevel == 0) break; state->includeLevel -= 1; return XML_ROLE_NONE; case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_CLOSE_BRACKET: break; case XML_TOK_NONE: if (state->includeLevel) break; return XML_ROLE_NONE; default: return internalSubset(state, tok, ptr, end, enc); } return common(state, tok); } #endif /* XML_DTD */ static int entity0(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_PERCENT: state->handler = entity1; return XML_ROLE_NONE; case XML_TOK_NAME: state->handler = entity2; return XML_ROLE_GENERAL_ENTITY_NAME; } return common(state, tok); } static int entity1(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: state->handler = entity7; return XML_ROLE_PARAM_ENTITY_NAME; } return common(state, tok); } static int entity2(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: if (XmlNameMatchesAscii(enc, ptr, end, KW_SYSTEM)) { state->handler = entity4; return XML_ROLE_NONE; } if (XmlNameMatchesAscii(enc, ptr, end, KW_PUBLIC)) { state->handler = entity3; return XML_ROLE_NONE; } break; case XML_TOK_LITERAL: state->handler = declClose; return XML_ROLE_ENTITY_VALUE; } return common(state, tok); } static int entity3(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = entity4; return XML_ROLE_ENTITY_PUBLIC_ID; } return common(state, tok); } static int entity4(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = entity5; return XML_ROLE_ENTITY_SYSTEM_ID; } return common(state, tok); } static int entity5(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_DECL_CLOSE: setTopLevel(state); return XML_ROLE_EXTERNAL_GENERAL_ENTITY_NO_NOTATION; case XML_TOK_NAME: if (XmlNameMatchesAscii(enc, ptr, end, KW_NDATA)) { state->handler = entity6; return XML_ROLE_NONE; } break; } return common(state, tok); } static int entity6(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: state->handler = declClose; return XML_ROLE_ENTITY_NOTATION_NAME; } return common(state, tok); } static int entity7(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: if (XmlNameMatchesAscii(enc, ptr, end, KW_SYSTEM)) { state->handler = entity9; return XML_ROLE_NONE; } if (XmlNameMatchesAscii(enc, ptr, end, KW_PUBLIC)) { state->handler = entity8; return XML_ROLE_NONE; } break; case XML_TOK_LITERAL: state->handler = declClose; return XML_ROLE_ENTITY_VALUE; } return common(state, tok); } static int entity8(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = entity9; return XML_ROLE_ENTITY_PUBLIC_ID; } return common(state, tok); } static int entity9(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = declClose; return XML_ROLE_ENTITY_SYSTEM_ID; } return common(state, tok); } static int notation0(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: state->handler = notation1; return XML_ROLE_NOTATION_NAME; } return common(state, tok); } static int notation1(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: if (XmlNameMatchesAscii(enc, ptr, end, KW_SYSTEM)) { state->handler = notation3; return XML_ROLE_NONE; } if (XmlNameMatchesAscii(enc, ptr, end, KW_PUBLIC)) { state->handler = notation2; return XML_ROLE_NONE; } break; } return common(state, tok); } static int notation2(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = notation4; return XML_ROLE_NOTATION_PUBLIC_ID; } return common(state, tok); } static int notation3(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = declClose; return XML_ROLE_NOTATION_SYSTEM_ID; } return common(state, tok); } static int notation4(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = declClose; return XML_ROLE_NOTATION_SYSTEM_ID; case XML_TOK_DECL_CLOSE: setTopLevel(state); return XML_ROLE_NOTATION_NO_SYSTEM_ID; } return common(state, tok); } static int attlist0(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: case XML_TOK_PREFIXED_NAME: state->handler = attlist1; return XML_ROLE_ATTLIST_ELEMENT_NAME; } return common(state, tok); } static int attlist1(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_DECL_CLOSE: setTopLevel(state); return XML_ROLE_NONE; case XML_TOK_NAME: case XML_TOK_PREFIXED_NAME: state->handler = attlist2; return XML_ROLE_ATTRIBUTE_NAME; } return common(state, tok); } static int attlist2(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: { static const char *types[] = { KW_CDATA, KW_ID, KW_IDREF, KW_IDREFS, KW_ENTITY, KW_ENTITIES, KW_NMTOKEN, KW_NMTOKENS, }; int i; for (i = 0; i < (int)(sizeof(types)/sizeof(types[0])); i++) if (XmlNameMatchesAscii(enc, ptr, end, types[i])) { state->handler = attlist8; return XML_ROLE_ATTRIBUTE_TYPE_CDATA + i; } } if (XmlNameMatchesAscii(enc, ptr, end, KW_NOTATION)) { state->handler = attlist5; return XML_ROLE_NONE; } break; case XML_TOK_OPEN_PAREN: state->handler = attlist3; return XML_ROLE_NONE; } return common(state, tok); } static int attlist3(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NMTOKEN: case XML_TOK_NAME: case XML_TOK_PREFIXED_NAME: state->handler = attlist4; return XML_ROLE_ATTRIBUTE_ENUM_VALUE; } return common(state, tok); } static int attlist4(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_CLOSE_PAREN: state->handler = attlist8; return XML_ROLE_NONE; case XML_TOK_OR: state->handler = attlist3; return XML_ROLE_NONE; } return common(state, tok); } static int attlist5(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_OPEN_PAREN: state->handler = attlist6; return XML_ROLE_NONE; } return common(state, tok); } static int attlist6(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: state->handler = attlist7; return XML_ROLE_ATTRIBUTE_NOTATION_VALUE; } return common(state, tok); } static int attlist7(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_CLOSE_PAREN: state->handler = attlist8; return XML_ROLE_NONE; case XML_TOK_OR: state->handler = attlist6; return XML_ROLE_NONE; } return common(state, tok); } /* default value */ static int attlist8(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_POUND_NAME: if (XmlNameMatchesAscii(enc, ptr + MIN_BYTES_PER_CHAR(enc), end, KW_IMPLIED)) { state->handler = attlist1; return XML_ROLE_IMPLIED_ATTRIBUTE_VALUE; } if (XmlNameMatchesAscii(enc, ptr + MIN_BYTES_PER_CHAR(enc), end, KW_REQUIRED)) { state->handler = attlist1; return XML_ROLE_REQUIRED_ATTRIBUTE_VALUE; } if (XmlNameMatchesAscii(enc, ptr + MIN_BYTES_PER_CHAR(enc), end, KW_FIXED)) { state->handler = attlist9; return XML_ROLE_NONE; } break; case XML_TOK_LITERAL: state->handler = attlist1; return XML_ROLE_DEFAULT_ATTRIBUTE_VALUE; } return common(state, tok); } static int attlist9(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = attlist1; return XML_ROLE_FIXED_ATTRIBUTE_VALUE; } return common(state, tok); } static int element0(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: case XML_TOK_PREFIXED_NAME: state->handler = element1; return XML_ROLE_ELEMENT_NAME; } return common(state, tok); } static int element1(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: if (XmlNameMatchesAscii(enc, ptr, end, KW_EMPTY)) { state->handler = declClose; return XML_ROLE_CONTENT_EMPTY; } if (XmlNameMatchesAscii(enc, ptr, end, KW_ANY)) { state->handler = declClose; return XML_ROLE_CONTENT_ANY; } break; case XML_TOK_OPEN_PAREN: state->handler = element2; state->level = 1; return XML_ROLE_GROUP_OPEN; } return common(state, tok); } static int element2(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_POUND_NAME: if (XmlNameMatchesAscii(enc, ptr + MIN_BYTES_PER_CHAR(enc), end, KW_PCDATA)) { state->handler = element3; return XML_ROLE_CONTENT_PCDATA; } break; case XML_TOK_OPEN_PAREN: state->level = 2; state->handler = element6; return XML_ROLE_GROUP_OPEN; case XML_TOK_NAME: case XML_TOK_PREFIXED_NAME: state->handler = element7; return XML_ROLE_CONTENT_ELEMENT; case XML_TOK_NAME_QUESTION: state->handler = element7; return XML_ROLE_CONTENT_ELEMENT_OPT; case XML_TOK_NAME_ASTERISK: state->handler = element7; return XML_ROLE_CONTENT_ELEMENT_REP; case XML_TOK_NAME_PLUS: state->handler = element7; return XML_ROLE_CONTENT_ELEMENT_PLUS; } return common(state, tok); } static int element3(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_CLOSE_PAREN: case XML_TOK_CLOSE_PAREN_ASTERISK: state->handler = declClose; return XML_ROLE_GROUP_CLOSE_REP; case XML_TOK_OR: state->handler = element4; return XML_ROLE_NONE; } return common(state, tok); } static int element4(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: case XML_TOK_PREFIXED_NAME: state->handler = element5; return XML_ROLE_CONTENT_ELEMENT; } return common(state, tok); } static int element5(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_CLOSE_PAREN_ASTERISK: state->handler = declClose; return XML_ROLE_GROUP_CLOSE_REP; case XML_TOK_OR: state->handler = element4; return XML_ROLE_NONE; } return common(state, tok); } static int element6(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_OPEN_PAREN: state->level += 1; return XML_ROLE_GROUP_OPEN; case XML_TOK_NAME: case XML_TOK_PREFIXED_NAME: state->handler = element7; return XML_ROLE_CONTENT_ELEMENT; case XML_TOK_NAME_QUESTION: state->handler = element7; return XML_ROLE_CONTENT_ELEMENT_OPT; case XML_TOK_NAME_ASTERISK: state->handler = element7; return XML_ROLE_CONTENT_ELEMENT_REP; case XML_TOK_NAME_PLUS: state->handler = element7; return XML_ROLE_CONTENT_ELEMENT_PLUS; } return common(state, tok); } static int element7(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_CLOSE_PAREN: state->level -= 1; if (state->level == 0) state->handler = declClose; return XML_ROLE_GROUP_CLOSE; case XML_TOK_CLOSE_PAREN_ASTERISK: state->level -= 1; if (state->level == 0) state->handler = declClose; return XML_ROLE_GROUP_CLOSE_REP; case XML_TOK_CLOSE_PAREN_QUESTION: state->level -= 1; if (state->level == 0) state->handler = declClose; return XML_ROLE_GROUP_CLOSE_OPT; case XML_TOK_CLOSE_PAREN_PLUS: state->level -= 1; if (state->level == 0) state->handler = declClose; return XML_ROLE_GROUP_CLOSE_PLUS; case XML_TOK_COMMA: state->handler = element6; return XML_ROLE_GROUP_SEQUENCE; case XML_TOK_OR: state->handler = element6; return XML_ROLE_GROUP_CHOICE; } return common(state, tok); } #ifdef XML_DTD static int condSect0(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: if (XmlNameMatchesAscii(enc, ptr, end, KW_INCLUDE)) { state->handler = condSect1; return XML_ROLE_NONE; } if (XmlNameMatchesAscii(enc, ptr, end, KW_IGNORE)) { state->handler = condSect2; return XML_ROLE_NONE; } break; } return common(state, tok); } static int condSect1(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_OPEN_BRACKET: state->handler = externalSubset1; state->includeLevel += 1; return XML_ROLE_NONE; } return common(state, tok); } static int condSect2(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_OPEN_BRACKET: state->handler = externalSubset1; return XML_ROLE_IGNORE_SECT; } return common(state, tok); } #endif /* XML_DTD */ static int declClose(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_DECL_CLOSE: setTopLevel(state); return XML_ROLE_NONE; } return common(state, tok); } #if 0 static int ignore(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_DECL_CLOSE: state->handler = internalSubset; return 0; default: return XML_ROLE_NONE; } return common(state, tok); } #endif static int error(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { return XML_ROLE_NONE; } static int common(PROLOG_STATE *state, int tok) { #ifdef XML_DTD if (!state->documentEntity && tok == XML_TOK_PARAM_ENTITY_REF) return XML_ROLE_INNER_PARAM_ENTITY_REF; #endif state->handler = error; return XML_ROLE_ERROR; } void XmlPrologStateInit(PROLOG_STATE *state) { state->handler = prolog0; #ifdef XML_DTD state->documentEntity = 1; state->includeLevel = 0; #endif /* XML_DTD */ } #ifdef XML_DTD void XmlPrologStateInitExternalEntity(PROLOG_STATE *state) { state->handler = externalSubset0; state->documentEntity = 0; state->includeLevel = 0; } #endif /* XML_DTD */ swish-e-2.4.7/src/expat/xmltok/latin1tab.h0000775000077100017500000000342611166010107015307 00000000000000/* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd See the file copying.txt for copying permission. */ /* 0x80 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0x84 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0x88 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0x8C */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0x90 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0x94 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0x98 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0x9C */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0xA0 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0xA4 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0xA8 */ BT_OTHER, BT_OTHER, BT_NMSTRT, BT_OTHER, /* 0xAC */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0xB0 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0xB4 */ BT_OTHER, BT_NMSTRT, BT_OTHER, BT_NAME, /* 0xB8 */ BT_OTHER, BT_OTHER, BT_NMSTRT, BT_OTHER, /* 0xBC */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0xC0 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xC4 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xC8 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xCC */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xD0 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xD4 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_OTHER, /* 0xD8 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xDC */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xE0 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xE4 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xE8 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xEC */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xF0 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xF4 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_OTHER, /* 0xF8 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xFC */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, swish-e-2.4.7/src/expat/xmltok/xmltok_ns.c0000775000077100017500000000456311166010106015443 00000000000000const ENCODING *NS(XmlGetUtf8InternalEncoding)(void) { return &ns(internal_utf8_encoding).enc; } const ENCODING *NS(XmlGetUtf16InternalEncoding)(void) { #if XML_BYTE_ORDER == 12 return &ns(internal_little2_encoding).enc; #elif XML_BYTE_ORDER == 21 return &ns(internal_big2_encoding).enc; #else const short n = 1; return *(const char *)&n ? &ns(internal_little2_encoding).enc : &ns(internal_big2_encoding).enc; #endif } static const ENCODING *NS(encodings)[] = { &ns(latin1_encoding).enc, &ns(ascii_encoding).enc, &ns(utf8_encoding).enc, &ns(big2_encoding).enc, &ns(big2_encoding).enc, &ns(little2_encoding).enc, &ns(utf8_encoding).enc /* NO_ENC */ }; static int NS(initScanProlog)(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { return initScan(NS(encodings), (const INIT_ENCODING *)enc, XML_PROLOG_STATE, ptr, end, nextTokPtr); } static int NS(initScanContent)(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { return initScan(NS(encodings), (const INIT_ENCODING *)enc, XML_CONTENT_STATE, ptr, end, nextTokPtr); } int NS(XmlInitEncoding)(INIT_ENCODING *p, const ENCODING **encPtr, const char *name) { int i = getEncodingIndex(name); if (i == UNKNOWN_ENC) return 0; SET_INIT_ENC_INDEX(p, i); p->initEnc.scanners[XML_PROLOG_STATE] = NS(initScanProlog); p->initEnc.scanners[XML_CONTENT_STATE] = NS(initScanContent); p->initEnc.updatePosition = initUpdatePosition; p->encPtr = encPtr; *encPtr = &(p->initEnc); return 1; } static const ENCODING *NS(findEncoding)(const ENCODING *enc, const char *ptr, const char *end) { #define ENCODING_MAX 128 char buf[ENCODING_MAX]; char *p = buf; int i; XmlUtf8Convert(enc, &ptr, end, &p, p + ENCODING_MAX - 1); if (ptr != end) return 0; *p = 0; if (streqci(buf, KW_UTF_16) && enc->minBytesPerChar == 2) return enc; i = getEncodingIndex(buf); if (i == UNKNOWN_ENC) return 0; return NS(encodings)[i]; } int NS(XmlParseXmlDecl)(int isGeneralTextEntity, const ENCODING *enc, const char *ptr, const char *end, const char **badPtr, const char **versionPtr, const char **encodingName, const ENCODING **encoding, int *standalone) { return doParseXmlDecl(NS(findEncoding), isGeneralTextEntity, enc, ptr, end, badPtr, versionPtr, encodingName, encoding, standalone); } swish-e-2.4.7/src/expat/xmltok/xmltok_impl.h0000775000077100017500000000123111166010106015756 00000000000000/* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd See the file copying.txt for copying permission. */ enum { BT_NONXML, BT_MALFORM, BT_LT, BT_AMP, BT_RSQB, BT_LEAD2, BT_LEAD3, BT_LEAD4, BT_TRAIL, BT_CR, BT_LF, BT_GT, BT_QUOT, BT_APOS, BT_EQUALS, BT_QUEST, BT_EXCL, BT_SOL, BT_SEMI, BT_NUM, BT_LSQB, BT_S, BT_NMSTRT, BT_COLON, BT_HEX, BT_DIGIT, BT_NAME, BT_MINUS, BT_OTHER, /* known not to be a name or name start character */ BT_NONASCII, /* might be a name or name start character */ BT_PERCNT, BT_LPAR, BT_RPAR, BT_AST, BT_PLUS, BT_COMMA, BT_VERBAR }; #include swish-e-2.4.7/src/expat/xmltok/utf8tab.h0000775000077100017500000000334411166010107015004 00000000000000/* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd See the file copying.txt for copying permission. */ /* 0x80 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0x84 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0x88 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0x8C */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0x90 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0x94 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0x98 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0x9C */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0xA0 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0xA4 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0xA8 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0xAC */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0xB0 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0xB4 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0xB8 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0xBC */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0xC0 */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2, /* 0xC4 */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2, /* 0xC8 */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2, /* 0xCC */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2, /* 0xD0 */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2, /* 0xD4 */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2, /* 0xD8 */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2, /* 0xDC */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2, /* 0xE0 */ BT_LEAD3, BT_LEAD3, BT_LEAD3, BT_LEAD3, /* 0xE4 */ BT_LEAD3, BT_LEAD3, BT_LEAD3, BT_LEAD3, /* 0xE8 */ BT_LEAD3, BT_LEAD3, BT_LEAD3, BT_LEAD3, /* 0xEC */ BT_LEAD3, BT_LEAD3, BT_LEAD3, BT_LEAD3, /* 0xF0 */ BT_LEAD4, BT_LEAD4, BT_LEAD4, BT_LEAD4, /* 0xF4 */ BT_LEAD4, BT_NONXML, BT_NONXML, BT_NONXML, /* 0xF8 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0xFC */ BT_NONXML, BT_NONXML, BT_MALFORM, BT_MALFORM, swish-e-2.4.7/src/expat/xmlrole.c0000664000077100017500000006752311166010107013574 00000000000000/* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd See the file copying.txt for copying permission. */ #include "xmldef.h" #include "xmlrole.h" #include "ascii.h" /* Doesn't check: that ,| are not mixed in a model group content of literals */ static const char KW_ANY[] = { ASCII_A, ASCII_N, ASCII_Y, '\0' }; static const char KW_ATTLIST[] = { ASCII_A, ASCII_T, ASCII_T, ASCII_L, ASCII_I, ASCII_S, ASCII_T, '\0' }; static const char KW_CDATA[] = { ASCII_C, ASCII_D, ASCII_A, ASCII_T, ASCII_A, '\0' }; static const char KW_DOCTYPE[] = { ASCII_D, ASCII_O, ASCII_C, ASCII_T, ASCII_Y, ASCII_P, ASCII_E, '\0' }; static const char KW_ELEMENT[] = { ASCII_E, ASCII_L, ASCII_E, ASCII_M, ASCII_E, ASCII_N, ASCII_T, '\0' }; static const char KW_EMPTY[] = { ASCII_E, ASCII_M, ASCII_P, ASCII_T, ASCII_Y, '\0' }; static const char KW_ENTITIES[] = { ASCII_E, ASCII_N, ASCII_T, ASCII_I, ASCII_T, ASCII_I, ASCII_E, ASCII_S, '\0' }; static const char KW_ENTITY[] = { ASCII_E, ASCII_N, ASCII_T, ASCII_I, ASCII_T, ASCII_Y, '\0' }; static const char KW_FIXED[] = { ASCII_F, ASCII_I, ASCII_X, ASCII_E, ASCII_D, '\0' }; static const char KW_ID[] = { ASCII_I, ASCII_D, '\0' }; static const char KW_IDREF[] = { ASCII_I, ASCII_D, ASCII_R, ASCII_E, ASCII_F, '\0' }; static const char KW_IDREFS[] = { ASCII_I, ASCII_D, ASCII_R, ASCII_E, ASCII_F, ASCII_S, '\0' }; static const char KW_IGNORE[] = { ASCII_I, ASCII_G, ASCII_N, ASCII_O, ASCII_R, ASCII_E, '\0' }; static const char KW_IMPLIED[] = { ASCII_I, ASCII_M, ASCII_P, ASCII_L, ASCII_I, ASCII_E, ASCII_D, '\0' }; static const char KW_INCLUDE[] = { ASCII_I, ASCII_N, ASCII_C, ASCII_L, ASCII_U, ASCII_D, ASCII_E, '\0' }; static const char KW_NDATA[] = { ASCII_N, ASCII_D, ASCII_A, ASCII_T, ASCII_A, '\0' }; static const char KW_NMTOKEN[] = { ASCII_N, ASCII_M, ASCII_T, ASCII_O, ASCII_K, ASCII_E, ASCII_N, '\0' }; static const char KW_NMTOKENS[] = { ASCII_N, ASCII_M, ASCII_T, ASCII_O, ASCII_K, ASCII_E, ASCII_N, ASCII_S, '\0' }; static const char KW_NOTATION[] = { ASCII_N, ASCII_O, ASCII_T, ASCII_A, ASCII_T, ASCII_I, ASCII_O, ASCII_N, '\0' }; static const char KW_PCDATA[] = { ASCII_P, ASCII_C, ASCII_D, ASCII_A, ASCII_T, ASCII_A, '\0' }; static const char KW_PUBLIC[] = { ASCII_P, ASCII_U, ASCII_B, ASCII_L, ASCII_I, ASCII_C, '\0' }; static const char KW_REQUIRED[] = { ASCII_R, ASCII_E, ASCII_Q, ASCII_U, ASCII_I, ASCII_R, ASCII_E, ASCII_D, '\0' }; static const char KW_SYSTEM[] = { ASCII_S, ASCII_Y, ASCII_S, ASCII_T, ASCII_E, ASCII_M, '\0' }; #ifndef MIN_BYTES_PER_CHAR #define MIN_BYTES_PER_CHAR(enc) ((enc)->minBytesPerChar) #endif #ifdef XML_DTD #define setTopLevel(state) \ ((state)->handler = ((state)->documentEntity \ ? internalSubset \ : externalSubset1)) #else /* not XML_DTD */ #define setTopLevel(state) ((state)->handler = internalSubset) #endif /* not XML_DTD */ typedef int PROLOG_HANDLER(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc); static PROLOG_HANDLER prolog0, prolog1, prolog2, doctype0, doctype1, doctype2, doctype3, doctype4, doctype5, internalSubset, entity0, entity1, entity2, entity3, entity4, entity5, entity6, entity7, entity8, entity9, notation0, notation1, notation2, notation3, notation4, attlist0, attlist1, attlist2, attlist3, attlist4, attlist5, attlist6, attlist7, attlist8, attlist9, element0, element1, element2, element3, element4, element5, element6, element7, #ifdef XML_DTD externalSubset0, externalSubset1, condSect0, condSect1, condSect2, #endif /* XML_DTD */ declClose, error; static int common(PROLOG_STATE *state, int tok); static int prolog0(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: state->handler = prolog1; return XML_ROLE_NONE; case XML_TOK_XML_DECL: state->handler = prolog1; return XML_ROLE_XML_DECL; case XML_TOK_PI: state->handler = prolog1; return XML_ROLE_NONE; case XML_TOK_COMMENT: state->handler = prolog1; case XML_TOK_BOM: return XML_ROLE_NONE; case XML_TOK_DECL_OPEN: if (!XmlNameMatchesAscii(enc, ptr + 2 * MIN_BYTES_PER_CHAR(enc), end, KW_DOCTYPE)) break; state->handler = doctype0; return XML_ROLE_NONE; case XML_TOK_INSTANCE_START: state->handler = error; return XML_ROLE_INSTANCE_START; } return common(state, tok); } static int prolog1(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_PI: case XML_TOK_COMMENT: case XML_TOK_BOM: return XML_ROLE_NONE; case XML_TOK_DECL_OPEN: if (!XmlNameMatchesAscii(enc, ptr + 2 * MIN_BYTES_PER_CHAR(enc), end, KW_DOCTYPE)) break; state->handler = doctype0; return XML_ROLE_NONE; case XML_TOK_INSTANCE_START: state->handler = error; return XML_ROLE_INSTANCE_START; } return common(state, tok); } static int prolog2(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_PI: case XML_TOK_COMMENT: return XML_ROLE_NONE; case XML_TOK_INSTANCE_START: state->handler = error; return XML_ROLE_INSTANCE_START; } return common(state, tok); } static int doctype0(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: case XML_TOK_PREFIXED_NAME: state->handler = doctype1; return XML_ROLE_DOCTYPE_NAME; } return common(state, tok); } static int doctype1(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_OPEN_BRACKET: state->handler = internalSubset; return XML_ROLE_NONE; case XML_TOK_DECL_CLOSE: state->handler = prolog2; return XML_ROLE_DOCTYPE_CLOSE; case XML_TOK_NAME: if (XmlNameMatchesAscii(enc, ptr, end, KW_SYSTEM)) { state->handler = doctype3; return XML_ROLE_NONE; } if (XmlNameMatchesAscii(enc, ptr, end, KW_PUBLIC)) { state->handler = doctype2; return XML_ROLE_NONE; } break; } return common(state, tok); } static int doctype2(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = doctype3; return XML_ROLE_DOCTYPE_PUBLIC_ID; } return common(state, tok); } static int doctype3(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = doctype4; return XML_ROLE_DOCTYPE_SYSTEM_ID; } return common(state, tok); } static int doctype4(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_OPEN_BRACKET: state->handler = internalSubset; return XML_ROLE_NONE; case XML_TOK_DECL_CLOSE: state->handler = prolog2; return XML_ROLE_DOCTYPE_CLOSE; } return common(state, tok); } static int doctype5(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_DECL_CLOSE: state->handler = prolog2; return XML_ROLE_DOCTYPE_CLOSE; } return common(state, tok); } static int internalSubset(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_DECL_OPEN: if (XmlNameMatchesAscii(enc, ptr + 2 * MIN_BYTES_PER_CHAR(enc), end, KW_ENTITY)) { state->handler = entity0; return XML_ROLE_NONE; } if (XmlNameMatchesAscii(enc, ptr + 2 * MIN_BYTES_PER_CHAR(enc), end, KW_ATTLIST)) { state->handler = attlist0; return XML_ROLE_NONE; } if (XmlNameMatchesAscii(enc, ptr + 2 * MIN_BYTES_PER_CHAR(enc), end, KW_ELEMENT)) { state->handler = element0; return XML_ROLE_NONE; } if (XmlNameMatchesAscii(enc, ptr + 2 * MIN_BYTES_PER_CHAR(enc), end, KW_NOTATION)) { state->handler = notation0; return XML_ROLE_NONE; } break; case XML_TOK_PI: case XML_TOK_COMMENT: return XML_ROLE_NONE; case XML_TOK_PARAM_ENTITY_REF: return XML_ROLE_PARAM_ENTITY_REF; case XML_TOK_CLOSE_BRACKET: state->handler = doctype5; return XML_ROLE_NONE; } return common(state, tok); } #ifdef XML_DTD static int externalSubset0(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { state->handler = externalSubset1; if (tok == XML_TOK_XML_DECL) return XML_ROLE_TEXT_DECL; return externalSubset1(state, tok, ptr, end, enc); } static int externalSubset1(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_COND_SECT_OPEN: state->handler = condSect0; return XML_ROLE_NONE; case XML_TOK_COND_SECT_CLOSE: if (state->includeLevel == 0) break; state->includeLevel -= 1; return XML_ROLE_NONE; case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_CLOSE_BRACKET: break; case XML_TOK_NONE: if (state->includeLevel) break; return XML_ROLE_NONE; default: return internalSubset(state, tok, ptr, end, enc); } return common(state, tok); } #endif /* XML_DTD */ static int entity0(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_PERCENT: state->handler = entity1; return XML_ROLE_NONE; case XML_TOK_NAME: state->handler = entity2; return XML_ROLE_GENERAL_ENTITY_NAME; } return common(state, tok); } static int entity1(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: state->handler = entity7; return XML_ROLE_PARAM_ENTITY_NAME; } return common(state, tok); } static int entity2(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: if (XmlNameMatchesAscii(enc, ptr, end, KW_SYSTEM)) { state->handler = entity4; return XML_ROLE_NONE; } if (XmlNameMatchesAscii(enc, ptr, end, KW_PUBLIC)) { state->handler = entity3; return XML_ROLE_NONE; } break; case XML_TOK_LITERAL: state->handler = declClose; return XML_ROLE_ENTITY_VALUE; } return common(state, tok); } static int entity3(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = entity4; return XML_ROLE_ENTITY_PUBLIC_ID; } return common(state, tok); } static int entity4(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = entity5; return XML_ROLE_ENTITY_SYSTEM_ID; } return common(state, tok); } static int entity5(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_DECL_CLOSE: setTopLevel(state); return XML_ROLE_EXTERNAL_GENERAL_ENTITY_NO_NOTATION; case XML_TOK_NAME: if (XmlNameMatchesAscii(enc, ptr, end, KW_NDATA)) { state->handler = entity6; return XML_ROLE_NONE; } break; } return common(state, tok); } static int entity6(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: state->handler = declClose; return XML_ROLE_ENTITY_NOTATION_NAME; } return common(state, tok); } static int entity7(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: if (XmlNameMatchesAscii(enc, ptr, end, KW_SYSTEM)) { state->handler = entity9; return XML_ROLE_NONE; } if (XmlNameMatchesAscii(enc, ptr, end, KW_PUBLIC)) { state->handler = entity8; return XML_ROLE_NONE; } break; case XML_TOK_LITERAL: state->handler = declClose; return XML_ROLE_ENTITY_VALUE; } return common(state, tok); } static int entity8(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = entity9; return XML_ROLE_ENTITY_PUBLIC_ID; } return common(state, tok); } static int entity9(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = declClose; return XML_ROLE_ENTITY_SYSTEM_ID; } return common(state, tok); } static int notation0(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: state->handler = notation1; return XML_ROLE_NOTATION_NAME; } return common(state, tok); } static int notation1(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: if (XmlNameMatchesAscii(enc, ptr, end, KW_SYSTEM)) { state->handler = notation3; return XML_ROLE_NONE; } if (XmlNameMatchesAscii(enc, ptr, end, KW_PUBLIC)) { state->handler = notation2; return XML_ROLE_NONE; } break; } return common(state, tok); } static int notation2(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = notation4; return XML_ROLE_NOTATION_PUBLIC_ID; } return common(state, tok); } static int notation3(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = declClose; return XML_ROLE_NOTATION_SYSTEM_ID; } return common(state, tok); } static int notation4(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = declClose; return XML_ROLE_NOTATION_SYSTEM_ID; case XML_TOK_DECL_CLOSE: setTopLevel(state); return XML_ROLE_NOTATION_NO_SYSTEM_ID; } return common(state, tok); } static int attlist0(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: case XML_TOK_PREFIXED_NAME: state->handler = attlist1; return XML_ROLE_ATTLIST_ELEMENT_NAME; } return common(state, tok); } static int attlist1(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_DECL_CLOSE: setTopLevel(state); return XML_ROLE_NONE; case XML_TOK_NAME: case XML_TOK_PREFIXED_NAME: state->handler = attlist2; return XML_ROLE_ATTRIBUTE_NAME; } return common(state, tok); } static int attlist2(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: { static const char *types[] = { KW_CDATA, KW_ID, KW_IDREF, KW_IDREFS, KW_ENTITY, KW_ENTITIES, KW_NMTOKEN, KW_NMTOKENS, }; int i; for (i = 0; i < (int)(sizeof(types)/sizeof(types[0])); i++) if (XmlNameMatchesAscii(enc, ptr, end, types[i])) { state->handler = attlist8; return XML_ROLE_ATTRIBUTE_TYPE_CDATA + i; } } if (XmlNameMatchesAscii(enc, ptr, end, KW_NOTATION)) { state->handler = attlist5; return XML_ROLE_NONE; } break; case XML_TOK_OPEN_PAREN: state->handler = attlist3; return XML_ROLE_NONE; } return common(state, tok); } static int attlist3(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NMTOKEN: case XML_TOK_NAME: case XML_TOK_PREFIXED_NAME: state->handler = attlist4; return XML_ROLE_ATTRIBUTE_ENUM_VALUE; } return common(state, tok); } static int attlist4(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_CLOSE_PAREN: state->handler = attlist8; return XML_ROLE_NONE; case XML_TOK_OR: state->handler = attlist3; return XML_ROLE_NONE; } return common(state, tok); } static int attlist5(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_OPEN_PAREN: state->handler = attlist6; return XML_ROLE_NONE; } return common(state, tok); } static int attlist6(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: state->handler = attlist7; return XML_ROLE_ATTRIBUTE_NOTATION_VALUE; } return common(state, tok); } static int attlist7(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_CLOSE_PAREN: state->handler = attlist8; return XML_ROLE_NONE; case XML_TOK_OR: state->handler = attlist6; return XML_ROLE_NONE; } return common(state, tok); } /* default value */ static int attlist8(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_POUND_NAME: if (XmlNameMatchesAscii(enc, ptr + MIN_BYTES_PER_CHAR(enc), end, KW_IMPLIED)) { state->handler = attlist1; return XML_ROLE_IMPLIED_ATTRIBUTE_VALUE; } if (XmlNameMatchesAscii(enc, ptr + MIN_BYTES_PER_CHAR(enc), end, KW_REQUIRED)) { state->handler = attlist1; return XML_ROLE_REQUIRED_ATTRIBUTE_VALUE; } if (XmlNameMatchesAscii(enc, ptr + MIN_BYTES_PER_CHAR(enc), end, KW_FIXED)) { state->handler = attlist9; return XML_ROLE_NONE; } break; case XML_TOK_LITERAL: state->handler = attlist1; return XML_ROLE_DEFAULT_ATTRIBUTE_VALUE; } return common(state, tok); } static int attlist9(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = attlist1; return XML_ROLE_FIXED_ATTRIBUTE_VALUE; } return common(state, tok); } static int element0(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: case XML_TOK_PREFIXED_NAME: state->handler = element1; return XML_ROLE_ELEMENT_NAME; } return common(state, tok); } static int element1(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: if (XmlNameMatchesAscii(enc, ptr, end, KW_EMPTY)) { state->handler = declClose; return XML_ROLE_CONTENT_EMPTY; } if (XmlNameMatchesAscii(enc, ptr, end, KW_ANY)) { state->handler = declClose; return XML_ROLE_CONTENT_ANY; } break; case XML_TOK_OPEN_PAREN: state->handler = element2; state->level = 1; return XML_ROLE_GROUP_OPEN; } return common(state, tok); } static int element2(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_POUND_NAME: if (XmlNameMatchesAscii(enc, ptr + MIN_BYTES_PER_CHAR(enc), end, KW_PCDATA)) { state->handler = element3; return XML_ROLE_CONTENT_PCDATA; } break; case XML_TOK_OPEN_PAREN: state->level = 2; state->handler = element6; return XML_ROLE_GROUP_OPEN; case XML_TOK_NAME: case XML_TOK_PREFIXED_NAME: state->handler = element7; return XML_ROLE_CONTENT_ELEMENT; case XML_TOK_NAME_QUESTION: state->handler = element7; return XML_ROLE_CONTENT_ELEMENT_OPT; case XML_TOK_NAME_ASTERISK: state->handler = element7; return XML_ROLE_CONTENT_ELEMENT_REP; case XML_TOK_NAME_PLUS: state->handler = element7; return XML_ROLE_CONTENT_ELEMENT_PLUS; } return common(state, tok); } static int element3(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_CLOSE_PAREN: case XML_TOK_CLOSE_PAREN_ASTERISK: state->handler = declClose; return XML_ROLE_GROUP_CLOSE_REP; case XML_TOK_OR: state->handler = element4; return XML_ROLE_NONE; } return common(state, tok); } static int element4(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: case XML_TOK_PREFIXED_NAME: state->handler = element5; return XML_ROLE_CONTENT_ELEMENT; } return common(state, tok); } static int element5(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_CLOSE_PAREN_ASTERISK: state->handler = declClose; return XML_ROLE_GROUP_CLOSE_REP; case XML_TOK_OR: state->handler = element4; return XML_ROLE_NONE; } return common(state, tok); } static int element6(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_OPEN_PAREN: state->level += 1; return XML_ROLE_GROUP_OPEN; case XML_TOK_NAME: case XML_TOK_PREFIXED_NAME: state->handler = element7; return XML_ROLE_CONTENT_ELEMENT; case XML_TOK_NAME_QUESTION: state->handler = element7; return XML_ROLE_CONTENT_ELEMENT_OPT; case XML_TOK_NAME_ASTERISK: state->handler = element7; return XML_ROLE_CONTENT_ELEMENT_REP; case XML_TOK_NAME_PLUS: state->handler = element7; return XML_ROLE_CONTENT_ELEMENT_PLUS; } return common(state, tok); } static int element7(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_CLOSE_PAREN: state->level -= 1; if (state->level == 0) state->handler = declClose; return XML_ROLE_GROUP_CLOSE; case XML_TOK_CLOSE_PAREN_ASTERISK: state->level -= 1; if (state->level == 0) state->handler = declClose; return XML_ROLE_GROUP_CLOSE_REP; case XML_TOK_CLOSE_PAREN_QUESTION: state->level -= 1; if (state->level == 0) state->handler = declClose; return XML_ROLE_GROUP_CLOSE_OPT; case XML_TOK_CLOSE_PAREN_PLUS: state->level -= 1; if (state->level == 0) state->handler = declClose; return XML_ROLE_GROUP_CLOSE_PLUS; case XML_TOK_COMMA: state->handler = element6; return XML_ROLE_GROUP_SEQUENCE; case XML_TOK_OR: state->handler = element6; return XML_ROLE_GROUP_CHOICE; } return common(state, tok); } #ifdef XML_DTD static int condSect0(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: if (XmlNameMatchesAscii(enc, ptr, end, KW_INCLUDE)) { state->handler = condSect1; return XML_ROLE_NONE; } if (XmlNameMatchesAscii(enc, ptr, end, KW_IGNORE)) { state->handler = condSect2; return XML_ROLE_NONE; } break; } return common(state, tok); } static int condSect1(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_OPEN_BRACKET: state->handler = externalSubset1; state->includeLevel += 1; return XML_ROLE_NONE; } return common(state, tok); } static int condSect2(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_OPEN_BRACKET: state->handler = externalSubset1; return XML_ROLE_IGNORE_SECT; } return common(state, tok); } #endif /* XML_DTD */ static int declClose(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_DECL_CLOSE: setTopLevel(state); return XML_ROLE_NONE; } return common(state, tok); } #if 0 static int ignore(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_DECL_CLOSE: state->handler = internalSubset; return 0; default: return XML_ROLE_NONE; } return common(state, tok); } #endif static int error(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { return XML_ROLE_NONE; } static int common(PROLOG_STATE *state, int tok) { #ifdef XML_DTD if (!state->documentEntity && tok == XML_TOK_PARAM_ENTITY_REF) return XML_ROLE_INNER_PARAM_ENTITY_REF; #endif state->handler = error; return XML_ROLE_ERROR; } void XmlPrologStateInit(PROLOG_STATE *state) { state->handler = prolog0; #ifdef XML_DTD state->documentEntity = 1; state->includeLevel = 0; #endif /* XML_DTD */ } #ifdef XML_DTD void XmlPrologStateInitExternalEntity(PROLOG_STATE *state) { state->handler = externalSubset0; state->documentEntity = 0; state->includeLevel = 0; } #endif /* XML_DTD */ swish-e-2.4.7/src/xml.h0000664000077100017500000000215511166010110011556 00000000000000/* $Id: xml.h 1736 2005-05-12 15:41:22Z karman $ ** ** This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 18:15:43 CDT 2005 ** added GPL ** The prototypes */ int countwords_XML (SWISH *sw, FileProp *fprop, FileRec *fi, char *buffer); swish-e-2.4.7/src/swregex.h0000775000077100017500000000313711166010110012446 00000000000000/* $Id: swregex.h 1736 2005-05-12 15:41:22Z karman $ This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 18:15:43 CDT 2005 ** added GPL */ #ifndef __HasSeenModule_regex #define __HasSeenModule_regex 1 void add_regex_patterns( char *name, regex_list **reg_list, char **params, int regex_pattern ); void add_replace_expression( char *name, regex_list **reg_list, char *expression ); int match_regex_list( char *str, regex_list *regex, char *comment ); char *process_regex_list( char *str, regex_list *regex, int *matched ); void free_regex_list( regex_list **reg_list ); void add_regular_expression( regex_list **reg_list, char *pattern, char *replace, int cflags, int global, int negate ); #endif /* __HasSeenModule_regex */ swish-e-2.4.7/src/file.c0000664000077100017500000003724411166010110011677 00000000000000/* $Id: file.c 2049 2008-03-08 15:33:49Z moseley $ ** ** Copyright (C) 1995, 1996, 1997, 1998 Hewlett-Packard Company ** Originally by Kevin Hughes, kev@kevcom.com, 3/11/94 ** This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 15:51:39 CDT 2005 ** added GPL **------------------------------------------------------------- ** Changed getdefaults to allow metaNames in the user ** configuration file ** G.Hill 4/16/97 ghill@library.berkeley.edu ** ** change sprintf to snprintf to avoid corruption, and use MAXSTRLEN from swish.h ** added safestrcpy() macro to avoid corruption from strcpy overflow ** SRE 11/17/99 ** ** added buffer size arg to grabStringValue - core dumping from overrun ** fixed logical OR and other problems pointed out by "gcc -Wall" ** SRE 2/22/00 ** ** counter modulo 128 had parens typo ** SRE 2/23/00 ** ** read stopwords from file ** Rainer Scherg (rasc) 2000-06-15 ** ** 2000-11-15 rasc ** file_properties retrieves last mod date, filesize, and evals some swish ** config flags for this file! ** ** 2001-02-12 rasc errormsg "print" changed... ** 2001-03-16 rasc truncateDoc [read_stream] (if doc to large, truncate... ) ** 2001-03-17 rasc fprop enhanced by "real_filename" ** */ #ifdef HAVE_CONFIG_H #include "acconfig.h" #endif #ifdef HAVE_STDLIB_H #include #endif #ifdef HAVE_UNISTD_H #include #endif #include "swish.h" #include "mem.h" #include "swstring.h" #include "file.h" #include "error.h" #include "list.h" #include "hash.h" #include "check.h" #include "index.h" #include "filter.h" #include "metanames.h" #ifndef HAVE_MKSTEMP #include "mkstemp.h" #endif /* Cough, hack, cough - convert slash to backslash for programs that are run via the shell */ #if defined(_WIN32) && !defined(__CYGWIN__) void make_windows_path( char *path ) { char *c; for ( c = path; *c; c++ ) if ( '/' == *c ) *c = '\\'; } #endif /* Win32 hack to get libexecdir at runtime */ /* Caller should free memory returned */ char * get_libexec(void){ char *fn; #if defined(_WIN32) && !defined(__CYGWIN__) char *tr; int pos; fn = emalloc(MAX_PATH+1); /* get the full name of the executable */ if(!GetModuleFileNameA(NULL,fn,MAX_PATH)) { efree( fn ); return(libexecdir); } /* get the base directory */ tr = strrchr(fn, '\\'); pos = tr - fn; fn[pos]='\0'; /* get the prefix directory */ tr = strrchr(fn, '\\'); pos = tr - fn; /* if we're in bin we'll assume prefix is up one level */ if(!strncasecmp(&fn[pos+1], "bin\0", 4)) fn[pos]='\0'; /* Tack on the libexecdir */ strcpy(fn+strlen(fn), "\\lib\\swish-e"); #else /* !_WIN32 */ #ifdef libexecdir fn = emalloc(strlen(libexecdir)+1); strcpy(fn,libexecdir); #else /* just in case we don't have libexecdir */ fn = emalloc(2); strcpy(fn,"."); #endif /* libexecdir */ #endif /* _WIN32 */ return(fn); } /* Flip any backslashes to forward slashes, and remove trailing slash */ void normalize_path(char *path) { int len = strlen( path ); char *c; /* For windows users */ for ( c = path; *c; c++ ) if ( '\\' == *c ) *c = '/'; while( len > 1 && path[len-1] == '/' ) { #if defined(_WIN32) && !defined(__CYGWIN__) /* c:/ must end with / but other directories must not */ if( path[1] == ':' && len == 3 ){ break; } else { path[len-1] = '\0'; len--; } #else path[len-1] = '\0'; len--; #endif } } /* Is a file a directory? */ int isdirectory(char *path) { struct stat stbuf; if (stat(path, &stbuf)) return 0; return ((stbuf.st_mode & S_IFMT) == S_IFDIR) ? 1 : 0; } /* Is a file a regular file? */ int isfile(char *path) { struct stat stbuf; if (stat(path, &stbuf)) return 0; return ((stbuf.st_mode & S_IFMT) == S_IFREG) ? 1 : 0; } /* Is a file a link? */ int islink(char *path) { #ifdef HAVE_LSTAT struct stat stbuf; if (lstat(path, &stbuf)) return 0; return ((stbuf.st_mode & S_IFLNK) == S_IFLNK) ? 1 : 0; #else return 0; #endif } /* Get the size, in bytes, of a file. ** Return -1 if there's a problem. */ int getsize(char *path) { struct stat stbuf; if (stat(path, &stbuf)) return -1; return stbuf.st_size; } /* * Invoke the methods of the current Indexing Data Source */ void indexpath(SWISH * sw, char *path) { /* invoke routine to index a "path" */ (*IndexingDataSource->indexpath_fn) (sw, path); } /* -- read file into a buffer -- truncate file if necessary (truncateDocSize) -- return: buffer -- 2001-03-16 rasc truncateDoc */ /* maybe some day this could be chunked reading? */ /* no, maybe some day this will go away... */ char *read_stream(SWISH *sw, FileProp *fprop, int is_text) { long c, offset; long bufferlen; unsigned char *buffer, *tmp = NULL; size_t bytes_read; long filelen = fprop->fsize; /* Number of bytes we think we need to read */ long max_size = sw->truncateDocSize; if ( filelen && !fprop->hasfilter ) { /* truncate doc? */ if (max_size && ( max_size < filelen) ) filelen = sw->truncateDocSize; buffer = (unsigned char *)Mem_ZoneAlloc(sw->Index->perDocTmpZone, filelen + 1); *buffer = '\0'; bytes_read = fread(buffer, 1, filelen, fprop->fp); buffer[bytes_read] = '\0'; /* hopfully doesn't read more than filelen bytes ;) */ /* JFP - substitute null chars, VFC record may have null char in reclen word, try to discard them */ if ( !fprop->index_no_content && is_text && strlen( (char *)buffer ) < bytes_read ) { int i; int j = 0; int i_bytes_read = (int)bytes_read; for (i = 0; i < i_bytes_read; ++i) { if (buffer[i] == '\0') { buffer[i] = '\n'; j++; } } if ( j ) progwarn("Substituted %d embedded null character(s) in file '%s' with a newline\n", j, fprop->real_path); } /* Reset length of buffer -- fsize is used by the parsers to say how long the buffer is */ fprop->fsize = (long)bytes_read; /* should be the same as strlen if in text mode */ return (char *) buffer; } /* if (filelen) */ /* filelen was zero so we are reading from a handle */ /* * No, if filelen is zero and fprop->hasfilter is set, then we are * reading from a filter and need to read the entire stream in. * This broke when using -S prog and a zero length file came along. - moseley Mar 2005 */ bufferlen = RD_BUFFER_SIZE; buffer = (unsigned char *)Mem_ZoneAlloc(sw->Index->perDocTmpZone, bufferlen + 1); *buffer = '\0'; /* catches case where source file is zero bytes */ /* but filter may still produce outut */ if ( !fprop->hasfilter ) return (char *)buffer; offset = 0; while ( 1 ) { c = fread(buffer + offset, 1, RD_BUFFER_SIZE, fprop->fp); offset += c; /* next place to write in the buffer */ /* truncate? */ if (max_size && (offset > max_size)) { offset = max_size; break; } /* more to read? */ if ( c < RD_BUFFER_SIZE || feof( fprop->fp) ) break; /* make buffer larger */ tmp = (unsigned char *)Mem_ZoneAlloc(sw->Index->perDocTmpZone, bufferlen + RD_BUFFER_SIZE + 1); memcpy(tmp,buffer,bufferlen+1); buffer = tmp; bufferlen += RD_BUFFER_SIZE; } buffer[offset] = '\0'; fprop->fsize = offset; return (char *) buffer; } /* Sept 25, 2001 - moseley * Flush the file -- for use with -S prog, when either Truncate is in use, or * the parser aborted for some reason (e.g. !isoktitle). */ void flush_stream( FileProp *fprop ) { static char tmpbuf[4096]; int read; while ( fprop->bytes_read < fprop->fsize ) { if ( ( fprop->fsize - fprop->bytes_read ) > 4096 ) { if ( !(read = fread(tmpbuf, 1, 4096, fprop->fp))) break; fprop->bytes_read += read; } else { read = fread(tmpbuf, 1, fprop->fsize - fprop->bytes_read, fprop->fp); break; } } } /* Mar 27, 2001 - moseley * Separate out the creation of the file properties * */ FileProp *init_file_properties(void) { FileProp *fprop; fprop = (FileProp *) emalloc(sizeof(FileProp)); /* emalloc checks fail and aborts... */ memset( fprop, 0, sizeof(FileProp) ); return fprop; } /* Mar 27, 2001 - moseley * Separate out the adjusting of file properties by config settings * 2001-04-09 rasc changed filters */ void init_file_prop_settings(SWISH * sw, FileProp * fprop) { /* Basename of document path => document filename */ fprop->real_filename = str_basename(fprop->real_path); /* -- get Doc Type as is in IndexContents or Defaultcontents -- doctypes by jruiz */ /* Might already be set by a header in extpro.c */ if ( !fprop->doctype ) { /* Get the type by file extension -- or return NODOCTYPE */ fprop->doctype = getdoctype(fprop->real_path, sw->indexcontents); /* If was not set by getdoctype() then assign it the default parser */ /* This could still be NODOCTYPE, or it might be something set by DefaultContents */ if (fprop->doctype == NODOCTYPE) fprop->doctype = sw->DefaultDocType; } /* -- index just the filename (or doc title tags)? -- this param was "wrongly" named indextitleonly */ fprop->index_no_content = (sw->nocontentslist != NULL) && isoksuffix(fprop->real_path, sw->nocontentslist); /* -- Any filter for this file type? -- NULL = No Filter, (char *) path to filter prog. */ fprop->hasfilter = hasfilter(sw, fprop->real_path); fprop->stordesc = hasdescription(fprop->doctype, sw->storedescription); } /* -- file_properties -- Get/eval information about a file and return it. -- Some flags are calculated from swish configs for this "real_path" -- Structure has to be freed using free_file_properties -- 2000-11-15 rasc -- return: (FileProp *) -- A failed stat returns an empty (default) structure -- 2000-12 -- Added StoreDescription */ FileProp *file_properties(char *real_path, char *work_file, SWISH * sw) { FileProp *fprop; struct stat stbuf; /* create an initilized fprop structure */ fprop = init_file_properties(); /* Dup these, since the real_path may be reallocated by FileRules */ fprop->real_path = estrdup( real_path ); fprop->work_path = estrdup( work_file ? work_file : real_path ); fprop->orig_path = estrdup( real_path ); /* Stat the file */ /* This is really the wrong place for this, as it's really only useful for fs.c method */ /* for http.c it means the last mod date is the temp file date */ /* Probably this entire function isn't needed - moseley */ if (!stat(fprop->work_path, &stbuf)) { fprop->fsize = (long) stbuf.st_size; fprop->source_size = fprop->fsize; /* to report the size of the original file */ fprop->mtime = stbuf.st_mtime; } /* Now set various fprop settings based mostly on file name */ init_file_prop_settings(sw, fprop); #ifdef DEBUG fprintf(stderr, "file_properties: path=%s, (workpath=%s), fsize=%ld, last_mod=%ld Doctype: %d Filter: %p\n", fprop->real_path, fprop->work_path, (long) fprop->fsize, (long) fprop->mtime, fprop->doctype, fprop->filterprog); #endif return fprop; } /* -- Free FileProp structure -- unless no alloc for strings simple free structure */ void free_file_properties(FileProp * fprop) { efree( fprop->real_path ); efree( fprop->work_path ); efree( fprop->orig_path ); efree(fprop); } static char *temp_file_template = "XXXXXX"; /*********************************************************************** * Create a temporary file * * Call With: * *SWISH = to get at the TmpDir config setting which I don't like * *prefix = chars to prepend to the file name * **file_name_buffer = where to store address of file name * unlink = if true, will unlink file * if not unlinked, then caller must free the name * Return: * *FILE * modified file_name_buffer * * Will create temp files in the directory specified by environment vars * TMPDIR and TMP, and by the config.h setting of TMPDIR in that order. * * Note: * It's expected that swish is not run suid, so * (getuid()==geteuid()) && (getgid()==getegid()) * if not checked. I'm not sure if that would choke on other platforms. * * * Source: * http://www.linuxdoc.org/HOWTO/Secure-Programs-HOWTO/avoid-race.html * * Questions: * Can non-unix OS unlink the file and continue to hold the fd? * ***********************************************************************/ FILE *create_tempfile(SWISH *sw, const char *f_mode, char *prefix, char **file_name_buffer, int remove_file_name ) { int temp_fd; mode_t old_mode; FILE *temp_file; char *file_name; int file_name_len; struct MOD_Index *idx = sw->Index; char *tmpdir = NULL; file_name_len = (prefix ? strlen(prefix) : 0) + strlen( temp_file_template ) + strlen( TEMP_FILE_PREFIX ); /* Perl is nice sometimes */ if ( !( tmpdir = getenv("TMPDIR")) ) if ( !(tmpdir = getenv("TMP")) ) if( !(tmpdir = getenv("TEMP")) ) tmpdir = idx->tmpdir; if ( tmpdir && !*tmpdir ) tmpdir = NULL; // just in case it's the empty string if ( tmpdir ) file_name_len += strlen( tmpdir ) + 1; // for path separator file_name = emalloc( file_name_len + 1 ); *file_name = '\0'; if ( tmpdir ) { strcat( file_name, tmpdir ); normalize_path( file_name ); strcat( file_name, "/" ); } strcat( file_name, TEMP_FILE_PREFIX ); if ( prefix ) strcat( file_name, prefix ); strcat( file_name, temp_file_template ); old_mode = umask(077); /* Create file with restrictive permissions */ temp_fd = mkstemp( file_name ); (void) umask(old_mode); if (temp_fd == -1) progerrno("Couldn't open temporary file '%s': ", file_name ); if (!(temp_file = fdopen(temp_fd, f_mode))) progerrno("Couldn't create temporary file '%s' file descriptor: ", file_name); if ( remove_file_name ) { if ( remove( file_name ) == -1 ) progerrno("Couldn't unlink temporary file '%s' :", file_name); efree( file_name ); } else *file_name_buffer = file_name; return temp_file; } swish-e-2.4.7/src/check.h0000664000077100017500000000250011166010110012025 00000000000000/* ** Copyright (C) 1995, 1996, 1997, 1998 Hewlett-Packard Company ** Originally by Kevin Hughes, kev@kevcom.com, 3/11/94 ** $Id: check.h 1736 2005-05-12 15:41:22Z karman $ This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 18:19:34 CDT 2005 ** added GPL */ int isokword (SWISH *,char *,IndexFILE *); int getdoctype (char *filename, struct IndexContents *indexcontents); struct StoreDescription *hasdescription (int, struct StoreDescription *); swish-e-2.4.7/src/fhash.c0000664000077100017500000002363111166010110012044 00000000000000/* $Id: fhash.c 1946 2007-10-22 14:56:35Z karpet $ This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 15:51:39 CDT 2005 ** added GPL */ /************************************************************************** ** 05-2002 jmruiz ** ** Very simple routines to maintain a hash database on disk. ** ** Default values (see fhash.h): ** FHASH_SIZE - Size of hash table ** ** Routines: ** FHASH *FHASH_Create(FILE *fp); ** Creates a db hash file in file fp ** Returns a pointer to hash db file ** ** FHASH *FHASH_Open(FILE *fp, sw_off_t start); ** Opens an existing hash db file in file fp, starting at offset ** start. ** Returns a pointer to hash db file ** sw_off_t FHASH_Close(FHASH *f); ** Closes and writes the hash table to the file ** Returns the pointer to the db hash file inside the file ** ** int FHASH_Insert(FHASH *f, unsigned char *key, int key_len, unsigned char *data, int data_len); ** Adds a new entry pair (key,data) to the hash db file. Length of ** key is key_len, length of data is data_len ** ** int FHASH_Search(FHASH *f, unsigned char *key, int key_len, unsigned char *data, int data_len); ** Searchs and returns the data for a given key of length key_len ** The data is returned in the data array that must be allocated by ** the caller. If data buffer is not long enough, data will ** be truncated ** Returns the copied length in data ** ** int FHASH_Update(FHASH *f, unsigned char *key, int key_len, unsigned char *data, int data_len); ** Updates data for a given key of length key_len ** Data must be of the same size of the original record ** Returns 0 (OK) or 1 (no OK) ** ** int FHASH_Delete(FHASH *f, unsigned char *key, int key_len); ** Deletes the entry for the given key of length key_len ** Returns 0 (OK) or 1 (no OK) ** ***************************************************************************/ #include #include #include #include "swish.h" #include "compress.h" #include "mem.h" #include "fhash.h" #include "error.h" FHASH *FHASH_Create(FILE *fp) { FHASH *f; unsigned int i; sw_off_t tmp = (sw_off_t)0; f = (FHASH *) emalloc(sizeof(FHASH)); /* Init hash table */ for(i = 0; i < FHASH_SIZE; i++) f->hash_offsets[i] = (sw_off_t)0; /* Go to the end of the file */ if(sw_fseek(fp,(sw_off_t)0,SEEK_END) !=0) progerrno("Failed to seek to eof: "); /* Get pointer to hash table */ f->start = sw_ftell(fp); /* Pack tmp */ tmp = PACKFILEOFFSET(tmp); /* Write an empty hash table - Preserve space on disk */ for(i = 0; i < FHASH_SIZE; i++) sw_fwrite((unsigned char *)&tmp,sizeof(tmp),1,fp); f->fp = fp; return f; } FHASH *FHASH_Open(FILE *fp, sw_off_t start) { FHASH *f; unsigned int i; sw_off_t tmp; f = (FHASH *) emalloc(sizeof(FHASH)); f->start = start; f->fp = fp; /* put file pointer at start of hash table */ sw_fseek(fp,start,SEEK_SET); /* Read hash table */ for(i = 0; i < FHASH_SIZE ; i++) { sw_fread((unsigned char *)&tmp,sizeof(tmp), 1, fp); f->hash_offsets[i] = UNPACKFILEOFFSET(tmp); } return f; } sw_off_t FHASH_Close(FHASH *f) { sw_off_t start = f->start; sw_off_t tmp; FILE *fp = f->fp; int i; /* put file pointer at start of hash table */ sw_fseek(fp,start,SEEK_SET); /* Read hash table */ for(i = 0; i < FHASH_SIZE ; i++) { tmp = PACKFILEOFFSET(f->hash_offsets[i]); sw_fwrite((unsigned char *)&tmp,sizeof(tmp), 1, fp); } /* release memory */ efree(f); /* Return offset to start table */ return start; } int FHASH_CompareKeys(unsigned char *key1, int key_len1, unsigned char *key2, int key_len2) { int rc; if(key_len1 > key_len2) rc = memcmp(key1,key2,key_len2); else rc = memcmp(key1,key2,key_len1); if(!rc) rc = key_len1 - key_len2; return rc; } unsigned int FHASH_hash(unsigned char *s, int len) { unsigned int hashval; for (hashval = 0; len; s++,len--) hashval = (int) ((unsigned char) *s) + 31 * hashval; return hashval % FHASH_SIZE; } int FHASH_Insert(FHASH *f, unsigned char *key, int key_len, unsigned char *data, int data_len) { unsigned int hashval = FHASH_hash(key,key_len); sw_off_t new,next; FILE *fp = f->fp; sw_fseek(fp,(sw_off_t)0,SEEK_END); new = sw_ftell(fp); next = f->hash_offsets[hashval]; next = PACKFILEOFFSET(next); sw_fwrite((unsigned char *)&next,sizeof(next), 1, fp); compress1(key_len,fp,fputc); sw_fwrite((unsigned char *)key, key_len, 1, fp); compress1(data_len,fp,fputc); sw_fwrite((unsigned char *)data, data_len, 1, fp); f->hash_offsets[hashval] = new; return 0; } int FHASH_Search(FHASH *f, unsigned char *key, int key_len, unsigned char *data, int data_len) { /* Calculate the has value for the key passed in and lookup the seek pointer */ unsigned int hashval = FHASH_hash(key,key_len); sw_off_t next = f->hash_offsets[hashval]; sw_off_t tmp; FILE *fp = f->fp; unsigned char stack_buffer[2048], *read_key; int read_key_len; int read_data_len; int retval; while(next) { if ( 0 != sw_fseek(fp,next,SEEK_SET) ) /* Will key be null terminated? */ progerrno( "Failed to seek to offset %ld looking for key '%s' :", next, key ); retval = sw_fread((unsigned char *)&tmp,1,sizeof(tmp),fp); if (feof(fp)) progerrno( "eof() while Attempting to read '%d' bytes from file hash: ", sizeof(tmp) ); if ( sizeof(tmp) != retval ) progerrno( "Only read '%d' bytes but expected '%d' while reading file hash: ", retval, sizeof(tmp) ); next = UNPACKFILEOFFSET(tmp); if((read_key_len = uncompress1(fp,fgetc)) > sizeof(stack_buffer)) read_key = emalloc(read_key_len); else read_key = stack_buffer; sw_fread((unsigned char *)read_key,read_key_len,1,fp); if(FHASH_CompareKeys(read_key, read_key_len, key, key_len) == 0) { read_data_len = uncompress1(fp,fgetc); if(read_data_len > data_len) read_data_len = data_len; sw_fread((unsigned char *)data,read_data_len,1,fp); if(read_key != stack_buffer) efree(read_key); return read_data_len; } if(read_key != stack_buffer) efree(read_key); } memset(data,0,data_len); return 0; } int FHASH_Update(FHASH *f, unsigned char *key, int key_len, unsigned char *data, int data_len) { unsigned int hashval = FHASH_hash(key,key_len); sw_off_t next = f->hash_offsets[hashval]; FILE *fp = f->fp; unsigned char stack_buffer[2048], *read_key; int read_key_len, read_data_len; sw_off_t tmp; while(next) { sw_fseek(fp,next,SEEK_END); sw_fread((unsigned char *)&tmp,sizeof(tmp),1,fp); next = UNPACKFILEOFFSET(tmp); if((read_key_len = uncompress1(fp,fgetc)) > sizeof(stack_buffer)) read_key = emalloc(read_key_len); else read_key = stack_buffer; sw_fread((unsigned char *)read_key,read_key_len,1,fp); if(FHASH_CompareKeys(read_key, read_key_len, key, key_len) == 0) { read_data_len = uncompress1(fp,fgetc); if(read_data_len > data_len) read_data_len = data_len; sw_fwrite((unsigned char *)data,read_data_len,1,fp); if(read_key != stack_buffer) efree(read_key); return 0; } if(read_key != stack_buffer) efree(read_key); } return 1; } int FHASH_Delete(FHASH *f, unsigned char *key, int key_len) { unsigned int hashval = FHASH_hash(key,key_len); sw_off_t next = f->hash_offsets[hashval]; sw_off_t prev = 0; FILE *fp = f->fp; unsigned char stack_buffer[2048], *read_key; int read_key_len; sw_off_t tmp; while(next) { sw_fseek(fp,next,SEEK_END); sw_fread((unsigned char *)&tmp,sizeof(tmp),1,fp); if((read_key_len = uncompress1(fp,fgetc)) > sizeof(stack_buffer)) read_key = emalloc(read_key_len); else read_key = stack_buffer; sw_fread((unsigned char *)read_key,read_key_len,1,fp); if(FHASH_CompareKeys(read_key, read_key_len, key, key_len) == 0) { next = UNPACKFILEOFFSET(tmp); if(!prev) f->hash_offsets[hashval] = next; else { sw_fseek(fp,prev,SEEK_SET); sw_fwrite((unsigned char *)&tmp,sizeof(tmp),1,fp); } if(read_key != stack_buffer) efree(read_key); return 0; } if(read_key != stack_buffer) efree(read_key); prev =next; next = UNPACKFILEOFFSET(tmp); } return 1; } swish-e-2.4.7/src/snowball/0000777000077100017500000000000011166013172012520 500000000000000swish-e-2.4.7/src/snowball/stem_en1.h0000664000077100017500000000050511166010110014307 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #ifdef __cplusplus extern "C" { #endif extern struct SN_env * porter_ISO_8859_1_create_env(void); extern void porter_ISO_8859_1_close_env(struct SN_env * z); extern int porter_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif swish-e-2.4.7/src/snowball/stem_pt.c0000664000077100017500000011273311166010110014251 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #include "header.h" #ifdef __cplusplus extern "C" { #endif extern int portuguese_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif static int r_residual_form(struct SN_env * z); static int r_residual_suffix(struct SN_env * z); static int r_verb_suffix(struct SN_env * z); static int r_standard_suffix(struct SN_env * z); static int r_R2(struct SN_env * z); static int r_R1(struct SN_env * z); static int r_RV(struct SN_env * z); static int r_mark_regions(struct SN_env * z); static int r_postlude(struct SN_env * z); static int r_prelude(struct SN_env * z); #ifdef __cplusplus extern "C" { #endif extern struct SN_env * portuguese_ISO_8859_1_create_env(void); extern void portuguese_ISO_8859_1_close_env(struct SN_env * z); #ifdef __cplusplus } #endif static const symbol s_0_1[1] = { 0xE3 }; static const symbol s_0_2[1] = { 0xF5 }; static const struct among a_0[3] = { /* 0 */ { 0, 0, -1, 3, 0}, /* 1 */ { 1, s_0_1, 0, 1, 0}, /* 2 */ { 1, s_0_2, 0, 2, 0} }; static const symbol s_1_1[2] = { 'a', '~' }; static const symbol s_1_2[2] = { 'o', '~' }; static const struct among a_1[3] = { /* 0 */ { 0, 0, -1, 3, 0}, /* 1 */ { 2, s_1_1, 0, 1, 0}, /* 2 */ { 2, s_1_2, 0, 2, 0} }; static const symbol s_2_0[2] = { 'i', 'c' }; static const symbol s_2_1[2] = { 'a', 'd' }; static const symbol s_2_2[2] = { 'o', 's' }; static const symbol s_2_3[2] = { 'i', 'v' }; static const struct among a_2[4] = { /* 0 */ { 2, s_2_0, -1, -1, 0}, /* 1 */ { 2, s_2_1, -1, -1, 0}, /* 2 */ { 2, s_2_2, -1, -1, 0}, /* 3 */ { 2, s_2_3, -1, 1, 0} }; static const symbol s_3_0[4] = { 'a', 'n', 't', 'e' }; static const symbol s_3_1[4] = { 'a', 'v', 'e', 'l' }; static const symbol s_3_2[4] = { 0xED, 'v', 'e', 'l' }; static const struct among a_3[3] = { /* 0 */ { 4, s_3_0, -1, 1, 0}, /* 1 */ { 4, s_3_1, -1, 1, 0}, /* 2 */ { 4, s_3_2, -1, 1, 0} }; static const symbol s_4_0[2] = { 'i', 'c' }; static const symbol s_4_1[4] = { 'a', 'b', 'i', 'l' }; static const symbol s_4_2[2] = { 'i', 'v' }; static const struct among a_4[3] = { /* 0 */ { 2, s_4_0, -1, 1, 0}, /* 1 */ { 4, s_4_1, -1, 1, 0}, /* 2 */ { 2, s_4_2, -1, 1, 0} }; static const symbol s_5_0[3] = { 'i', 'c', 'a' }; static const symbol s_5_1[5] = { 0xE2, 'n', 'c', 'i', 'a' }; static const symbol s_5_2[5] = { 0xEA, 'n', 'c', 'i', 'a' }; static const symbol s_5_3[3] = { 'i', 'r', 'a' }; static const symbol s_5_4[5] = { 'a', 'd', 'o', 'r', 'a' }; static const symbol s_5_5[3] = { 'o', 's', 'a' }; static const symbol s_5_6[4] = { 'i', 's', 't', 'a' }; static const symbol s_5_7[3] = { 'i', 'v', 'a' }; static const symbol s_5_8[3] = { 'e', 'z', 'a' }; static const symbol s_5_9[5] = { 'l', 'o', 'g', 0xED, 'a' }; static const symbol s_5_10[5] = { 'i', 'd', 'a', 'd', 'e' }; static const symbol s_5_11[4] = { 'a', 'n', 't', 'e' }; static const symbol s_5_12[5] = { 'm', 'e', 'n', 't', 'e' }; static const symbol s_5_13[6] = { 'a', 'm', 'e', 'n', 't', 'e' }; static const symbol s_5_14[4] = { 0xE1, 'v', 'e', 'l' }; static const symbol s_5_15[4] = { 0xED, 'v', 'e', 'l' }; static const symbol s_5_16[5] = { 'u', 'c', 'i', 0xF3, 'n' }; static const symbol s_5_17[3] = { 'i', 'c', 'o' }; static const symbol s_5_18[4] = { 'i', 's', 'm', 'o' }; static const symbol s_5_19[3] = { 'o', 's', 'o' }; static const symbol s_5_20[6] = { 'a', 'm', 'e', 'n', 't', 'o' }; static const symbol s_5_21[6] = { 'i', 'm', 'e', 'n', 't', 'o' }; static const symbol s_5_22[3] = { 'i', 'v', 'o' }; static const symbol s_5_23[5] = { 'a', 0xE7, 'a', '~', 'o' }; static const symbol s_5_24[4] = { 'a', 'd', 'o', 'r' }; static const symbol s_5_25[4] = { 'i', 'c', 'a', 's' }; static const symbol s_5_26[6] = { 0xEA, 'n', 'c', 'i', 'a', 's' }; static const symbol s_5_27[4] = { 'i', 'r', 'a', 's' }; static const symbol s_5_28[6] = { 'a', 'd', 'o', 'r', 'a', 's' }; static const symbol s_5_29[4] = { 'o', 's', 'a', 's' }; static const symbol s_5_30[5] = { 'i', 's', 't', 'a', 's' }; static const symbol s_5_31[4] = { 'i', 'v', 'a', 's' }; static const symbol s_5_32[4] = { 'e', 'z', 'a', 's' }; static const symbol s_5_33[6] = { 'l', 'o', 'g', 0xED, 'a', 's' }; static const symbol s_5_34[6] = { 'i', 'd', 'a', 'd', 'e', 's' }; static const symbol s_5_35[7] = { 'u', 'c', 'i', 'o', 'n', 'e', 's' }; static const symbol s_5_36[6] = { 'a', 'd', 'o', 'r', 'e', 's' }; static const symbol s_5_37[5] = { 'a', 'n', 't', 'e', 's' }; static const symbol s_5_38[6] = { 'a', 0xE7, 'o', '~', 'e', 's' }; static const symbol s_5_39[4] = { 'i', 'c', 'o', 's' }; static const symbol s_5_40[5] = { 'i', 's', 'm', 'o', 's' }; static const symbol s_5_41[4] = { 'o', 's', 'o', 's' }; static const symbol s_5_42[7] = { 'a', 'm', 'e', 'n', 't', 'o', 's' }; static const symbol s_5_43[7] = { 'i', 'm', 'e', 'n', 't', 'o', 's' }; static const symbol s_5_44[4] = { 'i', 'v', 'o', 's' }; static const struct among a_5[45] = { /* 0 */ { 3, s_5_0, -1, 1, 0}, /* 1 */ { 5, s_5_1, -1, 1, 0}, /* 2 */ { 5, s_5_2, -1, 4, 0}, /* 3 */ { 3, s_5_3, -1, 9, 0}, /* 4 */ { 5, s_5_4, -1, 1, 0}, /* 5 */ { 3, s_5_5, -1, 1, 0}, /* 6 */ { 4, s_5_6, -1, 1, 0}, /* 7 */ { 3, s_5_7, -1, 8, 0}, /* 8 */ { 3, s_5_8, -1, 1, 0}, /* 9 */ { 5, s_5_9, -1, 2, 0}, /* 10 */ { 5, s_5_10, -1, 7, 0}, /* 11 */ { 4, s_5_11, -1, 1, 0}, /* 12 */ { 5, s_5_12, -1, 6, 0}, /* 13 */ { 6, s_5_13, 12, 5, 0}, /* 14 */ { 4, s_5_14, -1, 1, 0}, /* 15 */ { 4, s_5_15, -1, 1, 0}, /* 16 */ { 5, s_5_16, -1, 3, 0}, /* 17 */ { 3, s_5_17, -1, 1, 0}, /* 18 */ { 4, s_5_18, -1, 1, 0}, /* 19 */ { 3, s_5_19, -1, 1, 0}, /* 20 */ { 6, s_5_20, -1, 1, 0}, /* 21 */ { 6, s_5_21, -1, 1, 0}, /* 22 */ { 3, s_5_22, -1, 8, 0}, /* 23 */ { 5, s_5_23, -1, 1, 0}, /* 24 */ { 4, s_5_24, -1, 1, 0}, /* 25 */ { 4, s_5_25, -1, 1, 0}, /* 26 */ { 6, s_5_26, -1, 4, 0}, /* 27 */ { 4, s_5_27, -1, 9, 0}, /* 28 */ { 6, s_5_28, -1, 1, 0}, /* 29 */ { 4, s_5_29, -1, 1, 0}, /* 30 */ { 5, s_5_30, -1, 1, 0}, /* 31 */ { 4, s_5_31, -1, 8, 0}, /* 32 */ { 4, s_5_32, -1, 1, 0}, /* 33 */ { 6, s_5_33, -1, 2, 0}, /* 34 */ { 6, s_5_34, -1, 7, 0}, /* 35 */ { 7, s_5_35, -1, 3, 0}, /* 36 */ { 6, s_5_36, -1, 1, 0}, /* 37 */ { 5, s_5_37, -1, 1, 0}, /* 38 */ { 6, s_5_38, -1, 1, 0}, /* 39 */ { 4, s_5_39, -1, 1, 0}, /* 40 */ { 5, s_5_40, -1, 1, 0}, /* 41 */ { 4, s_5_41, -1, 1, 0}, /* 42 */ { 7, s_5_42, -1, 1, 0}, /* 43 */ { 7, s_5_43, -1, 1, 0}, /* 44 */ { 4, s_5_44, -1, 8, 0} }; static const symbol s_6_0[3] = { 'a', 'd', 'a' }; static const symbol s_6_1[3] = { 'i', 'd', 'a' }; static const symbol s_6_2[2] = { 'i', 'a' }; static const symbol s_6_3[4] = { 'a', 'r', 'i', 'a' }; static const symbol s_6_4[4] = { 'e', 'r', 'i', 'a' }; static const symbol s_6_5[4] = { 'i', 'r', 'i', 'a' }; static const symbol s_6_6[3] = { 'a', 'r', 'a' }; static const symbol s_6_7[3] = { 'e', 'r', 'a' }; static const symbol s_6_8[3] = { 'i', 'r', 'a' }; static const symbol s_6_9[3] = { 'a', 'v', 'a' }; static const symbol s_6_10[4] = { 'a', 's', 's', 'e' }; static const symbol s_6_11[4] = { 'e', 's', 's', 'e' }; static const symbol s_6_12[4] = { 'i', 's', 's', 'e' }; static const symbol s_6_13[4] = { 'a', 's', 't', 'e' }; static const symbol s_6_14[4] = { 'e', 's', 't', 'e' }; static const symbol s_6_15[4] = { 'i', 's', 't', 'e' }; static const symbol s_6_16[2] = { 'e', 'i' }; static const symbol s_6_17[4] = { 'a', 'r', 'e', 'i' }; static const symbol s_6_18[4] = { 'e', 'r', 'e', 'i' }; static const symbol s_6_19[4] = { 'i', 'r', 'e', 'i' }; static const symbol s_6_20[2] = { 'a', 'm' }; static const symbol s_6_21[3] = { 'i', 'a', 'm' }; static const symbol s_6_22[5] = { 'a', 'r', 'i', 'a', 'm' }; static const symbol s_6_23[5] = { 'e', 'r', 'i', 'a', 'm' }; static const symbol s_6_24[5] = { 'i', 'r', 'i', 'a', 'm' }; static const symbol s_6_25[4] = { 'a', 'r', 'a', 'm' }; static const symbol s_6_26[4] = { 'e', 'r', 'a', 'm' }; static const symbol s_6_27[4] = { 'i', 'r', 'a', 'm' }; static const symbol s_6_28[4] = { 'a', 'v', 'a', 'm' }; static const symbol s_6_29[2] = { 'e', 'm' }; static const symbol s_6_30[4] = { 'a', 'r', 'e', 'm' }; static const symbol s_6_31[4] = { 'e', 'r', 'e', 'm' }; static const symbol s_6_32[4] = { 'i', 'r', 'e', 'm' }; static const symbol s_6_33[5] = { 'a', 's', 's', 'e', 'm' }; static const symbol s_6_34[5] = { 'e', 's', 's', 'e', 'm' }; static const symbol s_6_35[5] = { 'i', 's', 's', 'e', 'm' }; static const symbol s_6_36[3] = { 'a', 'd', 'o' }; static const symbol s_6_37[3] = { 'i', 'd', 'o' }; static const symbol s_6_38[4] = { 'a', 'n', 'd', 'o' }; static const symbol s_6_39[4] = { 'e', 'n', 'd', 'o' }; static const symbol s_6_40[4] = { 'i', 'n', 'd', 'o' }; static const symbol s_6_41[5] = { 'a', 'r', 'a', '~', 'o' }; static const symbol s_6_42[5] = { 'e', 'r', 'a', '~', 'o' }; static const symbol s_6_43[5] = { 'i', 'r', 'a', '~', 'o' }; static const symbol s_6_44[2] = { 'a', 'r' }; static const symbol s_6_45[2] = { 'e', 'r' }; static const symbol s_6_46[2] = { 'i', 'r' }; static const symbol s_6_47[2] = { 'a', 's' }; static const symbol s_6_48[4] = { 'a', 'd', 'a', 's' }; static const symbol s_6_49[4] = { 'i', 'd', 'a', 's' }; static const symbol s_6_50[3] = { 'i', 'a', 's' }; static const symbol s_6_51[5] = { 'a', 'r', 'i', 'a', 's' }; static const symbol s_6_52[5] = { 'e', 'r', 'i', 'a', 's' }; static const symbol s_6_53[5] = { 'i', 'r', 'i', 'a', 's' }; static const symbol s_6_54[4] = { 'a', 'r', 'a', 's' }; static const symbol s_6_55[4] = { 'e', 'r', 'a', 's' }; static const symbol s_6_56[4] = { 'i', 'r', 'a', 's' }; static const symbol s_6_57[4] = { 'a', 'v', 'a', 's' }; static const symbol s_6_58[2] = { 'e', 's' }; static const symbol s_6_59[5] = { 'a', 'r', 'd', 'e', 's' }; static const symbol s_6_60[5] = { 'e', 'r', 'd', 'e', 's' }; static const symbol s_6_61[5] = { 'i', 'r', 'd', 'e', 's' }; static const symbol s_6_62[4] = { 'a', 'r', 'e', 's' }; static const symbol s_6_63[4] = { 'e', 'r', 'e', 's' }; static const symbol s_6_64[4] = { 'i', 'r', 'e', 's' }; static const symbol s_6_65[5] = { 'a', 's', 's', 'e', 's' }; static const symbol s_6_66[5] = { 'e', 's', 's', 'e', 's' }; static const symbol s_6_67[5] = { 'i', 's', 's', 'e', 's' }; static const symbol s_6_68[5] = { 'a', 's', 't', 'e', 's' }; static const symbol s_6_69[5] = { 'e', 's', 't', 'e', 's' }; static const symbol s_6_70[5] = { 'i', 's', 't', 'e', 's' }; static const symbol s_6_71[2] = { 'i', 's' }; static const symbol s_6_72[3] = { 'a', 'i', 's' }; static const symbol s_6_73[3] = { 'e', 'i', 's' }; static const symbol s_6_74[5] = { 'a', 'r', 'e', 'i', 's' }; static const symbol s_6_75[5] = { 'e', 'r', 'e', 'i', 's' }; static const symbol s_6_76[5] = { 'i', 'r', 'e', 'i', 's' }; static const symbol s_6_77[5] = { 0xE1, 'r', 'e', 'i', 's' }; static const symbol s_6_78[5] = { 0xE9, 'r', 'e', 'i', 's' }; static const symbol s_6_79[5] = { 0xED, 'r', 'e', 'i', 's' }; static const symbol s_6_80[6] = { 0xE1, 's', 's', 'e', 'i', 's' }; static const symbol s_6_81[6] = { 0xE9, 's', 's', 'e', 'i', 's' }; static const symbol s_6_82[6] = { 0xED, 's', 's', 'e', 'i', 's' }; static const symbol s_6_83[5] = { 0xE1, 'v', 'e', 'i', 's' }; static const symbol s_6_84[4] = { 0xED, 'e', 'i', 's' }; static const symbol s_6_85[6] = { 'a', 'r', 0xED, 'e', 'i', 's' }; static const symbol s_6_86[6] = { 'e', 'r', 0xED, 'e', 'i', 's' }; static const symbol s_6_87[6] = { 'i', 'r', 0xED, 'e', 'i', 's' }; static const symbol s_6_88[4] = { 'a', 'd', 'o', 's' }; static const symbol s_6_89[4] = { 'i', 'd', 'o', 's' }; static const symbol s_6_90[4] = { 'a', 'm', 'o', 's' }; static const symbol s_6_91[6] = { 0xE1, 'r', 'a', 'm', 'o', 's' }; static const symbol s_6_92[6] = { 0xE9, 'r', 'a', 'm', 'o', 's' }; static const symbol s_6_93[6] = { 0xED, 'r', 'a', 'm', 'o', 's' }; static const symbol s_6_94[6] = { 0xE1, 'v', 'a', 'm', 'o', 's' }; static const symbol s_6_95[5] = { 0xED, 'a', 'm', 'o', 's' }; static const symbol s_6_96[7] = { 'a', 'r', 0xED, 'a', 'm', 'o', 's' }; static const symbol s_6_97[7] = { 'e', 'r', 0xED, 'a', 'm', 'o', 's' }; static const symbol s_6_98[7] = { 'i', 'r', 0xED, 'a', 'm', 'o', 's' }; static const symbol s_6_99[4] = { 'e', 'm', 'o', 's' }; static const symbol s_6_100[6] = { 'a', 'r', 'e', 'm', 'o', 's' }; static const symbol s_6_101[6] = { 'e', 'r', 'e', 'm', 'o', 's' }; static const symbol s_6_102[6] = { 'i', 'r', 'e', 'm', 'o', 's' }; static const symbol s_6_103[7] = { 0xE1, 's', 's', 'e', 'm', 'o', 's' }; static const symbol s_6_104[7] = { 0xEA, 's', 's', 'e', 'm', 'o', 's' }; static const symbol s_6_105[7] = { 0xED, 's', 's', 'e', 'm', 'o', 's' }; static const symbol s_6_106[4] = { 'i', 'm', 'o', 's' }; static const symbol s_6_107[5] = { 'a', 'r', 'm', 'o', 's' }; static const symbol s_6_108[5] = { 'e', 'r', 'm', 'o', 's' }; static const symbol s_6_109[5] = { 'i', 'r', 'm', 'o', 's' }; static const symbol s_6_110[4] = { 0xE1, 'm', 'o', 's' }; static const symbol s_6_111[4] = { 'a', 'r', 0xE1, 's' }; static const symbol s_6_112[4] = { 'e', 'r', 0xE1, 's' }; static const symbol s_6_113[4] = { 'i', 'r', 0xE1, 's' }; static const symbol s_6_114[2] = { 'e', 'u' }; static const symbol s_6_115[2] = { 'i', 'u' }; static const symbol s_6_116[2] = { 'o', 'u' }; static const symbol s_6_117[3] = { 'a', 'r', 0xE1 }; static const symbol s_6_118[3] = { 'e', 'r', 0xE1 }; static const symbol s_6_119[3] = { 'i', 'r', 0xE1 }; static const struct among a_6[120] = { /* 0 */ { 3, s_6_0, -1, 1, 0}, /* 1 */ { 3, s_6_1, -1, 1, 0}, /* 2 */ { 2, s_6_2, -1, 1, 0}, /* 3 */ { 4, s_6_3, 2, 1, 0}, /* 4 */ { 4, s_6_4, 2, 1, 0}, /* 5 */ { 4, s_6_5, 2, 1, 0}, /* 6 */ { 3, s_6_6, -1, 1, 0}, /* 7 */ { 3, s_6_7, -1, 1, 0}, /* 8 */ { 3, s_6_8, -1, 1, 0}, /* 9 */ { 3, s_6_9, -1, 1, 0}, /* 10 */ { 4, s_6_10, -1, 1, 0}, /* 11 */ { 4, s_6_11, -1, 1, 0}, /* 12 */ { 4, s_6_12, -1, 1, 0}, /* 13 */ { 4, s_6_13, -1, 1, 0}, /* 14 */ { 4, s_6_14, -1, 1, 0}, /* 15 */ { 4, s_6_15, -1, 1, 0}, /* 16 */ { 2, s_6_16, -1, 1, 0}, /* 17 */ { 4, s_6_17, 16, 1, 0}, /* 18 */ { 4, s_6_18, 16, 1, 0}, /* 19 */ { 4, s_6_19, 16, 1, 0}, /* 20 */ { 2, s_6_20, -1, 1, 0}, /* 21 */ { 3, s_6_21, 20, 1, 0}, /* 22 */ { 5, s_6_22, 21, 1, 0}, /* 23 */ { 5, s_6_23, 21, 1, 0}, /* 24 */ { 5, s_6_24, 21, 1, 0}, /* 25 */ { 4, s_6_25, 20, 1, 0}, /* 26 */ { 4, s_6_26, 20, 1, 0}, /* 27 */ { 4, s_6_27, 20, 1, 0}, /* 28 */ { 4, s_6_28, 20, 1, 0}, /* 29 */ { 2, s_6_29, -1, 1, 0}, /* 30 */ { 4, s_6_30, 29, 1, 0}, /* 31 */ { 4, s_6_31, 29, 1, 0}, /* 32 */ { 4, s_6_32, 29, 1, 0}, /* 33 */ { 5, s_6_33, 29, 1, 0}, /* 34 */ { 5, s_6_34, 29, 1, 0}, /* 35 */ { 5, s_6_35, 29, 1, 0}, /* 36 */ { 3, s_6_36, -1, 1, 0}, /* 37 */ { 3, s_6_37, -1, 1, 0}, /* 38 */ { 4, s_6_38, -1, 1, 0}, /* 39 */ { 4, s_6_39, -1, 1, 0}, /* 40 */ { 4, s_6_40, -1, 1, 0}, /* 41 */ { 5, s_6_41, -1, 1, 0}, /* 42 */ { 5, s_6_42, -1, 1, 0}, /* 43 */ { 5, s_6_43, -1, 1, 0}, /* 44 */ { 2, s_6_44, -1, 1, 0}, /* 45 */ { 2, s_6_45, -1, 1, 0}, /* 46 */ { 2, s_6_46, -1, 1, 0}, /* 47 */ { 2, s_6_47, -1, 1, 0}, /* 48 */ { 4, s_6_48, 47, 1, 0}, /* 49 */ { 4, s_6_49, 47, 1, 0}, /* 50 */ { 3, s_6_50, 47, 1, 0}, /* 51 */ { 5, s_6_51, 50, 1, 0}, /* 52 */ { 5, s_6_52, 50, 1, 0}, /* 53 */ { 5, s_6_53, 50, 1, 0}, /* 54 */ { 4, s_6_54, 47, 1, 0}, /* 55 */ { 4, s_6_55, 47, 1, 0}, /* 56 */ { 4, s_6_56, 47, 1, 0}, /* 57 */ { 4, s_6_57, 47, 1, 0}, /* 58 */ { 2, s_6_58, -1, 1, 0}, /* 59 */ { 5, s_6_59, 58, 1, 0}, /* 60 */ { 5, s_6_60, 58, 1, 0}, /* 61 */ { 5, s_6_61, 58, 1, 0}, /* 62 */ { 4, s_6_62, 58, 1, 0}, /* 63 */ { 4, s_6_63, 58, 1, 0}, /* 64 */ { 4, s_6_64, 58, 1, 0}, /* 65 */ { 5, s_6_65, 58, 1, 0}, /* 66 */ { 5, s_6_66, 58, 1, 0}, /* 67 */ { 5, s_6_67, 58, 1, 0}, /* 68 */ { 5, s_6_68, 58, 1, 0}, /* 69 */ { 5, s_6_69, 58, 1, 0}, /* 70 */ { 5, s_6_70, 58, 1, 0}, /* 71 */ { 2, s_6_71, -1, 1, 0}, /* 72 */ { 3, s_6_72, 71, 1, 0}, /* 73 */ { 3, s_6_73, 71, 1, 0}, /* 74 */ { 5, s_6_74, 73, 1, 0}, /* 75 */ { 5, s_6_75, 73, 1, 0}, /* 76 */ { 5, s_6_76, 73, 1, 0}, /* 77 */ { 5, s_6_77, 73, 1, 0}, /* 78 */ { 5, s_6_78, 73, 1, 0}, /* 79 */ { 5, s_6_79, 73, 1, 0}, /* 80 */ { 6, s_6_80, 73, 1, 0}, /* 81 */ { 6, s_6_81, 73, 1, 0}, /* 82 */ { 6, s_6_82, 73, 1, 0}, /* 83 */ { 5, s_6_83, 73, 1, 0}, /* 84 */ { 4, s_6_84, 73, 1, 0}, /* 85 */ { 6, s_6_85, 84, 1, 0}, /* 86 */ { 6, s_6_86, 84, 1, 0}, /* 87 */ { 6, s_6_87, 84, 1, 0}, /* 88 */ { 4, s_6_88, -1, 1, 0}, /* 89 */ { 4, s_6_89, -1, 1, 0}, /* 90 */ { 4, s_6_90, -1, 1, 0}, /* 91 */ { 6, s_6_91, 90, 1, 0}, /* 92 */ { 6, s_6_92, 90, 1, 0}, /* 93 */ { 6, s_6_93, 90, 1, 0}, /* 94 */ { 6, s_6_94, 90, 1, 0}, /* 95 */ { 5, s_6_95, 90, 1, 0}, /* 96 */ { 7, s_6_96, 95, 1, 0}, /* 97 */ { 7, s_6_97, 95, 1, 0}, /* 98 */ { 7, s_6_98, 95, 1, 0}, /* 99 */ { 4, s_6_99, -1, 1, 0}, /*100 */ { 6, s_6_100, 99, 1, 0}, /*101 */ { 6, s_6_101, 99, 1, 0}, /*102 */ { 6, s_6_102, 99, 1, 0}, /*103 */ { 7, s_6_103, 99, 1, 0}, /*104 */ { 7, s_6_104, 99, 1, 0}, /*105 */ { 7, s_6_105, 99, 1, 0}, /*106 */ { 4, s_6_106, -1, 1, 0}, /*107 */ { 5, s_6_107, -1, 1, 0}, /*108 */ { 5, s_6_108, -1, 1, 0}, /*109 */ { 5, s_6_109, -1, 1, 0}, /*110 */ { 4, s_6_110, -1, 1, 0}, /*111 */ { 4, s_6_111, -1, 1, 0}, /*112 */ { 4, s_6_112, -1, 1, 0}, /*113 */ { 4, s_6_113, -1, 1, 0}, /*114 */ { 2, s_6_114, -1, 1, 0}, /*115 */ { 2, s_6_115, -1, 1, 0}, /*116 */ { 2, s_6_116, -1, 1, 0}, /*117 */ { 3, s_6_117, -1, 1, 0}, /*118 */ { 3, s_6_118, -1, 1, 0}, /*119 */ { 3, s_6_119, -1, 1, 0} }; static const symbol s_7_0[1] = { 'a' }; static const symbol s_7_1[1] = { 'i' }; static const symbol s_7_2[1] = { 'o' }; static const symbol s_7_3[2] = { 'o', 's' }; static const symbol s_7_4[1] = { 0xE1 }; static const symbol s_7_5[1] = { 0xED }; static const symbol s_7_6[1] = { 0xF3 }; static const struct among a_7[7] = { /* 0 */ { 1, s_7_0, -1, 1, 0}, /* 1 */ { 1, s_7_1, -1, 1, 0}, /* 2 */ { 1, s_7_2, -1, 1, 0}, /* 3 */ { 2, s_7_3, -1, 1, 0}, /* 4 */ { 1, s_7_4, -1, 1, 0}, /* 5 */ { 1, s_7_5, -1, 1, 0}, /* 6 */ { 1, s_7_6, -1, 1, 0} }; static const symbol s_8_0[1] = { 'e' }; static const symbol s_8_1[1] = { 0xE7 }; static const symbol s_8_2[1] = { 0xE9 }; static const symbol s_8_3[1] = { 0xEA }; static const struct among a_8[4] = { /* 0 */ { 1, s_8_0, -1, 1, 0}, /* 1 */ { 1, s_8_1, -1, 2, 0}, /* 2 */ { 1, s_8_2, -1, 1, 0}, /* 3 */ { 1, s_8_3, -1, 1, 0} }; static const unsigned char g_v[] = { 17, 65, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 19, 12, 2 }; static const symbol s_0[] = { 'a', '~' }; static const symbol s_1[] = { 'o', '~' }; static const symbol s_2[] = { 0xE3 }; static const symbol s_3[] = { 0xF5 }; static const symbol s_4[] = { 'l', 'o', 'g' }; static const symbol s_5[] = { 'u' }; static const symbol s_6[] = { 'e', 'n', 't', 'e' }; static const symbol s_7[] = { 'a', 't' }; static const symbol s_8[] = { 'a', 't' }; static const symbol s_9[] = { 'e' }; static const symbol s_10[] = { 'i', 'r' }; static const symbol s_11[] = { 'u' }; static const symbol s_12[] = { 'g' }; static const symbol s_13[] = { 'i' }; static const symbol s_14[] = { 'c' }; static const symbol s_15[] = { 'c' }; static const symbol s_16[] = { 'i' }; static const symbol s_17[] = { 'c' }; static int r_prelude(struct SN_env * z) { int among_var; while(1) { /* repeat, line 36 */ int c1 = z->c; z->bra = z->c; /* [, line 37 */ if (z->c >= z->l || (z->p[z->c + 0] != 227 && z->p[z->c + 0] != 245)) among_var = 3; else among_var = find_among(z, a_0, 3); /* substring, line 37 */ if (!(among_var)) goto lab0; z->ket = z->c; /* ], line 37 */ switch(among_var) { case 0: goto lab0; case 1: { int ret = slice_from_s(z, 2, s_0); /* <-, line 38 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 2, s_1); /* <-, line 39 */ if (ret < 0) return ret; } break; case 3: if (z->c >= z->l) goto lab0; z->c++; /* next, line 40 */ break; } continue; lab0: z->c = c1; break; } return 1; } static int r_mark_regions(struct SN_env * z) { z->I[0] = z->l; z->I[1] = z->l; z->I[2] = z->l; { int c1 = z->c; /* do, line 50 */ { int c2 = z->c; /* or, line 52 */ if (in_grouping(z, g_v, 97, 250, 0)) goto lab2; { int c3 = z->c; /* or, line 51 */ if (out_grouping(z, g_v, 97, 250, 0)) goto lab4; { /* gopast */ /* grouping v, line 51 */ int ret = out_grouping(z, g_v, 97, 250, 1); if (ret < 0) goto lab4; z->c += ret; } goto lab3; lab4: z->c = c3; if (in_grouping(z, g_v, 97, 250, 0)) goto lab2; { /* gopast */ /* non v, line 51 */ int ret = in_grouping(z, g_v, 97, 250, 1); if (ret < 0) goto lab2; z->c += ret; } } lab3: goto lab1; lab2: z->c = c2; if (out_grouping(z, g_v, 97, 250, 0)) goto lab0; { int c4 = z->c; /* or, line 53 */ if (out_grouping(z, g_v, 97, 250, 0)) goto lab6; { /* gopast */ /* grouping v, line 53 */ int ret = out_grouping(z, g_v, 97, 250, 1); if (ret < 0) goto lab6; z->c += ret; } goto lab5; lab6: z->c = c4; if (in_grouping(z, g_v, 97, 250, 0)) goto lab0; if (z->c >= z->l) goto lab0; z->c++; /* next, line 53 */ } lab5: ; } lab1: z->I[0] = z->c; /* setmark pV, line 54 */ lab0: z->c = c1; } { int c5 = z->c; /* do, line 56 */ { /* gopast */ /* grouping v, line 57 */ int ret = out_grouping(z, g_v, 97, 250, 1); if (ret < 0) goto lab7; z->c += ret; } { /* gopast */ /* non v, line 57 */ int ret = in_grouping(z, g_v, 97, 250, 1); if (ret < 0) goto lab7; z->c += ret; } z->I[1] = z->c; /* setmark p1, line 57 */ { /* gopast */ /* grouping v, line 58 */ int ret = out_grouping(z, g_v, 97, 250, 1); if (ret < 0) goto lab7; z->c += ret; } { /* gopast */ /* non v, line 58 */ int ret = in_grouping(z, g_v, 97, 250, 1); if (ret < 0) goto lab7; z->c += ret; } z->I[2] = z->c; /* setmark p2, line 58 */ lab7: z->c = c5; } return 1; } static int r_postlude(struct SN_env * z) { int among_var; while(1) { /* repeat, line 62 */ int c1 = z->c; z->bra = z->c; /* [, line 63 */ if (z->c + 1 >= z->l || z->p[z->c + 1] != 126) among_var = 3; else among_var = find_among(z, a_1, 3); /* substring, line 63 */ if (!(among_var)) goto lab0; z->ket = z->c; /* ], line 63 */ switch(among_var) { case 0: goto lab0; case 1: { int ret = slice_from_s(z, 1, s_2); /* <-, line 64 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 1, s_3); /* <-, line 65 */ if (ret < 0) return ret; } break; case 3: if (z->c >= z->l) goto lab0; z->c++; /* next, line 66 */ break; } continue; lab0: z->c = c1; break; } return 1; } static int r_RV(struct SN_env * z) { if (!(z->I[0] <= z->c)) return 0; return 1; } static int r_R1(struct SN_env * z) { if (!(z->I[1] <= z->c)) return 0; return 1; } static int r_R2(struct SN_env * z) { if (!(z->I[2] <= z->c)) return 0; return 1; } static int r_standard_suffix(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 77 */ if (z->c - 2 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((839714 >> (z->p[z->c - 1] & 0x1f)) & 1)) return 0; among_var = find_among_b(z, a_5, 45); /* substring, line 77 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 77 */ switch(among_var) { case 0: return 0; case 1: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 93 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 93 */ if (ret < 0) return ret; } break; case 2: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 98 */ if (ret < 0) return ret; } { int ret = slice_from_s(z, 3, s_4); /* <-, line 98 */ if (ret < 0) return ret; } break; case 3: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 102 */ if (ret < 0) return ret; } { int ret = slice_from_s(z, 1, s_5); /* <-, line 102 */ if (ret < 0) return ret; } break; case 4: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 106 */ if (ret < 0) return ret; } { int ret = slice_from_s(z, 4, s_6); /* <-, line 106 */ if (ret < 0) return ret; } break; case 5: { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 110 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 110 */ if (ret < 0) return ret; } { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 111 */ z->ket = z->c; /* [, line 112 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((4718616 >> (z->p[z->c - 1] & 0x1f)) & 1)) { z->c = z->l - m_keep; goto lab0; } among_var = find_among_b(z, a_2, 4); /* substring, line 112 */ if (!(among_var)) { z->c = z->l - m_keep; goto lab0; } z->bra = z->c; /* ], line 112 */ { int ret = r_R2(z); if (ret == 0) { z->c = z->l - m_keep; goto lab0; } /* call R2, line 112 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 112 */ if (ret < 0) return ret; } switch(among_var) { case 0: { z->c = z->l - m_keep; goto lab0; } case 1: z->ket = z->c; /* [, line 113 */ if (!(eq_s_b(z, 2, s_7))) { z->c = z->l - m_keep; goto lab0; } z->bra = z->c; /* ], line 113 */ { int ret = r_R2(z); if (ret == 0) { z->c = z->l - m_keep; goto lab0; } /* call R2, line 113 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 113 */ if (ret < 0) return ret; } break; } lab0: ; } break; case 6: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 122 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 122 */ if (ret < 0) return ret; } { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 123 */ z->ket = z->c; /* [, line 124 */ if (z->c - 3 <= z->lb || (z->p[z->c - 1] != 101 && z->p[z->c - 1] != 108)) { z->c = z->l - m_keep; goto lab1; } among_var = find_among_b(z, a_3, 3); /* substring, line 124 */ if (!(among_var)) { z->c = z->l - m_keep; goto lab1; } z->bra = z->c; /* ], line 124 */ switch(among_var) { case 0: { z->c = z->l - m_keep; goto lab1; } case 1: { int ret = r_R2(z); if (ret == 0) { z->c = z->l - m_keep; goto lab1; } /* call R2, line 127 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 127 */ if (ret < 0) return ret; } break; } lab1: ; } break; case 7: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 134 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 134 */ if (ret < 0) return ret; } { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 135 */ z->ket = z->c; /* [, line 136 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((4198408 >> (z->p[z->c - 1] & 0x1f)) & 1)) { z->c = z->l - m_keep; goto lab2; } among_var = find_among_b(z, a_4, 3); /* substring, line 136 */ if (!(among_var)) { z->c = z->l - m_keep; goto lab2; } z->bra = z->c; /* ], line 136 */ switch(among_var) { case 0: { z->c = z->l - m_keep; goto lab2; } case 1: { int ret = r_R2(z); if (ret == 0) { z->c = z->l - m_keep; goto lab2; } /* call R2, line 139 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 139 */ if (ret < 0) return ret; } break; } lab2: ; } break; case 8: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 146 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 146 */ if (ret < 0) return ret; } { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 147 */ z->ket = z->c; /* [, line 148 */ if (!(eq_s_b(z, 2, s_8))) { z->c = z->l - m_keep; goto lab3; } z->bra = z->c; /* ], line 148 */ { int ret = r_R2(z); if (ret == 0) { z->c = z->l - m_keep; goto lab3; } /* call R2, line 148 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 148 */ if (ret < 0) return ret; } lab3: ; } break; case 9: { int ret = r_RV(z); if (ret == 0) return 0; /* call RV, line 153 */ if (ret < 0) return ret; } if (!(eq_s_b(z, 1, s_9))) return 0; { int ret = slice_from_s(z, 2, s_10); /* <-, line 154 */ if (ret < 0) return ret; } break; } return 1; } static int r_verb_suffix(struct SN_env * z) { int among_var; { int mlimit; /* setlimit, line 159 */ int m1 = z->l - z->c; (void)m1; if (z->c < z->I[0]) return 0; z->c = z->I[0]; /* tomark, line 159 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m1; z->ket = z->c; /* [, line 160 */ among_var = find_among_b(z, a_6, 120); /* substring, line 160 */ if (!(among_var)) { z->lb = mlimit; return 0; } z->bra = z->c; /* ], line 160 */ switch(among_var) { case 0: { z->lb = mlimit; return 0; } case 1: { int ret = slice_del(z); /* delete, line 179 */ if (ret < 0) return ret; } break; } z->lb = mlimit; } return 1; } static int r_residual_suffix(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 184 */ among_var = find_among_b(z, a_7, 7); /* substring, line 184 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 184 */ switch(among_var) { case 0: return 0; case 1: { int ret = r_RV(z); if (ret == 0) return 0; /* call RV, line 187 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 187 */ if (ret < 0) return ret; } break; } return 1; } static int r_residual_form(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 192 */ among_var = find_among_b(z, a_8, 4); /* substring, line 192 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 192 */ switch(among_var) { case 0: return 0; case 1: { int ret = r_RV(z); if (ret == 0) return 0; /* call RV, line 194 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 194 */ if (ret < 0) return ret; } z->ket = z->c; /* [, line 194 */ { int m1 = z->l - z->c; (void)m1; /* or, line 194 */ if (!(eq_s_b(z, 1, s_11))) goto lab1; z->bra = z->c; /* ], line 194 */ { int m_test = z->l - z->c; /* test, line 194 */ if (!(eq_s_b(z, 1, s_12))) goto lab1; z->c = z->l - m_test; } goto lab0; lab1: z->c = z->l - m1; if (!(eq_s_b(z, 1, s_13))) return 0; z->bra = z->c; /* ], line 195 */ { int m_test = z->l - z->c; /* test, line 195 */ if (!(eq_s_b(z, 1, s_14))) return 0; z->c = z->l - m_test; } } lab0: { int ret = r_RV(z); if (ret == 0) return 0; /* call RV, line 195 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 195 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 1, s_15); /* <-, line 196 */ if (ret < 0) return ret; } break; } return 1; } extern int portuguese_ISO_8859_1_stem(struct SN_env * z) { { int c1 = z->c; /* do, line 202 */ { int ret = r_prelude(z); if (ret == 0) goto lab0; /* call prelude, line 202 */ if (ret < 0) return ret; } lab0: z->c = c1; } { int c2 = z->c; /* do, line 203 */ { int ret = r_mark_regions(z); if (ret == 0) goto lab1; /* call mark_regions, line 203 */ if (ret < 0) return ret; } lab1: z->c = c2; } z->lb = z->c; z->c = z->l; /* backwards, line 204 */ { int m3 = z->l - z->c; (void)m3; /* do, line 205 */ { int m4 = z->l - z->c; (void)m4; /* or, line 209 */ { int m5 = z->l - z->c; (void)m5; /* and, line 207 */ { int m6 = z->l - z->c; (void)m6; /* or, line 206 */ { int ret = r_standard_suffix(z); if (ret == 0) goto lab6; /* call standard_suffix, line 206 */ if (ret < 0) return ret; } goto lab5; lab6: z->c = z->l - m6; { int ret = r_verb_suffix(z); if (ret == 0) goto lab4; /* call verb_suffix, line 206 */ if (ret < 0) return ret; } } lab5: z->c = z->l - m5; { int m7 = z->l - z->c; (void)m7; /* do, line 207 */ z->ket = z->c; /* [, line 207 */ if (!(eq_s_b(z, 1, s_16))) goto lab7; z->bra = z->c; /* ], line 207 */ { int m_test = z->l - z->c; /* test, line 207 */ if (!(eq_s_b(z, 1, s_17))) goto lab7; z->c = z->l - m_test; } { int ret = r_RV(z); if (ret == 0) goto lab7; /* call RV, line 207 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 207 */ if (ret < 0) return ret; } lab7: z->c = z->l - m7; } } goto lab3; lab4: z->c = z->l - m4; { int ret = r_residual_suffix(z); if (ret == 0) goto lab2; /* call residual_suffix, line 209 */ if (ret < 0) return ret; } } lab3: lab2: z->c = z->l - m3; } { int m8 = z->l - z->c; (void)m8; /* do, line 211 */ { int ret = r_residual_form(z); if (ret == 0) goto lab8; /* call residual_form, line 211 */ if (ret < 0) return ret; } lab8: z->c = z->l - m8; } z->c = z->lb; { int c9 = z->c; /* do, line 213 */ { int ret = r_postlude(z); if (ret == 0) goto lab9; /* call postlude, line 213 */ if (ret < 0) return ret; } lab9: z->c = c9; } return 1; } extern struct SN_env * portuguese_ISO_8859_1_create_env(void) { return SN_create_env(0, 3, 0); } extern void portuguese_ISO_8859_1_close_env(struct SN_env * z) { SN_close_env(z, 0); } swish-e-2.4.7/src/snowball/Makefile.am0000664000077100017500000000102411166010110014454 00000000000000AM_CPPFLAGS = -I"$(srcdir)" noinst_LTLIBRARIES = libsnowball.la libsnowball_la_SOURCES = \ api.c \ api.h \ utilities.c \ header.h \ stem_en1.c \ stem_en1.h \ stem_en2.c \ stem_en2.h \ stem_es.c \ stem_es.h \ stem_fr.c \ stem_fr.h \ stem_it.c \ stem_it.h \ stem_pt.c \ stem_pt.h \ stem_de.c \ stem_de.h \ stem_nl.c \ stem_nl.h \ stem_no.c \ stem_no.h \ stem_se.c \ stem_se.h \ stem_dk.c \ stem_dk.h \ stem_ru.c \ stem_ru.h \ stem_fi.c \ stem_fi.h \ stem_ro.c \ stem_ro.h \ stem_hu.c \ stem_hu.h swish-e-2.4.7/src/snowball/stem_es.c0000664000077100017500000012013311166010107014234 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #include "header.h" #ifdef __cplusplus extern "C" { #endif extern int spanish_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif static int r_residual_suffix(struct SN_env * z); static int r_verb_suffix(struct SN_env * z); static int r_y_verb_suffix(struct SN_env * z); static int r_standard_suffix(struct SN_env * z); static int r_attached_pronoun(struct SN_env * z); static int r_R2(struct SN_env * z); static int r_R1(struct SN_env * z); static int r_RV(struct SN_env * z); static int r_mark_regions(struct SN_env * z); static int r_postlude(struct SN_env * z); #ifdef __cplusplus extern "C" { #endif extern struct SN_env * spanish_ISO_8859_1_create_env(void); extern void spanish_ISO_8859_1_close_env(struct SN_env * z); #ifdef __cplusplus } #endif static const symbol s_0_1[1] = { 0xE1 }; static const symbol s_0_2[1] = { 0xE9 }; static const symbol s_0_3[1] = { 0xED }; static const symbol s_0_4[1] = { 0xF3 }; static const symbol s_0_5[1] = { 0xFA }; static const struct among a_0[6] = { /* 0 */ { 0, 0, -1, 6, 0}, /* 1 */ { 1, s_0_1, 0, 1, 0}, /* 2 */ { 1, s_0_2, 0, 2, 0}, /* 3 */ { 1, s_0_3, 0, 3, 0}, /* 4 */ { 1, s_0_4, 0, 4, 0}, /* 5 */ { 1, s_0_5, 0, 5, 0} }; static const symbol s_1_0[2] = { 'l', 'a' }; static const symbol s_1_1[4] = { 's', 'e', 'l', 'a' }; static const symbol s_1_2[2] = { 'l', 'e' }; static const symbol s_1_3[2] = { 'm', 'e' }; static const symbol s_1_4[2] = { 's', 'e' }; static const symbol s_1_5[2] = { 'l', 'o' }; static const symbol s_1_6[4] = { 's', 'e', 'l', 'o' }; static const symbol s_1_7[3] = { 'l', 'a', 's' }; static const symbol s_1_8[5] = { 's', 'e', 'l', 'a', 's' }; static const symbol s_1_9[3] = { 'l', 'e', 's' }; static const symbol s_1_10[3] = { 'l', 'o', 's' }; static const symbol s_1_11[5] = { 's', 'e', 'l', 'o', 's' }; static const symbol s_1_12[3] = { 'n', 'o', 's' }; static const struct among a_1[13] = { /* 0 */ { 2, s_1_0, -1, -1, 0}, /* 1 */ { 4, s_1_1, 0, -1, 0}, /* 2 */ { 2, s_1_2, -1, -1, 0}, /* 3 */ { 2, s_1_3, -1, -1, 0}, /* 4 */ { 2, s_1_4, -1, -1, 0}, /* 5 */ { 2, s_1_5, -1, -1, 0}, /* 6 */ { 4, s_1_6, 5, -1, 0}, /* 7 */ { 3, s_1_7, -1, -1, 0}, /* 8 */ { 5, s_1_8, 7, -1, 0}, /* 9 */ { 3, s_1_9, -1, -1, 0}, /* 10 */ { 3, s_1_10, -1, -1, 0}, /* 11 */ { 5, s_1_11, 10, -1, 0}, /* 12 */ { 3, s_1_12, -1, -1, 0} }; static const symbol s_2_0[4] = { 'a', 'n', 'd', 'o' }; static const symbol s_2_1[5] = { 'i', 'e', 'n', 'd', 'o' }; static const symbol s_2_2[5] = { 'y', 'e', 'n', 'd', 'o' }; static const symbol s_2_3[4] = { 0xE1, 'n', 'd', 'o' }; static const symbol s_2_4[5] = { 'i', 0xE9, 'n', 'd', 'o' }; static const symbol s_2_5[2] = { 'a', 'r' }; static const symbol s_2_6[2] = { 'e', 'r' }; static const symbol s_2_7[2] = { 'i', 'r' }; static const symbol s_2_8[2] = { 0xE1, 'r' }; static const symbol s_2_9[2] = { 0xE9, 'r' }; static const symbol s_2_10[2] = { 0xED, 'r' }; static const struct among a_2[11] = { /* 0 */ { 4, s_2_0, -1, 6, 0}, /* 1 */ { 5, s_2_1, -1, 6, 0}, /* 2 */ { 5, s_2_2, -1, 7, 0}, /* 3 */ { 4, s_2_3, -1, 2, 0}, /* 4 */ { 5, s_2_4, -1, 1, 0}, /* 5 */ { 2, s_2_5, -1, 6, 0}, /* 6 */ { 2, s_2_6, -1, 6, 0}, /* 7 */ { 2, s_2_7, -1, 6, 0}, /* 8 */ { 2, s_2_8, -1, 3, 0}, /* 9 */ { 2, s_2_9, -1, 4, 0}, /* 10 */ { 2, s_2_10, -1, 5, 0} }; static const symbol s_3_0[2] = { 'i', 'c' }; static const symbol s_3_1[2] = { 'a', 'd' }; static const symbol s_3_2[2] = { 'o', 's' }; static const symbol s_3_3[2] = { 'i', 'v' }; static const struct among a_3[4] = { /* 0 */ { 2, s_3_0, -1, -1, 0}, /* 1 */ { 2, s_3_1, -1, -1, 0}, /* 2 */ { 2, s_3_2, -1, -1, 0}, /* 3 */ { 2, s_3_3, -1, 1, 0} }; static const symbol s_4_0[4] = { 'a', 'b', 'l', 'e' }; static const symbol s_4_1[4] = { 'i', 'b', 'l', 'e' }; static const symbol s_4_2[4] = { 'a', 'n', 't', 'e' }; static const struct among a_4[3] = { /* 0 */ { 4, s_4_0, -1, 1, 0}, /* 1 */ { 4, s_4_1, -1, 1, 0}, /* 2 */ { 4, s_4_2, -1, 1, 0} }; static const symbol s_5_0[2] = { 'i', 'c' }; static const symbol s_5_1[4] = { 'a', 'b', 'i', 'l' }; static const symbol s_5_2[2] = { 'i', 'v' }; static const struct among a_5[3] = { /* 0 */ { 2, s_5_0, -1, 1, 0}, /* 1 */ { 4, s_5_1, -1, 1, 0}, /* 2 */ { 2, s_5_2, -1, 1, 0} }; static const symbol s_6_0[3] = { 'i', 'c', 'a' }; static const symbol s_6_1[5] = { 'a', 'n', 'c', 'i', 'a' }; static const symbol s_6_2[5] = { 'e', 'n', 'c', 'i', 'a' }; static const symbol s_6_3[5] = { 'a', 'd', 'o', 'r', 'a' }; static const symbol s_6_4[3] = { 'o', 's', 'a' }; static const symbol s_6_5[4] = { 'i', 's', 't', 'a' }; static const symbol s_6_6[3] = { 'i', 'v', 'a' }; static const symbol s_6_7[4] = { 'a', 'n', 'z', 'a' }; static const symbol s_6_8[5] = { 'l', 'o', 'g', 0xED, 'a' }; static const symbol s_6_9[4] = { 'i', 'd', 'a', 'd' }; static const symbol s_6_10[4] = { 'a', 'b', 'l', 'e' }; static const symbol s_6_11[4] = { 'i', 'b', 'l', 'e' }; static const symbol s_6_12[4] = { 'a', 'n', 't', 'e' }; static const symbol s_6_13[5] = { 'm', 'e', 'n', 't', 'e' }; static const symbol s_6_14[6] = { 'a', 'm', 'e', 'n', 't', 'e' }; static const symbol s_6_15[5] = { 'a', 'c', 'i', 0xF3, 'n' }; static const symbol s_6_16[5] = { 'u', 'c', 'i', 0xF3, 'n' }; static const symbol s_6_17[3] = { 'i', 'c', 'o' }; static const symbol s_6_18[4] = { 'i', 's', 'm', 'o' }; static const symbol s_6_19[3] = { 'o', 's', 'o' }; static const symbol s_6_20[7] = { 'a', 'm', 'i', 'e', 'n', 't', 'o' }; static const symbol s_6_21[7] = { 'i', 'm', 'i', 'e', 'n', 't', 'o' }; static const symbol s_6_22[3] = { 'i', 'v', 'o' }; static const symbol s_6_23[4] = { 'a', 'd', 'o', 'r' }; static const symbol s_6_24[4] = { 'i', 'c', 'a', 's' }; static const symbol s_6_25[6] = { 'a', 'n', 'c', 'i', 'a', 's' }; static const symbol s_6_26[6] = { 'e', 'n', 'c', 'i', 'a', 's' }; static const symbol s_6_27[6] = { 'a', 'd', 'o', 'r', 'a', 's' }; static const symbol s_6_28[4] = { 'o', 's', 'a', 's' }; static const symbol s_6_29[5] = { 'i', 's', 't', 'a', 's' }; static const symbol s_6_30[4] = { 'i', 'v', 'a', 's' }; static const symbol s_6_31[5] = { 'a', 'n', 'z', 'a', 's' }; static const symbol s_6_32[6] = { 'l', 'o', 'g', 0xED, 'a', 's' }; static const symbol s_6_33[6] = { 'i', 'd', 'a', 'd', 'e', 's' }; static const symbol s_6_34[5] = { 'a', 'b', 'l', 'e', 's' }; static const symbol s_6_35[5] = { 'i', 'b', 'l', 'e', 's' }; static const symbol s_6_36[7] = { 'a', 'c', 'i', 'o', 'n', 'e', 's' }; static const symbol s_6_37[7] = { 'u', 'c', 'i', 'o', 'n', 'e', 's' }; static const symbol s_6_38[6] = { 'a', 'd', 'o', 'r', 'e', 's' }; static const symbol s_6_39[5] = { 'a', 'n', 't', 'e', 's' }; static const symbol s_6_40[4] = { 'i', 'c', 'o', 's' }; static const symbol s_6_41[5] = { 'i', 's', 'm', 'o', 's' }; static const symbol s_6_42[4] = { 'o', 's', 'o', 's' }; static const symbol s_6_43[8] = { 'a', 'm', 'i', 'e', 'n', 't', 'o', 's' }; static const symbol s_6_44[8] = { 'i', 'm', 'i', 'e', 'n', 't', 'o', 's' }; static const symbol s_6_45[4] = { 'i', 'v', 'o', 's' }; static const struct among a_6[46] = { /* 0 */ { 3, s_6_0, -1, 1, 0}, /* 1 */ { 5, s_6_1, -1, 2, 0}, /* 2 */ { 5, s_6_2, -1, 5, 0}, /* 3 */ { 5, s_6_3, -1, 2, 0}, /* 4 */ { 3, s_6_4, -1, 1, 0}, /* 5 */ { 4, s_6_5, -1, 1, 0}, /* 6 */ { 3, s_6_6, -1, 9, 0}, /* 7 */ { 4, s_6_7, -1, 1, 0}, /* 8 */ { 5, s_6_8, -1, 3, 0}, /* 9 */ { 4, s_6_9, -1, 8, 0}, /* 10 */ { 4, s_6_10, -1, 1, 0}, /* 11 */ { 4, s_6_11, -1, 1, 0}, /* 12 */ { 4, s_6_12, -1, 2, 0}, /* 13 */ { 5, s_6_13, -1, 7, 0}, /* 14 */ { 6, s_6_14, 13, 6, 0}, /* 15 */ { 5, s_6_15, -1, 2, 0}, /* 16 */ { 5, s_6_16, -1, 4, 0}, /* 17 */ { 3, s_6_17, -1, 1, 0}, /* 18 */ { 4, s_6_18, -1, 1, 0}, /* 19 */ { 3, s_6_19, -1, 1, 0}, /* 20 */ { 7, s_6_20, -1, 1, 0}, /* 21 */ { 7, s_6_21, -1, 1, 0}, /* 22 */ { 3, s_6_22, -1, 9, 0}, /* 23 */ { 4, s_6_23, -1, 2, 0}, /* 24 */ { 4, s_6_24, -1, 1, 0}, /* 25 */ { 6, s_6_25, -1, 2, 0}, /* 26 */ { 6, s_6_26, -1, 5, 0}, /* 27 */ { 6, s_6_27, -1, 2, 0}, /* 28 */ { 4, s_6_28, -1, 1, 0}, /* 29 */ { 5, s_6_29, -1, 1, 0}, /* 30 */ { 4, s_6_30, -1, 9, 0}, /* 31 */ { 5, s_6_31, -1, 1, 0}, /* 32 */ { 6, s_6_32, -1, 3, 0}, /* 33 */ { 6, s_6_33, -1, 8, 0}, /* 34 */ { 5, s_6_34, -1, 1, 0}, /* 35 */ { 5, s_6_35, -1, 1, 0}, /* 36 */ { 7, s_6_36, -1, 2, 0}, /* 37 */ { 7, s_6_37, -1, 4, 0}, /* 38 */ { 6, s_6_38, -1, 2, 0}, /* 39 */ { 5, s_6_39, -1, 2, 0}, /* 40 */ { 4, s_6_40, -1, 1, 0}, /* 41 */ { 5, s_6_41, -1, 1, 0}, /* 42 */ { 4, s_6_42, -1, 1, 0}, /* 43 */ { 8, s_6_43, -1, 1, 0}, /* 44 */ { 8, s_6_44, -1, 1, 0}, /* 45 */ { 4, s_6_45, -1, 9, 0} }; static const symbol s_7_0[2] = { 'y', 'a' }; static const symbol s_7_1[2] = { 'y', 'e' }; static const symbol s_7_2[3] = { 'y', 'a', 'n' }; static const symbol s_7_3[3] = { 'y', 'e', 'n' }; static const symbol s_7_4[5] = { 'y', 'e', 'r', 'o', 'n' }; static const symbol s_7_5[5] = { 'y', 'e', 'n', 'd', 'o' }; static const symbol s_7_6[2] = { 'y', 'o' }; static const symbol s_7_7[3] = { 'y', 'a', 's' }; static const symbol s_7_8[3] = { 'y', 'e', 's' }; static const symbol s_7_9[4] = { 'y', 'a', 'i', 's' }; static const symbol s_7_10[5] = { 'y', 'a', 'm', 'o', 's' }; static const symbol s_7_11[2] = { 'y', 0xF3 }; static const struct among a_7[12] = { /* 0 */ { 2, s_7_0, -1, 1, 0}, /* 1 */ { 2, s_7_1, -1, 1, 0}, /* 2 */ { 3, s_7_2, -1, 1, 0}, /* 3 */ { 3, s_7_3, -1, 1, 0}, /* 4 */ { 5, s_7_4, -1, 1, 0}, /* 5 */ { 5, s_7_5, -1, 1, 0}, /* 6 */ { 2, s_7_6, -1, 1, 0}, /* 7 */ { 3, s_7_7, -1, 1, 0}, /* 8 */ { 3, s_7_8, -1, 1, 0}, /* 9 */ { 4, s_7_9, -1, 1, 0}, /* 10 */ { 5, s_7_10, -1, 1, 0}, /* 11 */ { 2, s_7_11, -1, 1, 0} }; static const symbol s_8_0[3] = { 'a', 'b', 'a' }; static const symbol s_8_1[3] = { 'a', 'd', 'a' }; static const symbol s_8_2[3] = { 'i', 'd', 'a' }; static const symbol s_8_3[3] = { 'a', 'r', 'a' }; static const symbol s_8_4[4] = { 'i', 'e', 'r', 'a' }; static const symbol s_8_5[2] = { 0xED, 'a' }; static const symbol s_8_6[4] = { 'a', 'r', 0xED, 'a' }; static const symbol s_8_7[4] = { 'e', 'r', 0xED, 'a' }; static const symbol s_8_8[4] = { 'i', 'r', 0xED, 'a' }; static const symbol s_8_9[2] = { 'a', 'd' }; static const symbol s_8_10[2] = { 'e', 'd' }; static const symbol s_8_11[2] = { 'i', 'd' }; static const symbol s_8_12[3] = { 'a', 's', 'e' }; static const symbol s_8_13[4] = { 'i', 'e', 's', 'e' }; static const symbol s_8_14[4] = { 'a', 's', 't', 'e' }; static const symbol s_8_15[4] = { 'i', 's', 't', 'e' }; static const symbol s_8_16[2] = { 'a', 'n' }; static const symbol s_8_17[4] = { 'a', 'b', 'a', 'n' }; static const symbol s_8_18[4] = { 'a', 'r', 'a', 'n' }; static const symbol s_8_19[5] = { 'i', 'e', 'r', 'a', 'n' }; static const symbol s_8_20[3] = { 0xED, 'a', 'n' }; static const symbol s_8_21[5] = { 'a', 'r', 0xED, 'a', 'n' }; static const symbol s_8_22[5] = { 'e', 'r', 0xED, 'a', 'n' }; static const symbol s_8_23[5] = { 'i', 'r', 0xED, 'a', 'n' }; static const symbol s_8_24[2] = { 'e', 'n' }; static const symbol s_8_25[4] = { 'a', 's', 'e', 'n' }; static const symbol s_8_26[5] = { 'i', 'e', 's', 'e', 'n' }; static const symbol s_8_27[4] = { 'a', 'r', 'o', 'n' }; static const symbol s_8_28[5] = { 'i', 'e', 'r', 'o', 'n' }; static const symbol s_8_29[4] = { 'a', 'r', 0xE1, 'n' }; static const symbol s_8_30[4] = { 'e', 'r', 0xE1, 'n' }; static const symbol s_8_31[4] = { 'i', 'r', 0xE1, 'n' }; static const symbol s_8_32[3] = { 'a', 'd', 'o' }; static const symbol s_8_33[3] = { 'i', 'd', 'o' }; static const symbol s_8_34[4] = { 'a', 'n', 'd', 'o' }; static const symbol s_8_35[5] = { 'i', 'e', 'n', 'd', 'o' }; static const symbol s_8_36[2] = { 'a', 'r' }; static const symbol s_8_37[2] = { 'e', 'r' }; static const symbol s_8_38[2] = { 'i', 'r' }; static const symbol s_8_39[2] = { 'a', 's' }; static const symbol s_8_40[4] = { 'a', 'b', 'a', 's' }; static const symbol s_8_41[4] = { 'a', 'd', 'a', 's' }; static const symbol s_8_42[4] = { 'i', 'd', 'a', 's' }; static const symbol s_8_43[4] = { 'a', 'r', 'a', 's' }; static const symbol s_8_44[5] = { 'i', 'e', 'r', 'a', 's' }; static const symbol s_8_45[3] = { 0xED, 'a', 's' }; static const symbol s_8_46[5] = { 'a', 'r', 0xED, 'a', 's' }; static const symbol s_8_47[5] = { 'e', 'r', 0xED, 'a', 's' }; static const symbol s_8_48[5] = { 'i', 'r', 0xED, 'a', 's' }; static const symbol s_8_49[2] = { 'e', 's' }; static const symbol s_8_50[4] = { 'a', 's', 'e', 's' }; static const symbol s_8_51[5] = { 'i', 'e', 's', 'e', 's' }; static const symbol s_8_52[5] = { 'a', 'b', 'a', 'i', 's' }; static const symbol s_8_53[5] = { 'a', 'r', 'a', 'i', 's' }; static const symbol s_8_54[6] = { 'i', 'e', 'r', 'a', 'i', 's' }; static const symbol s_8_55[4] = { 0xED, 'a', 'i', 's' }; static const symbol s_8_56[6] = { 'a', 'r', 0xED, 'a', 'i', 's' }; static const symbol s_8_57[6] = { 'e', 'r', 0xED, 'a', 'i', 's' }; static const symbol s_8_58[6] = { 'i', 'r', 0xED, 'a', 'i', 's' }; static const symbol s_8_59[5] = { 'a', 's', 'e', 'i', 's' }; static const symbol s_8_60[6] = { 'i', 'e', 's', 'e', 'i', 's' }; static const symbol s_8_61[6] = { 'a', 's', 't', 'e', 'i', 's' }; static const symbol s_8_62[6] = { 'i', 's', 't', 'e', 'i', 's' }; static const symbol s_8_63[3] = { 0xE1, 'i', 's' }; static const symbol s_8_64[3] = { 0xE9, 'i', 's' }; static const symbol s_8_65[5] = { 'a', 'r', 0xE9, 'i', 's' }; static const symbol s_8_66[5] = { 'e', 'r', 0xE9, 'i', 's' }; static const symbol s_8_67[5] = { 'i', 'r', 0xE9, 'i', 's' }; static const symbol s_8_68[4] = { 'a', 'd', 'o', 's' }; static const symbol s_8_69[4] = { 'i', 'd', 'o', 's' }; static const symbol s_8_70[4] = { 'a', 'm', 'o', 's' }; static const symbol s_8_71[6] = { 0xE1, 'b', 'a', 'm', 'o', 's' }; static const symbol s_8_72[6] = { 0xE1, 'r', 'a', 'm', 'o', 's' }; static const symbol s_8_73[7] = { 'i', 0xE9, 'r', 'a', 'm', 'o', 's' }; static const symbol s_8_74[5] = { 0xED, 'a', 'm', 'o', 's' }; static const symbol s_8_75[7] = { 'a', 'r', 0xED, 'a', 'm', 'o', 's' }; static const symbol s_8_76[7] = { 'e', 'r', 0xED, 'a', 'm', 'o', 's' }; static const symbol s_8_77[7] = { 'i', 'r', 0xED, 'a', 'm', 'o', 's' }; static const symbol s_8_78[4] = { 'e', 'm', 'o', 's' }; static const symbol s_8_79[6] = { 'a', 'r', 'e', 'm', 'o', 's' }; static const symbol s_8_80[6] = { 'e', 'r', 'e', 'm', 'o', 's' }; static const symbol s_8_81[6] = { 'i', 'r', 'e', 'm', 'o', 's' }; static const symbol s_8_82[6] = { 0xE1, 's', 'e', 'm', 'o', 's' }; static const symbol s_8_83[7] = { 'i', 0xE9, 's', 'e', 'm', 'o', 's' }; static const symbol s_8_84[4] = { 'i', 'm', 'o', 's' }; static const symbol s_8_85[4] = { 'a', 'r', 0xE1, 's' }; static const symbol s_8_86[4] = { 'e', 'r', 0xE1, 's' }; static const symbol s_8_87[4] = { 'i', 'r', 0xE1, 's' }; static const symbol s_8_88[2] = { 0xED, 's' }; static const symbol s_8_89[3] = { 'a', 'r', 0xE1 }; static const symbol s_8_90[3] = { 'e', 'r', 0xE1 }; static const symbol s_8_91[3] = { 'i', 'r', 0xE1 }; static const symbol s_8_92[3] = { 'a', 'r', 0xE9 }; static const symbol s_8_93[3] = { 'e', 'r', 0xE9 }; static const symbol s_8_94[3] = { 'i', 'r', 0xE9 }; static const symbol s_8_95[2] = { 'i', 0xF3 }; static const struct among a_8[96] = { /* 0 */ { 3, s_8_0, -1, 2, 0}, /* 1 */ { 3, s_8_1, -1, 2, 0}, /* 2 */ { 3, s_8_2, -1, 2, 0}, /* 3 */ { 3, s_8_3, -1, 2, 0}, /* 4 */ { 4, s_8_4, -1, 2, 0}, /* 5 */ { 2, s_8_5, -1, 2, 0}, /* 6 */ { 4, s_8_6, 5, 2, 0}, /* 7 */ { 4, s_8_7, 5, 2, 0}, /* 8 */ { 4, s_8_8, 5, 2, 0}, /* 9 */ { 2, s_8_9, -1, 2, 0}, /* 10 */ { 2, s_8_10, -1, 2, 0}, /* 11 */ { 2, s_8_11, -1, 2, 0}, /* 12 */ { 3, s_8_12, -1, 2, 0}, /* 13 */ { 4, s_8_13, -1, 2, 0}, /* 14 */ { 4, s_8_14, -1, 2, 0}, /* 15 */ { 4, s_8_15, -1, 2, 0}, /* 16 */ { 2, s_8_16, -1, 2, 0}, /* 17 */ { 4, s_8_17, 16, 2, 0}, /* 18 */ { 4, s_8_18, 16, 2, 0}, /* 19 */ { 5, s_8_19, 16, 2, 0}, /* 20 */ { 3, s_8_20, 16, 2, 0}, /* 21 */ { 5, s_8_21, 20, 2, 0}, /* 22 */ { 5, s_8_22, 20, 2, 0}, /* 23 */ { 5, s_8_23, 20, 2, 0}, /* 24 */ { 2, s_8_24, -1, 1, 0}, /* 25 */ { 4, s_8_25, 24, 2, 0}, /* 26 */ { 5, s_8_26, 24, 2, 0}, /* 27 */ { 4, s_8_27, -1, 2, 0}, /* 28 */ { 5, s_8_28, -1, 2, 0}, /* 29 */ { 4, s_8_29, -1, 2, 0}, /* 30 */ { 4, s_8_30, -1, 2, 0}, /* 31 */ { 4, s_8_31, -1, 2, 0}, /* 32 */ { 3, s_8_32, -1, 2, 0}, /* 33 */ { 3, s_8_33, -1, 2, 0}, /* 34 */ { 4, s_8_34, -1, 2, 0}, /* 35 */ { 5, s_8_35, -1, 2, 0}, /* 36 */ { 2, s_8_36, -1, 2, 0}, /* 37 */ { 2, s_8_37, -1, 2, 0}, /* 38 */ { 2, s_8_38, -1, 2, 0}, /* 39 */ { 2, s_8_39, -1, 2, 0}, /* 40 */ { 4, s_8_40, 39, 2, 0}, /* 41 */ { 4, s_8_41, 39, 2, 0}, /* 42 */ { 4, s_8_42, 39, 2, 0}, /* 43 */ { 4, s_8_43, 39, 2, 0}, /* 44 */ { 5, s_8_44, 39, 2, 0}, /* 45 */ { 3, s_8_45, 39, 2, 0}, /* 46 */ { 5, s_8_46, 45, 2, 0}, /* 47 */ { 5, s_8_47, 45, 2, 0}, /* 48 */ { 5, s_8_48, 45, 2, 0}, /* 49 */ { 2, s_8_49, -1, 1, 0}, /* 50 */ { 4, s_8_50, 49, 2, 0}, /* 51 */ { 5, s_8_51, 49, 2, 0}, /* 52 */ { 5, s_8_52, -1, 2, 0}, /* 53 */ { 5, s_8_53, -1, 2, 0}, /* 54 */ { 6, s_8_54, -1, 2, 0}, /* 55 */ { 4, s_8_55, -1, 2, 0}, /* 56 */ { 6, s_8_56, 55, 2, 0}, /* 57 */ { 6, s_8_57, 55, 2, 0}, /* 58 */ { 6, s_8_58, 55, 2, 0}, /* 59 */ { 5, s_8_59, -1, 2, 0}, /* 60 */ { 6, s_8_60, -1, 2, 0}, /* 61 */ { 6, s_8_61, -1, 2, 0}, /* 62 */ { 6, s_8_62, -1, 2, 0}, /* 63 */ { 3, s_8_63, -1, 2, 0}, /* 64 */ { 3, s_8_64, -1, 1, 0}, /* 65 */ { 5, s_8_65, 64, 2, 0}, /* 66 */ { 5, s_8_66, 64, 2, 0}, /* 67 */ { 5, s_8_67, 64, 2, 0}, /* 68 */ { 4, s_8_68, -1, 2, 0}, /* 69 */ { 4, s_8_69, -1, 2, 0}, /* 70 */ { 4, s_8_70, -1, 2, 0}, /* 71 */ { 6, s_8_71, 70, 2, 0}, /* 72 */ { 6, s_8_72, 70, 2, 0}, /* 73 */ { 7, s_8_73, 70, 2, 0}, /* 74 */ { 5, s_8_74, 70, 2, 0}, /* 75 */ { 7, s_8_75, 74, 2, 0}, /* 76 */ { 7, s_8_76, 74, 2, 0}, /* 77 */ { 7, s_8_77, 74, 2, 0}, /* 78 */ { 4, s_8_78, -1, 1, 0}, /* 79 */ { 6, s_8_79, 78, 2, 0}, /* 80 */ { 6, s_8_80, 78, 2, 0}, /* 81 */ { 6, s_8_81, 78, 2, 0}, /* 82 */ { 6, s_8_82, 78, 2, 0}, /* 83 */ { 7, s_8_83, 78, 2, 0}, /* 84 */ { 4, s_8_84, -1, 2, 0}, /* 85 */ { 4, s_8_85, -1, 2, 0}, /* 86 */ { 4, s_8_86, -1, 2, 0}, /* 87 */ { 4, s_8_87, -1, 2, 0}, /* 88 */ { 2, s_8_88, -1, 2, 0}, /* 89 */ { 3, s_8_89, -1, 2, 0}, /* 90 */ { 3, s_8_90, -1, 2, 0}, /* 91 */ { 3, s_8_91, -1, 2, 0}, /* 92 */ { 3, s_8_92, -1, 2, 0}, /* 93 */ { 3, s_8_93, -1, 2, 0}, /* 94 */ { 3, s_8_94, -1, 2, 0}, /* 95 */ { 2, s_8_95, -1, 2, 0} }; static const symbol s_9_0[1] = { 'a' }; static const symbol s_9_1[1] = { 'e' }; static const symbol s_9_2[1] = { 'o' }; static const symbol s_9_3[2] = { 'o', 's' }; static const symbol s_9_4[1] = { 0xE1 }; static const symbol s_9_5[1] = { 0xE9 }; static const symbol s_9_6[1] = { 0xED }; static const symbol s_9_7[1] = { 0xF3 }; static const struct among a_9[8] = { /* 0 */ { 1, s_9_0, -1, 1, 0}, /* 1 */ { 1, s_9_1, -1, 2, 0}, /* 2 */ { 1, s_9_2, -1, 1, 0}, /* 3 */ { 2, s_9_3, -1, 1, 0}, /* 4 */ { 1, s_9_4, -1, 1, 0}, /* 5 */ { 1, s_9_5, -1, 2, 0}, /* 6 */ { 1, s_9_6, -1, 1, 0}, /* 7 */ { 1, s_9_7, -1, 1, 0} }; static const unsigned char g_v[] = { 17, 65, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 17, 4, 10 }; static const symbol s_0[] = { 'a' }; static const symbol s_1[] = { 'e' }; static const symbol s_2[] = { 'i' }; static const symbol s_3[] = { 'o' }; static const symbol s_4[] = { 'u' }; static const symbol s_5[] = { 'i', 'e', 'n', 'd', 'o' }; static const symbol s_6[] = { 'a', 'n', 'd', 'o' }; static const symbol s_7[] = { 'a', 'r' }; static const symbol s_8[] = { 'e', 'r' }; static const symbol s_9[] = { 'i', 'r' }; static const symbol s_10[] = { 'u' }; static const symbol s_11[] = { 'i', 'c' }; static const symbol s_12[] = { 'l', 'o', 'g' }; static const symbol s_13[] = { 'u' }; static const symbol s_14[] = { 'e', 'n', 't', 'e' }; static const symbol s_15[] = { 'a', 't' }; static const symbol s_16[] = { 'a', 't' }; static const symbol s_17[] = { 'u' }; static const symbol s_18[] = { 'u' }; static const symbol s_19[] = { 'g' }; static const symbol s_20[] = { 'u' }; static const symbol s_21[] = { 'g' }; static int r_mark_regions(struct SN_env * z) { z->I[0] = z->l; z->I[1] = z->l; z->I[2] = z->l; { int c1 = z->c; /* do, line 37 */ { int c2 = z->c; /* or, line 39 */ if (in_grouping(z, g_v, 97, 252, 0)) goto lab2; { int c3 = z->c; /* or, line 38 */ if (out_grouping(z, g_v, 97, 252, 0)) goto lab4; { /* gopast */ /* grouping v, line 38 */ int ret = out_grouping(z, g_v, 97, 252, 1); if (ret < 0) goto lab4; z->c += ret; } goto lab3; lab4: z->c = c3; if (in_grouping(z, g_v, 97, 252, 0)) goto lab2; { /* gopast */ /* non v, line 38 */ int ret = in_grouping(z, g_v, 97, 252, 1); if (ret < 0) goto lab2; z->c += ret; } } lab3: goto lab1; lab2: z->c = c2; if (out_grouping(z, g_v, 97, 252, 0)) goto lab0; { int c4 = z->c; /* or, line 40 */ if (out_grouping(z, g_v, 97, 252, 0)) goto lab6; { /* gopast */ /* grouping v, line 40 */ int ret = out_grouping(z, g_v, 97, 252, 1); if (ret < 0) goto lab6; z->c += ret; } goto lab5; lab6: z->c = c4; if (in_grouping(z, g_v, 97, 252, 0)) goto lab0; if (z->c >= z->l) goto lab0; z->c++; /* next, line 40 */ } lab5: ; } lab1: z->I[0] = z->c; /* setmark pV, line 41 */ lab0: z->c = c1; } { int c5 = z->c; /* do, line 43 */ { /* gopast */ /* grouping v, line 44 */ int ret = out_grouping(z, g_v, 97, 252, 1); if (ret < 0) goto lab7; z->c += ret; } { /* gopast */ /* non v, line 44 */ int ret = in_grouping(z, g_v, 97, 252, 1); if (ret < 0) goto lab7; z->c += ret; } z->I[1] = z->c; /* setmark p1, line 44 */ { /* gopast */ /* grouping v, line 45 */ int ret = out_grouping(z, g_v, 97, 252, 1); if (ret < 0) goto lab7; z->c += ret; } { /* gopast */ /* non v, line 45 */ int ret = in_grouping(z, g_v, 97, 252, 1); if (ret < 0) goto lab7; z->c += ret; } z->I[2] = z->c; /* setmark p2, line 45 */ lab7: z->c = c5; } return 1; } static int r_postlude(struct SN_env * z) { int among_var; while(1) { /* repeat, line 49 */ int c1 = z->c; z->bra = z->c; /* [, line 50 */ if (z->c >= z->l || z->p[z->c + 0] >> 5 != 7 || !((67641858 >> (z->p[z->c + 0] & 0x1f)) & 1)) among_var = 6; else among_var = find_among(z, a_0, 6); /* substring, line 50 */ if (!(among_var)) goto lab0; z->ket = z->c; /* ], line 50 */ switch(among_var) { case 0: goto lab0; case 1: { int ret = slice_from_s(z, 1, s_0); /* <-, line 51 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 1, s_1); /* <-, line 52 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_from_s(z, 1, s_2); /* <-, line 53 */ if (ret < 0) return ret; } break; case 4: { int ret = slice_from_s(z, 1, s_3); /* <-, line 54 */ if (ret < 0) return ret; } break; case 5: { int ret = slice_from_s(z, 1, s_4); /* <-, line 55 */ if (ret < 0) return ret; } break; case 6: if (z->c >= z->l) goto lab0; z->c++; /* next, line 57 */ break; } continue; lab0: z->c = c1; break; } return 1; } static int r_RV(struct SN_env * z) { if (!(z->I[0] <= z->c)) return 0; return 1; } static int r_R1(struct SN_env * z) { if (!(z->I[1] <= z->c)) return 0; return 1; } static int r_R2(struct SN_env * z) { if (!(z->I[2] <= z->c)) return 0; return 1; } static int r_attached_pronoun(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 68 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((557090 >> (z->p[z->c - 1] & 0x1f)) & 1)) return 0; if (!(find_among_b(z, a_1, 13))) return 0; /* substring, line 68 */ z->bra = z->c; /* ], line 68 */ if (z->c - 1 <= z->lb || (z->p[z->c - 1] != 111 && z->p[z->c - 1] != 114)) return 0; among_var = find_among_b(z, a_2, 11); /* substring, line 72 */ if (!(among_var)) return 0; { int ret = r_RV(z); if (ret == 0) return 0; /* call RV, line 72 */ if (ret < 0) return ret; } switch(among_var) { case 0: return 0; case 1: z->bra = z->c; /* ], line 73 */ { int ret = slice_from_s(z, 5, s_5); /* <-, line 73 */ if (ret < 0) return ret; } break; case 2: z->bra = z->c; /* ], line 74 */ { int ret = slice_from_s(z, 4, s_6); /* <-, line 74 */ if (ret < 0) return ret; } break; case 3: z->bra = z->c; /* ], line 75 */ { int ret = slice_from_s(z, 2, s_7); /* <-, line 75 */ if (ret < 0) return ret; } break; case 4: z->bra = z->c; /* ], line 76 */ { int ret = slice_from_s(z, 2, s_8); /* <-, line 76 */ if (ret < 0) return ret; } break; case 5: z->bra = z->c; /* ], line 77 */ { int ret = slice_from_s(z, 2, s_9); /* <-, line 77 */ if (ret < 0) return ret; } break; case 6: { int ret = slice_del(z); /* delete, line 81 */ if (ret < 0) return ret; } break; case 7: if (!(eq_s_b(z, 1, s_10))) return 0; { int ret = slice_del(z); /* delete, line 82 */ if (ret < 0) return ret; } break; } return 1; } static int r_standard_suffix(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 87 */ if (z->c - 2 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((835634 >> (z->p[z->c - 1] & 0x1f)) & 1)) return 0; among_var = find_among_b(z, a_6, 46); /* substring, line 87 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 87 */ switch(among_var) { case 0: return 0; case 1: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 99 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 99 */ if (ret < 0) return ret; } break; case 2: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 105 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 105 */ if (ret < 0) return ret; } { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 106 */ z->ket = z->c; /* [, line 106 */ if (!(eq_s_b(z, 2, s_11))) { z->c = z->l - m_keep; goto lab0; } z->bra = z->c; /* ], line 106 */ { int ret = r_R2(z); if (ret == 0) { z->c = z->l - m_keep; goto lab0; } /* call R2, line 106 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 106 */ if (ret < 0) return ret; } lab0: ; } break; case 3: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 111 */ if (ret < 0) return ret; } { int ret = slice_from_s(z, 3, s_12); /* <-, line 111 */ if (ret < 0) return ret; } break; case 4: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 115 */ if (ret < 0) return ret; } { int ret = slice_from_s(z, 1, s_13); /* <-, line 115 */ if (ret < 0) return ret; } break; case 5: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 119 */ if (ret < 0) return ret; } { int ret = slice_from_s(z, 4, s_14); /* <-, line 119 */ if (ret < 0) return ret; } break; case 6: { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 123 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 123 */ if (ret < 0) return ret; } { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 124 */ z->ket = z->c; /* [, line 125 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((4718616 >> (z->p[z->c - 1] & 0x1f)) & 1)) { z->c = z->l - m_keep; goto lab1; } among_var = find_among_b(z, a_3, 4); /* substring, line 125 */ if (!(among_var)) { z->c = z->l - m_keep; goto lab1; } z->bra = z->c; /* ], line 125 */ { int ret = r_R2(z); if (ret == 0) { z->c = z->l - m_keep; goto lab1; } /* call R2, line 125 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 125 */ if (ret < 0) return ret; } switch(among_var) { case 0: { z->c = z->l - m_keep; goto lab1; } case 1: z->ket = z->c; /* [, line 126 */ if (!(eq_s_b(z, 2, s_15))) { z->c = z->l - m_keep; goto lab1; } z->bra = z->c; /* ], line 126 */ { int ret = r_R2(z); if (ret == 0) { z->c = z->l - m_keep; goto lab1; } /* call R2, line 126 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 126 */ if (ret < 0) return ret; } break; } lab1: ; } break; case 7: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 135 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 135 */ if (ret < 0) return ret; } { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 136 */ z->ket = z->c; /* [, line 137 */ if (z->c - 3 <= z->lb || z->p[z->c - 1] != 101) { z->c = z->l - m_keep; goto lab2; } among_var = find_among_b(z, a_4, 3); /* substring, line 137 */ if (!(among_var)) { z->c = z->l - m_keep; goto lab2; } z->bra = z->c; /* ], line 137 */ switch(among_var) { case 0: { z->c = z->l - m_keep; goto lab2; } case 1: { int ret = r_R2(z); if (ret == 0) { z->c = z->l - m_keep; goto lab2; } /* call R2, line 140 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 140 */ if (ret < 0) return ret; } break; } lab2: ; } break; case 8: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 147 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 147 */ if (ret < 0) return ret; } { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 148 */ z->ket = z->c; /* [, line 149 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((4198408 >> (z->p[z->c - 1] & 0x1f)) & 1)) { z->c = z->l - m_keep; goto lab3; } among_var = find_among_b(z, a_5, 3); /* substring, line 149 */ if (!(among_var)) { z->c = z->l - m_keep; goto lab3; } z->bra = z->c; /* ], line 149 */ switch(among_var) { case 0: { z->c = z->l - m_keep; goto lab3; } case 1: { int ret = r_R2(z); if (ret == 0) { z->c = z->l - m_keep; goto lab3; } /* call R2, line 152 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 152 */ if (ret < 0) return ret; } break; } lab3: ; } break; case 9: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 159 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 159 */ if (ret < 0) return ret; } { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 160 */ z->ket = z->c; /* [, line 161 */ if (!(eq_s_b(z, 2, s_16))) { z->c = z->l - m_keep; goto lab4; } z->bra = z->c; /* ], line 161 */ { int ret = r_R2(z); if (ret == 0) { z->c = z->l - m_keep; goto lab4; } /* call R2, line 161 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 161 */ if (ret < 0) return ret; } lab4: ; } break; } return 1; } static int r_y_verb_suffix(struct SN_env * z) { int among_var; { int mlimit; /* setlimit, line 168 */ int m1 = z->l - z->c; (void)m1; if (z->c < z->I[0]) return 0; z->c = z->I[0]; /* tomark, line 168 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m1; z->ket = z->c; /* [, line 168 */ among_var = find_among_b(z, a_7, 12); /* substring, line 168 */ if (!(among_var)) { z->lb = mlimit; return 0; } z->bra = z->c; /* ], line 168 */ z->lb = mlimit; } switch(among_var) { case 0: return 0; case 1: if (!(eq_s_b(z, 1, s_17))) return 0; { int ret = slice_del(z); /* delete, line 171 */ if (ret < 0) return ret; } break; } return 1; } static int r_verb_suffix(struct SN_env * z) { int among_var; { int mlimit; /* setlimit, line 176 */ int m1 = z->l - z->c; (void)m1; if (z->c < z->I[0]) return 0; z->c = z->I[0]; /* tomark, line 176 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m1; z->ket = z->c; /* [, line 176 */ among_var = find_among_b(z, a_8, 96); /* substring, line 176 */ if (!(among_var)) { z->lb = mlimit; return 0; } z->bra = z->c; /* ], line 176 */ z->lb = mlimit; } switch(among_var) { case 0: return 0; case 1: { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 179 */ if (!(eq_s_b(z, 1, s_18))) { z->c = z->l - m_keep; goto lab0; } { int m_test = z->l - z->c; /* test, line 179 */ if (!(eq_s_b(z, 1, s_19))) { z->c = z->l - m_keep; goto lab0; } z->c = z->l - m_test; } lab0: ; } z->bra = z->c; /* ], line 179 */ { int ret = slice_del(z); /* delete, line 179 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_del(z); /* delete, line 200 */ if (ret < 0) return ret; } break; } return 1; } static int r_residual_suffix(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 205 */ among_var = find_among_b(z, a_9, 8); /* substring, line 205 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 205 */ switch(among_var) { case 0: return 0; case 1: { int ret = r_RV(z); if (ret == 0) return 0; /* call RV, line 208 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 208 */ if (ret < 0) return ret; } break; case 2: { int ret = r_RV(z); if (ret == 0) return 0; /* call RV, line 210 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 210 */ if (ret < 0) return ret; } { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 210 */ z->ket = z->c; /* [, line 210 */ if (!(eq_s_b(z, 1, s_20))) { z->c = z->l - m_keep; goto lab0; } z->bra = z->c; /* ], line 210 */ { int m_test = z->l - z->c; /* test, line 210 */ if (!(eq_s_b(z, 1, s_21))) { z->c = z->l - m_keep; goto lab0; } z->c = z->l - m_test; } { int ret = r_RV(z); if (ret == 0) { z->c = z->l - m_keep; goto lab0; } /* call RV, line 210 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 210 */ if (ret < 0) return ret; } lab0: ; } break; } return 1; } extern int spanish_ISO_8859_1_stem(struct SN_env * z) { { int c1 = z->c; /* do, line 216 */ { int ret = r_mark_regions(z); if (ret == 0) goto lab0; /* call mark_regions, line 216 */ if (ret < 0) return ret; } lab0: z->c = c1; } z->lb = z->c; z->c = z->l; /* backwards, line 217 */ { int m2 = z->l - z->c; (void)m2; /* do, line 218 */ { int ret = r_attached_pronoun(z); if (ret == 0) goto lab1; /* call attached_pronoun, line 218 */ if (ret < 0) return ret; } lab1: z->c = z->l - m2; } { int m3 = z->l - z->c; (void)m3; /* do, line 219 */ { int m4 = z->l - z->c; (void)m4; /* or, line 219 */ { int ret = r_standard_suffix(z); if (ret == 0) goto lab4; /* call standard_suffix, line 219 */ if (ret < 0) return ret; } goto lab3; lab4: z->c = z->l - m4; { int ret = r_y_verb_suffix(z); if (ret == 0) goto lab5; /* call y_verb_suffix, line 220 */ if (ret < 0) return ret; } goto lab3; lab5: z->c = z->l - m4; { int ret = r_verb_suffix(z); if (ret == 0) goto lab2; /* call verb_suffix, line 221 */ if (ret < 0) return ret; } } lab3: lab2: z->c = z->l - m3; } { int m5 = z->l - z->c; (void)m5; /* do, line 223 */ { int ret = r_residual_suffix(z); if (ret == 0) goto lab6; /* call residual_suffix, line 223 */ if (ret < 0) return ret; } lab6: z->c = z->l - m5; } z->c = z->lb; { int c6 = z->c; /* do, line 225 */ { int ret = r_postlude(z); if (ret == 0) goto lab7; /* call postlude, line 225 */ if (ret < 0) return ret; } lab7: z->c = c6; } return 1; } extern struct SN_env * spanish_ISO_8859_1_create_env(void) { return SN_create_env(0, 3, 0); } extern void spanish_ISO_8859_1_close_env(struct SN_env * z) { SN_close_env(z, 0); } swish-e-2.4.7/src/snowball/stem_hu.c0000664000077100017500000011757411166010110014252 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #include "header.h" #ifdef __cplusplus extern "C" { #endif extern int hungarian_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif static int r_double(struct SN_env * z); static int r_undouble(struct SN_env * z); static int r_factive(struct SN_env * z); static int r_instrum(struct SN_env * z); static int r_plur_owner(struct SN_env * z); static int r_sing_owner(struct SN_env * z); static int r_owned(struct SN_env * z); static int r_plural(struct SN_env * z); static int r_case_other(struct SN_env * z); static int r_case_special(struct SN_env * z); static int r_case(struct SN_env * z); static int r_v_ending(struct SN_env * z); static int r_R1(struct SN_env * z); static int r_mark_regions(struct SN_env * z); #ifdef __cplusplus extern "C" { #endif extern struct SN_env * hungarian_ISO_8859_1_create_env(void); extern void hungarian_ISO_8859_1_close_env(struct SN_env * z); #ifdef __cplusplus } #endif static const symbol s_0_0[2] = { 'c', 's' }; static const symbol s_0_1[3] = { 'd', 'z', 's' }; static const symbol s_0_2[2] = { 'g', 'y' }; static const symbol s_0_3[2] = { 'l', 'y' }; static const symbol s_0_4[2] = { 'n', 'y' }; static const symbol s_0_5[2] = { 's', 'z' }; static const symbol s_0_6[2] = { 't', 'y' }; static const symbol s_0_7[2] = { 'z', 's' }; static const struct among a_0[8] = { /* 0 */ { 2, s_0_0, -1, -1, 0}, /* 1 */ { 3, s_0_1, -1, -1, 0}, /* 2 */ { 2, s_0_2, -1, -1, 0}, /* 3 */ { 2, s_0_3, -1, -1, 0}, /* 4 */ { 2, s_0_4, -1, -1, 0}, /* 5 */ { 2, s_0_5, -1, -1, 0}, /* 6 */ { 2, s_0_6, -1, -1, 0}, /* 7 */ { 2, s_0_7, -1, -1, 0} }; static const symbol s_1_0[1] = { 0xE1 }; static const symbol s_1_1[1] = { 0xE9 }; static const struct among a_1[2] = { /* 0 */ { 1, s_1_0, -1, 1, 0}, /* 1 */ { 1, s_1_1, -1, 2, 0} }; static const symbol s_2_0[2] = { 'b', 'b' }; static const symbol s_2_1[2] = { 'c', 'c' }; static const symbol s_2_2[2] = { 'd', 'd' }; static const symbol s_2_3[2] = { 'f', 'f' }; static const symbol s_2_4[2] = { 'g', 'g' }; static const symbol s_2_5[2] = { 'j', 'j' }; static const symbol s_2_6[2] = { 'k', 'k' }; static const symbol s_2_7[2] = { 'l', 'l' }; static const symbol s_2_8[2] = { 'm', 'm' }; static const symbol s_2_9[2] = { 'n', 'n' }; static const symbol s_2_10[2] = { 'p', 'p' }; static const symbol s_2_11[2] = { 'r', 'r' }; static const symbol s_2_12[3] = { 'c', 'c', 's' }; static const symbol s_2_13[2] = { 's', 's' }; static const symbol s_2_14[3] = { 'z', 'z', 's' }; static const symbol s_2_15[2] = { 't', 't' }; static const symbol s_2_16[2] = { 'v', 'v' }; static const symbol s_2_17[3] = { 'g', 'g', 'y' }; static const symbol s_2_18[3] = { 'l', 'l', 'y' }; static const symbol s_2_19[3] = { 'n', 'n', 'y' }; static const symbol s_2_20[3] = { 't', 't', 'y' }; static const symbol s_2_21[3] = { 's', 's', 'z' }; static const symbol s_2_22[2] = { 'z', 'z' }; static const struct among a_2[23] = { /* 0 */ { 2, s_2_0, -1, -1, 0}, /* 1 */ { 2, s_2_1, -1, -1, 0}, /* 2 */ { 2, s_2_2, -1, -1, 0}, /* 3 */ { 2, s_2_3, -1, -1, 0}, /* 4 */ { 2, s_2_4, -1, -1, 0}, /* 5 */ { 2, s_2_5, -1, -1, 0}, /* 6 */ { 2, s_2_6, -1, -1, 0}, /* 7 */ { 2, s_2_7, -1, -1, 0}, /* 8 */ { 2, s_2_8, -1, -1, 0}, /* 9 */ { 2, s_2_9, -1, -1, 0}, /* 10 */ { 2, s_2_10, -1, -1, 0}, /* 11 */ { 2, s_2_11, -1, -1, 0}, /* 12 */ { 3, s_2_12, -1, -1, 0}, /* 13 */ { 2, s_2_13, -1, -1, 0}, /* 14 */ { 3, s_2_14, -1, -1, 0}, /* 15 */ { 2, s_2_15, -1, -1, 0}, /* 16 */ { 2, s_2_16, -1, -1, 0}, /* 17 */ { 3, s_2_17, -1, -1, 0}, /* 18 */ { 3, s_2_18, -1, -1, 0}, /* 19 */ { 3, s_2_19, -1, -1, 0}, /* 20 */ { 3, s_2_20, -1, -1, 0}, /* 21 */ { 3, s_2_21, -1, -1, 0}, /* 22 */ { 2, s_2_22, -1, -1, 0} }; static const symbol s_3_0[2] = { 'a', 'l' }; static const symbol s_3_1[2] = { 'e', 'l' }; static const struct among a_3[2] = { /* 0 */ { 2, s_3_0, -1, 1, 0}, /* 1 */ { 2, s_3_1, -1, 2, 0} }; static const symbol s_4_0[2] = { 'b', 'a' }; static const symbol s_4_1[2] = { 'r', 'a' }; static const symbol s_4_2[2] = { 'b', 'e' }; static const symbol s_4_3[2] = { 'r', 'e' }; static const symbol s_4_4[2] = { 'i', 'g' }; static const symbol s_4_5[3] = { 'n', 'a', 'k' }; static const symbol s_4_6[3] = { 'n', 'e', 'k' }; static const symbol s_4_7[3] = { 'v', 'a', 'l' }; static const symbol s_4_8[3] = { 'v', 'e', 'l' }; static const symbol s_4_9[2] = { 'u', 'l' }; static const symbol s_4_10[3] = { 'n', 0xE1, 'l' }; static const symbol s_4_11[3] = { 'n', 0xE9, 'l' }; static const symbol s_4_12[3] = { 'b', 0xF3, 'l' }; static const symbol s_4_13[3] = { 'r', 0xF3, 'l' }; static const symbol s_4_14[3] = { 't', 0xF3, 'l' }; static const symbol s_4_15[3] = { 'b', 0xF5, 'l' }; static const symbol s_4_16[3] = { 'r', 0xF5, 'l' }; static const symbol s_4_17[3] = { 't', 0xF5, 'l' }; static const symbol s_4_18[2] = { 0xFC, 'l' }; static const symbol s_4_19[1] = { 'n' }; static const symbol s_4_20[2] = { 'a', 'n' }; static const symbol s_4_21[3] = { 'b', 'a', 'n' }; static const symbol s_4_22[2] = { 'e', 'n' }; static const symbol s_4_23[3] = { 'b', 'e', 'n' }; static const symbol s_4_24[6] = { 'k', 0xE9, 'p', 'p', 'e', 'n' }; static const symbol s_4_25[2] = { 'o', 'n' }; static const symbol s_4_26[2] = { 0xF6, 'n' }; static const symbol s_4_27[4] = { 'k', 0xE9, 'p', 'p' }; static const symbol s_4_28[3] = { 'k', 'o', 'r' }; static const symbol s_4_29[1] = { 't' }; static const symbol s_4_30[2] = { 'a', 't' }; static const symbol s_4_31[2] = { 'e', 't' }; static const symbol s_4_32[4] = { 'k', 0xE9, 'n', 't' }; static const symbol s_4_33[6] = { 'a', 'n', 'k', 0xE9, 'n', 't' }; static const symbol s_4_34[6] = { 'e', 'n', 'k', 0xE9, 'n', 't' }; static const symbol s_4_35[6] = { 'o', 'n', 'k', 0xE9, 'n', 't' }; static const symbol s_4_36[2] = { 'o', 't' }; static const symbol s_4_37[3] = { 0xE9, 'r', 't' }; static const symbol s_4_38[2] = { 0xF6, 't' }; static const symbol s_4_39[3] = { 'h', 'e', 'z' }; static const symbol s_4_40[3] = { 'h', 'o', 'z' }; static const symbol s_4_41[3] = { 'h', 0xF6, 'z' }; static const symbol s_4_42[2] = { 'v', 0xE1 }; static const symbol s_4_43[2] = { 'v', 0xE9 }; static const struct among a_4[44] = { /* 0 */ { 2, s_4_0, -1, -1, 0}, /* 1 */ { 2, s_4_1, -1, -1, 0}, /* 2 */ { 2, s_4_2, -1, -1, 0}, /* 3 */ { 2, s_4_3, -1, -1, 0}, /* 4 */ { 2, s_4_4, -1, -1, 0}, /* 5 */ { 3, s_4_5, -1, -1, 0}, /* 6 */ { 3, s_4_6, -1, -1, 0}, /* 7 */ { 3, s_4_7, -1, -1, 0}, /* 8 */ { 3, s_4_8, -1, -1, 0}, /* 9 */ { 2, s_4_9, -1, -1, 0}, /* 10 */ { 3, s_4_10, -1, -1, 0}, /* 11 */ { 3, s_4_11, -1, -1, 0}, /* 12 */ { 3, s_4_12, -1, -1, 0}, /* 13 */ { 3, s_4_13, -1, -1, 0}, /* 14 */ { 3, s_4_14, -1, -1, 0}, /* 15 */ { 3, s_4_15, -1, -1, 0}, /* 16 */ { 3, s_4_16, -1, -1, 0}, /* 17 */ { 3, s_4_17, -1, -1, 0}, /* 18 */ { 2, s_4_18, -1, -1, 0}, /* 19 */ { 1, s_4_19, -1, -1, 0}, /* 20 */ { 2, s_4_20, 19, -1, 0}, /* 21 */ { 3, s_4_21, 20, -1, 0}, /* 22 */ { 2, s_4_22, 19, -1, 0}, /* 23 */ { 3, s_4_23, 22, -1, 0}, /* 24 */ { 6, s_4_24, 22, -1, 0}, /* 25 */ { 2, s_4_25, 19, -1, 0}, /* 26 */ { 2, s_4_26, 19, -1, 0}, /* 27 */ { 4, s_4_27, -1, -1, 0}, /* 28 */ { 3, s_4_28, -1, -1, 0}, /* 29 */ { 1, s_4_29, -1, -1, 0}, /* 30 */ { 2, s_4_30, 29, -1, 0}, /* 31 */ { 2, s_4_31, 29, -1, 0}, /* 32 */ { 4, s_4_32, 29, -1, 0}, /* 33 */ { 6, s_4_33, 32, -1, 0}, /* 34 */ { 6, s_4_34, 32, -1, 0}, /* 35 */ { 6, s_4_35, 32, -1, 0}, /* 36 */ { 2, s_4_36, 29, -1, 0}, /* 37 */ { 3, s_4_37, 29, -1, 0}, /* 38 */ { 2, s_4_38, 29, -1, 0}, /* 39 */ { 3, s_4_39, -1, -1, 0}, /* 40 */ { 3, s_4_40, -1, -1, 0}, /* 41 */ { 3, s_4_41, -1, -1, 0}, /* 42 */ { 2, s_4_42, -1, -1, 0}, /* 43 */ { 2, s_4_43, -1, -1, 0} }; static const symbol s_5_0[2] = { 0xE1, 'n' }; static const symbol s_5_1[2] = { 0xE9, 'n' }; static const symbol s_5_2[6] = { 0xE1, 'n', 'k', 0xE9, 'n', 't' }; static const struct among a_5[3] = { /* 0 */ { 2, s_5_0, -1, 2, 0}, /* 1 */ { 2, s_5_1, -1, 1, 0}, /* 2 */ { 6, s_5_2, -1, 3, 0} }; static const symbol s_6_0[4] = { 's', 't', 'u', 'l' }; static const symbol s_6_1[5] = { 'a', 's', 't', 'u', 'l' }; static const symbol s_6_2[5] = { 0xE1, 's', 't', 'u', 'l' }; static const symbol s_6_3[4] = { 's', 't', 0xFC, 'l' }; static const symbol s_6_4[5] = { 'e', 's', 't', 0xFC, 'l' }; static const symbol s_6_5[5] = { 0xE9, 's', 't', 0xFC, 'l' }; static const struct among a_6[6] = { /* 0 */ { 4, s_6_0, -1, 2, 0}, /* 1 */ { 5, s_6_1, 0, 1, 0}, /* 2 */ { 5, s_6_2, 0, 3, 0}, /* 3 */ { 4, s_6_3, -1, 2, 0}, /* 4 */ { 5, s_6_4, 3, 1, 0}, /* 5 */ { 5, s_6_5, 3, 4, 0} }; static const symbol s_7_0[1] = { 0xE1 }; static const symbol s_7_1[1] = { 0xE9 }; static const struct among a_7[2] = { /* 0 */ { 1, s_7_0, -1, 1, 0}, /* 1 */ { 1, s_7_1, -1, 2, 0} }; static const symbol s_8_0[1] = { 'k' }; static const symbol s_8_1[2] = { 'a', 'k' }; static const symbol s_8_2[2] = { 'e', 'k' }; static const symbol s_8_3[2] = { 'o', 'k' }; static const symbol s_8_4[2] = { 0xE1, 'k' }; static const symbol s_8_5[2] = { 0xE9, 'k' }; static const symbol s_8_6[2] = { 0xF6, 'k' }; static const struct among a_8[7] = { /* 0 */ { 1, s_8_0, -1, 7, 0}, /* 1 */ { 2, s_8_1, 0, 4, 0}, /* 2 */ { 2, s_8_2, 0, 6, 0}, /* 3 */ { 2, s_8_3, 0, 5, 0}, /* 4 */ { 2, s_8_4, 0, 1, 0}, /* 5 */ { 2, s_8_5, 0, 2, 0}, /* 6 */ { 2, s_8_6, 0, 3, 0} }; static const symbol s_9_0[2] = { 0xE9, 'i' }; static const symbol s_9_1[3] = { 0xE1, 0xE9, 'i' }; static const symbol s_9_2[3] = { 0xE9, 0xE9, 'i' }; static const symbol s_9_3[1] = { 0xE9 }; static const symbol s_9_4[2] = { 'k', 0xE9 }; static const symbol s_9_5[3] = { 'a', 'k', 0xE9 }; static const symbol s_9_6[3] = { 'e', 'k', 0xE9 }; static const symbol s_9_7[3] = { 'o', 'k', 0xE9 }; static const symbol s_9_8[3] = { 0xE1, 'k', 0xE9 }; static const symbol s_9_9[3] = { 0xE9, 'k', 0xE9 }; static const symbol s_9_10[3] = { 0xF6, 'k', 0xE9 }; static const symbol s_9_11[2] = { 0xE9, 0xE9 }; static const struct among a_9[12] = { /* 0 */ { 2, s_9_0, -1, 7, 0}, /* 1 */ { 3, s_9_1, 0, 6, 0}, /* 2 */ { 3, s_9_2, 0, 5, 0}, /* 3 */ { 1, s_9_3, -1, 9, 0}, /* 4 */ { 2, s_9_4, 3, 4, 0}, /* 5 */ { 3, s_9_5, 4, 1, 0}, /* 6 */ { 3, s_9_6, 4, 1, 0}, /* 7 */ { 3, s_9_7, 4, 1, 0}, /* 8 */ { 3, s_9_8, 4, 3, 0}, /* 9 */ { 3, s_9_9, 4, 2, 0}, /* 10 */ { 3, s_9_10, 4, 1, 0}, /* 11 */ { 2, s_9_11, 3, 8, 0} }; static const symbol s_10_0[1] = { 'a' }; static const symbol s_10_1[2] = { 'j', 'a' }; static const symbol s_10_2[1] = { 'd' }; static const symbol s_10_3[2] = { 'a', 'd' }; static const symbol s_10_4[2] = { 'e', 'd' }; static const symbol s_10_5[2] = { 'o', 'd' }; static const symbol s_10_6[2] = { 0xE1, 'd' }; static const symbol s_10_7[2] = { 0xE9, 'd' }; static const symbol s_10_8[2] = { 0xF6, 'd' }; static const symbol s_10_9[1] = { 'e' }; static const symbol s_10_10[2] = { 'j', 'e' }; static const symbol s_10_11[2] = { 'n', 'k' }; static const symbol s_10_12[3] = { 'u', 'n', 'k' }; static const symbol s_10_13[3] = { 0xE1, 'n', 'k' }; static const symbol s_10_14[3] = { 0xE9, 'n', 'k' }; static const symbol s_10_15[3] = { 0xFC, 'n', 'k' }; static const symbol s_10_16[2] = { 'u', 'k' }; static const symbol s_10_17[3] = { 'j', 'u', 'k' }; static const symbol s_10_18[4] = { 0xE1, 'j', 'u', 'k' }; static const symbol s_10_19[2] = { 0xFC, 'k' }; static const symbol s_10_20[3] = { 'j', 0xFC, 'k' }; static const symbol s_10_21[4] = { 0xE9, 'j', 0xFC, 'k' }; static const symbol s_10_22[1] = { 'm' }; static const symbol s_10_23[2] = { 'a', 'm' }; static const symbol s_10_24[2] = { 'e', 'm' }; static const symbol s_10_25[2] = { 'o', 'm' }; static const symbol s_10_26[2] = { 0xE1, 'm' }; static const symbol s_10_27[2] = { 0xE9, 'm' }; static const symbol s_10_28[1] = { 'o' }; static const symbol s_10_29[1] = { 0xE1 }; static const symbol s_10_30[1] = { 0xE9 }; static const struct among a_10[31] = { /* 0 */ { 1, s_10_0, -1, 18, 0}, /* 1 */ { 2, s_10_1, 0, 17, 0}, /* 2 */ { 1, s_10_2, -1, 16, 0}, /* 3 */ { 2, s_10_3, 2, 13, 0}, /* 4 */ { 2, s_10_4, 2, 13, 0}, /* 5 */ { 2, s_10_5, 2, 13, 0}, /* 6 */ { 2, s_10_6, 2, 14, 0}, /* 7 */ { 2, s_10_7, 2, 15, 0}, /* 8 */ { 2, s_10_8, 2, 13, 0}, /* 9 */ { 1, s_10_9, -1, 18, 0}, /* 10 */ { 2, s_10_10, 9, 17, 0}, /* 11 */ { 2, s_10_11, -1, 4, 0}, /* 12 */ { 3, s_10_12, 11, 1, 0}, /* 13 */ { 3, s_10_13, 11, 2, 0}, /* 14 */ { 3, s_10_14, 11, 3, 0}, /* 15 */ { 3, s_10_15, 11, 1, 0}, /* 16 */ { 2, s_10_16, -1, 8, 0}, /* 17 */ { 3, s_10_17, 16, 7, 0}, /* 18 */ { 4, s_10_18, 17, 5, 0}, /* 19 */ { 2, s_10_19, -1, 8, 0}, /* 20 */ { 3, s_10_20, 19, 7, 0}, /* 21 */ { 4, s_10_21, 20, 6, 0}, /* 22 */ { 1, s_10_22, -1, 12, 0}, /* 23 */ { 2, s_10_23, 22, 9, 0}, /* 24 */ { 2, s_10_24, 22, 9, 0}, /* 25 */ { 2, s_10_25, 22, 9, 0}, /* 26 */ { 2, s_10_26, 22, 10, 0}, /* 27 */ { 2, s_10_27, 22, 11, 0}, /* 28 */ { 1, s_10_28, -1, 18, 0}, /* 29 */ { 1, s_10_29, -1, 19, 0}, /* 30 */ { 1, s_10_30, -1, 20, 0} }; static const symbol s_11_0[2] = { 'i', 'd' }; static const symbol s_11_1[3] = { 'a', 'i', 'd' }; static const symbol s_11_2[4] = { 'j', 'a', 'i', 'd' }; static const symbol s_11_3[3] = { 'e', 'i', 'd' }; static const symbol s_11_4[4] = { 'j', 'e', 'i', 'd' }; static const symbol s_11_5[3] = { 0xE1, 'i', 'd' }; static const symbol s_11_6[3] = { 0xE9, 'i', 'd' }; static const symbol s_11_7[1] = { 'i' }; static const symbol s_11_8[2] = { 'a', 'i' }; static const symbol s_11_9[3] = { 'j', 'a', 'i' }; static const symbol s_11_10[2] = { 'e', 'i' }; static const symbol s_11_11[3] = { 'j', 'e', 'i' }; static const symbol s_11_12[2] = { 0xE1, 'i' }; static const symbol s_11_13[2] = { 0xE9, 'i' }; static const symbol s_11_14[4] = { 'i', 't', 'e', 'k' }; static const symbol s_11_15[5] = { 'e', 'i', 't', 'e', 'k' }; static const symbol s_11_16[6] = { 'j', 'e', 'i', 't', 'e', 'k' }; static const symbol s_11_17[5] = { 0xE9, 'i', 't', 'e', 'k' }; static const symbol s_11_18[2] = { 'i', 'k' }; static const symbol s_11_19[3] = { 'a', 'i', 'k' }; static const symbol s_11_20[4] = { 'j', 'a', 'i', 'k' }; static const symbol s_11_21[3] = { 'e', 'i', 'k' }; static const symbol s_11_22[4] = { 'j', 'e', 'i', 'k' }; static const symbol s_11_23[3] = { 0xE1, 'i', 'k' }; static const symbol s_11_24[3] = { 0xE9, 'i', 'k' }; static const symbol s_11_25[3] = { 'i', 'n', 'k' }; static const symbol s_11_26[4] = { 'a', 'i', 'n', 'k' }; static const symbol s_11_27[5] = { 'j', 'a', 'i', 'n', 'k' }; static const symbol s_11_28[4] = { 'e', 'i', 'n', 'k' }; static const symbol s_11_29[5] = { 'j', 'e', 'i', 'n', 'k' }; static const symbol s_11_30[4] = { 0xE1, 'i', 'n', 'k' }; static const symbol s_11_31[4] = { 0xE9, 'i', 'n', 'k' }; static const symbol s_11_32[5] = { 'a', 'i', 't', 'o', 'k' }; static const symbol s_11_33[6] = { 'j', 'a', 'i', 't', 'o', 'k' }; static const symbol s_11_34[5] = { 0xE1, 'i', 't', 'o', 'k' }; static const symbol s_11_35[2] = { 'i', 'm' }; static const symbol s_11_36[3] = { 'a', 'i', 'm' }; static const symbol s_11_37[4] = { 'j', 'a', 'i', 'm' }; static const symbol s_11_38[3] = { 'e', 'i', 'm' }; static const symbol s_11_39[4] = { 'j', 'e', 'i', 'm' }; static const symbol s_11_40[3] = { 0xE1, 'i', 'm' }; static const symbol s_11_41[3] = { 0xE9, 'i', 'm' }; static const struct among a_11[42] = { /* 0 */ { 2, s_11_0, -1, 10, 0}, /* 1 */ { 3, s_11_1, 0, 9, 0}, /* 2 */ { 4, s_11_2, 1, 6, 0}, /* 3 */ { 3, s_11_3, 0, 9, 0}, /* 4 */ { 4, s_11_4, 3, 6, 0}, /* 5 */ { 3, s_11_5, 0, 7, 0}, /* 6 */ { 3, s_11_6, 0, 8, 0}, /* 7 */ { 1, s_11_7, -1, 15, 0}, /* 8 */ { 2, s_11_8, 7, 14, 0}, /* 9 */ { 3, s_11_9, 8, 11, 0}, /* 10 */ { 2, s_11_10, 7, 14, 0}, /* 11 */ { 3, s_11_11, 10, 11, 0}, /* 12 */ { 2, s_11_12, 7, 12, 0}, /* 13 */ { 2, s_11_13, 7, 13, 0}, /* 14 */ { 4, s_11_14, -1, 24, 0}, /* 15 */ { 5, s_11_15, 14, 21, 0}, /* 16 */ { 6, s_11_16, 15, 20, 0}, /* 17 */ { 5, s_11_17, 14, 23, 0}, /* 18 */ { 2, s_11_18, -1, 29, 0}, /* 19 */ { 3, s_11_19, 18, 26, 0}, /* 20 */ { 4, s_11_20, 19, 25, 0}, /* 21 */ { 3, s_11_21, 18, 26, 0}, /* 22 */ { 4, s_11_22, 21, 25, 0}, /* 23 */ { 3, s_11_23, 18, 27, 0}, /* 24 */ { 3, s_11_24, 18, 28, 0}, /* 25 */ { 3, s_11_25, -1, 20, 0}, /* 26 */ { 4, s_11_26, 25, 17, 0}, /* 27 */ { 5, s_11_27, 26, 16, 0}, /* 28 */ { 4, s_11_28, 25, 17, 0}, /* 29 */ { 5, s_11_29, 28, 16, 0}, /* 30 */ { 4, s_11_30, 25, 18, 0}, /* 31 */ { 4, s_11_31, 25, 19, 0}, /* 32 */ { 5, s_11_32, -1, 21, 0}, /* 33 */ { 6, s_11_33, 32, 20, 0}, /* 34 */ { 5, s_11_34, -1, 22, 0}, /* 35 */ { 2, s_11_35, -1, 5, 0}, /* 36 */ { 3, s_11_36, 35, 4, 0}, /* 37 */ { 4, s_11_37, 36, 1, 0}, /* 38 */ { 3, s_11_38, 35, 4, 0}, /* 39 */ { 4, s_11_39, 38, 1, 0}, /* 40 */ { 3, s_11_40, 35, 2, 0}, /* 41 */ { 3, s_11_41, 35, 3, 0} }; static const unsigned char g_v[] = { 17, 65, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 17, 52, 14 }; static const symbol s_0[] = { 'a' }; static const symbol s_1[] = { 'e' }; static const symbol s_2[] = { 'e' }; static const symbol s_3[] = { 'a' }; static const symbol s_4[] = { 'a' }; static const symbol s_5[] = { 'a' }; static const symbol s_6[] = { 'e' }; static const symbol s_7[] = { 'a' }; static const symbol s_8[] = { 'e' }; static const symbol s_9[] = { 'e' }; static const symbol s_10[] = { 'a' }; static const symbol s_11[] = { 'e' }; static const symbol s_12[] = { 'a' }; static const symbol s_13[] = { 'e' }; static const symbol s_14[] = { 'a' }; static const symbol s_15[] = { 'e' }; static const symbol s_16[] = { 'a' }; static const symbol s_17[] = { 'e' }; static const symbol s_18[] = { 'a' }; static const symbol s_19[] = { 'e' }; static const symbol s_20[] = { 'a' }; static const symbol s_21[] = { 'e' }; static const symbol s_22[] = { 'a' }; static const symbol s_23[] = { 'e' }; static const symbol s_24[] = { 'a' }; static const symbol s_25[] = { 'e' }; static const symbol s_26[] = { 'a' }; static const symbol s_27[] = { 'e' }; static const symbol s_28[] = { 'a' }; static const symbol s_29[] = { 'e' }; static const symbol s_30[] = { 'a' }; static const symbol s_31[] = { 'e' }; static const symbol s_32[] = { 'a' }; static const symbol s_33[] = { 'e' }; static const symbol s_34[] = { 'a' }; static const symbol s_35[] = { 'e' }; static int r_mark_regions(struct SN_env * z) { z->I[0] = z->l; { int c1 = z->c; /* or, line 51 */ if (in_grouping(z, g_v, 97, 252, 0)) goto lab1; if (in_grouping(z, g_v, 97, 252, 1) < 0) goto lab1; /* goto */ /* non v, line 48 */ { int c2 = z->c; /* or, line 49 */ if (z->c + 1 >= z->l || z->p[z->c + 1] >> 5 != 3 || !((101187584 >> (z->p[z->c + 1] & 0x1f)) & 1)) goto lab3; if (!(find_among(z, a_0, 8))) goto lab3; /* among, line 49 */ goto lab2; lab3: z->c = c2; if (z->c >= z->l) goto lab1; z->c++; /* next, line 49 */ } lab2: z->I[0] = z->c; /* setmark p1, line 50 */ goto lab0; lab1: z->c = c1; if (out_grouping(z, g_v, 97, 252, 0)) return 0; { /* gopast */ /* grouping v, line 53 */ int ret = out_grouping(z, g_v, 97, 252, 1); if (ret < 0) return 0; z->c += ret; } z->I[0] = z->c; /* setmark p1, line 53 */ } lab0: return 1; } static int r_R1(struct SN_env * z) { if (!(z->I[0] <= z->c)) return 0; return 1; } static int r_v_ending(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 61 */ if (z->c <= z->lb || (z->p[z->c - 1] != 225 && z->p[z->c - 1] != 233)) return 0; among_var = find_among_b(z, a_1, 2); /* substring, line 61 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 61 */ { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 61 */ if (ret < 0) return ret; } switch(among_var) { case 0: return 0; case 1: { int ret = slice_from_s(z, 1, s_0); /* <-, line 62 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 1, s_1); /* <-, line 63 */ if (ret < 0) return ret; } break; } return 1; } static int r_double(struct SN_env * z) { { int m_test = z->l - z->c; /* test, line 68 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((106790108 >> (z->p[z->c - 1] & 0x1f)) & 1)) return 0; if (!(find_among_b(z, a_2, 23))) return 0; /* among, line 68 */ z->c = z->l - m_test; } return 1; } static int r_undouble(struct SN_env * z) { if (z->c <= z->lb) return 0; z->c--; /* next, line 73 */ z->ket = z->c; /* [, line 73 */ { int ret = z->c - 1; if (z->lb > ret || ret > z->l) return 0; z->c = ret; /* hop, line 73 */ } z->bra = z->c; /* ], line 73 */ { int ret = slice_del(z); /* delete, line 73 */ if (ret < 0) return ret; } return 1; } static int r_instrum(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 77 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] != 108) return 0; among_var = find_among_b(z, a_3, 2); /* substring, line 77 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 77 */ { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 77 */ if (ret < 0) return ret; } switch(among_var) { case 0: return 0; case 1: { int ret = r_double(z); if (ret == 0) return 0; /* call double, line 78 */ if (ret < 0) return ret; } break; case 2: { int ret = r_double(z); if (ret == 0) return 0; /* call double, line 79 */ if (ret < 0) return ret; } break; } { int ret = slice_del(z); /* delete, line 81 */ if (ret < 0) return ret; } { int ret = r_undouble(z); if (ret == 0) return 0; /* call undouble, line 82 */ if (ret < 0) return ret; } return 1; } static int r_case(struct SN_env * z) { z->ket = z->c; /* [, line 87 */ if (!(find_among_b(z, a_4, 44))) return 0; /* substring, line 87 */ z->bra = z->c; /* ], line 87 */ { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 87 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 111 */ if (ret < 0) return ret; } { int ret = r_v_ending(z); if (ret == 0) return 0; /* call v_ending, line 112 */ if (ret < 0) return ret; } return 1; } static int r_case_special(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 116 */ if (z->c - 1 <= z->lb || (z->p[z->c - 1] != 110 && z->p[z->c - 1] != 116)) return 0; among_var = find_among_b(z, a_5, 3); /* substring, line 116 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 116 */ { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 116 */ if (ret < 0) return ret; } switch(among_var) { case 0: return 0; case 1: { int ret = slice_from_s(z, 1, s_2); /* <-, line 117 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 1, s_3); /* <-, line 118 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_from_s(z, 1, s_4); /* <-, line 119 */ if (ret < 0) return ret; } break; } return 1; } static int r_case_other(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 124 */ if (z->c - 3 <= z->lb || z->p[z->c - 1] != 108) return 0; among_var = find_among_b(z, a_6, 6); /* substring, line 124 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 124 */ { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 124 */ if (ret < 0) return ret; } switch(among_var) { case 0: return 0; case 1: { int ret = slice_del(z); /* delete, line 125 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_del(z); /* delete, line 126 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_from_s(z, 1, s_5); /* <-, line 127 */ if (ret < 0) return ret; } break; case 4: { int ret = slice_from_s(z, 1, s_6); /* <-, line 128 */ if (ret < 0) return ret; } break; } return 1; } static int r_factive(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 133 */ if (z->c <= z->lb || (z->p[z->c - 1] != 225 && z->p[z->c - 1] != 233)) return 0; among_var = find_among_b(z, a_7, 2); /* substring, line 133 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 133 */ { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 133 */ if (ret < 0) return ret; } switch(among_var) { case 0: return 0; case 1: { int ret = r_double(z); if (ret == 0) return 0; /* call double, line 134 */ if (ret < 0) return ret; } break; case 2: { int ret = r_double(z); if (ret == 0) return 0; /* call double, line 135 */ if (ret < 0) return ret; } break; } { int ret = slice_del(z); /* delete, line 137 */ if (ret < 0) return ret; } { int ret = r_undouble(z); if (ret == 0) return 0; /* call undouble, line 138 */ if (ret < 0) return ret; } return 1; } static int r_plural(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 142 */ if (z->c <= z->lb || z->p[z->c - 1] != 107) return 0; among_var = find_among_b(z, a_8, 7); /* substring, line 142 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 142 */ { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 142 */ if (ret < 0) return ret; } switch(among_var) { case 0: return 0; case 1: { int ret = slice_from_s(z, 1, s_7); /* <-, line 143 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 1, s_8); /* <-, line 144 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_del(z); /* delete, line 145 */ if (ret < 0) return ret; } break; case 4: { int ret = slice_del(z); /* delete, line 146 */ if (ret < 0) return ret; } break; case 5: { int ret = slice_del(z); /* delete, line 147 */ if (ret < 0) return ret; } break; case 6: { int ret = slice_del(z); /* delete, line 148 */ if (ret < 0) return ret; } break; case 7: { int ret = slice_del(z); /* delete, line 149 */ if (ret < 0) return ret; } break; } return 1; } static int r_owned(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 154 */ if (z->c <= z->lb || (z->p[z->c - 1] != 105 && z->p[z->c - 1] != 233)) return 0; among_var = find_among_b(z, a_9, 12); /* substring, line 154 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 154 */ { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 154 */ if (ret < 0) return ret; } switch(among_var) { case 0: return 0; case 1: { int ret = slice_del(z); /* delete, line 155 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 1, s_9); /* <-, line 156 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_from_s(z, 1, s_10); /* <-, line 157 */ if (ret < 0) return ret; } break; case 4: { int ret = slice_del(z); /* delete, line 158 */ if (ret < 0) return ret; } break; case 5: { int ret = slice_from_s(z, 1, s_11); /* <-, line 159 */ if (ret < 0) return ret; } break; case 6: { int ret = slice_from_s(z, 1, s_12); /* <-, line 160 */ if (ret < 0) return ret; } break; case 7: { int ret = slice_del(z); /* delete, line 161 */ if (ret < 0) return ret; } break; case 8: { int ret = slice_from_s(z, 1, s_13); /* <-, line 162 */ if (ret < 0) return ret; } break; case 9: { int ret = slice_del(z); /* delete, line 163 */ if (ret < 0) return ret; } break; } return 1; } static int r_sing_owner(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 168 */ among_var = find_among_b(z, a_10, 31); /* substring, line 168 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 168 */ { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 168 */ if (ret < 0) return ret; } switch(among_var) { case 0: return 0; case 1: { int ret = slice_del(z); /* delete, line 169 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 1, s_14); /* <-, line 170 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_from_s(z, 1, s_15); /* <-, line 171 */ if (ret < 0) return ret; } break; case 4: { int ret = slice_del(z); /* delete, line 172 */ if (ret < 0) return ret; } break; case 5: { int ret = slice_from_s(z, 1, s_16); /* <-, line 173 */ if (ret < 0) return ret; } break; case 6: { int ret = slice_from_s(z, 1, s_17); /* <-, line 174 */ if (ret < 0) return ret; } break; case 7: { int ret = slice_del(z); /* delete, line 175 */ if (ret < 0) return ret; } break; case 8: { int ret = slice_del(z); /* delete, line 176 */ if (ret < 0) return ret; } break; case 9: { int ret = slice_del(z); /* delete, line 177 */ if (ret < 0) return ret; } break; case 10: { int ret = slice_from_s(z, 1, s_18); /* <-, line 178 */ if (ret < 0) return ret; } break; case 11: { int ret = slice_from_s(z, 1, s_19); /* <-, line 179 */ if (ret < 0) return ret; } break; case 12: { int ret = slice_del(z); /* delete, line 180 */ if (ret < 0) return ret; } break; case 13: { int ret = slice_del(z); /* delete, line 181 */ if (ret < 0) return ret; } break; case 14: { int ret = slice_from_s(z, 1, s_20); /* <-, line 182 */ if (ret < 0) return ret; } break; case 15: { int ret = slice_from_s(z, 1, s_21); /* <-, line 183 */ if (ret < 0) return ret; } break; case 16: { int ret = slice_del(z); /* delete, line 184 */ if (ret < 0) return ret; } break; case 17: { int ret = slice_del(z); /* delete, line 185 */ if (ret < 0) return ret; } break; case 18: { int ret = slice_del(z); /* delete, line 186 */ if (ret < 0) return ret; } break; case 19: { int ret = slice_from_s(z, 1, s_22); /* <-, line 187 */ if (ret < 0) return ret; } break; case 20: { int ret = slice_from_s(z, 1, s_23); /* <-, line 188 */ if (ret < 0) return ret; } break; } return 1; } static int r_plur_owner(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 193 */ if (z->c <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((10768 >> (z->p[z->c - 1] & 0x1f)) & 1)) return 0; among_var = find_among_b(z, a_11, 42); /* substring, line 193 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 193 */ { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 193 */ if (ret < 0) return ret; } switch(among_var) { case 0: return 0; case 1: { int ret = slice_del(z); /* delete, line 194 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 1, s_24); /* <-, line 195 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_from_s(z, 1, s_25); /* <-, line 196 */ if (ret < 0) return ret; } break; case 4: { int ret = slice_del(z); /* delete, line 197 */ if (ret < 0) return ret; } break; case 5: { int ret = slice_del(z); /* delete, line 198 */ if (ret < 0) return ret; } break; case 6: { int ret = slice_del(z); /* delete, line 199 */ if (ret < 0) return ret; } break; case 7: { int ret = slice_from_s(z, 1, s_26); /* <-, line 200 */ if (ret < 0) return ret; } break; case 8: { int ret = slice_from_s(z, 1, s_27); /* <-, line 201 */ if (ret < 0) return ret; } break; case 9: { int ret = slice_del(z); /* delete, line 202 */ if (ret < 0) return ret; } break; case 10: { int ret = slice_del(z); /* delete, line 203 */ if (ret < 0) return ret; } break; case 11: { int ret = slice_del(z); /* delete, line 204 */ if (ret < 0) return ret; } break; case 12: { int ret = slice_from_s(z, 1, s_28); /* <-, line 205 */ if (ret < 0) return ret; } break; case 13: { int ret = slice_from_s(z, 1, s_29); /* <-, line 206 */ if (ret < 0) return ret; } break; case 14: { int ret = slice_del(z); /* delete, line 207 */ if (ret < 0) return ret; } break; case 15: { int ret = slice_del(z); /* delete, line 208 */ if (ret < 0) return ret; } break; case 16: { int ret = slice_del(z); /* delete, line 209 */ if (ret < 0) return ret; } break; case 17: { int ret = slice_del(z); /* delete, line 210 */ if (ret < 0) return ret; } break; case 18: { int ret = slice_from_s(z, 1, s_30); /* <-, line 211 */ if (ret < 0) return ret; } break; case 19: { int ret = slice_from_s(z, 1, s_31); /* <-, line 212 */ if (ret < 0) return ret; } break; case 20: { int ret = slice_del(z); /* delete, line 214 */ if (ret < 0) return ret; } break; case 21: { int ret = slice_del(z); /* delete, line 215 */ if (ret < 0) return ret; } break; case 22: { int ret = slice_from_s(z, 1, s_32); /* <-, line 216 */ if (ret < 0) return ret; } break; case 23: { int ret = slice_from_s(z, 1, s_33); /* <-, line 217 */ if (ret < 0) return ret; } break; case 24: { int ret = slice_del(z); /* delete, line 218 */ if (ret < 0) return ret; } break; case 25: { int ret = slice_del(z); /* delete, line 219 */ if (ret < 0) return ret; } break; case 26: { int ret = slice_del(z); /* delete, line 220 */ if (ret < 0) return ret; } break; case 27: { int ret = slice_from_s(z, 1, s_34); /* <-, line 221 */ if (ret < 0) return ret; } break; case 28: { int ret = slice_from_s(z, 1, s_35); /* <-, line 222 */ if (ret < 0) return ret; } break; case 29: { int ret = slice_del(z); /* delete, line 223 */ if (ret < 0) return ret; } break; } return 1; } extern int hungarian_ISO_8859_1_stem(struct SN_env * z) { { int c1 = z->c; /* do, line 229 */ { int ret = r_mark_regions(z); if (ret == 0) goto lab0; /* call mark_regions, line 229 */ if (ret < 0) return ret; } lab0: z->c = c1; } z->lb = z->c; z->c = z->l; /* backwards, line 230 */ { int m2 = z->l - z->c; (void)m2; /* do, line 231 */ { int ret = r_instrum(z); if (ret == 0) goto lab1; /* call instrum, line 231 */ if (ret < 0) return ret; } lab1: z->c = z->l - m2; } { int m3 = z->l - z->c; (void)m3; /* do, line 232 */ { int ret = r_case(z); if (ret == 0) goto lab2; /* call case, line 232 */ if (ret < 0) return ret; } lab2: z->c = z->l - m3; } { int m4 = z->l - z->c; (void)m4; /* do, line 233 */ { int ret = r_case_special(z); if (ret == 0) goto lab3; /* call case_special, line 233 */ if (ret < 0) return ret; } lab3: z->c = z->l - m4; } { int m5 = z->l - z->c; (void)m5; /* do, line 234 */ { int ret = r_case_other(z); if (ret == 0) goto lab4; /* call case_other, line 234 */ if (ret < 0) return ret; } lab4: z->c = z->l - m5; } { int m6 = z->l - z->c; (void)m6; /* do, line 235 */ { int ret = r_factive(z); if (ret == 0) goto lab5; /* call factive, line 235 */ if (ret < 0) return ret; } lab5: z->c = z->l - m6; } { int m7 = z->l - z->c; (void)m7; /* do, line 236 */ { int ret = r_owned(z); if (ret == 0) goto lab6; /* call owned, line 236 */ if (ret < 0) return ret; } lab6: z->c = z->l - m7; } { int m8 = z->l - z->c; (void)m8; /* do, line 237 */ { int ret = r_sing_owner(z); if (ret == 0) goto lab7; /* call sing_owner, line 237 */ if (ret < 0) return ret; } lab7: z->c = z->l - m8; } { int m9 = z->l - z->c; (void)m9; /* do, line 238 */ { int ret = r_plur_owner(z); if (ret == 0) goto lab8; /* call plur_owner, line 238 */ if (ret < 0) return ret; } lab8: z->c = z->l - m9; } { int m10 = z->l - z->c; (void)m10; /* do, line 239 */ { int ret = r_plural(z); if (ret == 0) goto lab9; /* call plural, line 239 */ if (ret < 0) return ret; } lab9: z->c = z->l - m10; } z->c = z->lb; return 1; } extern struct SN_env * hungarian_ISO_8859_1_create_env(void) { return SN_create_env(0, 1, 0); } extern void hungarian_ISO_8859_1_close_env(struct SN_env * z) { SN_close_env(z, 0); } swish-e-2.4.7/src/snowball/stem_no.c0000664000077100017500000002337011166010110014240 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #include "header.h" #ifdef __cplusplus extern "C" { #endif extern int norwegian_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif static int r_other_suffix(struct SN_env * z); static int r_consonant_pair(struct SN_env * z); static int r_main_suffix(struct SN_env * z); static int r_mark_regions(struct SN_env * z); #ifdef __cplusplus extern "C" { #endif extern struct SN_env * norwegian_ISO_8859_1_create_env(void); extern void norwegian_ISO_8859_1_close_env(struct SN_env * z); #ifdef __cplusplus } #endif static const symbol s_0_0[1] = { 'a' }; static const symbol s_0_1[1] = { 'e' }; static const symbol s_0_2[3] = { 'e', 'd', 'e' }; static const symbol s_0_3[4] = { 'a', 'n', 'd', 'e' }; static const symbol s_0_4[4] = { 'e', 'n', 'd', 'e' }; static const symbol s_0_5[3] = { 'a', 'n', 'e' }; static const symbol s_0_6[3] = { 'e', 'n', 'e' }; static const symbol s_0_7[6] = { 'h', 'e', 't', 'e', 'n', 'e' }; static const symbol s_0_8[4] = { 'e', 'r', 't', 'e' }; static const symbol s_0_9[2] = { 'e', 'n' }; static const symbol s_0_10[5] = { 'h', 'e', 't', 'e', 'n' }; static const symbol s_0_11[2] = { 'a', 'r' }; static const symbol s_0_12[2] = { 'e', 'r' }; static const symbol s_0_13[5] = { 'h', 'e', 't', 'e', 'r' }; static const symbol s_0_14[1] = { 's' }; static const symbol s_0_15[2] = { 'a', 's' }; static const symbol s_0_16[2] = { 'e', 's' }; static const symbol s_0_17[4] = { 'e', 'd', 'e', 's' }; static const symbol s_0_18[5] = { 'e', 'n', 'd', 'e', 's' }; static const symbol s_0_19[4] = { 'e', 'n', 'e', 's' }; static const symbol s_0_20[7] = { 'h', 'e', 't', 'e', 'n', 'e', 's' }; static const symbol s_0_21[3] = { 'e', 'n', 's' }; static const symbol s_0_22[6] = { 'h', 'e', 't', 'e', 'n', 's' }; static const symbol s_0_23[3] = { 'e', 'r', 's' }; static const symbol s_0_24[3] = { 'e', 't', 's' }; static const symbol s_0_25[2] = { 'e', 't' }; static const symbol s_0_26[3] = { 'h', 'e', 't' }; static const symbol s_0_27[3] = { 'e', 'r', 't' }; static const symbol s_0_28[3] = { 'a', 's', 't' }; static const struct among a_0[29] = { /* 0 */ { 1, s_0_0, -1, 1, 0}, /* 1 */ { 1, s_0_1, -1, 1, 0}, /* 2 */ { 3, s_0_2, 1, 1, 0}, /* 3 */ { 4, s_0_3, 1, 1, 0}, /* 4 */ { 4, s_0_4, 1, 1, 0}, /* 5 */ { 3, s_0_5, 1, 1, 0}, /* 6 */ { 3, s_0_6, 1, 1, 0}, /* 7 */ { 6, s_0_7, 6, 1, 0}, /* 8 */ { 4, s_0_8, 1, 3, 0}, /* 9 */ { 2, s_0_9, -1, 1, 0}, /* 10 */ { 5, s_0_10, 9, 1, 0}, /* 11 */ { 2, s_0_11, -1, 1, 0}, /* 12 */ { 2, s_0_12, -1, 1, 0}, /* 13 */ { 5, s_0_13, 12, 1, 0}, /* 14 */ { 1, s_0_14, -1, 2, 0}, /* 15 */ { 2, s_0_15, 14, 1, 0}, /* 16 */ { 2, s_0_16, 14, 1, 0}, /* 17 */ { 4, s_0_17, 16, 1, 0}, /* 18 */ { 5, s_0_18, 16, 1, 0}, /* 19 */ { 4, s_0_19, 16, 1, 0}, /* 20 */ { 7, s_0_20, 19, 1, 0}, /* 21 */ { 3, s_0_21, 14, 1, 0}, /* 22 */ { 6, s_0_22, 21, 1, 0}, /* 23 */ { 3, s_0_23, 14, 1, 0}, /* 24 */ { 3, s_0_24, 14, 1, 0}, /* 25 */ { 2, s_0_25, -1, 1, 0}, /* 26 */ { 3, s_0_26, 25, 1, 0}, /* 27 */ { 3, s_0_27, -1, 3, 0}, /* 28 */ { 3, s_0_28, -1, 1, 0} }; static const symbol s_1_0[2] = { 'd', 't' }; static const symbol s_1_1[2] = { 'v', 't' }; static const struct among a_1[2] = { /* 0 */ { 2, s_1_0, -1, -1, 0}, /* 1 */ { 2, s_1_1, -1, -1, 0} }; static const symbol s_2_0[3] = { 'l', 'e', 'g' }; static const symbol s_2_1[4] = { 'e', 'l', 'e', 'g' }; static const symbol s_2_2[2] = { 'i', 'g' }; static const symbol s_2_3[3] = { 'e', 'i', 'g' }; static const symbol s_2_4[3] = { 'l', 'i', 'g' }; static const symbol s_2_5[4] = { 'e', 'l', 'i', 'g' }; static const symbol s_2_6[3] = { 'e', 'l', 's' }; static const symbol s_2_7[3] = { 'l', 'o', 'v' }; static const symbol s_2_8[4] = { 'e', 'l', 'o', 'v' }; static const symbol s_2_9[4] = { 's', 'l', 'o', 'v' }; static const symbol s_2_10[7] = { 'h', 'e', 't', 's', 'l', 'o', 'v' }; static const struct among a_2[11] = { /* 0 */ { 3, s_2_0, -1, 1, 0}, /* 1 */ { 4, s_2_1, 0, 1, 0}, /* 2 */ { 2, s_2_2, -1, 1, 0}, /* 3 */ { 3, s_2_3, 2, 1, 0}, /* 4 */ { 3, s_2_4, 2, 1, 0}, /* 5 */ { 4, s_2_5, 4, 1, 0}, /* 6 */ { 3, s_2_6, -1, 1, 0}, /* 7 */ { 3, s_2_7, -1, 1, 0}, /* 8 */ { 4, s_2_8, 7, 1, 0}, /* 9 */ { 4, s_2_9, 7, 1, 0}, /* 10 */ { 7, s_2_10, 9, 1, 0} }; static const unsigned char g_v[] = { 17, 65, 16, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 128 }; static const unsigned char g_s_ending[] = { 119, 125, 149, 1 }; static const symbol s_0[] = { 'k' }; static const symbol s_1[] = { 'e', 'r' }; static int r_mark_regions(struct SN_env * z) { z->I[0] = z->l; { int c_test = z->c; /* test, line 30 */ { int ret = z->c + 3; if (0 > ret || ret > z->l) return 0; z->c = ret; /* hop, line 30 */ } z->I[1] = z->c; /* setmark x, line 30 */ z->c = c_test; } if (out_grouping(z, g_v, 97, 248, 1) < 0) return 0; /* goto */ /* grouping v, line 31 */ { /* gopast */ /* non v, line 31 */ int ret = in_grouping(z, g_v, 97, 248, 1); if (ret < 0) return 0; z->c += ret; } z->I[0] = z->c; /* setmark p1, line 31 */ /* try, line 32 */ if (!(z->I[0] < z->I[1])) goto lab0; z->I[0] = z->I[1]; lab0: return 1; } static int r_main_suffix(struct SN_env * z) { int among_var; { int mlimit; /* setlimit, line 38 */ int m1 = z->l - z->c; (void)m1; if (z->c < z->I[0]) return 0; z->c = z->I[0]; /* tomark, line 38 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m1; z->ket = z->c; /* [, line 38 */ if (z->c <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((1851426 >> (z->p[z->c - 1] & 0x1f)) & 1)) { z->lb = mlimit; return 0; } among_var = find_among_b(z, a_0, 29); /* substring, line 38 */ if (!(among_var)) { z->lb = mlimit; return 0; } z->bra = z->c; /* ], line 38 */ z->lb = mlimit; } switch(among_var) { case 0: return 0; case 1: { int ret = slice_del(z); /* delete, line 44 */ if (ret < 0) return ret; } break; case 2: { int m2 = z->l - z->c; (void)m2; /* or, line 46 */ if (in_grouping_b(z, g_s_ending, 98, 122, 0)) goto lab1; goto lab0; lab1: z->c = z->l - m2; if (!(eq_s_b(z, 1, s_0))) return 0; if (out_grouping_b(z, g_v, 97, 248, 0)) return 0; } lab0: { int ret = slice_del(z); /* delete, line 46 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_from_s(z, 2, s_1); /* <-, line 48 */ if (ret < 0) return ret; } break; } return 1; } static int r_consonant_pair(struct SN_env * z) { { int m_test = z->l - z->c; /* test, line 53 */ { int mlimit; /* setlimit, line 54 */ int m1 = z->l - z->c; (void)m1; if (z->c < z->I[0]) return 0; z->c = z->I[0]; /* tomark, line 54 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m1; z->ket = z->c; /* [, line 54 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] != 116) { z->lb = mlimit; return 0; } if (!(find_among_b(z, a_1, 2))) { z->lb = mlimit; return 0; } /* substring, line 54 */ z->bra = z->c; /* ], line 54 */ z->lb = mlimit; } z->c = z->l - m_test; } if (z->c <= z->lb) return 0; z->c--; /* next, line 59 */ z->bra = z->c; /* ], line 59 */ { int ret = slice_del(z); /* delete, line 59 */ if (ret < 0) return ret; } return 1; } static int r_other_suffix(struct SN_env * z) { int among_var; { int mlimit; /* setlimit, line 63 */ int m1 = z->l - z->c; (void)m1; if (z->c < z->I[0]) return 0; z->c = z->I[0]; /* tomark, line 63 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m1; z->ket = z->c; /* [, line 63 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((4718720 >> (z->p[z->c - 1] & 0x1f)) & 1)) { z->lb = mlimit; return 0; } among_var = find_among_b(z, a_2, 11); /* substring, line 63 */ if (!(among_var)) { z->lb = mlimit; return 0; } z->bra = z->c; /* ], line 63 */ z->lb = mlimit; } switch(among_var) { case 0: return 0; case 1: { int ret = slice_del(z); /* delete, line 67 */ if (ret < 0) return ret; } break; } return 1; } extern int norwegian_ISO_8859_1_stem(struct SN_env * z) { { int c1 = z->c; /* do, line 74 */ { int ret = r_mark_regions(z); if (ret == 0) goto lab0; /* call mark_regions, line 74 */ if (ret < 0) return ret; } lab0: z->c = c1; } z->lb = z->c; z->c = z->l; /* backwards, line 75 */ { int m2 = z->l - z->c; (void)m2; /* do, line 76 */ { int ret = r_main_suffix(z); if (ret == 0) goto lab1; /* call main_suffix, line 76 */ if (ret < 0) return ret; } lab1: z->c = z->l - m2; } { int m3 = z->l - z->c; (void)m3; /* do, line 77 */ { int ret = r_consonant_pair(z); if (ret == 0) goto lab2; /* call consonant_pair, line 77 */ if (ret < 0) return ret; } lab2: z->c = z->l - m3; } { int m4 = z->l - z->c; (void)m4; /* do, line 78 */ { int ret = r_other_suffix(z); if (ret == 0) goto lab3; /* call other_suffix, line 78 */ if (ret < 0) return ret; } lab3: z->c = z->l - m4; } z->c = z->lb; return 1; } extern struct SN_env * norwegian_ISO_8859_1_create_env(void) { return SN_create_env(0, 2, 0); } extern void norwegian_ISO_8859_1_close_env(struct SN_env * z) { SN_close_env(z, 0); } swish-e-2.4.7/src/snowball/stem_en2.c0000664000077100017500000011345311166010107014320 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #include "header.h" #ifdef __cplusplus extern "C" { #endif extern int english_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif static int r_exception2(struct SN_env * z); static int r_exception1(struct SN_env * z); static int r_Step_5(struct SN_env * z); static int r_Step_4(struct SN_env * z); static int r_Step_3(struct SN_env * z); static int r_Step_2(struct SN_env * z); static int r_Step_1c(struct SN_env * z); static int r_Step_1b(struct SN_env * z); static int r_Step_1a(struct SN_env * z); static int r_R2(struct SN_env * z); static int r_R1(struct SN_env * z); static int r_shortv(struct SN_env * z); static int r_mark_regions(struct SN_env * z); static int r_postlude(struct SN_env * z); static int r_prelude(struct SN_env * z); #ifdef __cplusplus extern "C" { #endif extern struct SN_env * english_ISO_8859_1_create_env(void); extern void english_ISO_8859_1_close_env(struct SN_env * z); #ifdef __cplusplus } #endif static const symbol s_0_0[5] = { 'a', 'r', 's', 'e', 'n' }; static const symbol s_0_1[6] = { 'c', 'o', 'm', 'm', 'u', 'n' }; static const symbol s_0_2[5] = { 'g', 'e', 'n', 'e', 'r' }; static const struct among a_0[3] = { /* 0 */ { 5, s_0_0, -1, -1, 0}, /* 1 */ { 6, s_0_1, -1, -1, 0}, /* 2 */ { 5, s_0_2, -1, -1, 0} }; static const symbol s_1_0[1] = { '\'' }; static const symbol s_1_1[3] = { '\'', 's', '\'' }; static const symbol s_1_2[2] = { '\'', 's' }; static const struct among a_1[3] = { /* 0 */ { 1, s_1_0, -1, 1, 0}, /* 1 */ { 3, s_1_1, 0, 1, 0}, /* 2 */ { 2, s_1_2, -1, 1, 0} }; static const symbol s_2_0[3] = { 'i', 'e', 'd' }; static const symbol s_2_1[1] = { 's' }; static const symbol s_2_2[3] = { 'i', 'e', 's' }; static const symbol s_2_3[4] = { 's', 's', 'e', 's' }; static const symbol s_2_4[2] = { 's', 's' }; static const symbol s_2_5[2] = { 'u', 's' }; static const struct among a_2[6] = { /* 0 */ { 3, s_2_0, -1, 2, 0}, /* 1 */ { 1, s_2_1, -1, 3, 0}, /* 2 */ { 3, s_2_2, 1, 2, 0}, /* 3 */ { 4, s_2_3, 1, 1, 0}, /* 4 */ { 2, s_2_4, 1, -1, 0}, /* 5 */ { 2, s_2_5, 1, -1, 0} }; static const symbol s_3_1[2] = { 'b', 'b' }; static const symbol s_3_2[2] = { 'd', 'd' }; static const symbol s_3_3[2] = { 'f', 'f' }; static const symbol s_3_4[2] = { 'g', 'g' }; static const symbol s_3_5[2] = { 'b', 'l' }; static const symbol s_3_6[2] = { 'm', 'm' }; static const symbol s_3_7[2] = { 'n', 'n' }; static const symbol s_3_8[2] = { 'p', 'p' }; static const symbol s_3_9[2] = { 'r', 'r' }; static const symbol s_3_10[2] = { 'a', 't' }; static const symbol s_3_11[2] = { 't', 't' }; static const symbol s_3_12[2] = { 'i', 'z' }; static const struct among a_3[13] = { /* 0 */ { 0, 0, -1, 3, 0}, /* 1 */ { 2, s_3_1, 0, 2, 0}, /* 2 */ { 2, s_3_2, 0, 2, 0}, /* 3 */ { 2, s_3_3, 0, 2, 0}, /* 4 */ { 2, s_3_4, 0, 2, 0}, /* 5 */ { 2, s_3_5, 0, 1, 0}, /* 6 */ { 2, s_3_6, 0, 2, 0}, /* 7 */ { 2, s_3_7, 0, 2, 0}, /* 8 */ { 2, s_3_8, 0, 2, 0}, /* 9 */ { 2, s_3_9, 0, 2, 0}, /* 10 */ { 2, s_3_10, 0, 1, 0}, /* 11 */ { 2, s_3_11, 0, 2, 0}, /* 12 */ { 2, s_3_12, 0, 1, 0} }; static const symbol s_4_0[2] = { 'e', 'd' }; static const symbol s_4_1[3] = { 'e', 'e', 'd' }; static const symbol s_4_2[3] = { 'i', 'n', 'g' }; static const symbol s_4_3[4] = { 'e', 'd', 'l', 'y' }; static const symbol s_4_4[5] = { 'e', 'e', 'd', 'l', 'y' }; static const symbol s_4_5[5] = { 'i', 'n', 'g', 'l', 'y' }; static const struct among a_4[6] = { /* 0 */ { 2, s_4_0, -1, 2, 0}, /* 1 */ { 3, s_4_1, 0, 1, 0}, /* 2 */ { 3, s_4_2, -1, 2, 0}, /* 3 */ { 4, s_4_3, -1, 2, 0}, /* 4 */ { 5, s_4_4, 3, 1, 0}, /* 5 */ { 5, s_4_5, -1, 2, 0} }; static const symbol s_5_0[4] = { 'a', 'n', 'c', 'i' }; static const symbol s_5_1[4] = { 'e', 'n', 'c', 'i' }; static const symbol s_5_2[3] = { 'o', 'g', 'i' }; static const symbol s_5_3[2] = { 'l', 'i' }; static const symbol s_5_4[3] = { 'b', 'l', 'i' }; static const symbol s_5_5[4] = { 'a', 'b', 'l', 'i' }; static const symbol s_5_6[4] = { 'a', 'l', 'l', 'i' }; static const symbol s_5_7[5] = { 'f', 'u', 'l', 'l', 'i' }; static const symbol s_5_8[6] = { 'l', 'e', 's', 's', 'l', 'i' }; static const symbol s_5_9[5] = { 'o', 'u', 's', 'l', 'i' }; static const symbol s_5_10[5] = { 'e', 'n', 't', 'l', 'i' }; static const symbol s_5_11[5] = { 'a', 'l', 'i', 't', 'i' }; static const symbol s_5_12[6] = { 'b', 'i', 'l', 'i', 't', 'i' }; static const symbol s_5_13[5] = { 'i', 'v', 'i', 't', 'i' }; static const symbol s_5_14[6] = { 't', 'i', 'o', 'n', 'a', 'l' }; static const symbol s_5_15[7] = { 'a', 't', 'i', 'o', 'n', 'a', 'l' }; static const symbol s_5_16[5] = { 'a', 'l', 'i', 's', 'm' }; static const symbol s_5_17[5] = { 'a', 't', 'i', 'o', 'n' }; static const symbol s_5_18[7] = { 'i', 'z', 'a', 't', 'i', 'o', 'n' }; static const symbol s_5_19[4] = { 'i', 'z', 'e', 'r' }; static const symbol s_5_20[4] = { 'a', 't', 'o', 'r' }; static const symbol s_5_21[7] = { 'i', 'v', 'e', 'n', 'e', 's', 's' }; static const symbol s_5_22[7] = { 'f', 'u', 'l', 'n', 'e', 's', 's' }; static const symbol s_5_23[7] = { 'o', 'u', 's', 'n', 'e', 's', 's' }; static const struct among a_5[24] = { /* 0 */ { 4, s_5_0, -1, 3, 0}, /* 1 */ { 4, s_5_1, -1, 2, 0}, /* 2 */ { 3, s_5_2, -1, 13, 0}, /* 3 */ { 2, s_5_3, -1, 16, 0}, /* 4 */ { 3, s_5_4, 3, 12, 0}, /* 5 */ { 4, s_5_5, 4, 4, 0}, /* 6 */ { 4, s_5_6, 3, 8, 0}, /* 7 */ { 5, s_5_7, 3, 14, 0}, /* 8 */ { 6, s_5_8, 3, 15, 0}, /* 9 */ { 5, s_5_9, 3, 10, 0}, /* 10 */ { 5, s_5_10, 3, 5, 0}, /* 11 */ { 5, s_5_11, -1, 8, 0}, /* 12 */ { 6, s_5_12, -1, 12, 0}, /* 13 */ { 5, s_5_13, -1, 11, 0}, /* 14 */ { 6, s_5_14, -1, 1, 0}, /* 15 */ { 7, s_5_15, 14, 7, 0}, /* 16 */ { 5, s_5_16, -1, 8, 0}, /* 17 */ { 5, s_5_17, -1, 7, 0}, /* 18 */ { 7, s_5_18, 17, 6, 0}, /* 19 */ { 4, s_5_19, -1, 6, 0}, /* 20 */ { 4, s_5_20, -1, 7, 0}, /* 21 */ { 7, s_5_21, -1, 11, 0}, /* 22 */ { 7, s_5_22, -1, 9, 0}, /* 23 */ { 7, s_5_23, -1, 10, 0} }; static const symbol s_6_0[5] = { 'i', 'c', 'a', 't', 'e' }; static const symbol s_6_1[5] = { 'a', 't', 'i', 'v', 'e' }; static const symbol s_6_2[5] = { 'a', 'l', 'i', 'z', 'e' }; static const symbol s_6_3[5] = { 'i', 'c', 'i', 't', 'i' }; static const symbol s_6_4[4] = { 'i', 'c', 'a', 'l' }; static const symbol s_6_5[6] = { 't', 'i', 'o', 'n', 'a', 'l' }; static const symbol s_6_6[7] = { 'a', 't', 'i', 'o', 'n', 'a', 'l' }; static const symbol s_6_7[3] = { 'f', 'u', 'l' }; static const symbol s_6_8[4] = { 'n', 'e', 's', 's' }; static const struct among a_6[9] = { /* 0 */ { 5, s_6_0, -1, 4, 0}, /* 1 */ { 5, s_6_1, -1, 6, 0}, /* 2 */ { 5, s_6_2, -1, 3, 0}, /* 3 */ { 5, s_6_3, -1, 4, 0}, /* 4 */ { 4, s_6_4, -1, 4, 0}, /* 5 */ { 6, s_6_5, -1, 1, 0}, /* 6 */ { 7, s_6_6, 5, 2, 0}, /* 7 */ { 3, s_6_7, -1, 5, 0}, /* 8 */ { 4, s_6_8, -1, 5, 0} }; static const symbol s_7_0[2] = { 'i', 'c' }; static const symbol s_7_1[4] = { 'a', 'n', 'c', 'e' }; static const symbol s_7_2[4] = { 'e', 'n', 'c', 'e' }; static const symbol s_7_3[4] = { 'a', 'b', 'l', 'e' }; static const symbol s_7_4[4] = { 'i', 'b', 'l', 'e' }; static const symbol s_7_5[3] = { 'a', 't', 'e' }; static const symbol s_7_6[3] = { 'i', 'v', 'e' }; static const symbol s_7_7[3] = { 'i', 'z', 'e' }; static const symbol s_7_8[3] = { 'i', 't', 'i' }; static const symbol s_7_9[2] = { 'a', 'l' }; static const symbol s_7_10[3] = { 'i', 's', 'm' }; static const symbol s_7_11[3] = { 'i', 'o', 'n' }; static const symbol s_7_12[2] = { 'e', 'r' }; static const symbol s_7_13[3] = { 'o', 'u', 's' }; static const symbol s_7_14[3] = { 'a', 'n', 't' }; static const symbol s_7_15[3] = { 'e', 'n', 't' }; static const symbol s_7_16[4] = { 'm', 'e', 'n', 't' }; static const symbol s_7_17[5] = { 'e', 'm', 'e', 'n', 't' }; static const struct among a_7[18] = { /* 0 */ { 2, s_7_0, -1, 1, 0}, /* 1 */ { 4, s_7_1, -1, 1, 0}, /* 2 */ { 4, s_7_2, -1, 1, 0}, /* 3 */ { 4, s_7_3, -1, 1, 0}, /* 4 */ { 4, s_7_4, -1, 1, 0}, /* 5 */ { 3, s_7_5, -1, 1, 0}, /* 6 */ { 3, s_7_6, -1, 1, 0}, /* 7 */ { 3, s_7_7, -1, 1, 0}, /* 8 */ { 3, s_7_8, -1, 1, 0}, /* 9 */ { 2, s_7_9, -1, 1, 0}, /* 10 */ { 3, s_7_10, -1, 1, 0}, /* 11 */ { 3, s_7_11, -1, 2, 0}, /* 12 */ { 2, s_7_12, -1, 1, 0}, /* 13 */ { 3, s_7_13, -1, 1, 0}, /* 14 */ { 3, s_7_14, -1, 1, 0}, /* 15 */ { 3, s_7_15, -1, 1, 0}, /* 16 */ { 4, s_7_16, 15, 1, 0}, /* 17 */ { 5, s_7_17, 16, 1, 0} }; static const symbol s_8_0[1] = { 'e' }; static const symbol s_8_1[1] = { 'l' }; static const struct among a_8[2] = { /* 0 */ { 1, s_8_0, -1, 1, 0}, /* 1 */ { 1, s_8_1, -1, 2, 0} }; static const symbol s_9_0[7] = { 's', 'u', 'c', 'c', 'e', 'e', 'd' }; static const symbol s_9_1[7] = { 'p', 'r', 'o', 'c', 'e', 'e', 'd' }; static const symbol s_9_2[6] = { 'e', 'x', 'c', 'e', 'e', 'd' }; static const symbol s_9_3[7] = { 'c', 'a', 'n', 'n', 'i', 'n', 'g' }; static const symbol s_9_4[6] = { 'i', 'n', 'n', 'i', 'n', 'g' }; static const symbol s_9_5[7] = { 'e', 'a', 'r', 'r', 'i', 'n', 'g' }; static const symbol s_9_6[7] = { 'h', 'e', 'r', 'r', 'i', 'n', 'g' }; static const symbol s_9_7[6] = { 'o', 'u', 't', 'i', 'n', 'g' }; static const struct among a_9[8] = { /* 0 */ { 7, s_9_0, -1, -1, 0}, /* 1 */ { 7, s_9_1, -1, -1, 0}, /* 2 */ { 6, s_9_2, -1, -1, 0}, /* 3 */ { 7, s_9_3, -1, -1, 0}, /* 4 */ { 6, s_9_4, -1, -1, 0}, /* 5 */ { 7, s_9_5, -1, -1, 0}, /* 6 */ { 7, s_9_6, -1, -1, 0}, /* 7 */ { 6, s_9_7, -1, -1, 0} }; static const symbol s_10_0[5] = { 'a', 'n', 'd', 'e', 's' }; static const symbol s_10_1[5] = { 'a', 't', 'l', 'a', 's' }; static const symbol s_10_2[4] = { 'b', 'i', 'a', 's' }; static const symbol s_10_3[6] = { 'c', 'o', 's', 'm', 'o', 's' }; static const symbol s_10_4[5] = { 'd', 'y', 'i', 'n', 'g' }; static const symbol s_10_5[5] = { 'e', 'a', 'r', 'l', 'y' }; static const symbol s_10_6[6] = { 'g', 'e', 'n', 't', 'l', 'y' }; static const symbol s_10_7[4] = { 'h', 'o', 'w', 'e' }; static const symbol s_10_8[4] = { 'i', 'd', 'l', 'y' }; static const symbol s_10_9[5] = { 'l', 'y', 'i', 'n', 'g' }; static const symbol s_10_10[4] = { 'n', 'e', 'w', 's' }; static const symbol s_10_11[4] = { 'o', 'n', 'l', 'y' }; static const symbol s_10_12[6] = { 's', 'i', 'n', 'g', 'l', 'y' }; static const symbol s_10_13[5] = { 's', 'k', 'i', 'e', 's' }; static const symbol s_10_14[4] = { 's', 'k', 'i', 's' }; static const symbol s_10_15[3] = { 's', 'k', 'y' }; static const symbol s_10_16[5] = { 't', 'y', 'i', 'n', 'g' }; static const symbol s_10_17[4] = { 'u', 'g', 'l', 'y' }; static const struct among a_10[18] = { /* 0 */ { 5, s_10_0, -1, -1, 0}, /* 1 */ { 5, s_10_1, -1, -1, 0}, /* 2 */ { 4, s_10_2, -1, -1, 0}, /* 3 */ { 6, s_10_3, -1, -1, 0}, /* 4 */ { 5, s_10_4, -1, 3, 0}, /* 5 */ { 5, s_10_5, -1, 9, 0}, /* 6 */ { 6, s_10_6, -1, 7, 0}, /* 7 */ { 4, s_10_7, -1, -1, 0}, /* 8 */ { 4, s_10_8, -1, 6, 0}, /* 9 */ { 5, s_10_9, -1, 4, 0}, /* 10 */ { 4, s_10_10, -1, -1, 0}, /* 11 */ { 4, s_10_11, -1, 10, 0}, /* 12 */ { 6, s_10_12, -1, 11, 0}, /* 13 */ { 5, s_10_13, -1, 2, 0}, /* 14 */ { 4, s_10_14, -1, 1, 0}, /* 15 */ { 3, s_10_15, -1, -1, 0}, /* 16 */ { 5, s_10_16, -1, 5, 0}, /* 17 */ { 4, s_10_17, -1, 8, 0} }; static const unsigned char g_v[] = { 17, 65, 16, 1 }; static const unsigned char g_v_WXY[] = { 1, 17, 65, 208, 1 }; static const unsigned char g_valid_LI[] = { 55, 141, 2 }; static const symbol s_0[] = { '\'' }; static const symbol s_1[] = { 'y' }; static const symbol s_2[] = { 'Y' }; static const symbol s_3[] = { 'y' }; static const symbol s_4[] = { 'Y' }; static const symbol s_5[] = { 's', 's' }; static const symbol s_6[] = { 'i' }; static const symbol s_7[] = { 'i', 'e' }; static const symbol s_8[] = { 'e', 'e' }; static const symbol s_9[] = { 'e' }; static const symbol s_10[] = { 'e' }; static const symbol s_11[] = { 'y' }; static const symbol s_12[] = { 'Y' }; static const symbol s_13[] = { 'i' }; static const symbol s_14[] = { 't', 'i', 'o', 'n' }; static const symbol s_15[] = { 'e', 'n', 'c', 'e' }; static const symbol s_16[] = { 'a', 'n', 'c', 'e' }; static const symbol s_17[] = { 'a', 'b', 'l', 'e' }; static const symbol s_18[] = { 'e', 'n', 't' }; static const symbol s_19[] = { 'i', 'z', 'e' }; static const symbol s_20[] = { 'a', 't', 'e' }; static const symbol s_21[] = { 'a', 'l' }; static const symbol s_22[] = { 'f', 'u', 'l' }; static const symbol s_23[] = { 'o', 'u', 's' }; static const symbol s_24[] = { 'i', 'v', 'e' }; static const symbol s_25[] = { 'b', 'l', 'e' }; static const symbol s_26[] = { 'l' }; static const symbol s_27[] = { 'o', 'g' }; static const symbol s_28[] = { 'f', 'u', 'l' }; static const symbol s_29[] = { 'l', 'e', 's', 's' }; static const symbol s_30[] = { 't', 'i', 'o', 'n' }; static const symbol s_31[] = { 'a', 't', 'e' }; static const symbol s_32[] = { 'a', 'l' }; static const symbol s_33[] = { 'i', 'c' }; static const symbol s_34[] = { 's' }; static const symbol s_35[] = { 't' }; static const symbol s_36[] = { 'l' }; static const symbol s_37[] = { 's', 'k', 'i' }; static const symbol s_38[] = { 's', 'k', 'y' }; static const symbol s_39[] = { 'd', 'i', 'e' }; static const symbol s_40[] = { 'l', 'i', 'e' }; static const symbol s_41[] = { 't', 'i', 'e' }; static const symbol s_42[] = { 'i', 'd', 'l' }; static const symbol s_43[] = { 'g', 'e', 'n', 't', 'l' }; static const symbol s_44[] = { 'u', 'g', 'l', 'i' }; static const symbol s_45[] = { 'e', 'a', 'r', 'l', 'i' }; static const symbol s_46[] = { 'o', 'n', 'l', 'i' }; static const symbol s_47[] = { 's', 'i', 'n', 'g', 'l' }; static const symbol s_48[] = { 'Y' }; static const symbol s_49[] = { 'y' }; static int r_prelude(struct SN_env * z) { z->B[0] = 0; /* unset Y_found, line 26 */ { int c1 = z->c; /* do, line 27 */ z->bra = z->c; /* [, line 27 */ if (!(eq_s(z, 1, s_0))) goto lab0; z->ket = z->c; /* ], line 27 */ { int ret = slice_del(z); /* delete, line 27 */ if (ret < 0) return ret; } lab0: z->c = c1; } { int c2 = z->c; /* do, line 28 */ z->bra = z->c; /* [, line 28 */ if (!(eq_s(z, 1, s_1))) goto lab1; z->ket = z->c; /* ], line 28 */ { int ret = slice_from_s(z, 1, s_2); /* <-, line 28 */ if (ret < 0) return ret; } z->B[0] = 1; /* set Y_found, line 28 */ lab1: z->c = c2; } { int c3 = z->c; /* do, line 29 */ while(1) { /* repeat, line 29 */ int c4 = z->c; while(1) { /* goto, line 29 */ int c5 = z->c; if (in_grouping(z, g_v, 97, 121, 0)) goto lab4; z->bra = z->c; /* [, line 29 */ if (!(eq_s(z, 1, s_3))) goto lab4; z->ket = z->c; /* ], line 29 */ z->c = c5; break; lab4: z->c = c5; if (z->c >= z->l) goto lab3; z->c++; /* goto, line 29 */ } { int ret = slice_from_s(z, 1, s_4); /* <-, line 29 */ if (ret < 0) return ret; } z->B[0] = 1; /* set Y_found, line 29 */ continue; lab3: z->c = c4; break; } z->c = c3; } return 1; } static int r_mark_regions(struct SN_env * z) { z->I[0] = z->l; z->I[1] = z->l; { int c1 = z->c; /* do, line 35 */ { int c2 = z->c; /* or, line 41 */ if (z->c + 4 >= z->l || z->p[z->c + 4] >> 5 != 3 || !((2375680 >> (z->p[z->c + 4] & 0x1f)) & 1)) goto lab2; if (!(find_among(z, a_0, 3))) goto lab2; /* among, line 36 */ goto lab1; lab2: z->c = c2; { /* gopast */ /* grouping v, line 41 */ int ret = out_grouping(z, g_v, 97, 121, 1); if (ret < 0) goto lab0; z->c += ret; } { /* gopast */ /* non v, line 41 */ int ret = in_grouping(z, g_v, 97, 121, 1); if (ret < 0) goto lab0; z->c += ret; } } lab1: z->I[0] = z->c; /* setmark p1, line 42 */ { /* gopast */ /* grouping v, line 43 */ int ret = out_grouping(z, g_v, 97, 121, 1); if (ret < 0) goto lab0; z->c += ret; } { /* gopast */ /* non v, line 43 */ int ret = in_grouping(z, g_v, 97, 121, 1); if (ret < 0) goto lab0; z->c += ret; } z->I[1] = z->c; /* setmark p2, line 43 */ lab0: z->c = c1; } return 1; } static int r_shortv(struct SN_env * z) { { int m1 = z->l - z->c; (void)m1; /* or, line 51 */ if (out_grouping_b(z, g_v_WXY, 89, 121, 0)) goto lab1; if (in_grouping_b(z, g_v, 97, 121, 0)) goto lab1; if (out_grouping_b(z, g_v, 97, 121, 0)) goto lab1; goto lab0; lab1: z->c = z->l - m1; if (out_grouping_b(z, g_v, 97, 121, 0)) return 0; if (in_grouping_b(z, g_v, 97, 121, 0)) return 0; if (z->c > z->lb) return 0; /* atlimit, line 52 */ } lab0: return 1; } static int r_R1(struct SN_env * z) { if (!(z->I[0] <= z->c)) return 0; return 1; } static int r_R2(struct SN_env * z) { if (!(z->I[1] <= z->c)) return 0; return 1; } static int r_Step_1a(struct SN_env * z) { int among_var; { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 59 */ z->ket = z->c; /* [, line 60 */ if (z->c <= z->lb || (z->p[z->c - 1] != 39 && z->p[z->c - 1] != 115)) { z->c = z->l - m_keep; goto lab0; } among_var = find_among_b(z, a_1, 3); /* substring, line 60 */ if (!(among_var)) { z->c = z->l - m_keep; goto lab0; } z->bra = z->c; /* ], line 60 */ switch(among_var) { case 0: { z->c = z->l - m_keep; goto lab0; } case 1: { int ret = slice_del(z); /* delete, line 62 */ if (ret < 0) return ret; } break; } lab0: ; } z->ket = z->c; /* [, line 65 */ if (z->c <= z->lb || (z->p[z->c - 1] != 100 && z->p[z->c - 1] != 115)) return 0; among_var = find_among_b(z, a_2, 6); /* substring, line 65 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 65 */ switch(among_var) { case 0: return 0; case 1: { int ret = slice_from_s(z, 2, s_5); /* <-, line 66 */ if (ret < 0) return ret; } break; case 2: { int m1 = z->l - z->c; (void)m1; /* or, line 68 */ { int ret = z->c - 2; if (z->lb > ret || ret > z->l) goto lab2; z->c = ret; /* hop, line 68 */ } { int ret = slice_from_s(z, 1, s_6); /* <-, line 68 */ if (ret < 0) return ret; } goto lab1; lab2: z->c = z->l - m1; { int ret = slice_from_s(z, 2, s_7); /* <-, line 68 */ if (ret < 0) return ret; } } lab1: break; case 3: if (z->c <= z->lb) return 0; z->c--; /* next, line 69 */ { /* gopast */ /* grouping v, line 69 */ int ret = out_grouping_b(z, g_v, 97, 121, 1); if (ret < 0) return 0; z->c -= ret; } { int ret = slice_del(z); /* delete, line 69 */ if (ret < 0) return ret; } break; } return 1; } static int r_Step_1b(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 75 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((33554576 >> (z->p[z->c - 1] & 0x1f)) & 1)) return 0; among_var = find_among_b(z, a_4, 6); /* substring, line 75 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 75 */ switch(among_var) { case 0: return 0; case 1: { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 77 */ if (ret < 0) return ret; } { int ret = slice_from_s(z, 2, s_8); /* <-, line 77 */ if (ret < 0) return ret; } break; case 2: { int m_test = z->l - z->c; /* test, line 80 */ { /* gopast */ /* grouping v, line 80 */ int ret = out_grouping_b(z, g_v, 97, 121, 1); if (ret < 0) return 0; z->c -= ret; } z->c = z->l - m_test; } { int ret = slice_del(z); /* delete, line 80 */ if (ret < 0) return ret; } { int m_test = z->l - z->c; /* test, line 81 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((68514004 >> (z->p[z->c - 1] & 0x1f)) & 1)) among_var = 3; else among_var = find_among_b(z, a_3, 13); /* substring, line 81 */ if (!(among_var)) return 0; z->c = z->l - m_test; } switch(among_var) { case 0: return 0; case 1: { int c_keep = z->c; int ret = insert_s(z, z->c, z->c, 1, s_9); /* <+, line 83 */ z->c = c_keep; if (ret < 0) return ret; } break; case 2: z->ket = z->c; /* [, line 86 */ if (z->c <= z->lb) return 0; z->c--; /* next, line 86 */ z->bra = z->c; /* ], line 86 */ { int ret = slice_del(z); /* delete, line 86 */ if (ret < 0) return ret; } break; case 3: if (z->c != z->I[0]) return 0; /* atmark, line 87 */ { int m_test = z->l - z->c; /* test, line 87 */ { int ret = r_shortv(z); if (ret == 0) return 0; /* call shortv, line 87 */ if (ret < 0) return ret; } z->c = z->l - m_test; } { int c_keep = z->c; int ret = insert_s(z, z->c, z->c, 1, s_10); /* <+, line 87 */ z->c = c_keep; if (ret < 0) return ret; } break; } break; } return 1; } static int r_Step_1c(struct SN_env * z) { z->ket = z->c; /* [, line 94 */ { int m1 = z->l - z->c; (void)m1; /* or, line 94 */ if (!(eq_s_b(z, 1, s_11))) goto lab1; goto lab0; lab1: z->c = z->l - m1; if (!(eq_s_b(z, 1, s_12))) return 0; } lab0: z->bra = z->c; /* ], line 94 */ if (out_grouping_b(z, g_v, 97, 121, 0)) return 0; { int m2 = z->l - z->c; (void)m2; /* not, line 95 */ if (z->c > z->lb) goto lab2; /* atlimit, line 95 */ return 0; lab2: z->c = z->l - m2; } { int ret = slice_from_s(z, 1, s_13); /* <-, line 96 */ if (ret < 0) return ret; } return 1; } static int r_Step_2(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 100 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((815616 >> (z->p[z->c - 1] & 0x1f)) & 1)) return 0; among_var = find_among_b(z, a_5, 24); /* substring, line 100 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 100 */ { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 100 */ if (ret < 0) return ret; } switch(among_var) { case 0: return 0; case 1: { int ret = slice_from_s(z, 4, s_14); /* <-, line 101 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 4, s_15); /* <-, line 102 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_from_s(z, 4, s_16); /* <-, line 103 */ if (ret < 0) return ret; } break; case 4: { int ret = slice_from_s(z, 4, s_17); /* <-, line 104 */ if (ret < 0) return ret; } break; case 5: { int ret = slice_from_s(z, 3, s_18); /* <-, line 105 */ if (ret < 0) return ret; } break; case 6: { int ret = slice_from_s(z, 3, s_19); /* <-, line 107 */ if (ret < 0) return ret; } break; case 7: { int ret = slice_from_s(z, 3, s_20); /* <-, line 109 */ if (ret < 0) return ret; } break; case 8: { int ret = slice_from_s(z, 2, s_21); /* <-, line 111 */ if (ret < 0) return ret; } break; case 9: { int ret = slice_from_s(z, 3, s_22); /* <-, line 112 */ if (ret < 0) return ret; } break; case 10: { int ret = slice_from_s(z, 3, s_23); /* <-, line 114 */ if (ret < 0) return ret; } break; case 11: { int ret = slice_from_s(z, 3, s_24); /* <-, line 116 */ if (ret < 0) return ret; } break; case 12: { int ret = slice_from_s(z, 3, s_25); /* <-, line 118 */ if (ret < 0) return ret; } break; case 13: if (!(eq_s_b(z, 1, s_26))) return 0; { int ret = slice_from_s(z, 2, s_27); /* <-, line 119 */ if (ret < 0) return ret; } break; case 14: { int ret = slice_from_s(z, 3, s_28); /* <-, line 120 */ if (ret < 0) return ret; } break; case 15: { int ret = slice_from_s(z, 4, s_29); /* <-, line 121 */ if (ret < 0) return ret; } break; case 16: if (in_grouping_b(z, g_valid_LI, 99, 116, 0)) return 0; { int ret = slice_del(z); /* delete, line 122 */ if (ret < 0) return ret; } break; } return 1; } static int r_Step_3(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 127 */ if (z->c - 2 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((528928 >> (z->p[z->c - 1] & 0x1f)) & 1)) return 0; among_var = find_among_b(z, a_6, 9); /* substring, line 127 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 127 */ { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 127 */ if (ret < 0) return ret; } switch(among_var) { case 0: return 0; case 1: { int ret = slice_from_s(z, 4, s_30); /* <-, line 128 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 3, s_31); /* <-, line 129 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_from_s(z, 2, s_32); /* <-, line 130 */ if (ret < 0) return ret; } break; case 4: { int ret = slice_from_s(z, 2, s_33); /* <-, line 132 */ if (ret < 0) return ret; } break; case 5: { int ret = slice_del(z); /* delete, line 134 */ if (ret < 0) return ret; } break; case 6: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 136 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 136 */ if (ret < 0) return ret; } break; } return 1; } static int r_Step_4(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 141 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((1864232 >> (z->p[z->c - 1] & 0x1f)) & 1)) return 0; among_var = find_among_b(z, a_7, 18); /* substring, line 141 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 141 */ { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 141 */ if (ret < 0) return ret; } switch(among_var) { case 0: return 0; case 1: { int ret = slice_del(z); /* delete, line 144 */ if (ret < 0) return ret; } break; case 2: { int m1 = z->l - z->c; (void)m1; /* or, line 145 */ if (!(eq_s_b(z, 1, s_34))) goto lab1; goto lab0; lab1: z->c = z->l - m1; if (!(eq_s_b(z, 1, s_35))) return 0; } lab0: { int ret = slice_del(z); /* delete, line 145 */ if (ret < 0) return ret; } break; } return 1; } static int r_Step_5(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 150 */ if (z->c <= z->lb || (z->p[z->c - 1] != 101 && z->p[z->c - 1] != 108)) return 0; among_var = find_among_b(z, a_8, 2); /* substring, line 150 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 150 */ switch(among_var) { case 0: return 0; case 1: { int m1 = z->l - z->c; (void)m1; /* or, line 151 */ { int ret = r_R2(z); if (ret == 0) goto lab1; /* call R2, line 151 */ if (ret < 0) return ret; } goto lab0; lab1: z->c = z->l - m1; { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 151 */ if (ret < 0) return ret; } { int m2 = z->l - z->c; (void)m2; /* not, line 151 */ { int ret = r_shortv(z); if (ret == 0) goto lab2; /* call shortv, line 151 */ if (ret < 0) return ret; } return 0; lab2: z->c = z->l - m2; } } lab0: { int ret = slice_del(z); /* delete, line 151 */ if (ret < 0) return ret; } break; case 2: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 152 */ if (ret < 0) return ret; } if (!(eq_s_b(z, 1, s_36))) return 0; { int ret = slice_del(z); /* delete, line 152 */ if (ret < 0) return ret; } break; } return 1; } static int r_exception2(struct SN_env * z) { z->ket = z->c; /* [, line 158 */ if (z->c - 5 <= z->lb || (z->p[z->c - 1] != 100 && z->p[z->c - 1] != 103)) return 0; if (!(find_among_b(z, a_9, 8))) return 0; /* substring, line 158 */ z->bra = z->c; /* ], line 158 */ if (z->c > z->lb) return 0; /* atlimit, line 158 */ return 1; } static int r_exception1(struct SN_env * z) { int among_var; z->bra = z->c; /* [, line 170 */ if (z->c + 2 >= z->l || z->p[z->c + 2] >> 5 != 3 || !((42750482 >> (z->p[z->c + 2] & 0x1f)) & 1)) return 0; among_var = find_among(z, a_10, 18); /* substring, line 170 */ if (!(among_var)) return 0; z->ket = z->c; /* ], line 170 */ if (z->c < z->l) return 0; /* atlimit, line 170 */ switch(among_var) { case 0: return 0; case 1: { int ret = slice_from_s(z, 3, s_37); /* <-, line 174 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 3, s_38); /* <-, line 175 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_from_s(z, 3, s_39); /* <-, line 176 */ if (ret < 0) return ret; } break; case 4: { int ret = slice_from_s(z, 3, s_40); /* <-, line 177 */ if (ret < 0) return ret; } break; case 5: { int ret = slice_from_s(z, 3, s_41); /* <-, line 178 */ if (ret < 0) return ret; } break; case 6: { int ret = slice_from_s(z, 3, s_42); /* <-, line 182 */ if (ret < 0) return ret; } break; case 7: { int ret = slice_from_s(z, 5, s_43); /* <-, line 183 */ if (ret < 0) return ret; } break; case 8: { int ret = slice_from_s(z, 4, s_44); /* <-, line 184 */ if (ret < 0) return ret; } break; case 9: { int ret = slice_from_s(z, 5, s_45); /* <-, line 185 */ if (ret < 0) return ret; } break; case 10: { int ret = slice_from_s(z, 4, s_46); /* <-, line 186 */ if (ret < 0) return ret; } break; case 11: { int ret = slice_from_s(z, 5, s_47); /* <-, line 187 */ if (ret < 0) return ret; } break; } return 1; } static int r_postlude(struct SN_env * z) { if (!(z->B[0])) return 0; /* Boolean test Y_found, line 203 */ while(1) { /* repeat, line 203 */ int c1 = z->c; while(1) { /* goto, line 203 */ int c2 = z->c; z->bra = z->c; /* [, line 203 */ if (!(eq_s(z, 1, s_48))) goto lab1; z->ket = z->c; /* ], line 203 */ z->c = c2; break; lab1: z->c = c2; if (z->c >= z->l) goto lab0; z->c++; /* goto, line 203 */ } { int ret = slice_from_s(z, 1, s_49); /* <-, line 203 */ if (ret < 0) return ret; } continue; lab0: z->c = c1; break; } return 1; } extern int english_ISO_8859_1_stem(struct SN_env * z) { { int c1 = z->c; /* or, line 207 */ { int ret = r_exception1(z); if (ret == 0) goto lab1; /* call exception1, line 207 */ if (ret < 0) return ret; } goto lab0; lab1: z->c = c1; { int c2 = z->c; /* not, line 208 */ { int ret = z->c + 3; if (0 > ret || ret > z->l) goto lab3; z->c = ret; /* hop, line 208 */ } goto lab2; lab3: z->c = c2; } goto lab0; lab2: z->c = c1; { int c3 = z->c; /* do, line 209 */ { int ret = r_prelude(z); if (ret == 0) goto lab4; /* call prelude, line 209 */ if (ret < 0) return ret; } lab4: z->c = c3; } { int c4 = z->c; /* do, line 210 */ { int ret = r_mark_regions(z); if (ret == 0) goto lab5; /* call mark_regions, line 210 */ if (ret < 0) return ret; } lab5: z->c = c4; } z->lb = z->c; z->c = z->l; /* backwards, line 211 */ { int m5 = z->l - z->c; (void)m5; /* do, line 213 */ { int ret = r_Step_1a(z); if (ret == 0) goto lab6; /* call Step_1a, line 213 */ if (ret < 0) return ret; } lab6: z->c = z->l - m5; } { int m6 = z->l - z->c; (void)m6; /* or, line 215 */ { int ret = r_exception2(z); if (ret == 0) goto lab8; /* call exception2, line 215 */ if (ret < 0) return ret; } goto lab7; lab8: z->c = z->l - m6; { int m7 = z->l - z->c; (void)m7; /* do, line 217 */ { int ret = r_Step_1b(z); if (ret == 0) goto lab9; /* call Step_1b, line 217 */ if (ret < 0) return ret; } lab9: z->c = z->l - m7; } { int m8 = z->l - z->c; (void)m8; /* do, line 218 */ { int ret = r_Step_1c(z); if (ret == 0) goto lab10; /* call Step_1c, line 218 */ if (ret < 0) return ret; } lab10: z->c = z->l - m8; } { int m9 = z->l - z->c; (void)m9; /* do, line 220 */ { int ret = r_Step_2(z); if (ret == 0) goto lab11; /* call Step_2, line 220 */ if (ret < 0) return ret; } lab11: z->c = z->l - m9; } { int m10 = z->l - z->c; (void)m10; /* do, line 221 */ { int ret = r_Step_3(z); if (ret == 0) goto lab12; /* call Step_3, line 221 */ if (ret < 0) return ret; } lab12: z->c = z->l - m10; } { int m11 = z->l - z->c; (void)m11; /* do, line 222 */ { int ret = r_Step_4(z); if (ret == 0) goto lab13; /* call Step_4, line 222 */ if (ret < 0) return ret; } lab13: z->c = z->l - m11; } { int m12 = z->l - z->c; (void)m12; /* do, line 224 */ { int ret = r_Step_5(z); if (ret == 0) goto lab14; /* call Step_5, line 224 */ if (ret < 0) return ret; } lab14: z->c = z->l - m12; } } lab7: z->c = z->lb; { int c13 = z->c; /* do, line 227 */ { int ret = r_postlude(z); if (ret == 0) goto lab15; /* call postlude, line 227 */ if (ret < 0) return ret; } lab15: z->c = c13; } } lab0: return 1; } extern struct SN_env * english_ISO_8859_1_create_env(void) { return SN_create_env(0, 2, 1); } extern void english_ISO_8859_1_close_env(struct SN_env * z) { SN_close_env(z, 0); } swish-e-2.4.7/src/snowball/stem_fi.h0000664000077100017500000000051011166010110014216 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #ifdef __cplusplus extern "C" { #endif extern struct SN_env * finnish_ISO_8859_1_create_env(void); extern void finnish_ISO_8859_1_close_env(struct SN_env * z); extern int finnish_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif swish-e-2.4.7/src/snowball/stem_nl.c0000664000077100017500000005046611166010110014243 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #include "header.h" #ifdef __cplusplus extern "C" { #endif extern int dutch_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif static int r_standard_suffix(struct SN_env * z); static int r_undouble(struct SN_env * z); static int r_R2(struct SN_env * z); static int r_R1(struct SN_env * z); static int r_mark_regions(struct SN_env * z); static int r_en_ending(struct SN_env * z); static int r_e_ending(struct SN_env * z); static int r_postlude(struct SN_env * z); static int r_prelude(struct SN_env * z); #ifdef __cplusplus extern "C" { #endif extern struct SN_env * dutch_ISO_8859_1_create_env(void); extern void dutch_ISO_8859_1_close_env(struct SN_env * z); #ifdef __cplusplus } #endif static const symbol s_0_1[1] = { 0xE1 }; static const symbol s_0_2[1] = { 0xE4 }; static const symbol s_0_3[1] = { 0xE9 }; static const symbol s_0_4[1] = { 0xEB }; static const symbol s_0_5[1] = { 0xED }; static const symbol s_0_6[1] = { 0xEF }; static const symbol s_0_7[1] = { 0xF3 }; static const symbol s_0_8[1] = { 0xF6 }; static const symbol s_0_9[1] = { 0xFA }; static const symbol s_0_10[1] = { 0xFC }; static const struct among a_0[11] = { /* 0 */ { 0, 0, -1, 6, 0}, /* 1 */ { 1, s_0_1, 0, 1, 0}, /* 2 */ { 1, s_0_2, 0, 1, 0}, /* 3 */ { 1, s_0_3, 0, 2, 0}, /* 4 */ { 1, s_0_4, 0, 2, 0}, /* 5 */ { 1, s_0_5, 0, 3, 0}, /* 6 */ { 1, s_0_6, 0, 3, 0}, /* 7 */ { 1, s_0_7, 0, 4, 0}, /* 8 */ { 1, s_0_8, 0, 4, 0}, /* 9 */ { 1, s_0_9, 0, 5, 0}, /* 10 */ { 1, s_0_10, 0, 5, 0} }; static const symbol s_1_1[1] = { 'I' }; static const symbol s_1_2[1] = { 'Y' }; static const struct among a_1[3] = { /* 0 */ { 0, 0, -1, 3, 0}, /* 1 */ { 1, s_1_1, 0, 2, 0}, /* 2 */ { 1, s_1_2, 0, 1, 0} }; static const symbol s_2_0[2] = { 'd', 'd' }; static const symbol s_2_1[2] = { 'k', 'k' }; static const symbol s_2_2[2] = { 't', 't' }; static const struct among a_2[3] = { /* 0 */ { 2, s_2_0, -1, -1, 0}, /* 1 */ { 2, s_2_1, -1, -1, 0}, /* 2 */ { 2, s_2_2, -1, -1, 0} }; static const symbol s_3_0[3] = { 'e', 'n', 'e' }; static const symbol s_3_1[2] = { 's', 'e' }; static const symbol s_3_2[2] = { 'e', 'n' }; static const symbol s_3_3[5] = { 'h', 'e', 'd', 'e', 'n' }; static const symbol s_3_4[1] = { 's' }; static const struct among a_3[5] = { /* 0 */ { 3, s_3_0, -1, 2, 0}, /* 1 */ { 2, s_3_1, -1, 3, 0}, /* 2 */ { 2, s_3_2, -1, 2, 0}, /* 3 */ { 5, s_3_3, 2, 1, 0}, /* 4 */ { 1, s_3_4, -1, 3, 0} }; static const symbol s_4_0[3] = { 'e', 'n', 'd' }; static const symbol s_4_1[2] = { 'i', 'g' }; static const symbol s_4_2[3] = { 'i', 'n', 'g' }; static const symbol s_4_3[4] = { 'l', 'i', 'j', 'k' }; static const symbol s_4_4[4] = { 'b', 'a', 'a', 'r' }; static const symbol s_4_5[3] = { 'b', 'a', 'r' }; static const struct among a_4[6] = { /* 0 */ { 3, s_4_0, -1, 1, 0}, /* 1 */ { 2, s_4_1, -1, 2, 0}, /* 2 */ { 3, s_4_2, -1, 1, 0}, /* 3 */ { 4, s_4_3, -1, 3, 0}, /* 4 */ { 4, s_4_4, -1, 4, 0}, /* 5 */ { 3, s_4_5, -1, 5, 0} }; static const symbol s_5_0[2] = { 'a', 'a' }; static const symbol s_5_1[2] = { 'e', 'e' }; static const symbol s_5_2[2] = { 'o', 'o' }; static const symbol s_5_3[2] = { 'u', 'u' }; static const struct among a_5[4] = { /* 0 */ { 2, s_5_0, -1, -1, 0}, /* 1 */ { 2, s_5_1, -1, -1, 0}, /* 2 */ { 2, s_5_2, -1, -1, 0}, /* 3 */ { 2, s_5_3, -1, -1, 0} }; static const unsigned char g_v[] = { 17, 65, 16, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 128 }; static const unsigned char g_v_I[] = { 1, 0, 0, 17, 65, 16, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 128 }; static const unsigned char g_v_j[] = { 17, 67, 16, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 128 }; static const symbol s_0[] = { 'a' }; static const symbol s_1[] = { 'e' }; static const symbol s_2[] = { 'i' }; static const symbol s_3[] = { 'o' }; static const symbol s_4[] = { 'u' }; static const symbol s_5[] = { 'y' }; static const symbol s_6[] = { 'Y' }; static const symbol s_7[] = { 'i' }; static const symbol s_8[] = { 'I' }; static const symbol s_9[] = { 'y' }; static const symbol s_10[] = { 'Y' }; static const symbol s_11[] = { 'y' }; static const symbol s_12[] = { 'i' }; static const symbol s_13[] = { 'e' }; static const symbol s_14[] = { 'g', 'e', 'm' }; static const symbol s_15[] = { 'h', 'e', 'i', 'd' }; static const symbol s_16[] = { 'h', 'e', 'i', 'd' }; static const symbol s_17[] = { 'c' }; static const symbol s_18[] = { 'e', 'n' }; static const symbol s_19[] = { 'i', 'g' }; static const symbol s_20[] = { 'e' }; static const symbol s_21[] = { 'e' }; static int r_prelude(struct SN_env * z) { int among_var; { int c_test = z->c; /* test, line 42 */ while(1) { /* repeat, line 42 */ int c1 = z->c; z->bra = z->c; /* [, line 43 */ if (z->c >= z->l || z->p[z->c + 0] >> 5 != 7 || !((340306450 >> (z->p[z->c + 0] & 0x1f)) & 1)) among_var = 6; else among_var = find_among(z, a_0, 11); /* substring, line 43 */ if (!(among_var)) goto lab0; z->ket = z->c; /* ], line 43 */ switch(among_var) { case 0: goto lab0; case 1: { int ret = slice_from_s(z, 1, s_0); /* <-, line 45 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 1, s_1); /* <-, line 47 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_from_s(z, 1, s_2); /* <-, line 49 */ if (ret < 0) return ret; } break; case 4: { int ret = slice_from_s(z, 1, s_3); /* <-, line 51 */ if (ret < 0) return ret; } break; case 5: { int ret = slice_from_s(z, 1, s_4); /* <-, line 53 */ if (ret < 0) return ret; } break; case 6: if (z->c >= z->l) goto lab0; z->c++; /* next, line 54 */ break; } continue; lab0: z->c = c1; break; } z->c = c_test; } { int c_keep = z->c; /* try, line 57 */ z->bra = z->c; /* [, line 57 */ if (!(eq_s(z, 1, s_5))) { z->c = c_keep; goto lab1; } z->ket = z->c; /* ], line 57 */ { int ret = slice_from_s(z, 1, s_6); /* <-, line 57 */ if (ret < 0) return ret; } lab1: ; } while(1) { /* repeat, line 58 */ int c2 = z->c; while(1) { /* goto, line 58 */ int c3 = z->c; if (in_grouping(z, g_v, 97, 232, 0)) goto lab3; z->bra = z->c; /* [, line 59 */ { int c4 = z->c; /* or, line 59 */ if (!(eq_s(z, 1, s_7))) goto lab5; z->ket = z->c; /* ], line 59 */ if (in_grouping(z, g_v, 97, 232, 0)) goto lab5; { int ret = slice_from_s(z, 1, s_8); /* <-, line 59 */ if (ret < 0) return ret; } goto lab4; lab5: z->c = c4; if (!(eq_s(z, 1, s_9))) goto lab3; z->ket = z->c; /* ], line 60 */ { int ret = slice_from_s(z, 1, s_10); /* <-, line 60 */ if (ret < 0) return ret; } } lab4: z->c = c3; break; lab3: z->c = c3; if (z->c >= z->l) goto lab2; z->c++; /* goto, line 58 */ } continue; lab2: z->c = c2; break; } return 1; } static int r_mark_regions(struct SN_env * z) { z->I[0] = z->l; z->I[1] = z->l; { /* gopast */ /* grouping v, line 69 */ int ret = out_grouping(z, g_v, 97, 232, 1); if (ret < 0) return 0; z->c += ret; } { /* gopast */ /* non v, line 69 */ int ret = in_grouping(z, g_v, 97, 232, 1); if (ret < 0) return 0; z->c += ret; } z->I[0] = z->c; /* setmark p1, line 69 */ /* try, line 70 */ if (!(z->I[0] < 3)) goto lab0; z->I[0] = 3; lab0: { /* gopast */ /* grouping v, line 71 */ int ret = out_grouping(z, g_v, 97, 232, 1); if (ret < 0) return 0; z->c += ret; } { /* gopast */ /* non v, line 71 */ int ret = in_grouping(z, g_v, 97, 232, 1); if (ret < 0) return 0; z->c += ret; } z->I[1] = z->c; /* setmark p2, line 71 */ return 1; } static int r_postlude(struct SN_env * z) { int among_var; while(1) { /* repeat, line 75 */ int c1 = z->c; z->bra = z->c; /* [, line 77 */ if (z->c >= z->l || (z->p[z->c + 0] != 73 && z->p[z->c + 0] != 89)) among_var = 3; else among_var = find_among(z, a_1, 3); /* substring, line 77 */ if (!(among_var)) goto lab0; z->ket = z->c; /* ], line 77 */ switch(among_var) { case 0: goto lab0; case 1: { int ret = slice_from_s(z, 1, s_11); /* <-, line 78 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 1, s_12); /* <-, line 79 */ if (ret < 0) return ret; } break; case 3: if (z->c >= z->l) goto lab0; z->c++; /* next, line 80 */ break; } continue; lab0: z->c = c1; break; } return 1; } static int r_R1(struct SN_env * z) { if (!(z->I[0] <= z->c)) return 0; return 1; } static int r_R2(struct SN_env * z) { if (!(z->I[1] <= z->c)) return 0; return 1; } static int r_undouble(struct SN_env * z) { { int m_test = z->l - z->c; /* test, line 91 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((1050640 >> (z->p[z->c - 1] & 0x1f)) & 1)) return 0; if (!(find_among_b(z, a_2, 3))) return 0; /* among, line 91 */ z->c = z->l - m_test; } z->ket = z->c; /* [, line 91 */ if (z->c <= z->lb) return 0; z->c--; /* next, line 91 */ z->bra = z->c; /* ], line 91 */ { int ret = slice_del(z); /* delete, line 91 */ if (ret < 0) return ret; } return 1; } static int r_e_ending(struct SN_env * z) { z->B[0] = 0; /* unset e_found, line 95 */ z->ket = z->c; /* [, line 96 */ if (!(eq_s_b(z, 1, s_13))) return 0; z->bra = z->c; /* ], line 96 */ { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 96 */ if (ret < 0) return ret; } { int m_test = z->l - z->c; /* test, line 96 */ if (out_grouping_b(z, g_v, 97, 232, 0)) return 0; z->c = z->l - m_test; } { int ret = slice_del(z); /* delete, line 96 */ if (ret < 0) return ret; } z->B[0] = 1; /* set e_found, line 97 */ { int ret = r_undouble(z); if (ret == 0) return 0; /* call undouble, line 98 */ if (ret < 0) return ret; } return 1; } static int r_en_ending(struct SN_env * z) { { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 102 */ if (ret < 0) return ret; } { int m1 = z->l - z->c; (void)m1; /* and, line 102 */ if (out_grouping_b(z, g_v, 97, 232, 0)) return 0; z->c = z->l - m1; { int m2 = z->l - z->c; (void)m2; /* not, line 102 */ if (!(eq_s_b(z, 3, s_14))) goto lab0; return 0; lab0: z->c = z->l - m2; } } { int ret = slice_del(z); /* delete, line 102 */ if (ret < 0) return ret; } { int ret = r_undouble(z); if (ret == 0) return 0; /* call undouble, line 103 */ if (ret < 0) return ret; } return 1; } static int r_standard_suffix(struct SN_env * z) { int among_var; { int m1 = z->l - z->c; (void)m1; /* do, line 107 */ z->ket = z->c; /* [, line 108 */ if (z->c <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((540704 >> (z->p[z->c - 1] & 0x1f)) & 1)) goto lab0; among_var = find_among_b(z, a_3, 5); /* substring, line 108 */ if (!(among_var)) goto lab0; z->bra = z->c; /* ], line 108 */ switch(among_var) { case 0: goto lab0; case 1: { int ret = r_R1(z); if (ret == 0) goto lab0; /* call R1, line 110 */ if (ret < 0) return ret; } { int ret = slice_from_s(z, 4, s_15); /* <-, line 110 */ if (ret < 0) return ret; } break; case 2: { int ret = r_en_ending(z); if (ret == 0) goto lab0; /* call en_ending, line 113 */ if (ret < 0) return ret; } break; case 3: { int ret = r_R1(z); if (ret == 0) goto lab0; /* call R1, line 116 */ if (ret < 0) return ret; } if (out_grouping_b(z, g_v_j, 97, 232, 0)) goto lab0; { int ret = slice_del(z); /* delete, line 116 */ if (ret < 0) return ret; } break; } lab0: z->c = z->l - m1; } { int m2 = z->l - z->c; (void)m2; /* do, line 120 */ { int ret = r_e_ending(z); if (ret == 0) goto lab1; /* call e_ending, line 120 */ if (ret < 0) return ret; } lab1: z->c = z->l - m2; } { int m3 = z->l - z->c; (void)m3; /* do, line 122 */ z->ket = z->c; /* [, line 122 */ if (!(eq_s_b(z, 4, s_16))) goto lab2; z->bra = z->c; /* ], line 122 */ { int ret = r_R2(z); if (ret == 0) goto lab2; /* call R2, line 122 */ if (ret < 0) return ret; } { int m4 = z->l - z->c; (void)m4; /* not, line 122 */ if (!(eq_s_b(z, 1, s_17))) goto lab3; goto lab2; lab3: z->c = z->l - m4; } { int ret = slice_del(z); /* delete, line 122 */ if (ret < 0) return ret; } z->ket = z->c; /* [, line 123 */ if (!(eq_s_b(z, 2, s_18))) goto lab2; z->bra = z->c; /* ], line 123 */ { int ret = r_en_ending(z); if (ret == 0) goto lab2; /* call en_ending, line 123 */ if (ret < 0) return ret; } lab2: z->c = z->l - m3; } { int m5 = z->l - z->c; (void)m5; /* do, line 126 */ z->ket = z->c; /* [, line 127 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((264336 >> (z->p[z->c - 1] & 0x1f)) & 1)) goto lab4; among_var = find_among_b(z, a_4, 6); /* substring, line 127 */ if (!(among_var)) goto lab4; z->bra = z->c; /* ], line 127 */ switch(among_var) { case 0: goto lab4; case 1: { int ret = r_R2(z); if (ret == 0) goto lab4; /* call R2, line 129 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 129 */ if (ret < 0) return ret; } { int m6 = z->l - z->c; (void)m6; /* or, line 130 */ z->ket = z->c; /* [, line 130 */ if (!(eq_s_b(z, 2, s_19))) goto lab6; z->bra = z->c; /* ], line 130 */ { int ret = r_R2(z); if (ret == 0) goto lab6; /* call R2, line 130 */ if (ret < 0) return ret; } { int m7 = z->l - z->c; (void)m7; /* not, line 130 */ if (!(eq_s_b(z, 1, s_20))) goto lab7; goto lab6; lab7: z->c = z->l - m7; } { int ret = slice_del(z); /* delete, line 130 */ if (ret < 0) return ret; } goto lab5; lab6: z->c = z->l - m6; { int ret = r_undouble(z); if (ret == 0) goto lab4; /* call undouble, line 130 */ if (ret < 0) return ret; } } lab5: break; case 2: { int ret = r_R2(z); if (ret == 0) goto lab4; /* call R2, line 133 */ if (ret < 0) return ret; } { int m8 = z->l - z->c; (void)m8; /* not, line 133 */ if (!(eq_s_b(z, 1, s_21))) goto lab8; goto lab4; lab8: z->c = z->l - m8; } { int ret = slice_del(z); /* delete, line 133 */ if (ret < 0) return ret; } break; case 3: { int ret = r_R2(z); if (ret == 0) goto lab4; /* call R2, line 136 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 136 */ if (ret < 0) return ret; } { int ret = r_e_ending(z); if (ret == 0) goto lab4; /* call e_ending, line 136 */ if (ret < 0) return ret; } break; case 4: { int ret = r_R2(z); if (ret == 0) goto lab4; /* call R2, line 139 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 139 */ if (ret < 0) return ret; } break; case 5: { int ret = r_R2(z); if (ret == 0) goto lab4; /* call R2, line 142 */ if (ret < 0) return ret; } if (!(z->B[0])) goto lab4; /* Boolean test e_found, line 142 */ { int ret = slice_del(z); /* delete, line 142 */ if (ret < 0) return ret; } break; } lab4: z->c = z->l - m5; } { int m9 = z->l - z->c; (void)m9; /* do, line 146 */ if (out_grouping_b(z, g_v_I, 73, 232, 0)) goto lab9; { int m_test = z->l - z->c; /* test, line 148 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((2129954 >> (z->p[z->c - 1] & 0x1f)) & 1)) goto lab9; if (!(find_among_b(z, a_5, 4))) goto lab9; /* among, line 149 */ if (out_grouping_b(z, g_v, 97, 232, 0)) goto lab9; z->c = z->l - m_test; } z->ket = z->c; /* [, line 152 */ if (z->c <= z->lb) goto lab9; z->c--; /* next, line 152 */ z->bra = z->c; /* ], line 152 */ { int ret = slice_del(z); /* delete, line 152 */ if (ret < 0) return ret; } lab9: z->c = z->l - m9; } return 1; } extern int dutch_ISO_8859_1_stem(struct SN_env * z) { { int c1 = z->c; /* do, line 159 */ { int ret = r_prelude(z); if (ret == 0) goto lab0; /* call prelude, line 159 */ if (ret < 0) return ret; } lab0: z->c = c1; } { int c2 = z->c; /* do, line 160 */ { int ret = r_mark_regions(z); if (ret == 0) goto lab1; /* call mark_regions, line 160 */ if (ret < 0) return ret; } lab1: z->c = c2; } z->lb = z->c; z->c = z->l; /* backwards, line 161 */ { int m3 = z->l - z->c; (void)m3; /* do, line 162 */ { int ret = r_standard_suffix(z); if (ret == 0) goto lab2; /* call standard_suffix, line 162 */ if (ret < 0) return ret; } lab2: z->c = z->l - m3; } z->c = z->lb; { int c4 = z->c; /* do, line 163 */ { int ret = r_postlude(z); if (ret == 0) goto lab3; /* call postlude, line 163 */ if (ret < 0) return ret; } lab3: z->c = c4; } return 1; } extern struct SN_env * dutch_ISO_8859_1_create_env(void) { return SN_create_env(0, 2, 1); } extern void dutch_ISO_8859_1_close_env(struct SN_env * z) { SN_close_env(z, 0); } swish-e-2.4.7/src/snowball/utilities.c0000664000077100017500000003134311166010107014614 00000000000000 #include #include #include #include "header.h" #define unless(C) if(!(C)) #define CREATE_SIZE 1 extern symbol * create_s(void) { symbol * p; void * mem = malloc(HEAD + (CREATE_SIZE + 1) * sizeof(symbol)); if (mem == NULL) return NULL; p = (symbol *) (HEAD + (char *) mem); CAPACITY(p) = CREATE_SIZE; SET_SIZE(p, CREATE_SIZE); return p; } extern void lose_s(symbol * p) { if (p == NULL) return; free((char *) p - HEAD); } /* new_p = skip_utf8(p, c, lb, l, n); skips n characters forwards from p + c if n +ve, or n characters backwards from p + c - 1 if n -ve. new_p is the new position, or 0 on failure. -- used to implement hop and next in the utf8 case. */ extern int skip_utf8(const symbol * p, int c, int lb, int l, int n) { int b; if (n >= 0) { for (; n > 0; n--) { if (c >= l) return -1; b = p[c++]; if (b >= 0xC0) { /* 1100 0000 */ while (c < l) { b = p[c]; if (b >= 0xC0 || b < 0x80) break; /* break unless b is 10------ */ c++; } } } } else { for (; n < 0; n++) { if (c <= lb) return -1; b = p[--c]; if (b >= 0x80) { /* 1000 0000 */ while (c > lb) { b = p[c]; if (b >= 0xC0) break; /* 1100 0000 */ c--; } } } } return c; } /* Code for character groupings: utf8 cases */ static int get_utf8(const symbol * p, int c, int l, int * slot) { int b0, b1; if (c >= l) return 0; b0 = p[c++]; if (b0 < 0xC0 || c == l) { /* 1100 0000 */ * slot = b0; return 1; } b1 = p[c++]; if (b0 < 0xE0 || c == l) { /* 1110 0000 */ * slot = (b0 & 0x1F) << 6 | (b1 & 0x3F); return 2; } * slot = (b0 & 0xF) << 12 | (b1 & 0x3F) << 6 | (p[c] & 0x3F); return 3; } static int get_b_utf8(const symbol * p, int c, int lb, int * slot) { int b0, b1; if (c <= lb) return 0; b0 = p[--c]; if (b0 < 0x80 || c == lb) { /* 1000 0000 */ * slot = b0; return 1; } b1 = p[--c]; if (b1 >= 0xC0 || c == lb) { /* 1100 0000 */ * slot = (b1 & 0x1F) << 6 | (b0 & 0x3F); return 2; } * slot = (p[c] & 0xF) << 12 | (b1 & 0x3F) << 6 | (b0 & 0x3F); return 3; } extern int in_grouping_U(struct SN_env * z, const unsigned char * s, int min, int max, int repeat) { do { int ch; int w = get_utf8(z->p, z->c, z->l, & ch); unless (w) return -1; if (ch > max || (ch -= min) < 0 || (s[ch >> 3] & (0X1 << (ch & 0X7))) == 0) return w; z->c += w; } while (repeat); return 0; } extern int in_grouping_b_U(struct SN_env * z, const unsigned char * s, int min, int max, int repeat) { do { int ch; int w = get_b_utf8(z->p, z->c, z->lb, & ch); unless (w) return -1; if (ch > max || (ch -= min) < 0 || (s[ch >> 3] & (0X1 << (ch & 0X7))) == 0) return w; z->c -= w; } while (repeat); return 0; } extern int out_grouping_U(struct SN_env * z, const unsigned char * s, int min, int max, int repeat) { do { int ch; int w = get_utf8(z->p, z->c, z->l, & ch); unless (w) return -1; unless (ch > max || (ch -= min) < 0 || (s[ch >> 3] & (0X1 << (ch & 0X7))) == 0) return w; z->c += w; } while (repeat); return 0; } extern int out_grouping_b_U(struct SN_env * z, const unsigned char * s, int min, int max, int repeat) { do { int ch; int w = get_b_utf8(z->p, z->c, z->lb, & ch); unless (w) return -1; unless (ch > max || (ch -= min) < 0 || (s[ch >> 3] & (0X1 << (ch & 0X7))) == 0) return w; z->c -= w; } while (repeat); return 0; } /* Code for character groupings: non-utf8 cases */ extern int in_grouping(struct SN_env * z, const unsigned char * s, int min, int max, int repeat) { do { int ch; if (z->c >= z->l) return -1; ch = z->p[z->c]; if (ch > max || (ch -= min) < 0 || (s[ch >> 3] & (0X1 << (ch & 0X7))) == 0) return 1; z->c++; } while (repeat); return 0; } extern int in_grouping_b(struct SN_env * z, const unsigned char * s, int min, int max, int repeat) { do { int ch; if (z->c <= z->lb) return -1; ch = z->p[z->c - 1]; if (ch > max || (ch -= min) < 0 || (s[ch >> 3] & (0X1 << (ch & 0X7))) == 0) return 1; z->c--; } while (repeat); return 0; } extern int out_grouping(struct SN_env * z, const unsigned char * s, int min, int max, int repeat) { do { int ch; if (z->c >= z->l) return -1; ch = z->p[z->c]; unless (ch > max || (ch -= min) < 0 || (s[ch >> 3] & (0X1 << (ch & 0X7))) == 0) return 1; z->c++; } while (repeat); return 0; } extern int out_grouping_b(struct SN_env * z, const unsigned char * s, int min, int max, int repeat) { do { int ch; if (z->c <= z->lb) return -1; ch = z->p[z->c - 1]; unless (ch > max || (ch -= min) < 0 || (s[ch >> 3] & (0X1 << (ch & 0X7))) == 0) return 1; z->c--; } while (repeat); return 0; } extern int eq_s(struct SN_env * z, int s_size, const symbol * s) { if (z->l - z->c < s_size || memcmp(z->p + z->c, s, s_size * sizeof(symbol)) != 0) return 0; z->c += s_size; return 1; } extern int eq_s_b(struct SN_env * z, int s_size, const symbol * s) { if (z->c - z->lb < s_size || memcmp(z->p + z->c - s_size, s, s_size * sizeof(symbol)) != 0) return 0; z->c -= s_size; return 1; } extern int eq_v(struct SN_env * z, const symbol * p) { return eq_s(z, SIZE(p), p); } extern int eq_v_b(struct SN_env * z, const symbol * p) { return eq_s_b(z, SIZE(p), p); } extern int find_among(struct SN_env * z, const struct among * v, int v_size) { int i = 0; int j = v_size; int c = z->c; int l = z->l; symbol * q = z->p + c; const struct among * w; int common_i = 0; int common_j = 0; int first_key_inspected = 0; while(1) { int k = i + ((j - i) >> 1); int diff = 0; int common = common_i < common_j ? common_i : common_j; /* smaller */ w = v + k; { int i2; for (i2 = common; i2 < w->s_size; i2++) { if (c + common == l) { diff = -1; break; } diff = q[common] - w->s[i2]; if (diff != 0) break; common++; } } if (diff < 0) { j = k; common_j = common; } else { i = k; common_i = common; } if (j - i <= 1) { if (i > 0) break; /* v->s has been inspected */ if (j == i) break; /* only one item in v */ /* - but now we need to go round once more to get v->s inspected. This looks messy, but is actually the optimal approach. */ if (first_key_inspected) break; first_key_inspected = 1; } } while(1) { w = v + i; if (common_i >= w->s_size) { z->c = c + w->s_size; if (w->function == 0) return w->result; { int res = w->function(z); z->c = c + w->s_size; if (res) return w->result; } } i = w->substring_i; if (i < 0) return 0; } } /* find_among_b is for backwards processing. Same comments apply */ extern int find_among_b(struct SN_env * z, const struct among * v, int v_size) { int i = 0; int j = v_size; int c = z->c; int lb = z->lb; symbol * q = z->p + c - 1; const struct among * w; int common_i = 0; int common_j = 0; int first_key_inspected = 0; while(1) { int k = i + ((j - i) >> 1); int diff = 0; int common = common_i < common_j ? common_i : common_j; w = v + k; { int i2; for (i2 = w->s_size - 1 - common; i2 >= 0; i2--) { if (c - common == lb) { diff = -1; break; } diff = q[- common] - w->s[i2]; if (diff != 0) break; common++; } } if (diff < 0) { j = k; common_j = common; } else { i = k; common_i = common; } if (j - i <= 1) { if (i > 0) break; if (j == i) break; if (first_key_inspected) break; first_key_inspected = 1; } } while(1) { w = v + i; if (common_i >= w->s_size) { z->c = c - w->s_size; if (w->function == 0) return w->result; { int res = w->function(z); z->c = c - w->s_size; if (res) return w->result; } } i = w->substring_i; if (i < 0) return 0; } } /* Increase the size of the buffer pointed to by p to at least n symbols. * If insufficient memory, returns NULL and frees the old buffer. */ static symbol * increase_size(symbol * p, int n) { symbol * q; int new_size = n + 20; void * mem = realloc((char *) p - HEAD, HEAD + (new_size + 1) * sizeof(symbol)); if (mem == NULL) { lose_s(p); return NULL; } q = (symbol *) (HEAD + (char *)mem); CAPACITY(q) = new_size; return q; } /* to replace symbols between c_bra and c_ket in z->p by the s_size symbols at s. Returns 0 on success, -1 on error. Also, frees z->p (and sets it to NULL) on error. */ extern int replace_s(struct SN_env * z, int c_bra, int c_ket, int s_size, const symbol * s, int * adjptr) { int adjustment; int len; if (z->p == NULL) { z->p = create_s(); if (z->p == NULL) return -1; } adjustment = s_size - (c_ket - c_bra); len = SIZE(z->p); if (adjustment != 0) { if (adjustment + len > CAPACITY(z->p)) { z->p = increase_size(z->p, adjustment + len); if (z->p == NULL) return -1; } memmove(z->p + c_ket + adjustment, z->p + c_ket, (len - c_ket) * sizeof(symbol)); SET_SIZE(z->p, adjustment + len); z->l += adjustment; if (z->c >= c_ket) z->c += adjustment; else if (z->c > c_bra) z->c = c_bra; } unless (s_size == 0) memmove(z->p + c_bra, s, s_size * sizeof(symbol)); if (adjptr != NULL) *adjptr = adjustment; return 0; } static int slice_check(struct SN_env * z) { if (z->bra < 0 || z->bra > z->ket || z->ket > z->l || z->p == NULL || z->l > SIZE(z->p)) /* this line could be removed */ { #if 0 fprintf(stderr, "faulty slice operation:\n"); debug(z, -1, 0); #endif return -1; } return 0; } extern int slice_from_s(struct SN_env * z, int s_size, const symbol * s) { if (slice_check(z)) return -1; return replace_s(z, z->bra, z->ket, s_size, s, NULL); } extern int slice_from_v(struct SN_env * z, const symbol * p) { return slice_from_s(z, SIZE(p), p); } extern int slice_del(struct SN_env * z) { return slice_from_s(z, 0, 0); } extern int insert_s(struct SN_env * z, int bra, int ket, int s_size, const symbol * s) { int adjustment; if (replace_s(z, bra, ket, s_size, s, &adjustment)) return -1; if (bra <= z->bra) z->bra += adjustment; if (bra <= z->ket) z->ket += adjustment; return 0; } extern int insert_v(struct SN_env * z, int bra, int ket, const symbol * p) { int adjustment; if (replace_s(z, bra, ket, SIZE(p), p, &adjustment)) return -1; if (bra <= z->bra) z->bra += adjustment; if (bra <= z->ket) z->ket += adjustment; return 0; } extern symbol * slice_to(struct SN_env * z, symbol * p) { if (slice_check(z)) { lose_s(p); return NULL; } { int len = z->ket - z->bra; if (CAPACITY(p) < len) { p = increase_size(p, len); if (p == NULL) return NULL; } memmove(p, z->p + z->bra, len * sizeof(symbol)); SET_SIZE(p, len); } return p; } extern symbol * assign_to(struct SN_env * z, symbol * p) { int len = z->l; if (CAPACITY(p) < len) { p = increase_size(p, len); if (p == NULL) return NULL; } memmove(p, z->p, len * sizeof(symbol)); SET_SIZE(p, len); return p; } #if 0 extern void debug(struct SN_env * z, int number, int line_count) { int i; int limit = SIZE(z->p); /*if (number >= 0) printf("%3d (line %4d): '", number, line_count);*/ if (number >= 0) printf("%3d (line %4d): [%d]'", number, line_count,limit); for (i = 0; i <= limit; i++) { if (z->lb == i) printf("{"); if (z->bra == i) printf("["); if (z->c == i) printf("|"); if (z->ket == i) printf("]"); if (z->l == i) printf("}"); if (i < limit) { int ch = z->p[i]; if (ch == 0) ch = '#'; printf("%c", ch); } } printf("'\n"); } #endif swish-e-2.4.7/src/snowball/stem_hu.h0000664000077100017500000000051611166010110014242 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #ifdef __cplusplus extern "C" { #endif extern struct SN_env * hungarian_ISO_8859_1_create_env(void); extern void hungarian_ISO_8859_1_close_env(struct SN_env * z); extern int hungarian_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif swish-e-2.4.7/src/snowball/stem_ro.c0000664000077100017500000010776611166010110014260 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #include "header.h" #ifdef __cplusplus extern "C" { #endif extern int romanian_ISO_8859_2_stem(struct SN_env * z); #ifdef __cplusplus } #endif static int r_vowel_suffix(struct SN_env * z); static int r_verb_suffix(struct SN_env * z); static int r_combo_suffix(struct SN_env * z); static int r_standard_suffix(struct SN_env * z); static int r_step_0(struct SN_env * z); static int r_R2(struct SN_env * z); static int r_R1(struct SN_env * z); static int r_RV(struct SN_env * z); static int r_mark_regions(struct SN_env * z); static int r_postlude(struct SN_env * z); static int r_prelude(struct SN_env * z); #ifdef __cplusplus extern "C" { #endif extern struct SN_env * romanian_ISO_8859_2_create_env(void); extern void romanian_ISO_8859_2_close_env(struct SN_env * z); #ifdef __cplusplus } #endif static const symbol s_0_1[1] = { 'I' }; static const symbol s_0_2[1] = { 'U' }; static const struct among a_0[3] = { /* 0 */ { 0, 0, -1, 3, 0}, /* 1 */ { 1, s_0_1, 0, 1, 0}, /* 2 */ { 1, s_0_2, 0, 2, 0} }; static const symbol s_1_0[2] = { 'e', 'a' }; static const symbol s_1_1[4] = { 'a', 0xFE, 'i', 'a' }; static const symbol s_1_2[3] = { 'a', 'u', 'a' }; static const symbol s_1_3[3] = { 'i', 'u', 'a' }; static const symbol s_1_4[4] = { 'a', 0xFE, 'i', 'e' }; static const symbol s_1_5[3] = { 'e', 'l', 'e' }; static const symbol s_1_6[3] = { 'i', 'l', 'e' }; static const symbol s_1_7[4] = { 'i', 'i', 'l', 'e' }; static const symbol s_1_8[3] = { 'i', 'e', 'i' }; static const symbol s_1_9[4] = { 'a', 't', 'e', 'i' }; static const symbol s_1_10[2] = { 'i', 'i' }; static const symbol s_1_11[4] = { 'u', 'l', 'u', 'i' }; static const symbol s_1_12[2] = { 'u', 'l' }; static const symbol s_1_13[4] = { 'e', 'l', 'o', 'r' }; static const symbol s_1_14[4] = { 'i', 'l', 'o', 'r' }; static const symbol s_1_15[5] = { 'i', 'i', 'l', 'o', 'r' }; static const struct among a_1[16] = { /* 0 */ { 2, s_1_0, -1, 3, 0}, /* 1 */ { 4, s_1_1, -1, 7, 0}, /* 2 */ { 3, s_1_2, -1, 2, 0}, /* 3 */ { 3, s_1_3, -1, 4, 0}, /* 4 */ { 4, s_1_4, -1, 7, 0}, /* 5 */ { 3, s_1_5, -1, 3, 0}, /* 6 */ { 3, s_1_6, -1, 5, 0}, /* 7 */ { 4, s_1_7, 6, 4, 0}, /* 8 */ { 3, s_1_8, -1, 4, 0}, /* 9 */ { 4, s_1_9, -1, 6, 0}, /* 10 */ { 2, s_1_10, -1, 4, 0}, /* 11 */ { 4, s_1_11, -1, 1, 0}, /* 12 */ { 2, s_1_12, -1, 1, 0}, /* 13 */ { 4, s_1_13, -1, 3, 0}, /* 14 */ { 4, s_1_14, -1, 4, 0}, /* 15 */ { 5, s_1_15, 14, 4, 0} }; static const symbol s_2_0[5] = { 'i', 'c', 'a', 'l', 'a' }; static const symbol s_2_1[5] = { 'i', 'c', 'i', 'v', 'a' }; static const symbol s_2_2[5] = { 'a', 't', 'i', 'v', 'a' }; static const symbol s_2_3[5] = { 'i', 't', 'i', 'v', 'a' }; static const symbol s_2_4[5] = { 'i', 'c', 'a', 'l', 'e' }; static const symbol s_2_5[6] = { 'a', 0xFE, 'i', 'u', 'n', 'e' }; static const symbol s_2_6[6] = { 'i', 0xFE, 'i', 'u', 'n', 'e' }; static const symbol s_2_7[6] = { 'a', 't', 'o', 'a', 'r', 'e' }; static const symbol s_2_8[6] = { 'i', 't', 'o', 'a', 'r', 'e' }; static const symbol s_2_9[6] = { 0xE3, 't', 'o', 'a', 'r', 'e' }; static const symbol s_2_10[7] = { 'i', 'c', 'i', 't', 'a', 't', 'e' }; static const symbol s_2_11[9] = { 'a', 'b', 'i', 'l', 'i', 't', 'a', 't', 'e' }; static const symbol s_2_12[9] = { 'i', 'b', 'i', 'l', 'i', 't', 'a', 't', 'e' }; static const symbol s_2_13[7] = { 'i', 'v', 'i', 't', 'a', 't', 'e' }; static const symbol s_2_14[5] = { 'i', 'c', 'i', 'v', 'e' }; static const symbol s_2_15[5] = { 'a', 't', 'i', 'v', 'e' }; static const symbol s_2_16[5] = { 'i', 't', 'i', 'v', 'e' }; static const symbol s_2_17[5] = { 'i', 'c', 'a', 'l', 'i' }; static const symbol s_2_18[5] = { 'a', 't', 'o', 'r', 'i' }; static const symbol s_2_19[7] = { 'i', 'c', 'a', 't', 'o', 'r', 'i' }; static const symbol s_2_20[5] = { 'i', 't', 'o', 'r', 'i' }; static const symbol s_2_21[5] = { 0xE3, 't', 'o', 'r', 'i' }; static const symbol s_2_22[7] = { 'i', 'c', 'i', 't', 'a', 't', 'i' }; static const symbol s_2_23[9] = { 'a', 'b', 'i', 'l', 'i', 't', 'a', 't', 'i' }; static const symbol s_2_24[7] = { 'i', 'v', 'i', 't', 'a', 't', 'i' }; static const symbol s_2_25[5] = { 'i', 'c', 'i', 'v', 'i' }; static const symbol s_2_26[5] = { 'a', 't', 'i', 'v', 'i' }; static const symbol s_2_27[5] = { 'i', 't', 'i', 'v', 'i' }; static const symbol s_2_28[6] = { 'i', 'c', 'i', 't', 0xE3, 'i' }; static const symbol s_2_29[8] = { 'a', 'b', 'i', 'l', 'i', 't', 0xE3, 'i' }; static const symbol s_2_30[6] = { 'i', 'v', 'i', 't', 0xE3, 'i' }; static const symbol s_2_31[7] = { 'i', 'c', 'i', 't', 0xE3, 0xFE, 'i' }; static const symbol s_2_32[9] = { 'a', 'b', 'i', 'l', 'i', 't', 0xE3, 0xFE, 'i' }; static const symbol s_2_33[7] = { 'i', 'v', 'i', 't', 0xE3, 0xFE, 'i' }; static const symbol s_2_34[4] = { 'i', 'c', 'a', 'l' }; static const symbol s_2_35[4] = { 'a', 't', 'o', 'r' }; static const symbol s_2_36[6] = { 'i', 'c', 'a', 't', 'o', 'r' }; static const symbol s_2_37[4] = { 'i', 't', 'o', 'r' }; static const symbol s_2_38[4] = { 0xE3, 't', 'o', 'r' }; static const symbol s_2_39[4] = { 'i', 'c', 'i', 'v' }; static const symbol s_2_40[4] = { 'a', 't', 'i', 'v' }; static const symbol s_2_41[4] = { 'i', 't', 'i', 'v' }; static const symbol s_2_42[5] = { 'i', 'c', 'a', 'l', 0xE3 }; static const symbol s_2_43[5] = { 'i', 'c', 'i', 'v', 0xE3 }; static const symbol s_2_44[5] = { 'a', 't', 'i', 'v', 0xE3 }; static const symbol s_2_45[5] = { 'i', 't', 'i', 'v', 0xE3 }; static const struct among a_2[46] = { /* 0 */ { 5, s_2_0, -1, 4, 0}, /* 1 */ { 5, s_2_1, -1, 4, 0}, /* 2 */ { 5, s_2_2, -1, 5, 0}, /* 3 */ { 5, s_2_3, -1, 6, 0}, /* 4 */ { 5, s_2_4, -1, 4, 0}, /* 5 */ { 6, s_2_5, -1, 5, 0}, /* 6 */ { 6, s_2_6, -1, 6, 0}, /* 7 */ { 6, s_2_7, -1, 5, 0}, /* 8 */ { 6, s_2_8, -1, 6, 0}, /* 9 */ { 6, s_2_9, -1, 5, 0}, /* 10 */ { 7, s_2_10, -1, 4, 0}, /* 11 */ { 9, s_2_11, -1, 1, 0}, /* 12 */ { 9, s_2_12, -1, 2, 0}, /* 13 */ { 7, s_2_13, -1, 3, 0}, /* 14 */ { 5, s_2_14, -1, 4, 0}, /* 15 */ { 5, s_2_15, -1, 5, 0}, /* 16 */ { 5, s_2_16, -1, 6, 0}, /* 17 */ { 5, s_2_17, -1, 4, 0}, /* 18 */ { 5, s_2_18, -1, 5, 0}, /* 19 */ { 7, s_2_19, 18, 4, 0}, /* 20 */ { 5, s_2_20, -1, 6, 0}, /* 21 */ { 5, s_2_21, -1, 5, 0}, /* 22 */ { 7, s_2_22, -1, 4, 0}, /* 23 */ { 9, s_2_23, -1, 1, 0}, /* 24 */ { 7, s_2_24, -1, 3, 0}, /* 25 */ { 5, s_2_25, -1, 4, 0}, /* 26 */ { 5, s_2_26, -1, 5, 0}, /* 27 */ { 5, s_2_27, -1, 6, 0}, /* 28 */ { 6, s_2_28, -1, 4, 0}, /* 29 */ { 8, s_2_29, -1, 1, 0}, /* 30 */ { 6, s_2_30, -1, 3, 0}, /* 31 */ { 7, s_2_31, -1, 4, 0}, /* 32 */ { 9, s_2_32, -1, 1, 0}, /* 33 */ { 7, s_2_33, -1, 3, 0}, /* 34 */ { 4, s_2_34, -1, 4, 0}, /* 35 */ { 4, s_2_35, -1, 5, 0}, /* 36 */ { 6, s_2_36, 35, 4, 0}, /* 37 */ { 4, s_2_37, -1, 6, 0}, /* 38 */ { 4, s_2_38, -1, 5, 0}, /* 39 */ { 4, s_2_39, -1, 4, 0}, /* 40 */ { 4, s_2_40, -1, 5, 0}, /* 41 */ { 4, s_2_41, -1, 6, 0}, /* 42 */ { 5, s_2_42, -1, 4, 0}, /* 43 */ { 5, s_2_43, -1, 4, 0}, /* 44 */ { 5, s_2_44, -1, 5, 0}, /* 45 */ { 5, s_2_45, -1, 6, 0} }; static const symbol s_3_0[3] = { 'i', 'c', 'a' }; static const symbol s_3_1[5] = { 'a', 'b', 'i', 'l', 'a' }; static const symbol s_3_2[5] = { 'i', 'b', 'i', 'l', 'a' }; static const symbol s_3_3[4] = { 'o', 'a', 's', 'a' }; static const symbol s_3_4[3] = { 'a', 't', 'a' }; static const symbol s_3_5[3] = { 'i', 't', 'a' }; static const symbol s_3_6[4] = { 'a', 'n', 't', 'a' }; static const symbol s_3_7[4] = { 'i', 's', 't', 'a' }; static const symbol s_3_8[3] = { 'u', 't', 'a' }; static const symbol s_3_9[3] = { 'i', 'v', 'a' }; static const symbol s_3_10[2] = { 'i', 'c' }; static const symbol s_3_11[3] = { 'i', 'c', 'e' }; static const symbol s_3_12[5] = { 'a', 'b', 'i', 'l', 'e' }; static const symbol s_3_13[5] = { 'i', 'b', 'i', 'l', 'e' }; static const symbol s_3_14[4] = { 'i', 's', 'm', 'e' }; static const symbol s_3_15[4] = { 'i', 'u', 'n', 'e' }; static const symbol s_3_16[4] = { 'o', 'a', 's', 'e' }; static const symbol s_3_17[3] = { 'a', 't', 'e' }; static const symbol s_3_18[5] = { 'i', 't', 'a', 't', 'e' }; static const symbol s_3_19[3] = { 'i', 't', 'e' }; static const symbol s_3_20[4] = { 'a', 'n', 't', 'e' }; static const symbol s_3_21[4] = { 'i', 's', 't', 'e' }; static const symbol s_3_22[3] = { 'u', 't', 'e' }; static const symbol s_3_23[3] = { 'i', 'v', 'e' }; static const symbol s_3_24[3] = { 'i', 'c', 'i' }; static const symbol s_3_25[5] = { 'a', 'b', 'i', 'l', 'i' }; static const symbol s_3_26[5] = { 'i', 'b', 'i', 'l', 'i' }; static const symbol s_3_27[4] = { 'i', 'u', 'n', 'i' }; static const symbol s_3_28[5] = { 'a', 't', 'o', 'r', 'i' }; static const symbol s_3_29[3] = { 'o', 's', 'i' }; static const symbol s_3_30[3] = { 'a', 't', 'i' }; static const symbol s_3_31[5] = { 'i', 't', 'a', 't', 'i' }; static const symbol s_3_32[3] = { 'i', 't', 'i' }; static const symbol s_3_33[4] = { 'a', 'n', 't', 'i' }; static const symbol s_3_34[4] = { 'i', 's', 't', 'i' }; static const symbol s_3_35[3] = { 'u', 't', 'i' }; static const symbol s_3_36[4] = { 'i', 0xBA, 't', 'i' }; static const symbol s_3_37[3] = { 'i', 'v', 'i' }; static const symbol s_3_38[3] = { 'o', 0xBA, 'i' }; static const symbol s_3_39[4] = { 'i', 't', 0xE3, 'i' }; static const symbol s_3_40[5] = { 'i', 't', 0xE3, 0xFE, 'i' }; static const symbol s_3_41[4] = { 'a', 'b', 'i', 'l' }; static const symbol s_3_42[4] = { 'i', 'b', 'i', 'l' }; static const symbol s_3_43[3] = { 'i', 's', 'm' }; static const symbol s_3_44[4] = { 'a', 't', 'o', 'r' }; static const symbol s_3_45[2] = { 'o', 's' }; static const symbol s_3_46[2] = { 'a', 't' }; static const symbol s_3_47[2] = { 'i', 't' }; static const symbol s_3_48[3] = { 'a', 'n', 't' }; static const symbol s_3_49[3] = { 'i', 's', 't' }; static const symbol s_3_50[2] = { 'u', 't' }; static const symbol s_3_51[2] = { 'i', 'v' }; static const symbol s_3_52[3] = { 'i', 'c', 0xE3 }; static const symbol s_3_53[5] = { 'a', 'b', 'i', 'l', 0xE3 }; static const symbol s_3_54[5] = { 'i', 'b', 'i', 'l', 0xE3 }; static const symbol s_3_55[4] = { 'o', 'a', 's', 0xE3 }; static const symbol s_3_56[3] = { 'a', 't', 0xE3 }; static const symbol s_3_57[3] = { 'i', 't', 0xE3 }; static const symbol s_3_58[4] = { 'a', 'n', 't', 0xE3 }; static const symbol s_3_59[4] = { 'i', 's', 't', 0xE3 }; static const symbol s_3_60[3] = { 'u', 't', 0xE3 }; static const symbol s_3_61[3] = { 'i', 'v', 0xE3 }; static const struct among a_3[62] = { /* 0 */ { 3, s_3_0, -1, 1, 0}, /* 1 */ { 5, s_3_1, -1, 1, 0}, /* 2 */ { 5, s_3_2, -1, 1, 0}, /* 3 */ { 4, s_3_3, -1, 1, 0}, /* 4 */ { 3, s_3_4, -1, 1, 0}, /* 5 */ { 3, s_3_5, -1, 1, 0}, /* 6 */ { 4, s_3_6, -1, 1, 0}, /* 7 */ { 4, s_3_7, -1, 3, 0}, /* 8 */ { 3, s_3_8, -1, 1, 0}, /* 9 */ { 3, s_3_9, -1, 1, 0}, /* 10 */ { 2, s_3_10, -1, 1, 0}, /* 11 */ { 3, s_3_11, -1, 1, 0}, /* 12 */ { 5, s_3_12, -1, 1, 0}, /* 13 */ { 5, s_3_13, -1, 1, 0}, /* 14 */ { 4, s_3_14, -1, 3, 0}, /* 15 */ { 4, s_3_15, -1, 2, 0}, /* 16 */ { 4, s_3_16, -1, 1, 0}, /* 17 */ { 3, s_3_17, -1, 1, 0}, /* 18 */ { 5, s_3_18, 17, 1, 0}, /* 19 */ { 3, s_3_19, -1, 1, 0}, /* 20 */ { 4, s_3_20, -1, 1, 0}, /* 21 */ { 4, s_3_21, -1, 3, 0}, /* 22 */ { 3, s_3_22, -1, 1, 0}, /* 23 */ { 3, s_3_23, -1, 1, 0}, /* 24 */ { 3, s_3_24, -1, 1, 0}, /* 25 */ { 5, s_3_25, -1, 1, 0}, /* 26 */ { 5, s_3_26, -1, 1, 0}, /* 27 */ { 4, s_3_27, -1, 2, 0}, /* 28 */ { 5, s_3_28, -1, 1, 0}, /* 29 */ { 3, s_3_29, -1, 1, 0}, /* 30 */ { 3, s_3_30, -1, 1, 0}, /* 31 */ { 5, s_3_31, 30, 1, 0}, /* 32 */ { 3, s_3_32, -1, 1, 0}, /* 33 */ { 4, s_3_33, -1, 1, 0}, /* 34 */ { 4, s_3_34, -1, 3, 0}, /* 35 */ { 3, s_3_35, -1, 1, 0}, /* 36 */ { 4, s_3_36, -1, 3, 0}, /* 37 */ { 3, s_3_37, -1, 1, 0}, /* 38 */ { 3, s_3_38, -1, 1, 0}, /* 39 */ { 4, s_3_39, -1, 1, 0}, /* 40 */ { 5, s_3_40, -1, 1, 0}, /* 41 */ { 4, s_3_41, -1, 1, 0}, /* 42 */ { 4, s_3_42, -1, 1, 0}, /* 43 */ { 3, s_3_43, -1, 3, 0}, /* 44 */ { 4, s_3_44, -1, 1, 0}, /* 45 */ { 2, s_3_45, -1, 1, 0}, /* 46 */ { 2, s_3_46, -1, 1, 0}, /* 47 */ { 2, s_3_47, -1, 1, 0}, /* 48 */ { 3, s_3_48, -1, 1, 0}, /* 49 */ { 3, s_3_49, -1, 3, 0}, /* 50 */ { 2, s_3_50, -1, 1, 0}, /* 51 */ { 2, s_3_51, -1, 1, 0}, /* 52 */ { 3, s_3_52, -1, 1, 0}, /* 53 */ { 5, s_3_53, -1, 1, 0}, /* 54 */ { 5, s_3_54, -1, 1, 0}, /* 55 */ { 4, s_3_55, -1, 1, 0}, /* 56 */ { 3, s_3_56, -1, 1, 0}, /* 57 */ { 3, s_3_57, -1, 1, 0}, /* 58 */ { 4, s_3_58, -1, 1, 0}, /* 59 */ { 4, s_3_59, -1, 3, 0}, /* 60 */ { 3, s_3_60, -1, 1, 0}, /* 61 */ { 3, s_3_61, -1, 1, 0} }; static const symbol s_4_0[2] = { 'e', 'a' }; static const symbol s_4_1[2] = { 'i', 'a' }; static const symbol s_4_2[3] = { 'e', 's', 'c' }; static const symbol s_4_3[3] = { 0xE3, 's', 'c' }; static const symbol s_4_4[3] = { 'i', 'n', 'd' }; static const symbol s_4_5[3] = { 0xE2, 'n', 'd' }; static const symbol s_4_6[3] = { 'a', 'r', 'e' }; static const symbol s_4_7[3] = { 'e', 'r', 'e' }; static const symbol s_4_8[3] = { 'i', 'r', 'e' }; static const symbol s_4_9[3] = { 0xE2, 'r', 'e' }; static const symbol s_4_10[2] = { 's', 'e' }; static const symbol s_4_11[3] = { 'a', 's', 'e' }; static const symbol s_4_12[4] = { 's', 'e', 's', 'e' }; static const symbol s_4_13[3] = { 'i', 's', 'e' }; static const symbol s_4_14[3] = { 'u', 's', 'e' }; static const symbol s_4_15[3] = { 0xE2, 's', 'e' }; static const symbol s_4_16[4] = { 'e', 0xBA, 't', 'e' }; static const symbol s_4_17[4] = { 0xE3, 0xBA, 't', 'e' }; static const symbol s_4_18[3] = { 'e', 'z', 'e' }; static const symbol s_4_19[2] = { 'a', 'i' }; static const symbol s_4_20[3] = { 'e', 'a', 'i' }; static const symbol s_4_21[3] = { 'i', 'a', 'i' }; static const symbol s_4_22[3] = { 's', 'e', 'i' }; static const symbol s_4_23[4] = { 'e', 0xBA, 't', 'i' }; static const symbol s_4_24[4] = { 0xE3, 0xBA, 't', 'i' }; static const symbol s_4_25[2] = { 'u', 'i' }; static const symbol s_4_26[3] = { 'e', 'z', 'i' }; static const symbol s_4_27[3] = { 'a', 0xBA, 'i' }; static const symbol s_4_28[4] = { 's', 'e', 0xBA, 'i' }; static const symbol s_4_29[5] = { 'a', 's', 'e', 0xBA, 'i' }; static const symbol s_4_30[6] = { 's', 'e', 's', 'e', 0xBA, 'i' }; static const symbol s_4_31[5] = { 'i', 's', 'e', 0xBA, 'i' }; static const symbol s_4_32[5] = { 'u', 's', 'e', 0xBA, 'i' }; static const symbol s_4_33[5] = { 0xE2, 's', 'e', 0xBA, 'i' }; static const symbol s_4_34[3] = { 'i', 0xBA, 'i' }; static const symbol s_4_35[3] = { 'u', 0xBA, 'i' }; static const symbol s_4_36[3] = { 0xE2, 0xBA, 'i' }; static const symbol s_4_37[2] = { 0xE2, 'i' }; static const symbol s_4_38[3] = { 'a', 0xFE, 'i' }; static const symbol s_4_39[4] = { 'e', 'a', 0xFE, 'i' }; static const symbol s_4_40[4] = { 'i', 'a', 0xFE, 'i' }; static const symbol s_4_41[3] = { 'e', 0xFE, 'i' }; static const symbol s_4_42[3] = { 'i', 0xFE, 'i' }; static const symbol s_4_43[3] = { 0xE2, 0xFE, 'i' }; static const symbol s_4_44[5] = { 'a', 'r', 0xE3, 0xFE, 'i' }; static const symbol s_4_45[6] = { 's', 'e', 'r', 0xE3, 0xFE, 'i' }; static const symbol s_4_46[7] = { 'a', 's', 'e', 'r', 0xE3, 0xFE, 'i' }; static const symbol s_4_47[8] = { 's', 'e', 's', 'e', 'r', 0xE3, 0xFE, 'i' }; static const symbol s_4_48[7] = { 'i', 's', 'e', 'r', 0xE3, 0xFE, 'i' }; static const symbol s_4_49[7] = { 'u', 's', 'e', 'r', 0xE3, 0xFE, 'i' }; static const symbol s_4_50[7] = { 0xE2, 's', 'e', 'r', 0xE3, 0xFE, 'i' }; static const symbol s_4_51[5] = { 'i', 'r', 0xE3, 0xFE, 'i' }; static const symbol s_4_52[5] = { 'u', 'r', 0xE3, 0xFE, 'i' }; static const symbol s_4_53[5] = { 0xE2, 'r', 0xE3, 0xFE, 'i' }; static const symbol s_4_54[2] = { 'a', 'm' }; static const symbol s_4_55[3] = { 'e', 'a', 'm' }; static const symbol s_4_56[3] = { 'i', 'a', 'm' }; static const symbol s_4_57[2] = { 'e', 'm' }; static const symbol s_4_58[4] = { 'a', 's', 'e', 'm' }; static const symbol s_4_59[5] = { 's', 'e', 's', 'e', 'm' }; static const symbol s_4_60[4] = { 'i', 's', 'e', 'm' }; static const symbol s_4_61[4] = { 'u', 's', 'e', 'm' }; static const symbol s_4_62[4] = { 0xE2, 's', 'e', 'm' }; static const symbol s_4_63[2] = { 'i', 'm' }; static const symbol s_4_64[2] = { 0xE2, 'm' }; static const symbol s_4_65[2] = { 0xE3, 'm' }; static const symbol s_4_66[4] = { 'a', 'r', 0xE3, 'm' }; static const symbol s_4_67[5] = { 's', 'e', 'r', 0xE3, 'm' }; static const symbol s_4_68[6] = { 'a', 's', 'e', 'r', 0xE3, 'm' }; static const symbol s_4_69[7] = { 's', 'e', 's', 'e', 'r', 0xE3, 'm' }; static const symbol s_4_70[6] = { 'i', 's', 'e', 'r', 0xE3, 'm' }; static const symbol s_4_71[6] = { 'u', 's', 'e', 'r', 0xE3, 'm' }; static const symbol s_4_72[6] = { 0xE2, 's', 'e', 'r', 0xE3, 'm' }; static const symbol s_4_73[4] = { 'i', 'r', 0xE3, 'm' }; static const symbol s_4_74[4] = { 'u', 'r', 0xE3, 'm' }; static const symbol s_4_75[4] = { 0xE2, 'r', 0xE3, 'm' }; static const symbol s_4_76[2] = { 'a', 'u' }; static const symbol s_4_77[3] = { 'e', 'a', 'u' }; static const symbol s_4_78[3] = { 'i', 'a', 'u' }; static const symbol s_4_79[4] = { 'i', 'n', 'd', 'u' }; static const symbol s_4_80[4] = { 0xE2, 'n', 'd', 'u' }; static const symbol s_4_81[2] = { 'e', 'z' }; static const symbol s_4_82[5] = { 'e', 'a', 's', 'c', 0xE3 }; static const symbol s_4_83[3] = { 'a', 'r', 0xE3 }; static const symbol s_4_84[4] = { 's', 'e', 'r', 0xE3 }; static const symbol s_4_85[5] = { 'a', 's', 'e', 'r', 0xE3 }; static const symbol s_4_86[6] = { 's', 'e', 's', 'e', 'r', 0xE3 }; static const symbol s_4_87[5] = { 'i', 's', 'e', 'r', 0xE3 }; static const symbol s_4_88[5] = { 'u', 's', 'e', 'r', 0xE3 }; static const symbol s_4_89[5] = { 0xE2, 's', 'e', 'r', 0xE3 }; static const symbol s_4_90[3] = { 'i', 'r', 0xE3 }; static const symbol s_4_91[3] = { 'u', 'r', 0xE3 }; static const symbol s_4_92[3] = { 0xE2, 'r', 0xE3 }; static const symbol s_4_93[4] = { 'e', 'a', 'z', 0xE3 }; static const struct among a_4[94] = { /* 0 */ { 2, s_4_0, -1, 1, 0}, /* 1 */ { 2, s_4_1, -1, 1, 0}, /* 2 */ { 3, s_4_2, -1, 1, 0}, /* 3 */ { 3, s_4_3, -1, 1, 0}, /* 4 */ { 3, s_4_4, -1, 1, 0}, /* 5 */ { 3, s_4_5, -1, 1, 0}, /* 6 */ { 3, s_4_6, -1, 1, 0}, /* 7 */ { 3, s_4_7, -1, 1, 0}, /* 8 */ { 3, s_4_8, -1, 1, 0}, /* 9 */ { 3, s_4_9, -1, 1, 0}, /* 10 */ { 2, s_4_10, -1, 2, 0}, /* 11 */ { 3, s_4_11, 10, 1, 0}, /* 12 */ { 4, s_4_12, 10, 2, 0}, /* 13 */ { 3, s_4_13, 10, 1, 0}, /* 14 */ { 3, s_4_14, 10, 1, 0}, /* 15 */ { 3, s_4_15, 10, 1, 0}, /* 16 */ { 4, s_4_16, -1, 1, 0}, /* 17 */ { 4, s_4_17, -1, 1, 0}, /* 18 */ { 3, s_4_18, -1, 1, 0}, /* 19 */ { 2, s_4_19, -1, 1, 0}, /* 20 */ { 3, s_4_20, 19, 1, 0}, /* 21 */ { 3, s_4_21, 19, 1, 0}, /* 22 */ { 3, s_4_22, -1, 2, 0}, /* 23 */ { 4, s_4_23, -1, 1, 0}, /* 24 */ { 4, s_4_24, -1, 1, 0}, /* 25 */ { 2, s_4_25, -1, 1, 0}, /* 26 */ { 3, s_4_26, -1, 1, 0}, /* 27 */ { 3, s_4_27, -1, 1, 0}, /* 28 */ { 4, s_4_28, -1, 2, 0}, /* 29 */ { 5, s_4_29, 28, 1, 0}, /* 30 */ { 6, s_4_30, 28, 2, 0}, /* 31 */ { 5, s_4_31, 28, 1, 0}, /* 32 */ { 5, s_4_32, 28, 1, 0}, /* 33 */ { 5, s_4_33, 28, 1, 0}, /* 34 */ { 3, s_4_34, -1, 1, 0}, /* 35 */ { 3, s_4_35, -1, 1, 0}, /* 36 */ { 3, s_4_36, -1, 1, 0}, /* 37 */ { 2, s_4_37, -1, 1, 0}, /* 38 */ { 3, s_4_38, -1, 2, 0}, /* 39 */ { 4, s_4_39, 38, 1, 0}, /* 40 */ { 4, s_4_40, 38, 1, 0}, /* 41 */ { 3, s_4_41, -1, 2, 0}, /* 42 */ { 3, s_4_42, -1, 2, 0}, /* 43 */ { 3, s_4_43, -1, 2, 0}, /* 44 */ { 5, s_4_44, -1, 1, 0}, /* 45 */ { 6, s_4_45, -1, 2, 0}, /* 46 */ { 7, s_4_46, 45, 1, 0}, /* 47 */ { 8, s_4_47, 45, 2, 0}, /* 48 */ { 7, s_4_48, 45, 1, 0}, /* 49 */ { 7, s_4_49, 45, 1, 0}, /* 50 */ { 7, s_4_50, 45, 1, 0}, /* 51 */ { 5, s_4_51, -1, 1, 0}, /* 52 */ { 5, s_4_52, -1, 1, 0}, /* 53 */ { 5, s_4_53, -1, 1, 0}, /* 54 */ { 2, s_4_54, -1, 1, 0}, /* 55 */ { 3, s_4_55, 54, 1, 0}, /* 56 */ { 3, s_4_56, 54, 1, 0}, /* 57 */ { 2, s_4_57, -1, 2, 0}, /* 58 */ { 4, s_4_58, 57, 1, 0}, /* 59 */ { 5, s_4_59, 57, 2, 0}, /* 60 */ { 4, s_4_60, 57, 1, 0}, /* 61 */ { 4, s_4_61, 57, 1, 0}, /* 62 */ { 4, s_4_62, 57, 1, 0}, /* 63 */ { 2, s_4_63, -1, 2, 0}, /* 64 */ { 2, s_4_64, -1, 2, 0}, /* 65 */ { 2, s_4_65, -1, 2, 0}, /* 66 */ { 4, s_4_66, 65, 1, 0}, /* 67 */ { 5, s_4_67, 65, 2, 0}, /* 68 */ { 6, s_4_68, 67, 1, 0}, /* 69 */ { 7, s_4_69, 67, 2, 0}, /* 70 */ { 6, s_4_70, 67, 1, 0}, /* 71 */ { 6, s_4_71, 67, 1, 0}, /* 72 */ { 6, s_4_72, 67, 1, 0}, /* 73 */ { 4, s_4_73, 65, 1, 0}, /* 74 */ { 4, s_4_74, 65, 1, 0}, /* 75 */ { 4, s_4_75, 65, 1, 0}, /* 76 */ { 2, s_4_76, -1, 1, 0}, /* 77 */ { 3, s_4_77, 76, 1, 0}, /* 78 */ { 3, s_4_78, 76, 1, 0}, /* 79 */ { 4, s_4_79, -1, 1, 0}, /* 80 */ { 4, s_4_80, -1, 1, 0}, /* 81 */ { 2, s_4_81, -1, 1, 0}, /* 82 */ { 5, s_4_82, -1, 1, 0}, /* 83 */ { 3, s_4_83, -1, 1, 0}, /* 84 */ { 4, s_4_84, -1, 2, 0}, /* 85 */ { 5, s_4_85, 84, 1, 0}, /* 86 */ { 6, s_4_86, 84, 2, 0}, /* 87 */ { 5, s_4_87, 84, 1, 0}, /* 88 */ { 5, s_4_88, 84, 1, 0}, /* 89 */ { 5, s_4_89, 84, 1, 0}, /* 90 */ { 3, s_4_90, -1, 1, 0}, /* 91 */ { 3, s_4_91, -1, 1, 0}, /* 92 */ { 3, s_4_92, -1, 1, 0}, /* 93 */ { 4, s_4_93, -1, 1, 0} }; static const symbol s_5_0[1] = { 'a' }; static const symbol s_5_1[1] = { 'e' }; static const symbol s_5_2[2] = { 'i', 'e' }; static const symbol s_5_3[1] = { 'i' }; static const symbol s_5_4[1] = { 0xE3 }; static const struct among a_5[5] = { /* 0 */ { 1, s_5_0, -1, 1, 0}, /* 1 */ { 1, s_5_1, -1, 1, 0}, /* 2 */ { 2, s_5_2, 1, 1, 0}, /* 3 */ { 1, s_5_3, -1, 1, 0}, /* 4 */ { 1, s_5_4, -1, 1, 0} }; static const unsigned char g_v[] = { 17, 65, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 32 }; static const symbol s_0[] = { 'u' }; static const symbol s_1[] = { 'U' }; static const symbol s_2[] = { 'i' }; static const symbol s_3[] = { 'I' }; static const symbol s_4[] = { 'i' }; static const symbol s_5[] = { 'u' }; static const symbol s_6[] = { 'a' }; static const symbol s_7[] = { 'e' }; static const symbol s_8[] = { 'i' }; static const symbol s_9[] = { 'a', 'b' }; static const symbol s_10[] = { 'i' }; static const symbol s_11[] = { 'a', 't' }; static const symbol s_12[] = { 'a', 0xFE, 'i' }; static const symbol s_13[] = { 'a', 'b', 'i', 'l' }; static const symbol s_14[] = { 'i', 'b', 'i', 'l' }; static const symbol s_15[] = { 'i', 'v' }; static const symbol s_16[] = { 'i', 'c' }; static const symbol s_17[] = { 'a', 't' }; static const symbol s_18[] = { 'i', 't' }; static const symbol s_19[] = { 0xFE }; static const symbol s_20[] = { 't' }; static const symbol s_21[] = { 'i', 's', 't' }; static const symbol s_22[] = { 'u' }; static int r_prelude(struct SN_env * z) { while(1) { /* repeat, line 32 */ int c1 = z->c; while(1) { /* goto, line 32 */ int c2 = z->c; if (in_grouping(z, g_v, 97, 238, 0)) goto lab1; z->bra = z->c; /* [, line 33 */ { int c3 = z->c; /* or, line 33 */ if (!(eq_s(z, 1, s_0))) goto lab3; z->ket = z->c; /* ], line 33 */ if (in_grouping(z, g_v, 97, 238, 0)) goto lab3; { int ret = slice_from_s(z, 1, s_1); /* <-, line 33 */ if (ret < 0) return ret; } goto lab2; lab3: z->c = c3; if (!(eq_s(z, 1, s_2))) goto lab1; z->ket = z->c; /* ], line 34 */ if (in_grouping(z, g_v, 97, 238, 0)) goto lab1; { int ret = slice_from_s(z, 1, s_3); /* <-, line 34 */ if (ret < 0) return ret; } } lab2: z->c = c2; break; lab1: z->c = c2; if (z->c >= z->l) goto lab0; z->c++; /* goto, line 32 */ } continue; lab0: z->c = c1; break; } return 1; } static int r_mark_regions(struct SN_env * z) { z->I[0] = z->l; z->I[1] = z->l; z->I[2] = z->l; { int c1 = z->c; /* do, line 44 */ { int c2 = z->c; /* or, line 46 */ if (in_grouping(z, g_v, 97, 238, 0)) goto lab2; { int c3 = z->c; /* or, line 45 */ if (out_grouping(z, g_v, 97, 238, 0)) goto lab4; { /* gopast */ /* grouping v, line 45 */ int ret = out_grouping(z, g_v, 97, 238, 1); if (ret < 0) goto lab4; z->c += ret; } goto lab3; lab4: z->c = c3; if (in_grouping(z, g_v, 97, 238, 0)) goto lab2; { /* gopast */ /* non v, line 45 */ int ret = in_grouping(z, g_v, 97, 238, 1); if (ret < 0) goto lab2; z->c += ret; } } lab3: goto lab1; lab2: z->c = c2; if (out_grouping(z, g_v, 97, 238, 0)) goto lab0; { int c4 = z->c; /* or, line 47 */ if (out_grouping(z, g_v, 97, 238, 0)) goto lab6; { /* gopast */ /* grouping v, line 47 */ int ret = out_grouping(z, g_v, 97, 238, 1); if (ret < 0) goto lab6; z->c += ret; } goto lab5; lab6: z->c = c4; if (in_grouping(z, g_v, 97, 238, 0)) goto lab0; if (z->c >= z->l) goto lab0; z->c++; /* next, line 47 */ } lab5: ; } lab1: z->I[0] = z->c; /* setmark pV, line 48 */ lab0: z->c = c1; } { int c5 = z->c; /* do, line 50 */ { /* gopast */ /* grouping v, line 51 */ int ret = out_grouping(z, g_v, 97, 238, 1); if (ret < 0) goto lab7; z->c += ret; } { /* gopast */ /* non v, line 51 */ int ret = in_grouping(z, g_v, 97, 238, 1); if (ret < 0) goto lab7; z->c += ret; } z->I[1] = z->c; /* setmark p1, line 51 */ { /* gopast */ /* grouping v, line 52 */ int ret = out_grouping(z, g_v, 97, 238, 1); if (ret < 0) goto lab7; z->c += ret; } { /* gopast */ /* non v, line 52 */ int ret = in_grouping(z, g_v, 97, 238, 1); if (ret < 0) goto lab7; z->c += ret; } z->I[2] = z->c; /* setmark p2, line 52 */ lab7: z->c = c5; } return 1; } static int r_postlude(struct SN_env * z) { int among_var; while(1) { /* repeat, line 56 */ int c1 = z->c; z->bra = z->c; /* [, line 58 */ if (z->c >= z->l || (z->p[z->c + 0] != 73 && z->p[z->c + 0] != 85)) among_var = 3; else among_var = find_among(z, a_0, 3); /* substring, line 58 */ if (!(among_var)) goto lab0; z->ket = z->c; /* ], line 58 */ switch(among_var) { case 0: goto lab0; case 1: { int ret = slice_from_s(z, 1, s_4); /* <-, line 59 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 1, s_5); /* <-, line 60 */ if (ret < 0) return ret; } break; case 3: if (z->c >= z->l) goto lab0; z->c++; /* next, line 61 */ break; } continue; lab0: z->c = c1; break; } return 1; } static int r_RV(struct SN_env * z) { if (!(z->I[0] <= z->c)) return 0; return 1; } static int r_R1(struct SN_env * z) { if (!(z->I[1] <= z->c)) return 0; return 1; } static int r_R2(struct SN_env * z) { if (!(z->I[2] <= z->c)) return 0; return 1; } static int r_step_0(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 73 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((266786 >> (z->p[z->c - 1] & 0x1f)) & 1)) return 0; among_var = find_among_b(z, a_1, 16); /* substring, line 73 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 73 */ { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 73 */ if (ret < 0) return ret; } switch(among_var) { case 0: return 0; case 1: { int ret = slice_del(z); /* delete, line 75 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 1, s_6); /* <-, line 77 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_from_s(z, 1, s_7); /* <-, line 79 */ if (ret < 0) return ret; } break; case 4: { int ret = slice_from_s(z, 1, s_8); /* <-, line 81 */ if (ret < 0) return ret; } break; case 5: { int m1 = z->l - z->c; (void)m1; /* not, line 83 */ if (!(eq_s_b(z, 2, s_9))) goto lab0; return 0; lab0: z->c = z->l - m1; } { int ret = slice_from_s(z, 1, s_10); /* <-, line 83 */ if (ret < 0) return ret; } break; case 6: { int ret = slice_from_s(z, 2, s_11); /* <-, line 85 */ if (ret < 0) return ret; } break; case 7: { int ret = slice_from_s(z, 3, s_12); /* <-, line 87 */ if (ret < 0) return ret; } break; } return 1; } static int r_combo_suffix(struct SN_env * z) { int among_var; { int m_test = z->l - z->c; /* test, line 91 */ z->ket = z->c; /* [, line 92 */ among_var = find_among_b(z, a_2, 46); /* substring, line 92 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 92 */ { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 92 */ if (ret < 0) return ret; } switch(among_var) { case 0: return 0; case 1: { int ret = slice_from_s(z, 4, s_13); /* <-, line 101 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 4, s_14); /* <-, line 104 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_from_s(z, 2, s_15); /* <-, line 107 */ if (ret < 0) return ret; } break; case 4: { int ret = slice_from_s(z, 2, s_16); /* <-, line 113 */ if (ret < 0) return ret; } break; case 5: { int ret = slice_from_s(z, 2, s_17); /* <-, line 118 */ if (ret < 0) return ret; } break; case 6: { int ret = slice_from_s(z, 2, s_18); /* <-, line 122 */ if (ret < 0) return ret; } break; } z->B[0] = 1; /* set standard_suffix_removed, line 125 */ z->c = z->l - m_test; } return 1; } static int r_standard_suffix(struct SN_env * z) { int among_var; z->B[0] = 0; /* unset standard_suffix_removed, line 130 */ while(1) { /* repeat, line 131 */ int m1 = z->l - z->c; (void)m1; { int ret = r_combo_suffix(z); if (ret == 0) goto lab0; /* call combo_suffix, line 131 */ if (ret < 0) return ret; } continue; lab0: z->c = z->l - m1; break; } z->ket = z->c; /* [, line 132 */ among_var = find_among_b(z, a_3, 62); /* substring, line 132 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 132 */ { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 132 */ if (ret < 0) return ret; } switch(among_var) { case 0: return 0; case 1: { int ret = slice_del(z); /* delete, line 149 */ if (ret < 0) return ret; } break; case 2: if (!(eq_s_b(z, 1, s_19))) return 0; z->bra = z->c; /* ], line 152 */ { int ret = slice_from_s(z, 1, s_20); /* <-, line 152 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_from_s(z, 3, s_21); /* <-, line 156 */ if (ret < 0) return ret; } break; } z->B[0] = 1; /* set standard_suffix_removed, line 160 */ return 1; } static int r_verb_suffix(struct SN_env * z) { int among_var; { int mlimit; /* setlimit, line 164 */ int m1 = z->l - z->c; (void)m1; if (z->c < z->I[0]) return 0; z->c = z->I[0]; /* tomark, line 164 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m1; z->ket = z->c; /* [, line 165 */ among_var = find_among_b(z, a_4, 94); /* substring, line 165 */ if (!(among_var)) { z->lb = mlimit; return 0; } z->bra = z->c; /* ], line 165 */ switch(among_var) { case 0: { z->lb = mlimit; return 0; } case 1: { int m2 = z->l - z->c; (void)m2; /* or, line 200 */ if (out_grouping_b(z, g_v, 97, 238, 0)) goto lab1; goto lab0; lab1: z->c = z->l - m2; if (!(eq_s_b(z, 1, s_22))) { z->lb = mlimit; return 0; } } lab0: { int ret = slice_del(z); /* delete, line 200 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_del(z); /* delete, line 214 */ if (ret < 0) return ret; } break; } z->lb = mlimit; } return 1; } static int r_vowel_suffix(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 219 */ among_var = find_among_b(z, a_5, 5); /* substring, line 219 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 219 */ { int ret = r_RV(z); if (ret == 0) return 0; /* call RV, line 219 */ if (ret < 0) return ret; } switch(among_var) { case 0: return 0; case 1: { int ret = slice_del(z); /* delete, line 220 */ if (ret < 0) return ret; } break; } return 1; } extern int romanian_ISO_8859_2_stem(struct SN_env * z) { { int c1 = z->c; /* do, line 226 */ { int ret = r_prelude(z); if (ret == 0) goto lab0; /* call prelude, line 226 */ if (ret < 0) return ret; } lab0: z->c = c1; } { int c2 = z->c; /* do, line 227 */ { int ret = r_mark_regions(z); if (ret == 0) goto lab1; /* call mark_regions, line 227 */ if (ret < 0) return ret; } lab1: z->c = c2; } z->lb = z->c; z->c = z->l; /* backwards, line 228 */ { int m3 = z->l - z->c; (void)m3; /* do, line 229 */ { int ret = r_step_0(z); if (ret == 0) goto lab2; /* call step_0, line 229 */ if (ret < 0) return ret; } lab2: z->c = z->l - m3; } { int m4 = z->l - z->c; (void)m4; /* do, line 230 */ { int ret = r_standard_suffix(z); if (ret == 0) goto lab3; /* call standard_suffix, line 230 */ if (ret < 0) return ret; } lab3: z->c = z->l - m4; } { int m5 = z->l - z->c; (void)m5; /* do, line 231 */ { int m6 = z->l - z->c; (void)m6; /* or, line 231 */ if (!(z->B[0])) goto lab6; /* Boolean test standard_suffix_removed, line 231 */ goto lab5; lab6: z->c = z->l - m6; { int ret = r_verb_suffix(z); if (ret == 0) goto lab4; /* call verb_suffix, line 231 */ if (ret < 0) return ret; } } lab5: lab4: z->c = z->l - m5; } { int m7 = z->l - z->c; (void)m7; /* do, line 232 */ { int ret = r_vowel_suffix(z); if (ret == 0) goto lab7; /* call vowel_suffix, line 232 */ if (ret < 0) return ret; } lab7: z->c = z->l - m7; } z->c = z->lb; { int c8 = z->c; /* do, line 234 */ { int ret = r_postlude(z); if (ret == 0) goto lab8; /* call postlude, line 234 */ if (ret < 0) return ret; } lab8: z->c = c8; } return 1; } extern struct SN_env * romanian_ISO_8859_2_create_env(void) { return SN_create_env(0, 3, 1); } extern void romanian_ISO_8859_2_close_env(struct SN_env * z) { SN_close_env(z, 0); } swish-e-2.4.7/src/snowball/stem_se.h0000664000077100017500000000051011166010107014235 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #ifdef __cplusplus extern "C" { #endif extern struct SN_env * swedish_ISO_8859_1_create_env(void); extern void swedish_ISO_8859_1_close_env(struct SN_env * z); extern int swedish_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif swish-e-2.4.7/src/snowball/stem_ru.h0000664000077100017500000000047411166010110014257 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #ifdef __cplusplus extern "C" { #endif extern struct SN_env * russian_KOI8_R_create_env(void); extern void russian_KOI8_R_close_env(struct SN_env * z); extern int russian_KOI8_R_stem(struct SN_env * z); #ifdef __cplusplus } #endif swish-e-2.4.7/src/snowball/api.h0000664000077100017500000000146011166010107013354 00000000000000#ifndef STEMMER_API_H #define STEMMER_API_H 1 typedef unsigned char symbol; /* Or replace 'char' above with 'short' for 16 bit characters. More precisely, replace 'char' with whatever type guarantees the character width you need. Note however that sizeof(symbol) should divide HEAD, defined in header.h as 2*sizeof(int), without remainder, otherwise there is an alignment problem. In the unlikely event of a problem here, consult Martin Porter. */ struct SN_env { symbol * p; int c; int l; int lb; int bra; int ket; symbol * * S; int * I; unsigned char * B; }; extern struct SN_env * SN_create_env(int S_size, int I_size, int B_size); extern void SN_close_env(struct SN_env * z, int S_size); extern int SN_set_current(struct SN_env * z, int size, const symbol * s); #endif swish-e-2.4.7/src/snowball/stem_ru.c0000664000077100017500000005716711166010110014265 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #include "header.h" #ifdef __cplusplus extern "C" { #endif extern int russian_KOI8_R_stem(struct SN_env * z); #ifdef __cplusplus } #endif static int r_tidy_up(struct SN_env * z); static int r_derivational(struct SN_env * z); static int r_noun(struct SN_env * z); static int r_verb(struct SN_env * z); static int r_reflexive(struct SN_env * z); static int r_adjectival(struct SN_env * z); static int r_adjective(struct SN_env * z); static int r_perfective_gerund(struct SN_env * z); static int r_R2(struct SN_env * z); static int r_mark_regions(struct SN_env * z); #ifdef __cplusplus extern "C" { #endif extern struct SN_env * russian_KOI8_R_create_env(void); extern void russian_KOI8_R_close_env(struct SN_env * z); #ifdef __cplusplus } #endif static const symbol s_0_0[3] = { 0xD7, 0xDB, 0xC9 }; static const symbol s_0_1[4] = { 0xC9, 0xD7, 0xDB, 0xC9 }; static const symbol s_0_2[4] = { 0xD9, 0xD7, 0xDB, 0xC9 }; static const symbol s_0_3[1] = { 0xD7 }; static const symbol s_0_4[2] = { 0xC9, 0xD7 }; static const symbol s_0_5[2] = { 0xD9, 0xD7 }; static const symbol s_0_6[5] = { 0xD7, 0xDB, 0xC9, 0xD3, 0xD8 }; static const symbol s_0_7[6] = { 0xC9, 0xD7, 0xDB, 0xC9, 0xD3, 0xD8 }; static const symbol s_0_8[6] = { 0xD9, 0xD7, 0xDB, 0xC9, 0xD3, 0xD8 }; static const struct among a_0[9] = { /* 0 */ { 3, s_0_0, -1, 1, 0}, /* 1 */ { 4, s_0_1, 0, 2, 0}, /* 2 */ { 4, s_0_2, 0, 2, 0}, /* 3 */ { 1, s_0_3, -1, 1, 0}, /* 4 */ { 2, s_0_4, 3, 2, 0}, /* 5 */ { 2, s_0_5, 3, 2, 0}, /* 6 */ { 5, s_0_6, -1, 1, 0}, /* 7 */ { 6, s_0_7, 6, 2, 0}, /* 8 */ { 6, s_0_8, 6, 2, 0} }; static const symbol s_1_0[2] = { 0xC0, 0xC0 }; static const symbol s_1_1[2] = { 0xC5, 0xC0 }; static const symbol s_1_2[2] = { 0xCF, 0xC0 }; static const symbol s_1_3[2] = { 0xD5, 0xC0 }; static const symbol s_1_4[2] = { 0xC5, 0xC5 }; static const symbol s_1_5[2] = { 0xC9, 0xC5 }; static const symbol s_1_6[2] = { 0xCF, 0xC5 }; static const symbol s_1_7[2] = { 0xD9, 0xC5 }; static const symbol s_1_8[2] = { 0xC9, 0xC8 }; static const symbol s_1_9[2] = { 0xD9, 0xC8 }; static const symbol s_1_10[3] = { 0xC9, 0xCD, 0xC9 }; static const symbol s_1_11[3] = { 0xD9, 0xCD, 0xC9 }; static const symbol s_1_12[2] = { 0xC5, 0xCA }; static const symbol s_1_13[2] = { 0xC9, 0xCA }; static const symbol s_1_14[2] = { 0xCF, 0xCA }; static const symbol s_1_15[2] = { 0xD9, 0xCA }; static const symbol s_1_16[2] = { 0xC5, 0xCD }; static const symbol s_1_17[2] = { 0xC9, 0xCD }; static const symbol s_1_18[2] = { 0xCF, 0xCD }; static const symbol s_1_19[2] = { 0xD9, 0xCD }; static const symbol s_1_20[3] = { 0xC5, 0xC7, 0xCF }; static const symbol s_1_21[3] = { 0xCF, 0xC7, 0xCF }; static const symbol s_1_22[2] = { 0xC1, 0xD1 }; static const symbol s_1_23[2] = { 0xD1, 0xD1 }; static const symbol s_1_24[3] = { 0xC5, 0xCD, 0xD5 }; static const symbol s_1_25[3] = { 0xCF, 0xCD, 0xD5 }; static const struct among a_1[26] = { /* 0 */ { 2, s_1_0, -1, 1, 0}, /* 1 */ { 2, s_1_1, -1, 1, 0}, /* 2 */ { 2, s_1_2, -1, 1, 0}, /* 3 */ { 2, s_1_3, -1, 1, 0}, /* 4 */ { 2, s_1_4, -1, 1, 0}, /* 5 */ { 2, s_1_5, -1, 1, 0}, /* 6 */ { 2, s_1_6, -1, 1, 0}, /* 7 */ { 2, s_1_7, -1, 1, 0}, /* 8 */ { 2, s_1_8, -1, 1, 0}, /* 9 */ { 2, s_1_9, -1, 1, 0}, /* 10 */ { 3, s_1_10, -1, 1, 0}, /* 11 */ { 3, s_1_11, -1, 1, 0}, /* 12 */ { 2, s_1_12, -1, 1, 0}, /* 13 */ { 2, s_1_13, -1, 1, 0}, /* 14 */ { 2, s_1_14, -1, 1, 0}, /* 15 */ { 2, s_1_15, -1, 1, 0}, /* 16 */ { 2, s_1_16, -1, 1, 0}, /* 17 */ { 2, s_1_17, -1, 1, 0}, /* 18 */ { 2, s_1_18, -1, 1, 0}, /* 19 */ { 2, s_1_19, -1, 1, 0}, /* 20 */ { 3, s_1_20, -1, 1, 0}, /* 21 */ { 3, s_1_21, -1, 1, 0}, /* 22 */ { 2, s_1_22, -1, 1, 0}, /* 23 */ { 2, s_1_23, -1, 1, 0}, /* 24 */ { 3, s_1_24, -1, 1, 0}, /* 25 */ { 3, s_1_25, -1, 1, 0} }; static const symbol s_2_0[2] = { 0xC5, 0xCD }; static const symbol s_2_1[2] = { 0xCE, 0xCE }; static const symbol s_2_2[2] = { 0xD7, 0xDB }; static const symbol s_2_3[3] = { 0xC9, 0xD7, 0xDB }; static const symbol s_2_4[3] = { 0xD9, 0xD7, 0xDB }; static const symbol s_2_5[1] = { 0xDD }; static const symbol s_2_6[2] = { 0xC0, 0xDD }; static const symbol s_2_7[3] = { 0xD5, 0xC0, 0xDD }; static const struct among a_2[8] = { /* 0 */ { 2, s_2_0, -1, 1, 0}, /* 1 */ { 2, s_2_1, -1, 1, 0}, /* 2 */ { 2, s_2_2, -1, 1, 0}, /* 3 */ { 3, s_2_3, 2, 2, 0}, /* 4 */ { 3, s_2_4, 2, 2, 0}, /* 5 */ { 1, s_2_5, -1, 1, 0}, /* 6 */ { 2, s_2_6, 5, 1, 0}, /* 7 */ { 3, s_2_7, 6, 2, 0} }; static const symbol s_3_0[2] = { 0xD3, 0xD1 }; static const symbol s_3_1[2] = { 0xD3, 0xD8 }; static const struct among a_3[2] = { /* 0 */ { 2, s_3_0, -1, 1, 0}, /* 1 */ { 2, s_3_1, -1, 1, 0} }; static const symbol s_4_0[1] = { 0xC0 }; static const symbol s_4_1[2] = { 0xD5, 0xC0 }; static const symbol s_4_2[2] = { 0xCC, 0xC1 }; static const symbol s_4_3[3] = { 0xC9, 0xCC, 0xC1 }; static const symbol s_4_4[3] = { 0xD9, 0xCC, 0xC1 }; static const symbol s_4_5[2] = { 0xCE, 0xC1 }; static const symbol s_4_6[3] = { 0xC5, 0xCE, 0xC1 }; static const symbol s_4_7[3] = { 0xC5, 0xD4, 0xC5 }; static const symbol s_4_8[3] = { 0xC9, 0xD4, 0xC5 }; static const symbol s_4_9[3] = { 0xCA, 0xD4, 0xC5 }; static const symbol s_4_10[4] = { 0xC5, 0xCA, 0xD4, 0xC5 }; static const symbol s_4_11[4] = { 0xD5, 0xCA, 0xD4, 0xC5 }; static const symbol s_4_12[2] = { 0xCC, 0xC9 }; static const symbol s_4_13[3] = { 0xC9, 0xCC, 0xC9 }; static const symbol s_4_14[3] = { 0xD9, 0xCC, 0xC9 }; static const symbol s_4_15[1] = { 0xCA }; static const symbol s_4_16[2] = { 0xC5, 0xCA }; static const symbol s_4_17[2] = { 0xD5, 0xCA }; static const symbol s_4_18[1] = { 0xCC }; static const symbol s_4_19[2] = { 0xC9, 0xCC }; static const symbol s_4_20[2] = { 0xD9, 0xCC }; static const symbol s_4_21[2] = { 0xC5, 0xCD }; static const symbol s_4_22[2] = { 0xC9, 0xCD }; static const symbol s_4_23[2] = { 0xD9, 0xCD }; static const symbol s_4_24[1] = { 0xCE }; static const symbol s_4_25[2] = { 0xC5, 0xCE }; static const symbol s_4_26[2] = { 0xCC, 0xCF }; static const symbol s_4_27[3] = { 0xC9, 0xCC, 0xCF }; static const symbol s_4_28[3] = { 0xD9, 0xCC, 0xCF }; static const symbol s_4_29[2] = { 0xCE, 0xCF }; static const symbol s_4_30[3] = { 0xC5, 0xCE, 0xCF }; static const symbol s_4_31[3] = { 0xCE, 0xCE, 0xCF }; static const symbol s_4_32[2] = { 0xC0, 0xD4 }; static const symbol s_4_33[3] = { 0xD5, 0xC0, 0xD4 }; static const symbol s_4_34[2] = { 0xC5, 0xD4 }; static const symbol s_4_35[3] = { 0xD5, 0xC5, 0xD4 }; static const symbol s_4_36[2] = { 0xC9, 0xD4 }; static const symbol s_4_37[2] = { 0xD1, 0xD4 }; static const symbol s_4_38[2] = { 0xD9, 0xD4 }; static const symbol s_4_39[2] = { 0xD4, 0xD8 }; static const symbol s_4_40[3] = { 0xC9, 0xD4, 0xD8 }; static const symbol s_4_41[3] = { 0xD9, 0xD4, 0xD8 }; static const symbol s_4_42[3] = { 0xC5, 0xDB, 0xD8 }; static const symbol s_4_43[3] = { 0xC9, 0xDB, 0xD8 }; static const symbol s_4_44[2] = { 0xCE, 0xD9 }; static const symbol s_4_45[3] = { 0xC5, 0xCE, 0xD9 }; static const struct among a_4[46] = { /* 0 */ { 1, s_4_0, -1, 2, 0}, /* 1 */ { 2, s_4_1, 0, 2, 0}, /* 2 */ { 2, s_4_2, -1, 1, 0}, /* 3 */ { 3, s_4_3, 2, 2, 0}, /* 4 */ { 3, s_4_4, 2, 2, 0}, /* 5 */ { 2, s_4_5, -1, 1, 0}, /* 6 */ { 3, s_4_6, 5, 2, 0}, /* 7 */ { 3, s_4_7, -1, 1, 0}, /* 8 */ { 3, s_4_8, -1, 2, 0}, /* 9 */ { 3, s_4_9, -1, 1, 0}, /* 10 */ { 4, s_4_10, 9, 2, 0}, /* 11 */ { 4, s_4_11, 9, 2, 0}, /* 12 */ { 2, s_4_12, -1, 1, 0}, /* 13 */ { 3, s_4_13, 12, 2, 0}, /* 14 */ { 3, s_4_14, 12, 2, 0}, /* 15 */ { 1, s_4_15, -1, 1, 0}, /* 16 */ { 2, s_4_16, 15, 2, 0}, /* 17 */ { 2, s_4_17, 15, 2, 0}, /* 18 */ { 1, s_4_18, -1, 1, 0}, /* 19 */ { 2, s_4_19, 18, 2, 0}, /* 20 */ { 2, s_4_20, 18, 2, 0}, /* 21 */ { 2, s_4_21, -1, 1, 0}, /* 22 */ { 2, s_4_22, -1, 2, 0}, /* 23 */ { 2, s_4_23, -1, 2, 0}, /* 24 */ { 1, s_4_24, -1, 1, 0}, /* 25 */ { 2, s_4_25, 24, 2, 0}, /* 26 */ { 2, s_4_26, -1, 1, 0}, /* 27 */ { 3, s_4_27, 26, 2, 0}, /* 28 */ { 3, s_4_28, 26, 2, 0}, /* 29 */ { 2, s_4_29, -1, 1, 0}, /* 30 */ { 3, s_4_30, 29, 2, 0}, /* 31 */ { 3, s_4_31, 29, 1, 0}, /* 32 */ { 2, s_4_32, -1, 1, 0}, /* 33 */ { 3, s_4_33, 32, 2, 0}, /* 34 */ { 2, s_4_34, -1, 1, 0}, /* 35 */ { 3, s_4_35, 34, 2, 0}, /* 36 */ { 2, s_4_36, -1, 2, 0}, /* 37 */ { 2, s_4_37, -1, 2, 0}, /* 38 */ { 2, s_4_38, -1, 2, 0}, /* 39 */ { 2, s_4_39, -1, 1, 0}, /* 40 */ { 3, s_4_40, 39, 2, 0}, /* 41 */ { 3, s_4_41, 39, 2, 0}, /* 42 */ { 3, s_4_42, -1, 1, 0}, /* 43 */ { 3, s_4_43, -1, 2, 0}, /* 44 */ { 2, s_4_44, -1, 1, 0}, /* 45 */ { 3, s_4_45, 44, 2, 0} }; static const symbol s_5_0[1] = { 0xC0 }; static const symbol s_5_1[2] = { 0xC9, 0xC0 }; static const symbol s_5_2[2] = { 0xD8, 0xC0 }; static const symbol s_5_3[1] = { 0xC1 }; static const symbol s_5_4[1] = { 0xC5 }; static const symbol s_5_5[2] = { 0xC9, 0xC5 }; static const symbol s_5_6[2] = { 0xD8, 0xC5 }; static const symbol s_5_7[2] = { 0xC1, 0xC8 }; static const symbol s_5_8[2] = { 0xD1, 0xC8 }; static const symbol s_5_9[3] = { 0xC9, 0xD1, 0xC8 }; static const symbol s_5_10[1] = { 0xC9 }; static const symbol s_5_11[2] = { 0xC5, 0xC9 }; static const symbol s_5_12[2] = { 0xC9, 0xC9 }; static const symbol s_5_13[3] = { 0xC1, 0xCD, 0xC9 }; static const symbol s_5_14[3] = { 0xD1, 0xCD, 0xC9 }; static const symbol s_5_15[4] = { 0xC9, 0xD1, 0xCD, 0xC9 }; static const symbol s_5_16[1] = { 0xCA }; static const symbol s_5_17[2] = { 0xC5, 0xCA }; static const symbol s_5_18[3] = { 0xC9, 0xC5, 0xCA }; static const symbol s_5_19[2] = { 0xC9, 0xCA }; static const symbol s_5_20[2] = { 0xCF, 0xCA }; static const symbol s_5_21[2] = { 0xC1, 0xCD }; static const symbol s_5_22[2] = { 0xC5, 0xCD }; static const symbol s_5_23[3] = { 0xC9, 0xC5, 0xCD }; static const symbol s_5_24[2] = { 0xCF, 0xCD }; static const symbol s_5_25[2] = { 0xD1, 0xCD }; static const symbol s_5_26[3] = { 0xC9, 0xD1, 0xCD }; static const symbol s_5_27[1] = { 0xCF }; static const symbol s_5_28[1] = { 0xD1 }; static const symbol s_5_29[2] = { 0xC9, 0xD1 }; static const symbol s_5_30[2] = { 0xD8, 0xD1 }; static const symbol s_5_31[1] = { 0xD5 }; static const symbol s_5_32[2] = { 0xC5, 0xD7 }; static const symbol s_5_33[2] = { 0xCF, 0xD7 }; static const symbol s_5_34[1] = { 0xD8 }; static const symbol s_5_35[1] = { 0xD9 }; static const struct among a_5[36] = { /* 0 */ { 1, s_5_0, -1, 1, 0}, /* 1 */ { 2, s_5_1, 0, 1, 0}, /* 2 */ { 2, s_5_2, 0, 1, 0}, /* 3 */ { 1, s_5_3, -1, 1, 0}, /* 4 */ { 1, s_5_4, -1, 1, 0}, /* 5 */ { 2, s_5_5, 4, 1, 0}, /* 6 */ { 2, s_5_6, 4, 1, 0}, /* 7 */ { 2, s_5_7, -1, 1, 0}, /* 8 */ { 2, s_5_8, -1, 1, 0}, /* 9 */ { 3, s_5_9, 8, 1, 0}, /* 10 */ { 1, s_5_10, -1, 1, 0}, /* 11 */ { 2, s_5_11, 10, 1, 0}, /* 12 */ { 2, s_5_12, 10, 1, 0}, /* 13 */ { 3, s_5_13, 10, 1, 0}, /* 14 */ { 3, s_5_14, 10, 1, 0}, /* 15 */ { 4, s_5_15, 14, 1, 0}, /* 16 */ { 1, s_5_16, -1, 1, 0}, /* 17 */ { 2, s_5_17, 16, 1, 0}, /* 18 */ { 3, s_5_18, 17, 1, 0}, /* 19 */ { 2, s_5_19, 16, 1, 0}, /* 20 */ { 2, s_5_20, 16, 1, 0}, /* 21 */ { 2, s_5_21, -1, 1, 0}, /* 22 */ { 2, s_5_22, -1, 1, 0}, /* 23 */ { 3, s_5_23, 22, 1, 0}, /* 24 */ { 2, s_5_24, -1, 1, 0}, /* 25 */ { 2, s_5_25, -1, 1, 0}, /* 26 */ { 3, s_5_26, 25, 1, 0}, /* 27 */ { 1, s_5_27, -1, 1, 0}, /* 28 */ { 1, s_5_28, -1, 1, 0}, /* 29 */ { 2, s_5_29, 28, 1, 0}, /* 30 */ { 2, s_5_30, 28, 1, 0}, /* 31 */ { 1, s_5_31, -1, 1, 0}, /* 32 */ { 2, s_5_32, -1, 1, 0}, /* 33 */ { 2, s_5_33, -1, 1, 0}, /* 34 */ { 1, s_5_34, -1, 1, 0}, /* 35 */ { 1, s_5_35, -1, 1, 0} }; static const symbol s_6_0[3] = { 0xCF, 0xD3, 0xD4 }; static const symbol s_6_1[4] = { 0xCF, 0xD3, 0xD4, 0xD8 }; static const struct among a_6[2] = { /* 0 */ { 3, s_6_0, -1, 1, 0}, /* 1 */ { 4, s_6_1, -1, 1, 0} }; static const symbol s_7_0[4] = { 0xC5, 0xCA, 0xDB, 0xC5 }; static const symbol s_7_1[1] = { 0xCE }; static const symbol s_7_2[1] = { 0xD8 }; static const symbol s_7_3[3] = { 0xC5, 0xCA, 0xDB }; static const struct among a_7[4] = { /* 0 */ { 4, s_7_0, -1, 1, 0}, /* 1 */ { 1, s_7_1, -1, 2, 0}, /* 2 */ { 1, s_7_2, -1, 3, 0}, /* 3 */ { 3, s_7_3, -1, 1, 0} }; static const unsigned char g_v[] = { 35, 130, 34, 18 }; static const symbol s_0[] = { 0xC1 }; static const symbol s_1[] = { 0xD1 }; static const symbol s_2[] = { 0xC1 }; static const symbol s_3[] = { 0xD1 }; static const symbol s_4[] = { 0xC1 }; static const symbol s_5[] = { 0xD1 }; static const symbol s_6[] = { 0xCE }; static const symbol s_7[] = { 0xCE }; static const symbol s_8[] = { 0xCE }; static const symbol s_9[] = { 0xC9 }; static int r_mark_regions(struct SN_env * z) { z->I[0] = z->l; z->I[1] = z->l; { int c1 = z->c; /* do, line 63 */ { /* gopast */ /* grouping v, line 64 */ int ret = out_grouping(z, g_v, 192, 220, 1); if (ret < 0) goto lab0; z->c += ret; } z->I[0] = z->c; /* setmark pV, line 64 */ { /* gopast */ /* non v, line 64 */ int ret = in_grouping(z, g_v, 192, 220, 1); if (ret < 0) goto lab0; z->c += ret; } { /* gopast */ /* grouping v, line 65 */ int ret = out_grouping(z, g_v, 192, 220, 1); if (ret < 0) goto lab0; z->c += ret; } { /* gopast */ /* non v, line 65 */ int ret = in_grouping(z, g_v, 192, 220, 1); if (ret < 0) goto lab0; z->c += ret; } z->I[1] = z->c; /* setmark p2, line 65 */ lab0: z->c = c1; } return 1; } static int r_R2(struct SN_env * z) { if (!(z->I[1] <= z->c)) return 0; return 1; } static int r_perfective_gerund(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 74 */ if (z->c <= z->lb || z->p[z->c - 1] >> 5 != 6 || !((25166336 >> (z->p[z->c - 1] & 0x1f)) & 1)) return 0; among_var = find_among_b(z, a_0, 9); /* substring, line 74 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 74 */ switch(among_var) { case 0: return 0; case 1: { int m1 = z->l - z->c; (void)m1; /* or, line 78 */ if (!(eq_s_b(z, 1, s_0))) goto lab1; goto lab0; lab1: z->c = z->l - m1; if (!(eq_s_b(z, 1, s_1))) return 0; } lab0: { int ret = slice_del(z); /* delete, line 78 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_del(z); /* delete, line 85 */ if (ret < 0) return ret; } break; } return 1; } static int r_adjective(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 90 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 6 || !((2271009 >> (z->p[z->c - 1] & 0x1f)) & 1)) return 0; among_var = find_among_b(z, a_1, 26); /* substring, line 90 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 90 */ switch(among_var) { case 0: return 0; case 1: { int ret = slice_del(z); /* delete, line 99 */ if (ret < 0) return ret; } break; } return 1; } static int r_adjectival(struct SN_env * z) { int among_var; { int ret = r_adjective(z); if (ret == 0) return 0; /* call adjective, line 104 */ if (ret < 0) return ret; } { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 111 */ z->ket = z->c; /* [, line 112 */ if (z->c <= z->lb || z->p[z->c - 1] >> 5 != 6 || !((671113216 >> (z->p[z->c - 1] & 0x1f)) & 1)) { z->c = z->l - m_keep; goto lab0; } among_var = find_among_b(z, a_2, 8); /* substring, line 112 */ if (!(among_var)) { z->c = z->l - m_keep; goto lab0; } z->bra = z->c; /* ], line 112 */ switch(among_var) { case 0: { z->c = z->l - m_keep; goto lab0; } case 1: { int m1 = z->l - z->c; (void)m1; /* or, line 117 */ if (!(eq_s_b(z, 1, s_2))) goto lab2; goto lab1; lab2: z->c = z->l - m1; if (!(eq_s_b(z, 1, s_3))) { z->c = z->l - m_keep; goto lab0; } } lab1: { int ret = slice_del(z); /* delete, line 117 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_del(z); /* delete, line 124 */ if (ret < 0) return ret; } break; } lab0: ; } return 1; } static int r_reflexive(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 131 */ if (z->c - 1 <= z->lb || (z->p[z->c - 1] != 209 && z->p[z->c - 1] != 216)) return 0; among_var = find_among_b(z, a_3, 2); /* substring, line 131 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 131 */ switch(among_var) { case 0: return 0; case 1: { int ret = slice_del(z); /* delete, line 134 */ if (ret < 0) return ret; } break; } return 1; } static int r_verb(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 139 */ if (z->c <= z->lb || z->p[z->c - 1] >> 5 != 6 || !((51443235 >> (z->p[z->c - 1] & 0x1f)) & 1)) return 0; among_var = find_among_b(z, a_4, 46); /* substring, line 139 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 139 */ switch(among_var) { case 0: return 0; case 1: { int m1 = z->l - z->c; (void)m1; /* or, line 145 */ if (!(eq_s_b(z, 1, s_4))) goto lab1; goto lab0; lab1: z->c = z->l - m1; if (!(eq_s_b(z, 1, s_5))) return 0; } lab0: { int ret = slice_del(z); /* delete, line 145 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_del(z); /* delete, line 153 */ if (ret < 0) return ret; } break; } return 1; } static int r_noun(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 162 */ if (z->c <= z->lb || z->p[z->c - 1] >> 5 != 6 || !((60991267 >> (z->p[z->c - 1] & 0x1f)) & 1)) return 0; among_var = find_among_b(z, a_5, 36); /* substring, line 162 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 162 */ switch(among_var) { case 0: return 0; case 1: { int ret = slice_del(z); /* delete, line 169 */ if (ret < 0) return ret; } break; } return 1; } static int r_derivational(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 178 */ if (z->c - 2 <= z->lb || (z->p[z->c - 1] != 212 && z->p[z->c - 1] != 216)) return 0; among_var = find_among_b(z, a_6, 2); /* substring, line 178 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 178 */ { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 178 */ if (ret < 0) return ret; } switch(among_var) { case 0: return 0; case 1: { int ret = slice_del(z); /* delete, line 181 */ if (ret < 0) return ret; } break; } return 1; } static int r_tidy_up(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 186 */ if (z->c <= z->lb || z->p[z->c - 1] >> 5 != 6 || !((151011360 >> (z->p[z->c - 1] & 0x1f)) & 1)) return 0; among_var = find_among_b(z, a_7, 4); /* substring, line 186 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 186 */ switch(among_var) { case 0: return 0; case 1: { int ret = slice_del(z); /* delete, line 190 */ if (ret < 0) return ret; } z->ket = z->c; /* [, line 191 */ if (!(eq_s_b(z, 1, s_6))) return 0; z->bra = z->c; /* ], line 191 */ if (!(eq_s_b(z, 1, s_7))) return 0; { int ret = slice_del(z); /* delete, line 191 */ if (ret < 0) return ret; } break; case 2: if (!(eq_s_b(z, 1, s_8))) return 0; { int ret = slice_del(z); /* delete, line 194 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_del(z); /* delete, line 196 */ if (ret < 0) return ret; } break; } return 1; } extern int russian_KOI8_R_stem(struct SN_env * z) { { int c1 = z->c; /* do, line 203 */ { int ret = r_mark_regions(z); if (ret == 0) goto lab0; /* call mark_regions, line 203 */ if (ret < 0) return ret; } lab0: z->c = c1; } z->lb = z->c; z->c = z->l; /* backwards, line 204 */ { int mlimit; /* setlimit, line 204 */ int m2 = z->l - z->c; (void)m2; if (z->c < z->I[0]) return 0; z->c = z->I[0]; /* tomark, line 204 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m2; { int m3 = z->l - z->c; (void)m3; /* do, line 205 */ { int m4 = z->l - z->c; (void)m4; /* or, line 206 */ { int ret = r_perfective_gerund(z); if (ret == 0) goto lab3; /* call perfective_gerund, line 206 */ if (ret < 0) return ret; } goto lab2; lab3: z->c = z->l - m4; { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 207 */ { int ret = r_reflexive(z); if (ret == 0) { z->c = z->l - m_keep; goto lab4; } /* call reflexive, line 207 */ if (ret < 0) return ret; } lab4: ; } { int m5 = z->l - z->c; (void)m5; /* or, line 208 */ { int ret = r_adjectival(z); if (ret == 0) goto lab6; /* call adjectival, line 208 */ if (ret < 0) return ret; } goto lab5; lab6: z->c = z->l - m5; { int ret = r_verb(z); if (ret == 0) goto lab7; /* call verb, line 208 */ if (ret < 0) return ret; } goto lab5; lab7: z->c = z->l - m5; { int ret = r_noun(z); if (ret == 0) goto lab1; /* call noun, line 208 */ if (ret < 0) return ret; } } lab5: ; } lab2: lab1: z->c = z->l - m3; } { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 211 */ z->ket = z->c; /* [, line 211 */ if (!(eq_s_b(z, 1, s_9))) { z->c = z->l - m_keep; goto lab8; } z->bra = z->c; /* ], line 211 */ { int ret = slice_del(z); /* delete, line 211 */ if (ret < 0) return ret; } lab8: ; } { int m6 = z->l - z->c; (void)m6; /* do, line 214 */ { int ret = r_derivational(z); if (ret == 0) goto lab9; /* call derivational, line 214 */ if (ret < 0) return ret; } lab9: z->c = z->l - m6; } { int m7 = z->l - z->c; (void)m7; /* do, line 215 */ { int ret = r_tidy_up(z); if (ret == 0) goto lab10; /* call tidy_up, line 215 */ if (ret < 0) return ret; } lab10: z->c = z->l - m7; } z->lb = mlimit; } z->c = z->lb; return 1; } extern struct SN_env * russian_KOI8_R_create_env(void) { return SN_create_env(0, 2, 0); } extern void russian_KOI8_R_close_env(struct SN_env * z) { SN_close_env(z, 0); } swish-e-2.4.7/src/snowball/stem_en2.h0000664000077100017500000000051011166010110014304 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #ifdef __cplusplus extern "C" { #endif extern struct SN_env * english_ISO_8859_1_create_env(void); extern void english_ISO_8859_1_close_env(struct SN_env * z); extern int english_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif swish-e-2.4.7/src/snowball/stem_dk.h0000664000077100017500000000050511166010110014222 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #ifdef __cplusplus extern "C" { #endif extern struct SN_env * danish_ISO_8859_1_create_env(void); extern void danish_ISO_8859_1_close_env(struct SN_env * z); extern int danish_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif swish-e-2.4.7/src/snowball/header.h0000664000077100017500000000466711166010110014041 00000000000000 #include #include "api.h" #define MAXINT INT_MAX #define MININT INT_MIN #define HEAD 2*sizeof(int) #define SIZE(p) ((int *)(p))[-1] #define SET_SIZE(p, n) ((int *)(p))[-1] = n #define CAPACITY(p) ((int *)(p))[-2] struct among { int s_size; /* number of chars in string */ const symbol * s; /* search string */ int substring_i;/* index to longest matching substring */ int result; /* result of the lookup */ int (* function)(struct SN_env *); }; extern symbol * create_s(void); extern void lose_s(symbol * p); extern int skip_utf8(const symbol * p, int c, int lb, int l, int n); extern int in_grouping_U(struct SN_env * z, const unsigned char * s, int min, int max, int repeat); extern int in_grouping_b_U(struct SN_env * z, const unsigned char * s, int min, int max, int repeat); extern int out_grouping_U(struct SN_env * z, const unsigned char * s, int min, int max, int repeat); extern int out_grouping_b_U(struct SN_env * z, const unsigned char * s, int min, int max, int repeat); extern int in_grouping(struct SN_env * z, const unsigned char * s, int min, int max, int repeat); extern int in_grouping_b(struct SN_env * z, const unsigned char * s, int min, int max, int repeat); extern int out_grouping(struct SN_env * z, const unsigned char * s, int min, int max, int repeat); extern int out_grouping_b(struct SN_env * z, const unsigned char * s, int min, int max, int repeat); extern int eq_s(struct SN_env * z, int s_size, const symbol * s); extern int eq_s_b(struct SN_env * z, int s_size, const symbol * s); extern int eq_v(struct SN_env * z, const symbol * p); extern int eq_v_b(struct SN_env * z, const symbol * p); extern int find_among(struct SN_env * z, const struct among * v, int v_size); extern int find_among_b(struct SN_env * z, const struct among * v, int v_size); extern int replace_s(struct SN_env * z, int c_bra, int c_ket, int s_size, const symbol * s, int * adjustment); extern int slice_from_s(struct SN_env * z, int s_size, const symbol * s); extern int slice_from_v(struct SN_env * z, const symbol * p); extern int slice_del(struct SN_env * z); extern int insert_s(struct SN_env * z, int bra, int ket, int s_size, const symbol * s); extern int insert_v(struct SN_env * z, int bra, int ket, const symbol * p); extern symbol * slice_to(struct SN_env * z, symbol * p); extern symbol * assign_to(struct SN_env * z, symbol * p); extern void debug(struct SN_env * z, int number, int line_count); swish-e-2.4.7/src/snowball/stem_it.h0000664000077100017500000000051011166010107014242 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #ifdef __cplusplus extern "C" { #endif extern struct SN_env * italian_ISO_8859_1_create_env(void); extern void italian_ISO_8859_1_close_env(struct SN_env * z); extern int italian_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif swish-e-2.4.7/src/snowball/api.c0000664000077100017500000000255011166010107013350 00000000000000 #include /* for calloc, free */ #include "header.h" extern struct SN_env * SN_create_env(int S_size, int I_size, int B_size) { struct SN_env * z = (struct SN_env *) calloc(1, sizeof(struct SN_env)); if (z == NULL) return NULL; z->p = create_s(); if (z->p == NULL) goto error; if (S_size) { int i; z->S = (symbol * *) calloc(S_size, sizeof(symbol *)); if (z->S == NULL) goto error; for (i = 0; i < S_size; i++) { z->S[i] = create_s(); if (z->S[i] == NULL) goto error; } } if (I_size) { z->I = (int *) calloc(I_size, sizeof(int)); if (z->I == NULL) goto error; } if (B_size) { z->B = (unsigned char *) calloc(B_size, sizeof(unsigned char)); if (z->B == NULL) goto error; } return z; error: SN_close_env(z, S_size); return NULL; } extern void SN_close_env(struct SN_env * z, int S_size) { if (z == NULL) return; if (S_size) { int i; for (i = 0; i < S_size; i++) { lose_s(z->S[i]); } free(z->S); } free(z->I); free(z->B); if (z->p) lose_s(z->p); free(z); } extern int SN_set_current(struct SN_env * z, int size, const symbol * s) { int err = replace_s(z, 0, z->l, size, s, NULL); z->c = 0; return err; } swish-e-2.4.7/src/snowball/stem_es.h0000664000077100017500000000051011166010107014235 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #ifdef __cplusplus extern "C" { #endif extern struct SN_env * spanish_ISO_8859_1_create_env(void); extern void spanish_ISO_8859_1_close_env(struct SN_env * z); extern int spanish_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif swish-e-2.4.7/src/snowball/stem_fr.c0000664000077100017500000013454611166010110014243 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #include "header.h" #ifdef __cplusplus extern "C" { #endif extern int french_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif static int r_un_accent(struct SN_env * z); static int r_un_double(struct SN_env * z); static int r_residual_suffix(struct SN_env * z); static int r_verb_suffix(struct SN_env * z); static int r_i_verb_suffix(struct SN_env * z); static int r_standard_suffix(struct SN_env * z); static int r_R2(struct SN_env * z); static int r_R1(struct SN_env * z); static int r_RV(struct SN_env * z); static int r_mark_regions(struct SN_env * z); static int r_postlude(struct SN_env * z); static int r_prelude(struct SN_env * z); #ifdef __cplusplus extern "C" { #endif extern struct SN_env * french_ISO_8859_1_create_env(void); extern void french_ISO_8859_1_close_env(struct SN_env * z); #ifdef __cplusplus } #endif static const symbol s_0_0[3] = { 'c', 'o', 'l' }; static const symbol s_0_1[3] = { 'p', 'a', 'r' }; static const symbol s_0_2[3] = { 't', 'a', 'p' }; static const struct among a_0[3] = { /* 0 */ { 3, s_0_0, -1, -1, 0}, /* 1 */ { 3, s_0_1, -1, -1, 0}, /* 2 */ { 3, s_0_2, -1, -1, 0} }; static const symbol s_1_1[1] = { 'I' }; static const symbol s_1_2[1] = { 'U' }; static const symbol s_1_3[1] = { 'Y' }; static const struct among a_1[4] = { /* 0 */ { 0, 0, -1, 4, 0}, /* 1 */ { 1, s_1_1, 0, 1, 0}, /* 2 */ { 1, s_1_2, 0, 2, 0}, /* 3 */ { 1, s_1_3, 0, 3, 0} }; static const symbol s_2_0[3] = { 'i', 'q', 'U' }; static const symbol s_2_1[3] = { 'a', 'b', 'l' }; static const symbol s_2_2[3] = { 'I', 0xE8, 'r' }; static const symbol s_2_3[3] = { 'i', 0xE8, 'r' }; static const symbol s_2_4[3] = { 'e', 'u', 's' }; static const symbol s_2_5[2] = { 'i', 'v' }; static const struct among a_2[6] = { /* 0 */ { 3, s_2_0, -1, 3, 0}, /* 1 */ { 3, s_2_1, -1, 3, 0}, /* 2 */ { 3, s_2_2, -1, 4, 0}, /* 3 */ { 3, s_2_3, -1, 4, 0}, /* 4 */ { 3, s_2_4, -1, 2, 0}, /* 5 */ { 2, s_2_5, -1, 1, 0} }; static const symbol s_3_0[2] = { 'i', 'c' }; static const symbol s_3_1[4] = { 'a', 'b', 'i', 'l' }; static const symbol s_3_2[2] = { 'i', 'v' }; static const struct among a_3[3] = { /* 0 */ { 2, s_3_0, -1, 2, 0}, /* 1 */ { 4, s_3_1, -1, 1, 0}, /* 2 */ { 2, s_3_2, -1, 3, 0} }; static const symbol s_4_0[4] = { 'i', 'q', 'U', 'e' }; static const symbol s_4_1[6] = { 'a', 't', 'r', 'i', 'c', 'e' }; static const symbol s_4_2[4] = { 'a', 'n', 'c', 'e' }; static const symbol s_4_3[4] = { 'e', 'n', 'c', 'e' }; static const symbol s_4_4[5] = { 'l', 'o', 'g', 'i', 'e' }; static const symbol s_4_5[4] = { 'a', 'b', 'l', 'e' }; static const symbol s_4_6[4] = { 'i', 's', 'm', 'e' }; static const symbol s_4_7[4] = { 'e', 'u', 's', 'e' }; static const symbol s_4_8[4] = { 'i', 's', 't', 'e' }; static const symbol s_4_9[3] = { 'i', 'v', 'e' }; static const symbol s_4_10[2] = { 'i', 'f' }; static const symbol s_4_11[5] = { 'u', 's', 'i', 'o', 'n' }; static const symbol s_4_12[5] = { 'a', 't', 'i', 'o', 'n' }; static const symbol s_4_13[5] = { 'u', 't', 'i', 'o', 'n' }; static const symbol s_4_14[5] = { 'a', 't', 'e', 'u', 'r' }; static const symbol s_4_15[5] = { 'i', 'q', 'U', 'e', 's' }; static const symbol s_4_16[7] = { 'a', 't', 'r', 'i', 'c', 'e', 's' }; static const symbol s_4_17[5] = { 'a', 'n', 'c', 'e', 's' }; static const symbol s_4_18[5] = { 'e', 'n', 'c', 'e', 's' }; static const symbol s_4_19[6] = { 'l', 'o', 'g', 'i', 'e', 's' }; static const symbol s_4_20[5] = { 'a', 'b', 'l', 'e', 's' }; static const symbol s_4_21[5] = { 'i', 's', 'm', 'e', 's' }; static const symbol s_4_22[5] = { 'e', 'u', 's', 'e', 's' }; static const symbol s_4_23[5] = { 'i', 's', 't', 'e', 's' }; static const symbol s_4_24[4] = { 'i', 'v', 'e', 's' }; static const symbol s_4_25[3] = { 'i', 'f', 's' }; static const symbol s_4_26[6] = { 'u', 's', 'i', 'o', 'n', 's' }; static const symbol s_4_27[6] = { 'a', 't', 'i', 'o', 'n', 's' }; static const symbol s_4_28[6] = { 'u', 't', 'i', 'o', 'n', 's' }; static const symbol s_4_29[6] = { 'a', 't', 'e', 'u', 'r', 's' }; static const symbol s_4_30[5] = { 'm', 'e', 'n', 't', 's' }; static const symbol s_4_31[6] = { 'e', 'm', 'e', 'n', 't', 's' }; static const symbol s_4_32[9] = { 'i', 's', 's', 'e', 'm', 'e', 'n', 't', 's' }; static const symbol s_4_33[4] = { 'i', 't', 0xE9, 's' }; static const symbol s_4_34[4] = { 'm', 'e', 'n', 't' }; static const symbol s_4_35[5] = { 'e', 'm', 'e', 'n', 't' }; static const symbol s_4_36[8] = { 'i', 's', 's', 'e', 'm', 'e', 'n', 't' }; static const symbol s_4_37[6] = { 'a', 'm', 'm', 'e', 'n', 't' }; static const symbol s_4_38[6] = { 'e', 'm', 'm', 'e', 'n', 't' }; static const symbol s_4_39[3] = { 'a', 'u', 'x' }; static const symbol s_4_40[4] = { 'e', 'a', 'u', 'x' }; static const symbol s_4_41[3] = { 'e', 'u', 'x' }; static const symbol s_4_42[3] = { 'i', 't', 0xE9 }; static const struct among a_4[43] = { /* 0 */ { 4, s_4_0, -1, 1, 0}, /* 1 */ { 6, s_4_1, -1, 2, 0}, /* 2 */ { 4, s_4_2, -1, 1, 0}, /* 3 */ { 4, s_4_3, -1, 5, 0}, /* 4 */ { 5, s_4_4, -1, 3, 0}, /* 5 */ { 4, s_4_5, -1, 1, 0}, /* 6 */ { 4, s_4_6, -1, 1, 0}, /* 7 */ { 4, s_4_7, -1, 11, 0}, /* 8 */ { 4, s_4_8, -1, 1, 0}, /* 9 */ { 3, s_4_9, -1, 8, 0}, /* 10 */ { 2, s_4_10, -1, 8, 0}, /* 11 */ { 5, s_4_11, -1, 4, 0}, /* 12 */ { 5, s_4_12, -1, 2, 0}, /* 13 */ { 5, s_4_13, -1, 4, 0}, /* 14 */ { 5, s_4_14, -1, 2, 0}, /* 15 */ { 5, s_4_15, -1, 1, 0}, /* 16 */ { 7, s_4_16, -1, 2, 0}, /* 17 */ { 5, s_4_17, -1, 1, 0}, /* 18 */ { 5, s_4_18, -1, 5, 0}, /* 19 */ { 6, s_4_19, -1, 3, 0}, /* 20 */ { 5, s_4_20, -1, 1, 0}, /* 21 */ { 5, s_4_21, -1, 1, 0}, /* 22 */ { 5, s_4_22, -1, 11, 0}, /* 23 */ { 5, s_4_23, -1, 1, 0}, /* 24 */ { 4, s_4_24, -1, 8, 0}, /* 25 */ { 3, s_4_25, -1, 8, 0}, /* 26 */ { 6, s_4_26, -1, 4, 0}, /* 27 */ { 6, s_4_27, -1, 2, 0}, /* 28 */ { 6, s_4_28, -1, 4, 0}, /* 29 */ { 6, s_4_29, -1, 2, 0}, /* 30 */ { 5, s_4_30, -1, 15, 0}, /* 31 */ { 6, s_4_31, 30, 6, 0}, /* 32 */ { 9, s_4_32, 31, 12, 0}, /* 33 */ { 4, s_4_33, -1, 7, 0}, /* 34 */ { 4, s_4_34, -1, 15, 0}, /* 35 */ { 5, s_4_35, 34, 6, 0}, /* 36 */ { 8, s_4_36, 35, 12, 0}, /* 37 */ { 6, s_4_37, 34, 13, 0}, /* 38 */ { 6, s_4_38, 34, 14, 0}, /* 39 */ { 3, s_4_39, -1, 10, 0}, /* 40 */ { 4, s_4_40, 39, 9, 0}, /* 41 */ { 3, s_4_41, -1, 1, 0}, /* 42 */ { 3, s_4_42, -1, 7, 0} }; static const symbol s_5_0[3] = { 'i', 'r', 'a' }; static const symbol s_5_1[2] = { 'i', 'e' }; static const symbol s_5_2[4] = { 'i', 's', 's', 'e' }; static const symbol s_5_3[7] = { 'i', 's', 's', 'a', 'n', 't', 'e' }; static const symbol s_5_4[1] = { 'i' }; static const symbol s_5_5[4] = { 'i', 'r', 'a', 'i' }; static const symbol s_5_6[2] = { 'i', 'r' }; static const symbol s_5_7[4] = { 'i', 'r', 'a', 's' }; static const symbol s_5_8[3] = { 'i', 'e', 's' }; static const symbol s_5_9[4] = { 0xEE, 'm', 'e', 's' }; static const symbol s_5_10[5] = { 'i', 's', 's', 'e', 's' }; static const symbol s_5_11[8] = { 'i', 's', 's', 'a', 'n', 't', 'e', 's' }; static const symbol s_5_12[4] = { 0xEE, 't', 'e', 's' }; static const symbol s_5_13[2] = { 'i', 's' }; static const symbol s_5_14[5] = { 'i', 'r', 'a', 'i', 's' }; static const symbol s_5_15[6] = { 'i', 's', 's', 'a', 'i', 's' }; static const symbol s_5_16[6] = { 'i', 'r', 'i', 'o', 'n', 's' }; static const symbol s_5_17[7] = { 'i', 's', 's', 'i', 'o', 'n', 's' }; static const symbol s_5_18[5] = { 'i', 'r', 'o', 'n', 's' }; static const symbol s_5_19[6] = { 'i', 's', 's', 'o', 'n', 's' }; static const symbol s_5_20[7] = { 'i', 's', 's', 'a', 'n', 't', 's' }; static const symbol s_5_21[2] = { 'i', 't' }; static const symbol s_5_22[5] = { 'i', 'r', 'a', 'i', 't' }; static const symbol s_5_23[6] = { 'i', 's', 's', 'a', 'i', 't' }; static const symbol s_5_24[6] = { 'i', 's', 's', 'a', 'n', 't' }; static const symbol s_5_25[7] = { 'i', 'r', 'a', 'I', 'e', 'n', 't' }; static const symbol s_5_26[8] = { 'i', 's', 's', 'a', 'I', 'e', 'n', 't' }; static const symbol s_5_27[5] = { 'i', 'r', 'e', 'n', 't' }; static const symbol s_5_28[6] = { 'i', 's', 's', 'e', 'n', 't' }; static const symbol s_5_29[5] = { 'i', 'r', 'o', 'n', 't' }; static const symbol s_5_30[2] = { 0xEE, 't' }; static const symbol s_5_31[5] = { 'i', 'r', 'i', 'e', 'z' }; static const symbol s_5_32[6] = { 'i', 's', 's', 'i', 'e', 'z' }; static const symbol s_5_33[4] = { 'i', 'r', 'e', 'z' }; static const symbol s_5_34[5] = { 'i', 's', 's', 'e', 'z' }; static const struct among a_5[35] = { /* 0 */ { 3, s_5_0, -1, 1, 0}, /* 1 */ { 2, s_5_1, -1, 1, 0}, /* 2 */ { 4, s_5_2, -1, 1, 0}, /* 3 */ { 7, s_5_3, -1, 1, 0}, /* 4 */ { 1, s_5_4, -1, 1, 0}, /* 5 */ { 4, s_5_5, 4, 1, 0}, /* 6 */ { 2, s_5_6, -1, 1, 0}, /* 7 */ { 4, s_5_7, -1, 1, 0}, /* 8 */ { 3, s_5_8, -1, 1, 0}, /* 9 */ { 4, s_5_9, -1, 1, 0}, /* 10 */ { 5, s_5_10, -1, 1, 0}, /* 11 */ { 8, s_5_11, -1, 1, 0}, /* 12 */ { 4, s_5_12, -1, 1, 0}, /* 13 */ { 2, s_5_13, -1, 1, 0}, /* 14 */ { 5, s_5_14, 13, 1, 0}, /* 15 */ { 6, s_5_15, 13, 1, 0}, /* 16 */ { 6, s_5_16, -1, 1, 0}, /* 17 */ { 7, s_5_17, -1, 1, 0}, /* 18 */ { 5, s_5_18, -1, 1, 0}, /* 19 */ { 6, s_5_19, -1, 1, 0}, /* 20 */ { 7, s_5_20, -1, 1, 0}, /* 21 */ { 2, s_5_21, -1, 1, 0}, /* 22 */ { 5, s_5_22, 21, 1, 0}, /* 23 */ { 6, s_5_23, 21, 1, 0}, /* 24 */ { 6, s_5_24, -1, 1, 0}, /* 25 */ { 7, s_5_25, -1, 1, 0}, /* 26 */ { 8, s_5_26, -1, 1, 0}, /* 27 */ { 5, s_5_27, -1, 1, 0}, /* 28 */ { 6, s_5_28, -1, 1, 0}, /* 29 */ { 5, s_5_29, -1, 1, 0}, /* 30 */ { 2, s_5_30, -1, 1, 0}, /* 31 */ { 5, s_5_31, -1, 1, 0}, /* 32 */ { 6, s_5_32, -1, 1, 0}, /* 33 */ { 4, s_5_33, -1, 1, 0}, /* 34 */ { 5, s_5_34, -1, 1, 0} }; static const symbol s_6_0[1] = { 'a' }; static const symbol s_6_1[3] = { 'e', 'r', 'a' }; static const symbol s_6_2[4] = { 'a', 's', 's', 'e' }; static const symbol s_6_3[4] = { 'a', 'n', 't', 'e' }; static const symbol s_6_4[2] = { 0xE9, 'e' }; static const symbol s_6_5[2] = { 'a', 'i' }; static const symbol s_6_6[4] = { 'e', 'r', 'a', 'i' }; static const symbol s_6_7[2] = { 'e', 'r' }; static const symbol s_6_8[2] = { 'a', 's' }; static const symbol s_6_9[4] = { 'e', 'r', 'a', 's' }; static const symbol s_6_10[4] = { 0xE2, 'm', 'e', 's' }; static const symbol s_6_11[5] = { 'a', 's', 's', 'e', 's' }; static const symbol s_6_12[5] = { 'a', 'n', 't', 'e', 's' }; static const symbol s_6_13[4] = { 0xE2, 't', 'e', 's' }; static const symbol s_6_14[3] = { 0xE9, 'e', 's' }; static const symbol s_6_15[3] = { 'a', 'i', 's' }; static const symbol s_6_16[5] = { 'e', 'r', 'a', 'i', 's' }; static const symbol s_6_17[4] = { 'i', 'o', 'n', 's' }; static const symbol s_6_18[6] = { 'e', 'r', 'i', 'o', 'n', 's' }; static const symbol s_6_19[7] = { 'a', 's', 's', 'i', 'o', 'n', 's' }; static const symbol s_6_20[5] = { 'e', 'r', 'o', 'n', 's' }; static const symbol s_6_21[4] = { 'a', 'n', 't', 's' }; static const symbol s_6_22[2] = { 0xE9, 's' }; static const symbol s_6_23[3] = { 'a', 'i', 't' }; static const symbol s_6_24[5] = { 'e', 'r', 'a', 'i', 't' }; static const symbol s_6_25[3] = { 'a', 'n', 't' }; static const symbol s_6_26[5] = { 'a', 'I', 'e', 'n', 't' }; static const symbol s_6_27[7] = { 'e', 'r', 'a', 'I', 'e', 'n', 't' }; static const symbol s_6_28[5] = { 0xE8, 'r', 'e', 'n', 't' }; static const symbol s_6_29[6] = { 'a', 's', 's', 'e', 'n', 't' }; static const symbol s_6_30[5] = { 'e', 'r', 'o', 'n', 't' }; static const symbol s_6_31[2] = { 0xE2, 't' }; static const symbol s_6_32[2] = { 'e', 'z' }; static const symbol s_6_33[3] = { 'i', 'e', 'z' }; static const symbol s_6_34[5] = { 'e', 'r', 'i', 'e', 'z' }; static const symbol s_6_35[6] = { 'a', 's', 's', 'i', 'e', 'z' }; static const symbol s_6_36[4] = { 'e', 'r', 'e', 'z' }; static const symbol s_6_37[1] = { 0xE9 }; static const struct among a_6[38] = { /* 0 */ { 1, s_6_0, -1, 3, 0}, /* 1 */ { 3, s_6_1, 0, 2, 0}, /* 2 */ { 4, s_6_2, -1, 3, 0}, /* 3 */ { 4, s_6_3, -1, 3, 0}, /* 4 */ { 2, s_6_4, -1, 2, 0}, /* 5 */ { 2, s_6_5, -1, 3, 0}, /* 6 */ { 4, s_6_6, 5, 2, 0}, /* 7 */ { 2, s_6_7, -1, 2, 0}, /* 8 */ { 2, s_6_8, -1, 3, 0}, /* 9 */ { 4, s_6_9, 8, 2, 0}, /* 10 */ { 4, s_6_10, -1, 3, 0}, /* 11 */ { 5, s_6_11, -1, 3, 0}, /* 12 */ { 5, s_6_12, -1, 3, 0}, /* 13 */ { 4, s_6_13, -1, 3, 0}, /* 14 */ { 3, s_6_14, -1, 2, 0}, /* 15 */ { 3, s_6_15, -1, 3, 0}, /* 16 */ { 5, s_6_16, 15, 2, 0}, /* 17 */ { 4, s_6_17, -1, 1, 0}, /* 18 */ { 6, s_6_18, 17, 2, 0}, /* 19 */ { 7, s_6_19, 17, 3, 0}, /* 20 */ { 5, s_6_20, -1, 2, 0}, /* 21 */ { 4, s_6_21, -1, 3, 0}, /* 22 */ { 2, s_6_22, -1, 2, 0}, /* 23 */ { 3, s_6_23, -1, 3, 0}, /* 24 */ { 5, s_6_24, 23, 2, 0}, /* 25 */ { 3, s_6_25, -1, 3, 0}, /* 26 */ { 5, s_6_26, -1, 3, 0}, /* 27 */ { 7, s_6_27, 26, 2, 0}, /* 28 */ { 5, s_6_28, -1, 2, 0}, /* 29 */ { 6, s_6_29, -1, 3, 0}, /* 30 */ { 5, s_6_30, -1, 2, 0}, /* 31 */ { 2, s_6_31, -1, 3, 0}, /* 32 */ { 2, s_6_32, -1, 2, 0}, /* 33 */ { 3, s_6_33, 32, 2, 0}, /* 34 */ { 5, s_6_34, 33, 2, 0}, /* 35 */ { 6, s_6_35, 33, 3, 0}, /* 36 */ { 4, s_6_36, 32, 2, 0}, /* 37 */ { 1, s_6_37, -1, 2, 0} }; static const symbol s_7_0[1] = { 'e' }; static const symbol s_7_1[4] = { 'I', 0xE8, 'r', 'e' }; static const symbol s_7_2[4] = { 'i', 0xE8, 'r', 'e' }; static const symbol s_7_3[3] = { 'i', 'o', 'n' }; static const symbol s_7_4[3] = { 'I', 'e', 'r' }; static const symbol s_7_5[3] = { 'i', 'e', 'r' }; static const symbol s_7_6[1] = { 0xEB }; static const struct among a_7[7] = { /* 0 */ { 1, s_7_0, -1, 3, 0}, /* 1 */ { 4, s_7_1, 0, 2, 0}, /* 2 */ { 4, s_7_2, 0, 2, 0}, /* 3 */ { 3, s_7_3, -1, 1, 0}, /* 4 */ { 3, s_7_4, -1, 2, 0}, /* 5 */ { 3, s_7_5, -1, 2, 0}, /* 6 */ { 1, s_7_6, -1, 4, 0} }; static const symbol s_8_0[3] = { 'e', 'l', 'l' }; static const symbol s_8_1[4] = { 'e', 'i', 'l', 'l' }; static const symbol s_8_2[3] = { 'e', 'n', 'n' }; static const symbol s_8_3[3] = { 'o', 'n', 'n' }; static const symbol s_8_4[3] = { 'e', 't', 't' }; static const struct among a_8[5] = { /* 0 */ { 3, s_8_0, -1, -1, 0}, /* 1 */ { 4, s_8_1, -1, -1, 0}, /* 2 */ { 3, s_8_2, -1, -1, 0}, /* 3 */ { 3, s_8_3, -1, -1, 0}, /* 4 */ { 3, s_8_4, -1, -1, 0} }; static const unsigned char g_v[] = { 17, 65, 16, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 128, 130, 103, 8, 5 }; static const unsigned char g_keep_with_s[] = { 1, 65, 20, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 128 }; static const symbol s_0[] = { 'u' }; static const symbol s_1[] = { 'U' }; static const symbol s_2[] = { 'i' }; static const symbol s_3[] = { 'I' }; static const symbol s_4[] = { 'y' }; static const symbol s_5[] = { 'Y' }; static const symbol s_6[] = { 'y' }; static const symbol s_7[] = { 'Y' }; static const symbol s_8[] = { 'q' }; static const symbol s_9[] = { 'u' }; static const symbol s_10[] = { 'U' }; static const symbol s_11[] = { 'i' }; static const symbol s_12[] = { 'u' }; static const symbol s_13[] = { 'y' }; static const symbol s_14[] = { 'i', 'c' }; static const symbol s_15[] = { 'i', 'q', 'U' }; static const symbol s_16[] = { 'l', 'o', 'g' }; static const symbol s_17[] = { 'u' }; static const symbol s_18[] = { 'e', 'n', 't' }; static const symbol s_19[] = { 'a', 't' }; static const symbol s_20[] = { 'e', 'u', 'x' }; static const symbol s_21[] = { 'i' }; static const symbol s_22[] = { 'a', 'b', 'l' }; static const symbol s_23[] = { 'i', 'q', 'U' }; static const symbol s_24[] = { 'a', 't' }; static const symbol s_25[] = { 'i', 'c' }; static const symbol s_26[] = { 'i', 'q', 'U' }; static const symbol s_27[] = { 'e', 'a', 'u' }; static const symbol s_28[] = { 'a', 'l' }; static const symbol s_29[] = { 'e', 'u', 'x' }; static const symbol s_30[] = { 'a', 'n', 't' }; static const symbol s_31[] = { 'e', 'n', 't' }; static const symbol s_32[] = { 'e' }; static const symbol s_33[] = { 's' }; static const symbol s_34[] = { 's' }; static const symbol s_35[] = { 't' }; static const symbol s_36[] = { 'i' }; static const symbol s_37[] = { 'g', 'u' }; static const symbol s_38[] = { 0xE9 }; static const symbol s_39[] = { 0xE8 }; static const symbol s_40[] = { 'e' }; static const symbol s_41[] = { 'Y' }; static const symbol s_42[] = { 'i' }; static const symbol s_43[] = { 0xE7 }; static const symbol s_44[] = { 'c' }; static int r_prelude(struct SN_env * z) { while(1) { /* repeat, line 38 */ int c1 = z->c; while(1) { /* goto, line 38 */ int c2 = z->c; { int c3 = z->c; /* or, line 44 */ if (in_grouping(z, g_v, 97, 251, 0)) goto lab3; z->bra = z->c; /* [, line 40 */ { int c4 = z->c; /* or, line 40 */ if (!(eq_s(z, 1, s_0))) goto lab5; z->ket = z->c; /* ], line 40 */ if (in_grouping(z, g_v, 97, 251, 0)) goto lab5; { int ret = slice_from_s(z, 1, s_1); /* <-, line 40 */ if (ret < 0) return ret; } goto lab4; lab5: z->c = c4; if (!(eq_s(z, 1, s_2))) goto lab6; z->ket = z->c; /* ], line 41 */ if (in_grouping(z, g_v, 97, 251, 0)) goto lab6; { int ret = slice_from_s(z, 1, s_3); /* <-, line 41 */ if (ret < 0) return ret; } goto lab4; lab6: z->c = c4; if (!(eq_s(z, 1, s_4))) goto lab3; z->ket = z->c; /* ], line 42 */ { int ret = slice_from_s(z, 1, s_5); /* <-, line 42 */ if (ret < 0) return ret; } } lab4: goto lab2; lab3: z->c = c3; z->bra = z->c; /* [, line 45 */ if (!(eq_s(z, 1, s_6))) goto lab7; z->ket = z->c; /* ], line 45 */ if (in_grouping(z, g_v, 97, 251, 0)) goto lab7; { int ret = slice_from_s(z, 1, s_7); /* <-, line 45 */ if (ret < 0) return ret; } goto lab2; lab7: z->c = c3; if (!(eq_s(z, 1, s_8))) goto lab1; z->bra = z->c; /* [, line 47 */ if (!(eq_s(z, 1, s_9))) goto lab1; z->ket = z->c; /* ], line 47 */ { int ret = slice_from_s(z, 1, s_10); /* <-, line 47 */ if (ret < 0) return ret; } } lab2: z->c = c2; break; lab1: z->c = c2; if (z->c >= z->l) goto lab0; z->c++; /* goto, line 38 */ } continue; lab0: z->c = c1; break; } return 1; } static int r_mark_regions(struct SN_env * z) { z->I[0] = z->l; z->I[1] = z->l; z->I[2] = z->l; { int c1 = z->c; /* do, line 56 */ { int c2 = z->c; /* or, line 58 */ if (in_grouping(z, g_v, 97, 251, 0)) goto lab2; if (in_grouping(z, g_v, 97, 251, 0)) goto lab2; if (z->c >= z->l) goto lab2; z->c++; /* next, line 57 */ goto lab1; lab2: z->c = c2; if (z->c + 2 >= z->l || z->p[z->c + 2] >> 5 != 3 || !((331776 >> (z->p[z->c + 2] & 0x1f)) & 1)) goto lab3; if (!(find_among(z, a_0, 3))) goto lab3; /* among, line 59 */ goto lab1; lab3: z->c = c2; if (z->c >= z->l) goto lab0; z->c++; /* next, line 66 */ { /* gopast */ /* grouping v, line 66 */ int ret = out_grouping(z, g_v, 97, 251, 1); if (ret < 0) goto lab0; z->c += ret; } } lab1: z->I[0] = z->c; /* setmark pV, line 67 */ lab0: z->c = c1; } { int c3 = z->c; /* do, line 69 */ { /* gopast */ /* grouping v, line 70 */ int ret = out_grouping(z, g_v, 97, 251, 1); if (ret < 0) goto lab4; z->c += ret; } { /* gopast */ /* non v, line 70 */ int ret = in_grouping(z, g_v, 97, 251, 1); if (ret < 0) goto lab4; z->c += ret; } z->I[1] = z->c; /* setmark p1, line 70 */ { /* gopast */ /* grouping v, line 71 */ int ret = out_grouping(z, g_v, 97, 251, 1); if (ret < 0) goto lab4; z->c += ret; } { /* gopast */ /* non v, line 71 */ int ret = in_grouping(z, g_v, 97, 251, 1); if (ret < 0) goto lab4; z->c += ret; } z->I[2] = z->c; /* setmark p2, line 71 */ lab4: z->c = c3; } return 1; } static int r_postlude(struct SN_env * z) { int among_var; while(1) { /* repeat, line 75 */ int c1 = z->c; z->bra = z->c; /* [, line 77 */ if (z->c >= z->l || z->p[z->c + 0] >> 5 != 2 || !((35652096 >> (z->p[z->c + 0] & 0x1f)) & 1)) among_var = 4; else among_var = find_among(z, a_1, 4); /* substring, line 77 */ if (!(among_var)) goto lab0; z->ket = z->c; /* ], line 77 */ switch(among_var) { case 0: goto lab0; case 1: { int ret = slice_from_s(z, 1, s_11); /* <-, line 78 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 1, s_12); /* <-, line 79 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_from_s(z, 1, s_13); /* <-, line 80 */ if (ret < 0) return ret; } break; case 4: if (z->c >= z->l) goto lab0; z->c++; /* next, line 81 */ break; } continue; lab0: z->c = c1; break; } return 1; } static int r_RV(struct SN_env * z) { if (!(z->I[0] <= z->c)) return 0; return 1; } static int r_R1(struct SN_env * z) { if (!(z->I[1] <= z->c)) return 0; return 1; } static int r_R2(struct SN_env * z) { if (!(z->I[2] <= z->c)) return 0; return 1; } static int r_standard_suffix(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 92 */ among_var = find_among_b(z, a_4, 43); /* substring, line 92 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 92 */ switch(among_var) { case 0: return 0; case 1: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 96 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 96 */ if (ret < 0) return ret; } break; case 2: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 99 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 99 */ if (ret < 0) return ret; } { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 100 */ z->ket = z->c; /* [, line 100 */ if (!(eq_s_b(z, 2, s_14))) { z->c = z->l - m_keep; goto lab0; } z->bra = z->c; /* ], line 100 */ { int m1 = z->l - z->c; (void)m1; /* or, line 100 */ { int ret = r_R2(z); if (ret == 0) goto lab2; /* call R2, line 100 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 100 */ if (ret < 0) return ret; } goto lab1; lab2: z->c = z->l - m1; { int ret = slice_from_s(z, 3, s_15); /* <-, line 100 */ if (ret < 0) return ret; } } lab1: lab0: ; } break; case 3: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 104 */ if (ret < 0) return ret; } { int ret = slice_from_s(z, 3, s_16); /* <-, line 104 */ if (ret < 0) return ret; } break; case 4: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 107 */ if (ret < 0) return ret; } { int ret = slice_from_s(z, 1, s_17); /* <-, line 107 */ if (ret < 0) return ret; } break; case 5: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 110 */ if (ret < 0) return ret; } { int ret = slice_from_s(z, 3, s_18); /* <-, line 110 */ if (ret < 0) return ret; } break; case 6: { int ret = r_RV(z); if (ret == 0) return 0; /* call RV, line 114 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 114 */ if (ret < 0) return ret; } { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 115 */ z->ket = z->c; /* [, line 116 */ among_var = find_among_b(z, a_2, 6); /* substring, line 116 */ if (!(among_var)) { z->c = z->l - m_keep; goto lab3; } z->bra = z->c; /* ], line 116 */ switch(among_var) { case 0: { z->c = z->l - m_keep; goto lab3; } case 1: { int ret = r_R2(z); if (ret == 0) { z->c = z->l - m_keep; goto lab3; } /* call R2, line 117 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 117 */ if (ret < 0) return ret; } z->ket = z->c; /* [, line 117 */ if (!(eq_s_b(z, 2, s_19))) { z->c = z->l - m_keep; goto lab3; } z->bra = z->c; /* ], line 117 */ { int ret = r_R2(z); if (ret == 0) { z->c = z->l - m_keep; goto lab3; } /* call R2, line 117 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 117 */ if (ret < 0) return ret; } break; case 2: { int m2 = z->l - z->c; (void)m2; /* or, line 118 */ { int ret = r_R2(z); if (ret == 0) goto lab5; /* call R2, line 118 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 118 */ if (ret < 0) return ret; } goto lab4; lab5: z->c = z->l - m2; { int ret = r_R1(z); if (ret == 0) { z->c = z->l - m_keep; goto lab3; } /* call R1, line 118 */ if (ret < 0) return ret; } { int ret = slice_from_s(z, 3, s_20); /* <-, line 118 */ if (ret < 0) return ret; } } lab4: break; case 3: { int ret = r_R2(z); if (ret == 0) { z->c = z->l - m_keep; goto lab3; } /* call R2, line 120 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 120 */ if (ret < 0) return ret; } break; case 4: { int ret = r_RV(z); if (ret == 0) { z->c = z->l - m_keep; goto lab3; } /* call RV, line 122 */ if (ret < 0) return ret; } { int ret = slice_from_s(z, 1, s_21); /* <-, line 122 */ if (ret < 0) return ret; } break; } lab3: ; } break; case 7: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 129 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 129 */ if (ret < 0) return ret; } { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 130 */ z->ket = z->c; /* [, line 131 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((4198408 >> (z->p[z->c - 1] & 0x1f)) & 1)) { z->c = z->l - m_keep; goto lab6; } among_var = find_among_b(z, a_3, 3); /* substring, line 131 */ if (!(among_var)) { z->c = z->l - m_keep; goto lab6; } z->bra = z->c; /* ], line 131 */ switch(among_var) { case 0: { z->c = z->l - m_keep; goto lab6; } case 1: { int m3 = z->l - z->c; (void)m3; /* or, line 132 */ { int ret = r_R2(z); if (ret == 0) goto lab8; /* call R2, line 132 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 132 */ if (ret < 0) return ret; } goto lab7; lab8: z->c = z->l - m3; { int ret = slice_from_s(z, 3, s_22); /* <-, line 132 */ if (ret < 0) return ret; } } lab7: break; case 2: { int m4 = z->l - z->c; (void)m4; /* or, line 133 */ { int ret = r_R2(z); if (ret == 0) goto lab10; /* call R2, line 133 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 133 */ if (ret < 0) return ret; } goto lab9; lab10: z->c = z->l - m4; { int ret = slice_from_s(z, 3, s_23); /* <-, line 133 */ if (ret < 0) return ret; } } lab9: break; case 3: { int ret = r_R2(z); if (ret == 0) { z->c = z->l - m_keep; goto lab6; } /* call R2, line 134 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 134 */ if (ret < 0) return ret; } break; } lab6: ; } break; case 8: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 141 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 141 */ if (ret < 0) return ret; } { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 142 */ z->ket = z->c; /* [, line 142 */ if (!(eq_s_b(z, 2, s_24))) { z->c = z->l - m_keep; goto lab11; } z->bra = z->c; /* ], line 142 */ { int ret = r_R2(z); if (ret == 0) { z->c = z->l - m_keep; goto lab11; } /* call R2, line 142 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 142 */ if (ret < 0) return ret; } z->ket = z->c; /* [, line 142 */ if (!(eq_s_b(z, 2, s_25))) { z->c = z->l - m_keep; goto lab11; } z->bra = z->c; /* ], line 142 */ { int m5 = z->l - z->c; (void)m5; /* or, line 142 */ { int ret = r_R2(z); if (ret == 0) goto lab13; /* call R2, line 142 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 142 */ if (ret < 0) return ret; } goto lab12; lab13: z->c = z->l - m5; { int ret = slice_from_s(z, 3, s_26); /* <-, line 142 */ if (ret < 0) return ret; } } lab12: lab11: ; } break; case 9: { int ret = slice_from_s(z, 3, s_27); /* <-, line 144 */ if (ret < 0) return ret; } break; case 10: { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 145 */ if (ret < 0) return ret; } { int ret = slice_from_s(z, 2, s_28); /* <-, line 145 */ if (ret < 0) return ret; } break; case 11: { int m6 = z->l - z->c; (void)m6; /* or, line 147 */ { int ret = r_R2(z); if (ret == 0) goto lab15; /* call R2, line 147 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 147 */ if (ret < 0) return ret; } goto lab14; lab15: z->c = z->l - m6; { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 147 */ if (ret < 0) return ret; } { int ret = slice_from_s(z, 3, s_29); /* <-, line 147 */ if (ret < 0) return ret; } } lab14: break; case 12: { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 150 */ if (ret < 0) return ret; } if (out_grouping_b(z, g_v, 97, 251, 0)) return 0; { int ret = slice_del(z); /* delete, line 150 */ if (ret < 0) return ret; } break; case 13: { int ret = r_RV(z); if (ret == 0) return 0; /* call RV, line 155 */ if (ret < 0) return ret; } { int ret = slice_from_s(z, 3, s_30); /* <-, line 155 */ if (ret < 0) return ret; } return 0; /* fail, line 155 */ break; case 14: { int ret = r_RV(z); if (ret == 0) return 0; /* call RV, line 156 */ if (ret < 0) return ret; } { int ret = slice_from_s(z, 3, s_31); /* <-, line 156 */ if (ret < 0) return ret; } return 0; /* fail, line 156 */ break; case 15: { int m_test = z->l - z->c; /* test, line 158 */ if (in_grouping_b(z, g_v, 97, 251, 0)) return 0; { int ret = r_RV(z); if (ret == 0) return 0; /* call RV, line 158 */ if (ret < 0) return ret; } z->c = z->l - m_test; } { int ret = slice_del(z); /* delete, line 158 */ if (ret < 0) return ret; } return 0; /* fail, line 158 */ break; } return 1; } static int r_i_verb_suffix(struct SN_env * z) { int among_var; { int mlimit; /* setlimit, line 163 */ int m1 = z->l - z->c; (void)m1; if (z->c < z->I[0]) return 0; z->c = z->I[0]; /* tomark, line 163 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m1; z->ket = z->c; /* [, line 164 */ if (z->c <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((68944418 >> (z->p[z->c - 1] & 0x1f)) & 1)) { z->lb = mlimit; return 0; } among_var = find_among_b(z, a_5, 35); /* substring, line 164 */ if (!(among_var)) { z->lb = mlimit; return 0; } z->bra = z->c; /* ], line 164 */ switch(among_var) { case 0: { z->lb = mlimit; return 0; } case 1: if (out_grouping_b(z, g_v, 97, 251, 0)) { z->lb = mlimit; return 0; } { int ret = slice_del(z); /* delete, line 170 */ if (ret < 0) return ret; } break; } z->lb = mlimit; } return 1; } static int r_verb_suffix(struct SN_env * z) { int among_var; { int mlimit; /* setlimit, line 174 */ int m1 = z->l - z->c; (void)m1; if (z->c < z->I[0]) return 0; z->c = z->I[0]; /* tomark, line 174 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m1; z->ket = z->c; /* [, line 175 */ among_var = find_among_b(z, a_6, 38); /* substring, line 175 */ if (!(among_var)) { z->lb = mlimit; return 0; } z->bra = z->c; /* ], line 175 */ switch(among_var) { case 0: { z->lb = mlimit; return 0; } case 1: { int ret = r_R2(z); if (ret == 0) { z->lb = mlimit; return 0; } /* call R2, line 177 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 177 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_del(z); /* delete, line 185 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_del(z); /* delete, line 190 */ if (ret < 0) return ret; } { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 191 */ z->ket = z->c; /* [, line 191 */ if (!(eq_s_b(z, 1, s_32))) { z->c = z->l - m_keep; goto lab0; } z->bra = z->c; /* ], line 191 */ { int ret = slice_del(z); /* delete, line 191 */ if (ret < 0) return ret; } lab0: ; } break; } z->lb = mlimit; } return 1; } static int r_residual_suffix(struct SN_env * z) { int among_var; { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 199 */ z->ket = z->c; /* [, line 199 */ if (!(eq_s_b(z, 1, s_33))) { z->c = z->l - m_keep; goto lab0; } z->bra = z->c; /* ], line 199 */ { int m_test = z->l - z->c; /* test, line 199 */ if (out_grouping_b(z, g_keep_with_s, 97, 232, 0)) { z->c = z->l - m_keep; goto lab0; } z->c = z->l - m_test; } { int ret = slice_del(z); /* delete, line 199 */ if (ret < 0) return ret; } lab0: ; } { int mlimit; /* setlimit, line 200 */ int m1 = z->l - z->c; (void)m1; if (z->c < z->I[0]) return 0; z->c = z->I[0]; /* tomark, line 200 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m1; z->ket = z->c; /* [, line 201 */ among_var = find_among_b(z, a_7, 7); /* substring, line 201 */ if (!(among_var)) { z->lb = mlimit; return 0; } z->bra = z->c; /* ], line 201 */ switch(among_var) { case 0: { z->lb = mlimit; return 0; } case 1: { int ret = r_R2(z); if (ret == 0) { z->lb = mlimit; return 0; } /* call R2, line 202 */ if (ret < 0) return ret; } { int m2 = z->l - z->c; (void)m2; /* or, line 202 */ if (!(eq_s_b(z, 1, s_34))) goto lab2; goto lab1; lab2: z->c = z->l - m2; if (!(eq_s_b(z, 1, s_35))) { z->lb = mlimit; return 0; } } lab1: { int ret = slice_del(z); /* delete, line 202 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 1, s_36); /* <-, line 204 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_del(z); /* delete, line 205 */ if (ret < 0) return ret; } break; case 4: if (!(eq_s_b(z, 2, s_37))) { z->lb = mlimit; return 0; } { int ret = slice_del(z); /* delete, line 206 */ if (ret < 0) return ret; } break; } z->lb = mlimit; } return 1; } static int r_un_double(struct SN_env * z) { { int m_test = z->l - z->c; /* test, line 212 */ if (z->c - 2 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((1069056 >> (z->p[z->c - 1] & 0x1f)) & 1)) return 0; if (!(find_among_b(z, a_8, 5))) return 0; /* among, line 212 */ z->c = z->l - m_test; } z->ket = z->c; /* [, line 212 */ if (z->c <= z->lb) return 0; z->c--; /* next, line 212 */ z->bra = z->c; /* ], line 212 */ { int ret = slice_del(z); /* delete, line 212 */ if (ret < 0) return ret; } return 1; } static int r_un_accent(struct SN_env * z) { { int i = 1; while(1) { /* atleast, line 216 */ if (out_grouping_b(z, g_v, 97, 251, 0)) goto lab0; i--; continue; lab0: break; } if (i > 0) return 0; } z->ket = z->c; /* [, line 217 */ { int m1 = z->l - z->c; (void)m1; /* or, line 217 */ if (!(eq_s_b(z, 1, s_38))) goto lab2; goto lab1; lab2: z->c = z->l - m1; if (!(eq_s_b(z, 1, s_39))) return 0; } lab1: z->bra = z->c; /* ], line 217 */ { int ret = slice_from_s(z, 1, s_40); /* <-, line 217 */ if (ret < 0) return ret; } return 1; } extern int french_ISO_8859_1_stem(struct SN_env * z) { { int c1 = z->c; /* do, line 223 */ { int ret = r_prelude(z); if (ret == 0) goto lab0; /* call prelude, line 223 */ if (ret < 0) return ret; } lab0: z->c = c1; } { int c2 = z->c; /* do, line 224 */ { int ret = r_mark_regions(z); if (ret == 0) goto lab1; /* call mark_regions, line 224 */ if (ret < 0) return ret; } lab1: z->c = c2; } z->lb = z->c; z->c = z->l; /* backwards, line 225 */ { int m3 = z->l - z->c; (void)m3; /* do, line 227 */ { int m4 = z->l - z->c; (void)m4; /* or, line 237 */ { int m5 = z->l - z->c; (void)m5; /* and, line 233 */ { int m6 = z->l - z->c; (void)m6; /* or, line 229 */ { int ret = r_standard_suffix(z); if (ret == 0) goto lab6; /* call standard_suffix, line 229 */ if (ret < 0) return ret; } goto lab5; lab6: z->c = z->l - m6; { int ret = r_i_verb_suffix(z); if (ret == 0) goto lab7; /* call i_verb_suffix, line 230 */ if (ret < 0) return ret; } goto lab5; lab7: z->c = z->l - m6; { int ret = r_verb_suffix(z); if (ret == 0) goto lab4; /* call verb_suffix, line 231 */ if (ret < 0) return ret; } } lab5: z->c = z->l - m5; { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 234 */ z->ket = z->c; /* [, line 234 */ { int m7 = z->l - z->c; (void)m7; /* or, line 234 */ if (!(eq_s_b(z, 1, s_41))) goto lab10; z->bra = z->c; /* ], line 234 */ { int ret = slice_from_s(z, 1, s_42); /* <-, line 234 */ if (ret < 0) return ret; } goto lab9; lab10: z->c = z->l - m7; if (!(eq_s_b(z, 1, s_43))) { z->c = z->l - m_keep; goto lab8; } z->bra = z->c; /* ], line 235 */ { int ret = slice_from_s(z, 1, s_44); /* <-, line 235 */ if (ret < 0) return ret; } } lab9: lab8: ; } } goto lab3; lab4: z->c = z->l - m4; { int ret = r_residual_suffix(z); if (ret == 0) goto lab2; /* call residual_suffix, line 238 */ if (ret < 0) return ret; } } lab3: lab2: z->c = z->l - m3; } { int m8 = z->l - z->c; (void)m8; /* do, line 243 */ { int ret = r_un_double(z); if (ret == 0) goto lab11; /* call un_double, line 243 */ if (ret < 0) return ret; } lab11: z->c = z->l - m8; } { int m9 = z->l - z->c; (void)m9; /* do, line 244 */ { int ret = r_un_accent(z); if (ret == 0) goto lab12; /* call un_accent, line 244 */ if (ret < 0) return ret; } lab12: z->c = z->l - m9; } z->c = z->lb; { int c10 = z->c; /* do, line 246 */ { int ret = r_postlude(z); if (ret == 0) goto lab13; /* call postlude, line 246 */ if (ret < 0) return ret; } lab13: z->c = c10; } return 1; } extern struct SN_env * french_ISO_8859_1_create_env(void) { return SN_create_env(0, 3, 0); } extern void french_ISO_8859_1_close_env(struct SN_env * z) { SN_close_env(z, 0); } swish-e-2.4.7/src/snowball/stem_dk.c0000664000077100017500000002636511166010110014231 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #include "header.h" #ifdef __cplusplus extern "C" { #endif extern int danish_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif static int r_undouble(struct SN_env * z); static int r_other_suffix(struct SN_env * z); static int r_consonant_pair(struct SN_env * z); static int r_main_suffix(struct SN_env * z); static int r_mark_regions(struct SN_env * z); #ifdef __cplusplus extern "C" { #endif extern struct SN_env * danish_ISO_8859_1_create_env(void); extern void danish_ISO_8859_1_close_env(struct SN_env * z); #ifdef __cplusplus } #endif static const symbol s_0_0[3] = { 'h', 'e', 'd' }; static const symbol s_0_1[5] = { 'e', 't', 'h', 'e', 'd' }; static const symbol s_0_2[4] = { 'e', 'r', 'e', 'd' }; static const symbol s_0_3[1] = { 'e' }; static const symbol s_0_4[5] = { 'e', 'r', 'e', 'd', 'e' }; static const symbol s_0_5[4] = { 'e', 'n', 'd', 'e' }; static const symbol s_0_6[6] = { 'e', 'r', 'e', 'n', 'd', 'e' }; static const symbol s_0_7[3] = { 'e', 'n', 'e' }; static const symbol s_0_8[4] = { 'e', 'r', 'n', 'e' }; static const symbol s_0_9[3] = { 'e', 'r', 'e' }; static const symbol s_0_10[2] = { 'e', 'n' }; static const symbol s_0_11[5] = { 'h', 'e', 'd', 'e', 'n' }; static const symbol s_0_12[4] = { 'e', 'r', 'e', 'n' }; static const symbol s_0_13[2] = { 'e', 'r' }; static const symbol s_0_14[5] = { 'h', 'e', 'd', 'e', 'r' }; static const symbol s_0_15[4] = { 'e', 'r', 'e', 'r' }; static const symbol s_0_16[1] = { 's' }; static const symbol s_0_17[4] = { 'h', 'e', 'd', 's' }; static const symbol s_0_18[2] = { 'e', 's' }; static const symbol s_0_19[5] = { 'e', 'n', 'd', 'e', 's' }; static const symbol s_0_20[7] = { 'e', 'r', 'e', 'n', 'd', 'e', 's' }; static const symbol s_0_21[4] = { 'e', 'n', 'e', 's' }; static const symbol s_0_22[5] = { 'e', 'r', 'n', 'e', 's' }; static const symbol s_0_23[4] = { 'e', 'r', 'e', 's' }; static const symbol s_0_24[3] = { 'e', 'n', 's' }; static const symbol s_0_25[6] = { 'h', 'e', 'd', 'e', 'n', 's' }; static const symbol s_0_26[5] = { 'e', 'r', 'e', 'n', 's' }; static const symbol s_0_27[3] = { 'e', 'r', 's' }; static const symbol s_0_28[3] = { 'e', 't', 's' }; static const symbol s_0_29[5] = { 'e', 'r', 'e', 't', 's' }; static const symbol s_0_30[2] = { 'e', 't' }; static const symbol s_0_31[4] = { 'e', 'r', 'e', 't' }; static const struct among a_0[32] = { /* 0 */ { 3, s_0_0, -1, 1, 0}, /* 1 */ { 5, s_0_1, 0, 1, 0}, /* 2 */ { 4, s_0_2, -1, 1, 0}, /* 3 */ { 1, s_0_3, -1, 1, 0}, /* 4 */ { 5, s_0_4, 3, 1, 0}, /* 5 */ { 4, s_0_5, 3, 1, 0}, /* 6 */ { 6, s_0_6, 5, 1, 0}, /* 7 */ { 3, s_0_7, 3, 1, 0}, /* 8 */ { 4, s_0_8, 3, 1, 0}, /* 9 */ { 3, s_0_9, 3, 1, 0}, /* 10 */ { 2, s_0_10, -1, 1, 0}, /* 11 */ { 5, s_0_11, 10, 1, 0}, /* 12 */ { 4, s_0_12, 10, 1, 0}, /* 13 */ { 2, s_0_13, -1, 1, 0}, /* 14 */ { 5, s_0_14, 13, 1, 0}, /* 15 */ { 4, s_0_15, 13, 1, 0}, /* 16 */ { 1, s_0_16, -1, 2, 0}, /* 17 */ { 4, s_0_17, 16, 1, 0}, /* 18 */ { 2, s_0_18, 16, 1, 0}, /* 19 */ { 5, s_0_19, 18, 1, 0}, /* 20 */ { 7, s_0_20, 19, 1, 0}, /* 21 */ { 4, s_0_21, 18, 1, 0}, /* 22 */ { 5, s_0_22, 18, 1, 0}, /* 23 */ { 4, s_0_23, 18, 1, 0}, /* 24 */ { 3, s_0_24, 16, 1, 0}, /* 25 */ { 6, s_0_25, 24, 1, 0}, /* 26 */ { 5, s_0_26, 24, 1, 0}, /* 27 */ { 3, s_0_27, 16, 1, 0}, /* 28 */ { 3, s_0_28, 16, 1, 0}, /* 29 */ { 5, s_0_29, 28, 1, 0}, /* 30 */ { 2, s_0_30, -1, 1, 0}, /* 31 */ { 4, s_0_31, 30, 1, 0} }; static const symbol s_1_0[2] = { 'g', 'd' }; static const symbol s_1_1[2] = { 'd', 't' }; static const symbol s_1_2[2] = { 'g', 't' }; static const symbol s_1_3[2] = { 'k', 't' }; static const struct among a_1[4] = { /* 0 */ { 2, s_1_0, -1, -1, 0}, /* 1 */ { 2, s_1_1, -1, -1, 0}, /* 2 */ { 2, s_1_2, -1, -1, 0}, /* 3 */ { 2, s_1_3, -1, -1, 0} }; static const symbol s_2_0[2] = { 'i', 'g' }; static const symbol s_2_1[3] = { 'l', 'i', 'g' }; static const symbol s_2_2[4] = { 'e', 'l', 'i', 'g' }; static const symbol s_2_3[3] = { 'e', 'l', 's' }; static const symbol s_2_4[4] = { 'l', 0xF8, 's', 't' }; static const struct among a_2[5] = { /* 0 */ { 2, s_2_0, -1, 1, 0}, /* 1 */ { 3, s_2_1, 0, 1, 0}, /* 2 */ { 4, s_2_2, 1, 1, 0}, /* 3 */ { 3, s_2_3, -1, 1, 0}, /* 4 */ { 4, s_2_4, -1, 2, 0} }; static const unsigned char g_v[] = { 17, 65, 16, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 128 }; static const unsigned char g_s_ending[] = { 239, 254, 42, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 16 }; static const symbol s_0[] = { 's', 't' }; static const symbol s_1[] = { 'i', 'g' }; static const symbol s_2[] = { 'l', 0xF8, 's' }; static int r_mark_regions(struct SN_env * z) { z->I[0] = z->l; { int c_test = z->c; /* test, line 33 */ { int ret = z->c + 3; if (0 > ret || ret > z->l) return 0; z->c = ret; /* hop, line 33 */ } z->I[1] = z->c; /* setmark x, line 33 */ z->c = c_test; } if (out_grouping(z, g_v, 97, 248, 1) < 0) return 0; /* goto */ /* grouping v, line 34 */ { /* gopast */ /* non v, line 34 */ int ret = in_grouping(z, g_v, 97, 248, 1); if (ret < 0) return 0; z->c += ret; } z->I[0] = z->c; /* setmark p1, line 34 */ /* try, line 35 */ if (!(z->I[0] < z->I[1])) goto lab0; z->I[0] = z->I[1]; lab0: return 1; } static int r_main_suffix(struct SN_env * z) { int among_var; { int mlimit; /* setlimit, line 41 */ int m1 = z->l - z->c; (void)m1; if (z->c < z->I[0]) return 0; z->c = z->I[0]; /* tomark, line 41 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m1; z->ket = z->c; /* [, line 41 */ if (z->c <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((1851440 >> (z->p[z->c - 1] & 0x1f)) & 1)) { z->lb = mlimit; return 0; } among_var = find_among_b(z, a_0, 32); /* substring, line 41 */ if (!(among_var)) { z->lb = mlimit; return 0; } z->bra = z->c; /* ], line 41 */ z->lb = mlimit; } switch(among_var) { case 0: return 0; case 1: { int ret = slice_del(z); /* delete, line 48 */ if (ret < 0) return ret; } break; case 2: if (in_grouping_b(z, g_s_ending, 97, 229, 0)) return 0; { int ret = slice_del(z); /* delete, line 50 */ if (ret < 0) return ret; } break; } return 1; } static int r_consonant_pair(struct SN_env * z) { { int m_test = z->l - z->c; /* test, line 55 */ { int mlimit; /* setlimit, line 56 */ int m1 = z->l - z->c; (void)m1; if (z->c < z->I[0]) return 0; z->c = z->I[0]; /* tomark, line 56 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m1; z->ket = z->c; /* [, line 56 */ if (z->c - 1 <= z->lb || (z->p[z->c - 1] != 100 && z->p[z->c - 1] != 116)) { z->lb = mlimit; return 0; } if (!(find_among_b(z, a_1, 4))) { z->lb = mlimit; return 0; } /* substring, line 56 */ z->bra = z->c; /* ], line 56 */ z->lb = mlimit; } z->c = z->l - m_test; } if (z->c <= z->lb) return 0; z->c--; /* next, line 62 */ z->bra = z->c; /* ], line 62 */ { int ret = slice_del(z); /* delete, line 62 */ if (ret < 0) return ret; } return 1; } static int r_other_suffix(struct SN_env * z) { int among_var; { int m1 = z->l - z->c; (void)m1; /* do, line 66 */ z->ket = z->c; /* [, line 66 */ if (!(eq_s_b(z, 2, s_0))) goto lab0; z->bra = z->c; /* ], line 66 */ if (!(eq_s_b(z, 2, s_1))) goto lab0; { int ret = slice_del(z); /* delete, line 66 */ if (ret < 0) return ret; } lab0: z->c = z->l - m1; } { int mlimit; /* setlimit, line 67 */ int m2 = z->l - z->c; (void)m2; if (z->c < z->I[0]) return 0; z->c = z->I[0]; /* tomark, line 67 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m2; z->ket = z->c; /* [, line 67 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((1572992 >> (z->p[z->c - 1] & 0x1f)) & 1)) { z->lb = mlimit; return 0; } among_var = find_among_b(z, a_2, 5); /* substring, line 67 */ if (!(among_var)) { z->lb = mlimit; return 0; } z->bra = z->c; /* ], line 67 */ z->lb = mlimit; } switch(among_var) { case 0: return 0; case 1: { int ret = slice_del(z); /* delete, line 70 */ if (ret < 0) return ret; } { int m3 = z->l - z->c; (void)m3; /* do, line 70 */ { int ret = r_consonant_pair(z); if (ret == 0) goto lab1; /* call consonant_pair, line 70 */ if (ret < 0) return ret; } lab1: z->c = z->l - m3; } break; case 2: { int ret = slice_from_s(z, 3, s_2); /* <-, line 72 */ if (ret < 0) return ret; } break; } return 1; } static int r_undouble(struct SN_env * z) { { int mlimit; /* setlimit, line 76 */ int m1 = z->l - z->c; (void)m1; if (z->c < z->I[0]) return 0; z->c = z->I[0]; /* tomark, line 76 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m1; z->ket = z->c; /* [, line 76 */ if (out_grouping_b(z, g_v, 97, 248, 0)) { z->lb = mlimit; return 0; } z->bra = z->c; /* ], line 76 */ z->S[0] = slice_to(z, z->S[0]); /* -> ch, line 76 */ if (z->S[0] == 0) return -1; /* -> ch, line 76 */ z->lb = mlimit; } if (!(eq_v_b(z, z->S[0]))) return 0; /* name ch, line 77 */ { int ret = slice_del(z); /* delete, line 78 */ if (ret < 0) return ret; } return 1; } extern int danish_ISO_8859_1_stem(struct SN_env * z) { { int c1 = z->c; /* do, line 84 */ { int ret = r_mark_regions(z); if (ret == 0) goto lab0; /* call mark_regions, line 84 */ if (ret < 0) return ret; } lab0: z->c = c1; } z->lb = z->c; z->c = z->l; /* backwards, line 85 */ { int m2 = z->l - z->c; (void)m2; /* do, line 86 */ { int ret = r_main_suffix(z); if (ret == 0) goto lab1; /* call main_suffix, line 86 */ if (ret < 0) return ret; } lab1: z->c = z->l - m2; } { int m3 = z->l - z->c; (void)m3; /* do, line 87 */ { int ret = r_consonant_pair(z); if (ret == 0) goto lab2; /* call consonant_pair, line 87 */ if (ret < 0) return ret; } lab2: z->c = z->l - m3; } { int m4 = z->l - z->c; (void)m4; /* do, line 88 */ { int ret = r_other_suffix(z); if (ret == 0) goto lab3; /* call other_suffix, line 88 */ if (ret < 0) return ret; } lab3: z->c = z->l - m4; } { int m5 = z->l - z->c; (void)m5; /* do, line 89 */ { int ret = r_undouble(z); if (ret == 0) goto lab4; /* call undouble, line 89 */ if (ret < 0) return ret; } lab4: z->c = z->l - m5; } z->c = z->lb; return 1; } extern struct SN_env * danish_ISO_8859_1_create_env(void) { return SN_create_env(1, 2, 0); } extern void danish_ISO_8859_1_close_env(struct SN_env * z) { SN_close_env(z, 1); } swish-e-2.4.7/src/snowball/stem_de.c0000664000077100017500000004072011166010107014220 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #include "header.h" #ifdef __cplusplus extern "C" { #endif extern int german_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif static int r_standard_suffix(struct SN_env * z); static int r_R2(struct SN_env * z); static int r_R1(struct SN_env * z); static int r_mark_regions(struct SN_env * z); static int r_postlude(struct SN_env * z); static int r_prelude(struct SN_env * z); #ifdef __cplusplus extern "C" { #endif extern struct SN_env * german_ISO_8859_1_create_env(void); extern void german_ISO_8859_1_close_env(struct SN_env * z); #ifdef __cplusplus } #endif static const symbol s_0_1[1] = { 'U' }; static const symbol s_0_2[1] = { 'Y' }; static const symbol s_0_3[1] = { 0xE4 }; static const symbol s_0_4[1] = { 0xF6 }; static const symbol s_0_5[1] = { 0xFC }; static const struct among a_0[6] = { /* 0 */ { 0, 0, -1, 6, 0}, /* 1 */ { 1, s_0_1, 0, 2, 0}, /* 2 */ { 1, s_0_2, 0, 1, 0}, /* 3 */ { 1, s_0_3, 0, 3, 0}, /* 4 */ { 1, s_0_4, 0, 4, 0}, /* 5 */ { 1, s_0_5, 0, 5, 0} }; static const symbol s_1_0[1] = { 'e' }; static const symbol s_1_1[2] = { 'e', 'm' }; static const symbol s_1_2[2] = { 'e', 'n' }; static const symbol s_1_3[3] = { 'e', 'r', 'n' }; static const symbol s_1_4[2] = { 'e', 'r' }; static const symbol s_1_5[1] = { 's' }; static const symbol s_1_6[2] = { 'e', 's' }; static const struct among a_1[7] = { /* 0 */ { 1, s_1_0, -1, 1, 0}, /* 1 */ { 2, s_1_1, -1, 1, 0}, /* 2 */ { 2, s_1_2, -1, 1, 0}, /* 3 */ { 3, s_1_3, -1, 1, 0}, /* 4 */ { 2, s_1_4, -1, 1, 0}, /* 5 */ { 1, s_1_5, -1, 2, 0}, /* 6 */ { 2, s_1_6, 5, 1, 0} }; static const symbol s_2_0[2] = { 'e', 'n' }; static const symbol s_2_1[2] = { 'e', 'r' }; static const symbol s_2_2[2] = { 's', 't' }; static const symbol s_2_3[3] = { 'e', 's', 't' }; static const struct among a_2[4] = { /* 0 */ { 2, s_2_0, -1, 1, 0}, /* 1 */ { 2, s_2_1, -1, 1, 0}, /* 2 */ { 2, s_2_2, -1, 2, 0}, /* 3 */ { 3, s_2_3, 2, 1, 0} }; static const symbol s_3_0[2] = { 'i', 'g' }; static const symbol s_3_1[4] = { 'l', 'i', 'c', 'h' }; static const struct among a_3[2] = { /* 0 */ { 2, s_3_0, -1, 1, 0}, /* 1 */ { 4, s_3_1, -1, 1, 0} }; static const symbol s_4_0[3] = { 'e', 'n', 'd' }; static const symbol s_4_1[2] = { 'i', 'g' }; static const symbol s_4_2[3] = { 'u', 'n', 'g' }; static const symbol s_4_3[4] = { 'l', 'i', 'c', 'h' }; static const symbol s_4_4[4] = { 'i', 's', 'c', 'h' }; static const symbol s_4_5[2] = { 'i', 'k' }; static const symbol s_4_6[4] = { 'h', 'e', 'i', 't' }; static const symbol s_4_7[4] = { 'k', 'e', 'i', 't' }; static const struct among a_4[8] = { /* 0 */ { 3, s_4_0, -1, 1, 0}, /* 1 */ { 2, s_4_1, -1, 2, 0}, /* 2 */ { 3, s_4_2, -1, 1, 0}, /* 3 */ { 4, s_4_3, -1, 3, 0}, /* 4 */ { 4, s_4_4, -1, 2, 0}, /* 5 */ { 2, s_4_5, -1, 2, 0}, /* 6 */ { 4, s_4_6, -1, 3, 0}, /* 7 */ { 4, s_4_7, -1, 4, 0} }; static const unsigned char g_v[] = { 17, 65, 16, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 32, 8 }; static const unsigned char g_s_ending[] = { 117, 30, 5 }; static const unsigned char g_st_ending[] = { 117, 30, 4 }; static const symbol s_0[] = { 0xDF }; static const symbol s_1[] = { 's', 's' }; static const symbol s_2[] = { 'u' }; static const symbol s_3[] = { 'U' }; static const symbol s_4[] = { 'y' }; static const symbol s_5[] = { 'Y' }; static const symbol s_6[] = { 'y' }; static const symbol s_7[] = { 'u' }; static const symbol s_8[] = { 'a' }; static const symbol s_9[] = { 'o' }; static const symbol s_10[] = { 'u' }; static const symbol s_11[] = { 'i', 'g' }; static const symbol s_12[] = { 'e' }; static const symbol s_13[] = { 'e' }; static const symbol s_14[] = { 'e', 'r' }; static const symbol s_15[] = { 'e', 'n' }; static int r_prelude(struct SN_env * z) { { int c_test = z->c; /* test, line 30 */ while(1) { /* repeat, line 30 */ int c1 = z->c; { int c2 = z->c; /* or, line 33 */ z->bra = z->c; /* [, line 32 */ if (!(eq_s(z, 1, s_0))) goto lab2; z->ket = z->c; /* ], line 32 */ { int ret = slice_from_s(z, 2, s_1); /* <-, line 32 */ if (ret < 0) return ret; } goto lab1; lab2: z->c = c2; if (z->c >= z->l) goto lab0; z->c++; /* next, line 33 */ } lab1: continue; lab0: z->c = c1; break; } z->c = c_test; } while(1) { /* repeat, line 36 */ int c3 = z->c; while(1) { /* goto, line 36 */ int c4 = z->c; if (in_grouping(z, g_v, 97, 252, 0)) goto lab4; z->bra = z->c; /* [, line 37 */ { int c5 = z->c; /* or, line 37 */ if (!(eq_s(z, 1, s_2))) goto lab6; z->ket = z->c; /* ], line 37 */ if (in_grouping(z, g_v, 97, 252, 0)) goto lab6; { int ret = slice_from_s(z, 1, s_3); /* <-, line 37 */ if (ret < 0) return ret; } goto lab5; lab6: z->c = c5; if (!(eq_s(z, 1, s_4))) goto lab4; z->ket = z->c; /* ], line 38 */ if (in_grouping(z, g_v, 97, 252, 0)) goto lab4; { int ret = slice_from_s(z, 1, s_5); /* <-, line 38 */ if (ret < 0) return ret; } } lab5: z->c = c4; break; lab4: z->c = c4; if (z->c >= z->l) goto lab3; z->c++; /* goto, line 36 */ } continue; lab3: z->c = c3; break; } return 1; } static int r_mark_regions(struct SN_env * z) { z->I[0] = z->l; z->I[1] = z->l; { int c_test = z->c; /* test, line 47 */ { int ret = z->c + 3; if (0 > ret || ret > z->l) return 0; z->c = ret; /* hop, line 47 */ } z->I[2] = z->c; /* setmark x, line 47 */ z->c = c_test; } { /* gopast */ /* grouping v, line 49 */ int ret = out_grouping(z, g_v, 97, 252, 1); if (ret < 0) return 0; z->c += ret; } { /* gopast */ /* non v, line 49 */ int ret = in_grouping(z, g_v, 97, 252, 1); if (ret < 0) return 0; z->c += ret; } z->I[0] = z->c; /* setmark p1, line 49 */ /* try, line 50 */ if (!(z->I[0] < z->I[2])) goto lab0; z->I[0] = z->I[2]; lab0: { /* gopast */ /* grouping v, line 51 */ int ret = out_grouping(z, g_v, 97, 252, 1); if (ret < 0) return 0; z->c += ret; } { /* gopast */ /* non v, line 51 */ int ret = in_grouping(z, g_v, 97, 252, 1); if (ret < 0) return 0; z->c += ret; } z->I[1] = z->c; /* setmark p2, line 51 */ return 1; } static int r_postlude(struct SN_env * z) { int among_var; while(1) { /* repeat, line 55 */ int c1 = z->c; z->bra = z->c; /* [, line 57 */ among_var = find_among(z, a_0, 6); /* substring, line 57 */ if (!(among_var)) goto lab0; z->ket = z->c; /* ], line 57 */ switch(among_var) { case 0: goto lab0; case 1: { int ret = slice_from_s(z, 1, s_6); /* <-, line 58 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 1, s_7); /* <-, line 59 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_from_s(z, 1, s_8); /* <-, line 60 */ if (ret < 0) return ret; } break; case 4: { int ret = slice_from_s(z, 1, s_9); /* <-, line 61 */ if (ret < 0) return ret; } break; case 5: { int ret = slice_from_s(z, 1, s_10); /* <-, line 62 */ if (ret < 0) return ret; } break; case 6: if (z->c >= z->l) goto lab0; z->c++; /* next, line 63 */ break; } continue; lab0: z->c = c1; break; } return 1; } static int r_R1(struct SN_env * z) { if (!(z->I[0] <= z->c)) return 0; return 1; } static int r_R2(struct SN_env * z) { if (!(z->I[1] <= z->c)) return 0; return 1; } static int r_standard_suffix(struct SN_env * z) { int among_var; { int m1 = z->l - z->c; (void)m1; /* do, line 74 */ z->ket = z->c; /* [, line 75 */ if (z->c <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((811040 >> (z->p[z->c - 1] & 0x1f)) & 1)) goto lab0; among_var = find_among_b(z, a_1, 7); /* substring, line 75 */ if (!(among_var)) goto lab0; z->bra = z->c; /* ], line 75 */ { int ret = r_R1(z); if (ret == 0) goto lab0; /* call R1, line 75 */ if (ret < 0) return ret; } switch(among_var) { case 0: goto lab0; case 1: { int ret = slice_del(z); /* delete, line 77 */ if (ret < 0) return ret; } break; case 2: if (in_grouping_b(z, g_s_ending, 98, 116, 0)) goto lab0; { int ret = slice_del(z); /* delete, line 80 */ if (ret < 0) return ret; } break; } lab0: z->c = z->l - m1; } { int m2 = z->l - z->c; (void)m2; /* do, line 84 */ z->ket = z->c; /* [, line 85 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((1327104 >> (z->p[z->c - 1] & 0x1f)) & 1)) goto lab1; among_var = find_among_b(z, a_2, 4); /* substring, line 85 */ if (!(among_var)) goto lab1; z->bra = z->c; /* ], line 85 */ { int ret = r_R1(z); if (ret == 0) goto lab1; /* call R1, line 85 */ if (ret < 0) return ret; } switch(among_var) { case 0: goto lab1; case 1: { int ret = slice_del(z); /* delete, line 87 */ if (ret < 0) return ret; } break; case 2: if (in_grouping_b(z, g_st_ending, 98, 116, 0)) goto lab1; { int ret = z->c - 3; if (z->lb > ret || ret > z->l) goto lab1; z->c = ret; /* hop, line 90 */ } { int ret = slice_del(z); /* delete, line 90 */ if (ret < 0) return ret; } break; } lab1: z->c = z->l - m2; } { int m3 = z->l - z->c; (void)m3; /* do, line 94 */ z->ket = z->c; /* [, line 95 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((1051024 >> (z->p[z->c - 1] & 0x1f)) & 1)) goto lab2; among_var = find_among_b(z, a_4, 8); /* substring, line 95 */ if (!(among_var)) goto lab2; z->bra = z->c; /* ], line 95 */ { int ret = r_R2(z); if (ret == 0) goto lab2; /* call R2, line 95 */ if (ret < 0) return ret; } switch(among_var) { case 0: goto lab2; case 1: { int ret = slice_del(z); /* delete, line 97 */ if (ret < 0) return ret; } { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 98 */ z->ket = z->c; /* [, line 98 */ if (!(eq_s_b(z, 2, s_11))) { z->c = z->l - m_keep; goto lab3; } z->bra = z->c; /* ], line 98 */ { int m4 = z->l - z->c; (void)m4; /* not, line 98 */ if (!(eq_s_b(z, 1, s_12))) goto lab4; { z->c = z->l - m_keep; goto lab3; } lab4: z->c = z->l - m4; } { int ret = r_R2(z); if (ret == 0) { z->c = z->l - m_keep; goto lab3; } /* call R2, line 98 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 98 */ if (ret < 0) return ret; } lab3: ; } break; case 2: { int m5 = z->l - z->c; (void)m5; /* not, line 101 */ if (!(eq_s_b(z, 1, s_13))) goto lab5; goto lab2; lab5: z->c = z->l - m5; } { int ret = slice_del(z); /* delete, line 101 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_del(z); /* delete, line 104 */ if (ret < 0) return ret; } { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 105 */ z->ket = z->c; /* [, line 106 */ { int m6 = z->l - z->c; (void)m6; /* or, line 106 */ if (!(eq_s_b(z, 2, s_14))) goto lab8; goto lab7; lab8: z->c = z->l - m6; if (!(eq_s_b(z, 2, s_15))) { z->c = z->l - m_keep; goto lab6; } } lab7: z->bra = z->c; /* ], line 106 */ { int ret = r_R1(z); if (ret == 0) { z->c = z->l - m_keep; goto lab6; } /* call R1, line 106 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 106 */ if (ret < 0) return ret; } lab6: ; } break; case 4: { int ret = slice_del(z); /* delete, line 110 */ if (ret < 0) return ret; } { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 111 */ z->ket = z->c; /* [, line 112 */ if (z->c - 1 <= z->lb || (z->p[z->c - 1] != 103 && z->p[z->c - 1] != 104)) { z->c = z->l - m_keep; goto lab9; } among_var = find_among_b(z, a_3, 2); /* substring, line 112 */ if (!(among_var)) { z->c = z->l - m_keep; goto lab9; } z->bra = z->c; /* ], line 112 */ { int ret = r_R2(z); if (ret == 0) { z->c = z->l - m_keep; goto lab9; } /* call R2, line 112 */ if (ret < 0) return ret; } switch(among_var) { case 0: { z->c = z->l - m_keep; goto lab9; } case 1: { int ret = slice_del(z); /* delete, line 114 */ if (ret < 0) return ret; } break; } lab9: ; } break; } lab2: z->c = z->l - m3; } return 1; } extern int german_ISO_8859_1_stem(struct SN_env * z) { { int c1 = z->c; /* do, line 125 */ { int ret = r_prelude(z); if (ret == 0) goto lab0; /* call prelude, line 125 */ if (ret < 0) return ret; } lab0: z->c = c1; } { int c2 = z->c; /* do, line 126 */ { int ret = r_mark_regions(z); if (ret == 0) goto lab1; /* call mark_regions, line 126 */ if (ret < 0) return ret; } lab1: z->c = c2; } z->lb = z->c; z->c = z->l; /* backwards, line 127 */ { int m3 = z->l - z->c; (void)m3; /* do, line 128 */ { int ret = r_standard_suffix(z); if (ret == 0) goto lab2; /* call standard_suffix, line 128 */ if (ret < 0) return ret; } lab2: z->c = z->l - m3; } z->c = z->lb; { int c4 = z->c; /* do, line 129 */ { int ret = r_postlude(z); if (ret == 0) goto lab3; /* call postlude, line 129 */ if (ret < 0) return ret; } lab3: z->c = c4; } return 1; } extern struct SN_env * german_ISO_8859_1_create_env(void) { return SN_create_env(0, 3, 0); } extern void german_ISO_8859_1_close_env(struct SN_env * z) { SN_close_env(z, 0); } swish-e-2.4.7/src/snowball/Makefile.in0000664000077100017500000003661211166010107014506 00000000000000# Makefile.in generated by automake 1.9.6 from Makefile.am. # @configure_input@ # Copyright (C) 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, # 2003, 2004, 2005 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. @SET_MAKE@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = ../.. am__cd = CDPATH="$${ZSH_VERSION+.}$(PATH_SEPARATOR)" && cd INSTALL = @INSTALL@ install_sh_DATA = $(install_sh) -c -m 644 install_sh_PROGRAM = $(install_sh) -c install_sh_SCRIPT = $(install_sh) -c INSTALL_HEADER = $(INSTALL_DATA) transform = $(program_transform_name) NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : build_triplet = @build@ host_triplet = @host@ subdir = src/snowball DIST_COMMON = $(srcdir)/Makefile.am $(srcdir)/Makefile.in ACLOCAL_M4 = $(top_srcdir)/aclocal.m4 am__aclocal_m4_deps = $(top_srcdir)/config/acinclude.m4 \ $(top_srcdir)/configure.in am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \ $(ACLOCAL_M4) mkinstalldirs = $(SHELL) $(top_srcdir)/config/mkinstalldirs CONFIG_HEADER = $(top_builddir)/src/acconfig.h CONFIG_CLEAN_FILES = LTLIBRARIES = $(noinst_LTLIBRARIES) libsnowball_la_LIBADD = am_libsnowball_la_OBJECTS = api.lo utilities.lo stem_en1.lo \ stem_en2.lo stem_es.lo stem_fr.lo stem_it.lo stem_pt.lo \ stem_de.lo stem_nl.lo stem_no.lo stem_se.lo stem_dk.lo \ stem_ru.lo stem_fi.lo stem_ro.lo stem_hu.lo libsnowball_la_OBJECTS = $(am_libsnowball_la_OBJECTS) DEFAULT_INCLUDES = -I. -I$(srcdir) -I$(top_builddir)/src depcomp = $(SHELL) $(top_srcdir)/config/depcomp am__depfiles_maybe = depfiles COMPILE = $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) \ $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) LTCOMPILE = $(LIBTOOL) --tag=CC --mode=compile $(CC) $(DEFS) \ $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) \ $(AM_CFLAGS) $(CFLAGS) CCLD = $(CC) LINK = $(LIBTOOL) --tag=CC --mode=link $(CCLD) $(AM_CFLAGS) $(CFLAGS) \ $(AM_LDFLAGS) $(LDFLAGS) -o $@ SOURCES = $(libsnowball_la_SOURCES) DIST_SOURCES = $(libsnowball_la_SOURCES) ETAGS = etags CTAGS = ctags DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST) ACLOCAL = @ACLOCAL@ ALLOCA = @ALLOCA@ AMDEP_FALSE = @AMDEP_FALSE@ AMDEP_TRUE = @AMDEP_TRUE@ AMTAR = @AMTAR@ AR = @AR@ AS = @AS@ AUTOCONF = @AUTOCONF@ AUTOHEADER = @AUTOHEADER@ AUTOMAKE = @AUTOMAKE@ AWK = @AWK@ BTREE_OBJS = @BTREE_OBJS@ BUILDDOCS_FALSE = @BUILDDOCS_FALSE@ BUILDDOCS_TRUE = @BUILDDOCS_TRUE@ CC = @CC@ CCDEPMODE = @CCDEPMODE@ CFLAGS = @CFLAGS@ CPP = @CPP@ CPPFLAGS = @CPPFLAGS@ CXX = @CXX@ CXXCPP = @CXXCPP@ CXXDEPMODE = @CXXDEPMODE@ CXXFLAGS = @CXXFLAGS@ CYGPATH_W = @CYGPATH_W@ DEFS = @DEFS@ DEPDIR = @DEPDIR@ DLLTOOL = @DLLTOOL@ ECHO = @ECHO@ ECHO_C = @ECHO_C@ ECHO_N = @ECHO_N@ ECHO_T = @ECHO_T@ EGREP = @EGREP@ EXEEXT = @EXEEXT@ F77 = @F77@ FFLAGS = @FFLAGS@ INSTALLDOCS_FALSE = @INSTALLDOCS_FALSE@ INSTALLDOCS_TRUE = @INSTALLDOCS_TRUE@ INSTALL_DATA = @INSTALL_DATA@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@ LARGEFILES_MACROS = @LARGEFILES_MACROS@ LDFLAGS = @LDFLAGS@ LIBOBJS = @LIBOBJS@ LIBS = @LIBS@ LIBTOOL = @LIBTOOL@ LIBXML2_CFLAGS = @LIBXML2_CFLAGS@ LIBXML2_LIB = @LIBXML2_LIB@ LIBXML2_OBJS = @LIBXML2_OBJS@ LIBXML_REQUIRED_VERSION = @LIBXML_REQUIRED_VERSION@ LN_S = @LN_S@ LTLIBOBJS = @LTLIBOBJS@ MAINT = @MAINT@ MAINTAINER_MODE_FALSE = @MAINTAINER_MODE_FALSE@ MAINTAINER_MODE_TRUE = @MAINTAINER_MODE_TRUE@ MAKEINFO = @MAKEINFO@ OBJDUMP = @OBJDUMP@ OBJEXT = @OBJEXT@ PACKAGE = @PACKAGE@ PACKAGE_BUGREPORT = @PACKAGE_BUGREPORT@ PACKAGE_NAME = @PACKAGE_NAME@ PACKAGE_STRING = @PACKAGE_STRING@ PACKAGE_TARNAME = @PACKAGE_TARNAME@ PACKAGE_VERSION = @PACKAGE_VERSION@ PATH_SEPARATOR = @PATH_SEPARATOR@ PCRE_CFLAGS = @PCRE_CFLAGS@ PCRE_CONFIG = @PCRE_CONFIG@ PCRE_LIBS = @PCRE_LIBS@ PCRE_REQUIRED_VERSION = @PCRE_REQUIRED_VERSION@ PERL = @PERL@ POD2MAN = @POD2MAN@ RANLIB = @RANLIB@ SET_MAKE = @SET_MAKE@ SHELL = @SHELL@ STRIP = @STRIP@ SWISH_WEB = @SWISH_WEB@ VERSION = @VERSION@ XML2_CONFIG = @XML2_CONFIG@ Z_CFLAGS = @Z_CFLAGS@ Z_LIBS = @Z_LIBS@ ac_ct_AR = @ac_ct_AR@ ac_ct_AS = @ac_ct_AS@ ac_ct_CC = @ac_ct_CC@ ac_ct_CXX = @ac_ct_CXX@ ac_ct_DLLTOOL = @ac_ct_DLLTOOL@ ac_ct_F77 = @ac_ct_F77@ ac_ct_OBJDUMP = @ac_ct_OBJDUMP@ ac_ct_RANLIB = @ac_ct_RANLIB@ ac_ct_STRIP = @ac_ct_STRIP@ am__fastdepCC_FALSE = @am__fastdepCC_FALSE@ am__fastdepCC_TRUE = @am__fastdepCC_TRUE@ am__fastdepCXX_FALSE = @am__fastdepCXX_FALSE@ am__fastdepCXX_TRUE = @am__fastdepCXX_TRUE@ am__include = @am__include@ am__leading_dot = @am__leading_dot@ am__quote = @am__quote@ am__tar = @am__tar@ am__untar = @am__untar@ bindir = @bindir@ build = @build@ build_alias = @build_alias@ build_cpu = @build_cpu@ build_os = @build_os@ build_vendor = @build_vendor@ datadir = @datadir@ exec_prefix = @exec_prefix@ host = @host@ host_alias = @host_alias@ host_cpu = @host_cpu@ host_os = @host_os@ host_vendor = @host_vendor@ includedir = @includedir@ infodir = @infodir@ install_sh = @install_sh@ libdir = @libdir@ libexecdir = @libexecdir@ localstatedir = @localstatedir@ mandir = @mandir@ mkdir_p = @mkdir_p@ oldincludedir = @oldincludedir@ prefix = @prefix@ program_transform_name = @program_transform_name@ sbindir = @sbindir@ sharedstatedir = @sharedstatedir@ sysconfdir = @sysconfdir@ target_alias = @target_alias@ AM_CPPFLAGS = -I"$(srcdir)" noinst_LTLIBRARIES = libsnowball.la libsnowball_la_SOURCES = \ api.c \ api.h \ utilities.c \ header.h \ stem_en1.c \ stem_en1.h \ stem_en2.c \ stem_en2.h \ stem_es.c \ stem_es.h \ stem_fr.c \ stem_fr.h \ stem_it.c \ stem_it.h \ stem_pt.c \ stem_pt.h \ stem_de.c \ stem_de.h \ stem_nl.c \ stem_nl.h \ stem_no.c \ stem_no.h \ stem_se.c \ stem_se.h \ stem_dk.c \ stem_dk.h \ stem_ru.c \ stem_ru.h \ stem_fi.c \ stem_fi.h \ stem_ro.c \ stem_ro.h \ stem_hu.c \ stem_hu.h all: all-am .SUFFIXES: .SUFFIXES: .c .lo .o .obj $(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am $(am__configure_deps) @for dep in $?; do \ case '$(am__configure_deps)' in \ *$$dep*) \ cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh \ && exit 0; \ exit 1;; \ esac; \ done; \ echo ' cd $(top_srcdir) && $(AUTOMAKE) --foreign src/snowball/Makefile'; \ cd $(top_srcdir) && \ $(AUTOMAKE) --foreign src/snowball/Makefile .PRECIOUS: Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status @case '$?' in \ *config.status*) \ cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh;; \ *) \ echo ' cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe)'; \ cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe);; \ esac; $(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh $(top_srcdir)/configure: @MAINTAINER_MODE_TRUE@ $(am__configure_deps) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh $(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh clean-noinstLTLIBRARIES: -test -z "$(noinst_LTLIBRARIES)" || rm -f $(noinst_LTLIBRARIES) @list='$(noinst_LTLIBRARIES)'; for p in $$list; do \ dir="`echo $$p | sed -e 's|/[^/]*$$||'`"; \ test "$$dir" != "$$p" || dir=.; \ echo "rm -f \"$${dir}/so_locations\""; \ rm -f "$${dir}/so_locations"; \ done libsnowball.la: $(libsnowball_la_OBJECTS) $(libsnowball_la_DEPENDENCIES) $(LINK) $(libsnowball_la_LDFLAGS) $(libsnowball_la_OBJECTS) $(libsnowball_la_LIBADD) $(LIBS) mostlyclean-compile: -rm -f *.$(OBJEXT) distclean-compile: -rm -f *.tab.c @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/api.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/stem_de.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/stem_dk.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/stem_en1.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/stem_en2.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/stem_es.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/stem_fi.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/stem_fr.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/stem_hu.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/stem_it.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/stem_nl.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/stem_no.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/stem_pt.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/stem_ro.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/stem_ru.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/stem_se.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/utilities.Plo@am__quote@ .c.o: @am__fastdepCC_TRUE@ if $(COMPILE) -MT $@ -MD -MP -MF "$(DEPDIR)/$*.Tpo" -c -o $@ $<; \ @am__fastdepCC_TRUE@ then mv -f "$(DEPDIR)/$*.Tpo" "$(DEPDIR)/$*.Po"; else rm -f "$(DEPDIR)/$*.Tpo"; exit 1; fi @AMDEP_TRUE@@am__fastdepCC_FALSE@ source='$<' object='$@' libtool=no @AMDEPBACKSLASH@ @AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ @am__fastdepCC_FALSE@ $(COMPILE) -c $< .c.obj: @am__fastdepCC_TRUE@ if $(COMPILE) -MT $@ -MD -MP -MF "$(DEPDIR)/$*.Tpo" -c -o $@ `$(CYGPATH_W) '$<'`; \ @am__fastdepCC_TRUE@ then mv -f "$(DEPDIR)/$*.Tpo" "$(DEPDIR)/$*.Po"; else rm -f "$(DEPDIR)/$*.Tpo"; exit 1; fi @AMDEP_TRUE@@am__fastdepCC_FALSE@ source='$<' object='$@' libtool=no @AMDEPBACKSLASH@ @AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ @am__fastdepCC_FALSE@ $(COMPILE) -c `$(CYGPATH_W) '$<'` .c.lo: @am__fastdepCC_TRUE@ if $(LTCOMPILE) -MT $@ -MD -MP -MF "$(DEPDIR)/$*.Tpo" -c -o $@ $<; \ @am__fastdepCC_TRUE@ then mv -f "$(DEPDIR)/$*.Tpo" "$(DEPDIR)/$*.Plo"; else rm -f "$(DEPDIR)/$*.Tpo"; exit 1; fi @AMDEP_TRUE@@am__fastdepCC_FALSE@ source='$<' object='$@' libtool=yes @AMDEPBACKSLASH@ @AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ @am__fastdepCC_FALSE@ $(LTCOMPILE) -c -o $@ $< mostlyclean-libtool: -rm -f *.lo clean-libtool: -rm -rf .libs _libs distclean-libtool: -rm -f libtool uninstall-info-am: ID: $(HEADERS) $(SOURCES) $(LISP) $(TAGS_FILES) list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \ unique=`for i in $$list; do \ if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \ done | \ $(AWK) ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ mkid -fID $$unique tags: TAGS TAGS: $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) \ $(TAGS_FILES) $(LISP) tags=; \ here=`pwd`; \ list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \ unique=`for i in $$list; do \ if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \ done | \ $(AWK) ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ if test -z "$(ETAGS_ARGS)$$tags$$unique"; then :; else \ test -n "$$unique" || unique=$$empty_fix; \ $(ETAGS) $(ETAGSFLAGS) $(AM_ETAGSFLAGS) $(ETAGS_ARGS) \ $$tags $$unique; \ fi ctags: CTAGS CTAGS: $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) \ $(TAGS_FILES) $(LISP) tags=; \ here=`pwd`; \ list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \ unique=`for i in $$list; do \ if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \ done | \ $(AWK) ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ test -z "$(CTAGS_ARGS)$$tags$$unique" \ || $(CTAGS) $(CTAGSFLAGS) $(AM_CTAGSFLAGS) $(CTAGS_ARGS) \ $$tags $$unique GTAGS: here=`$(am__cd) $(top_builddir) && pwd` \ && cd $(top_srcdir) \ && gtags -i $(GTAGS_ARGS) $$here distclean-tags: -rm -f TAGS ID GTAGS GRTAGS GSYMS GPATH tags distdir: $(DISTFILES) @srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; \ topsrcdirstrip=`echo "$(top_srcdir)" | sed 's|.|.|g'`; \ list='$(DISTFILES)'; for file in $$list; do \ case $$file in \ $(srcdir)/*) file=`echo "$$file" | sed "s|^$$srcdirstrip/||"`;; \ $(top_srcdir)/*) file=`echo "$$file" | sed "s|^$$topsrcdirstrip/|$(top_builddir)/|"`;; \ esac; \ if test -f $$file || test -d $$file; then d=.; else d=$(srcdir); fi; \ dir=`echo "$$file" | sed -e 's,/[^/]*$$,,'`; \ if test "$$dir" != "$$file" && test "$$dir" != "."; then \ dir="/$$dir"; \ $(mkdir_p) "$(distdir)$$dir"; \ else \ dir=''; \ fi; \ if test -d $$d/$$file; then \ if test -d $(srcdir)/$$file && test $$d != $(srcdir); then \ cp -pR $(srcdir)/$$file $(distdir)$$dir || exit 1; \ fi; \ cp -pR $$d/$$file $(distdir)$$dir || exit 1; \ else \ test -f $(distdir)/$$file \ || cp -p $$d/$$file $(distdir)/$$file \ || exit 1; \ fi; \ done check-am: all-am check: check-am all-am: Makefile $(LTLIBRARIES) installdirs: install: install-am install-exec: install-exec-am install-data: install-data-am uninstall: uninstall-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am installcheck: installcheck-am install-strip: $(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \ install_sh_PROGRAM="$(INSTALL_STRIP_PROGRAM)" INSTALL_STRIP_FLAG=-s \ `test -z '$(STRIP)' || \ echo "INSTALL_PROGRAM_ENV=STRIPPROG='$(STRIP)'"` install mostlyclean-generic: clean-generic: distclean-generic: -test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES) maintainer-clean-generic: @echo "This command is intended for maintainers to use" @echo "it deletes files that may require special tools to rebuild." clean: clean-am clean-am: clean-generic clean-libtool clean-noinstLTLIBRARIES \ mostlyclean-am distclean: distclean-am -rm -rf ./$(DEPDIR) -rm -f Makefile distclean-am: clean-am distclean-compile distclean-generic \ distclean-libtool distclean-tags dvi: dvi-am dvi-am: html: html-am info: info-am info-am: install-data-am: install-exec-am: install-info: install-info-am install-man: installcheck-am: maintainer-clean: maintainer-clean-am -rm -rf ./$(DEPDIR) -rm -f Makefile maintainer-clean-am: distclean-am maintainer-clean-generic mostlyclean: mostlyclean-am mostlyclean-am: mostlyclean-compile mostlyclean-generic \ mostlyclean-libtool pdf: pdf-am pdf-am: ps: ps-am ps-am: uninstall-am: uninstall-info-am .PHONY: CTAGS GTAGS all all-am check check-am clean clean-generic \ clean-libtool clean-noinstLTLIBRARIES ctags distclean \ distclean-compile distclean-generic distclean-libtool \ distclean-tags distdir dvi dvi-am html html-am info info-am \ install install-am install-data install-data-am install-exec \ install-exec-am install-info install-info-am install-man \ install-strip installcheck installcheck-am installdirs \ maintainer-clean maintainer-clean-generic mostlyclean \ mostlyclean-compile mostlyclean-generic mostlyclean-libtool \ pdf pdf-am ps ps-am tags uninstall uninstall-am \ uninstall-info-am # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: swish-e-2.4.7/src/snowball/stem_fi.c0000664000077100017500000006235111166010110014224 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #include "header.h" #ifdef __cplusplus extern "C" { #endif extern int finnish_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif static int r_tidy(struct SN_env * z); static int r_other_endings(struct SN_env * z); static int r_t_plural(struct SN_env * z); static int r_i_plural(struct SN_env * z); static int r_case_ending(struct SN_env * z); static int r_VI(struct SN_env * z); static int r_LONG(struct SN_env * z); static int r_possessive(struct SN_env * z); static int r_particle_etc(struct SN_env * z); static int r_R2(struct SN_env * z); static int r_mark_regions(struct SN_env * z); #ifdef __cplusplus extern "C" { #endif extern struct SN_env * finnish_ISO_8859_1_create_env(void); extern void finnish_ISO_8859_1_close_env(struct SN_env * z); #ifdef __cplusplus } #endif static const symbol s_0_0[2] = { 'p', 'a' }; static const symbol s_0_1[3] = { 's', 't', 'i' }; static const symbol s_0_2[4] = { 'k', 'a', 'a', 'n' }; static const symbol s_0_3[3] = { 'h', 'a', 'n' }; static const symbol s_0_4[3] = { 'k', 'i', 'n' }; static const symbol s_0_5[3] = { 'h', 0xE4, 'n' }; static const symbol s_0_6[4] = { 'k', 0xE4, 0xE4, 'n' }; static const symbol s_0_7[2] = { 'k', 'o' }; static const symbol s_0_8[2] = { 'p', 0xE4 }; static const symbol s_0_9[2] = { 'k', 0xF6 }; static const struct among a_0[10] = { /* 0 */ { 2, s_0_0, -1, 1, 0}, /* 1 */ { 3, s_0_1, -1, 2, 0}, /* 2 */ { 4, s_0_2, -1, 1, 0}, /* 3 */ { 3, s_0_3, -1, 1, 0}, /* 4 */ { 3, s_0_4, -1, 1, 0}, /* 5 */ { 3, s_0_5, -1, 1, 0}, /* 6 */ { 4, s_0_6, -1, 1, 0}, /* 7 */ { 2, s_0_7, -1, 1, 0}, /* 8 */ { 2, s_0_8, -1, 1, 0}, /* 9 */ { 2, s_0_9, -1, 1, 0} }; static const symbol s_1_0[3] = { 'l', 'l', 'a' }; static const symbol s_1_1[2] = { 'n', 'a' }; static const symbol s_1_2[3] = { 's', 's', 'a' }; static const symbol s_1_3[2] = { 't', 'a' }; static const symbol s_1_4[3] = { 'l', 't', 'a' }; static const symbol s_1_5[3] = { 's', 't', 'a' }; static const struct among a_1[6] = { /* 0 */ { 3, s_1_0, -1, -1, 0}, /* 1 */ { 2, s_1_1, -1, -1, 0}, /* 2 */ { 3, s_1_2, -1, -1, 0}, /* 3 */ { 2, s_1_3, -1, -1, 0}, /* 4 */ { 3, s_1_4, 3, -1, 0}, /* 5 */ { 3, s_1_5, 3, -1, 0} }; static const symbol s_2_0[3] = { 'l', 'l', 0xE4 }; static const symbol s_2_1[2] = { 'n', 0xE4 }; static const symbol s_2_2[3] = { 's', 's', 0xE4 }; static const symbol s_2_3[2] = { 't', 0xE4 }; static const symbol s_2_4[3] = { 'l', 't', 0xE4 }; static const symbol s_2_5[3] = { 's', 't', 0xE4 }; static const struct among a_2[6] = { /* 0 */ { 3, s_2_0, -1, -1, 0}, /* 1 */ { 2, s_2_1, -1, -1, 0}, /* 2 */ { 3, s_2_2, -1, -1, 0}, /* 3 */ { 2, s_2_3, -1, -1, 0}, /* 4 */ { 3, s_2_4, 3, -1, 0}, /* 5 */ { 3, s_2_5, 3, -1, 0} }; static const symbol s_3_0[3] = { 'l', 'l', 'e' }; static const symbol s_3_1[3] = { 'i', 'n', 'e' }; static const struct among a_3[2] = { /* 0 */ { 3, s_3_0, -1, -1, 0}, /* 1 */ { 3, s_3_1, -1, -1, 0} }; static const symbol s_4_0[3] = { 'n', 's', 'a' }; static const symbol s_4_1[3] = { 'm', 'm', 'e' }; static const symbol s_4_2[3] = { 'n', 'n', 'e' }; static const symbol s_4_3[2] = { 'n', 'i' }; static const symbol s_4_4[2] = { 's', 'i' }; static const symbol s_4_5[2] = { 'a', 'n' }; static const symbol s_4_6[2] = { 'e', 'n' }; static const symbol s_4_7[2] = { 0xE4, 'n' }; static const symbol s_4_8[3] = { 'n', 's', 0xE4 }; static const struct among a_4[9] = { /* 0 */ { 3, s_4_0, -1, 3, 0}, /* 1 */ { 3, s_4_1, -1, 3, 0}, /* 2 */ { 3, s_4_2, -1, 3, 0}, /* 3 */ { 2, s_4_3, -1, 2, 0}, /* 4 */ { 2, s_4_4, -1, 1, 0}, /* 5 */ { 2, s_4_5, -1, 4, 0}, /* 6 */ { 2, s_4_6, -1, 6, 0}, /* 7 */ { 2, s_4_7, -1, 5, 0}, /* 8 */ { 3, s_4_8, -1, 3, 0} }; static const symbol s_5_0[2] = { 'a', 'a' }; static const symbol s_5_1[2] = { 'e', 'e' }; static const symbol s_5_2[2] = { 'i', 'i' }; static const symbol s_5_3[2] = { 'o', 'o' }; static const symbol s_5_4[2] = { 'u', 'u' }; static const symbol s_5_5[2] = { 0xE4, 0xE4 }; static const symbol s_5_6[2] = { 0xF6, 0xF6 }; static const struct among a_5[7] = { /* 0 */ { 2, s_5_0, -1, -1, 0}, /* 1 */ { 2, s_5_1, -1, -1, 0}, /* 2 */ { 2, s_5_2, -1, -1, 0}, /* 3 */ { 2, s_5_3, -1, -1, 0}, /* 4 */ { 2, s_5_4, -1, -1, 0}, /* 5 */ { 2, s_5_5, -1, -1, 0}, /* 6 */ { 2, s_5_6, -1, -1, 0} }; static const symbol s_6_0[1] = { 'a' }; static const symbol s_6_1[3] = { 'l', 'l', 'a' }; static const symbol s_6_2[2] = { 'n', 'a' }; static const symbol s_6_3[3] = { 's', 's', 'a' }; static const symbol s_6_4[2] = { 't', 'a' }; static const symbol s_6_5[3] = { 'l', 't', 'a' }; static const symbol s_6_6[3] = { 's', 't', 'a' }; static const symbol s_6_7[3] = { 't', 't', 'a' }; static const symbol s_6_8[3] = { 'l', 'l', 'e' }; static const symbol s_6_9[3] = { 'i', 'n', 'e' }; static const symbol s_6_10[3] = { 'k', 's', 'i' }; static const symbol s_6_11[1] = { 'n' }; static const symbol s_6_12[3] = { 'h', 'a', 'n' }; static const symbol s_6_13[3] = { 'd', 'e', 'n' }; static const symbol s_6_14[4] = { 's', 'e', 'e', 'n' }; static const symbol s_6_15[3] = { 'h', 'e', 'n' }; static const symbol s_6_16[4] = { 't', 't', 'e', 'n' }; static const symbol s_6_17[3] = { 'h', 'i', 'n' }; static const symbol s_6_18[4] = { 's', 'i', 'i', 'n' }; static const symbol s_6_19[3] = { 'h', 'o', 'n' }; static const symbol s_6_20[3] = { 'h', 0xE4, 'n' }; static const symbol s_6_21[3] = { 'h', 0xF6, 'n' }; static const symbol s_6_22[1] = { 0xE4 }; static const symbol s_6_23[3] = { 'l', 'l', 0xE4 }; static const symbol s_6_24[2] = { 'n', 0xE4 }; static const symbol s_6_25[3] = { 's', 's', 0xE4 }; static const symbol s_6_26[2] = { 't', 0xE4 }; static const symbol s_6_27[3] = { 'l', 't', 0xE4 }; static const symbol s_6_28[3] = { 's', 't', 0xE4 }; static const symbol s_6_29[3] = { 't', 't', 0xE4 }; static const struct among a_6[30] = { /* 0 */ { 1, s_6_0, -1, 8, 0}, /* 1 */ { 3, s_6_1, 0, -1, 0}, /* 2 */ { 2, s_6_2, 0, -1, 0}, /* 3 */ { 3, s_6_3, 0, -1, 0}, /* 4 */ { 2, s_6_4, 0, -1, 0}, /* 5 */ { 3, s_6_5, 4, -1, 0}, /* 6 */ { 3, s_6_6, 4, -1, 0}, /* 7 */ { 3, s_6_7, 4, 9, 0}, /* 8 */ { 3, s_6_8, -1, -1, 0}, /* 9 */ { 3, s_6_9, -1, -1, 0}, /* 10 */ { 3, s_6_10, -1, -1, 0}, /* 11 */ { 1, s_6_11, -1, 7, 0}, /* 12 */ { 3, s_6_12, 11, 1, 0}, /* 13 */ { 3, s_6_13, 11, -1, r_VI}, /* 14 */ { 4, s_6_14, 11, -1, r_LONG}, /* 15 */ { 3, s_6_15, 11, 2, 0}, /* 16 */ { 4, s_6_16, 11, -1, r_VI}, /* 17 */ { 3, s_6_17, 11, 3, 0}, /* 18 */ { 4, s_6_18, 11, -1, r_VI}, /* 19 */ { 3, s_6_19, 11, 4, 0}, /* 20 */ { 3, s_6_20, 11, 5, 0}, /* 21 */ { 3, s_6_21, 11, 6, 0}, /* 22 */ { 1, s_6_22, -1, 8, 0}, /* 23 */ { 3, s_6_23, 22, -1, 0}, /* 24 */ { 2, s_6_24, 22, -1, 0}, /* 25 */ { 3, s_6_25, 22, -1, 0}, /* 26 */ { 2, s_6_26, 22, -1, 0}, /* 27 */ { 3, s_6_27, 26, -1, 0}, /* 28 */ { 3, s_6_28, 26, -1, 0}, /* 29 */ { 3, s_6_29, 26, 9, 0} }; static const symbol s_7_0[3] = { 'e', 'j', 'a' }; static const symbol s_7_1[3] = { 'm', 'm', 'a' }; static const symbol s_7_2[4] = { 'i', 'm', 'm', 'a' }; static const symbol s_7_3[3] = { 'm', 'p', 'a' }; static const symbol s_7_4[4] = { 'i', 'm', 'p', 'a' }; static const symbol s_7_5[3] = { 'm', 'm', 'i' }; static const symbol s_7_6[4] = { 'i', 'm', 'm', 'i' }; static const symbol s_7_7[3] = { 'm', 'p', 'i' }; static const symbol s_7_8[4] = { 'i', 'm', 'p', 'i' }; static const symbol s_7_9[3] = { 'e', 'j', 0xE4 }; static const symbol s_7_10[3] = { 'm', 'm', 0xE4 }; static const symbol s_7_11[4] = { 'i', 'm', 'm', 0xE4 }; static const symbol s_7_12[3] = { 'm', 'p', 0xE4 }; static const symbol s_7_13[4] = { 'i', 'm', 'p', 0xE4 }; static const struct among a_7[14] = { /* 0 */ { 3, s_7_0, -1, -1, 0}, /* 1 */ { 3, s_7_1, -1, 1, 0}, /* 2 */ { 4, s_7_2, 1, -1, 0}, /* 3 */ { 3, s_7_3, -1, 1, 0}, /* 4 */ { 4, s_7_4, 3, -1, 0}, /* 5 */ { 3, s_7_5, -1, 1, 0}, /* 6 */ { 4, s_7_6, 5, -1, 0}, /* 7 */ { 3, s_7_7, -1, 1, 0}, /* 8 */ { 4, s_7_8, 7, -1, 0}, /* 9 */ { 3, s_7_9, -1, -1, 0}, /* 10 */ { 3, s_7_10, -1, 1, 0}, /* 11 */ { 4, s_7_11, 10, -1, 0}, /* 12 */ { 3, s_7_12, -1, 1, 0}, /* 13 */ { 4, s_7_13, 12, -1, 0} }; static const symbol s_8_0[1] = { 'i' }; static const symbol s_8_1[1] = { 'j' }; static const struct among a_8[2] = { /* 0 */ { 1, s_8_0, -1, -1, 0}, /* 1 */ { 1, s_8_1, -1, -1, 0} }; static const symbol s_9_0[3] = { 'm', 'm', 'a' }; static const symbol s_9_1[4] = { 'i', 'm', 'm', 'a' }; static const struct among a_9[2] = { /* 0 */ { 3, s_9_0, -1, 1, 0}, /* 1 */ { 4, s_9_1, 0, -1, 0} }; static const unsigned char g_AEI[] = { 17, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8 }; static const unsigned char g_V1[] = { 17, 65, 16, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 32 }; static const unsigned char g_V2[] = { 17, 65, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 32 }; static const unsigned char g_particle_end[] = { 17, 97, 24, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 32 }; static const symbol s_0[] = { 'k' }; static const symbol s_1[] = { 'k', 's', 'e' }; static const symbol s_2[] = { 'k', 's', 'i' }; static const symbol s_3[] = { 'i' }; static const symbol s_4[] = { 'a' }; static const symbol s_5[] = { 'e' }; static const symbol s_6[] = { 'i' }; static const symbol s_7[] = { 'o' }; static const symbol s_8[] = { 0xE4 }; static const symbol s_9[] = { 0xF6 }; static const symbol s_10[] = { 'i', 'e' }; static const symbol s_11[] = { 'e' }; static const symbol s_12[] = { 'p', 'o' }; static const symbol s_13[] = { 't' }; static const symbol s_14[] = { 'p', 'o' }; static const symbol s_15[] = { 'j' }; static const symbol s_16[] = { 'o' }; static const symbol s_17[] = { 'u' }; static const symbol s_18[] = { 'o' }; static const symbol s_19[] = { 'j' }; static int r_mark_regions(struct SN_env * z) { z->I[0] = z->l; z->I[1] = z->l; if (out_grouping(z, g_V1, 97, 246, 1) < 0) return 0; /* goto */ /* grouping V1, line 46 */ { /* gopast */ /* non V1, line 46 */ int ret = in_grouping(z, g_V1, 97, 246, 1); if (ret < 0) return 0; z->c += ret; } z->I[0] = z->c; /* setmark p1, line 46 */ if (out_grouping(z, g_V1, 97, 246, 1) < 0) return 0; /* goto */ /* grouping V1, line 47 */ { /* gopast */ /* non V1, line 47 */ int ret = in_grouping(z, g_V1, 97, 246, 1); if (ret < 0) return 0; z->c += ret; } z->I[1] = z->c; /* setmark p2, line 47 */ return 1; } static int r_R2(struct SN_env * z) { if (!(z->I[1] <= z->c)) return 0; return 1; } static int r_particle_etc(struct SN_env * z) { int among_var; { int mlimit; /* setlimit, line 55 */ int m1 = z->l - z->c; (void)m1; if (z->c < z->I[0]) return 0; z->c = z->I[0]; /* tomark, line 55 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m1; z->ket = z->c; /* [, line 55 */ among_var = find_among_b(z, a_0, 10); /* substring, line 55 */ if (!(among_var)) { z->lb = mlimit; return 0; } z->bra = z->c; /* ], line 55 */ z->lb = mlimit; } switch(among_var) { case 0: return 0; case 1: if (in_grouping_b(z, g_particle_end, 97, 246, 0)) return 0; break; case 2: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 64 */ if (ret < 0) return ret; } break; } { int ret = slice_del(z); /* delete, line 66 */ if (ret < 0) return ret; } return 1; } static int r_possessive(struct SN_env * z) { int among_var; { int mlimit; /* setlimit, line 69 */ int m1 = z->l - z->c; (void)m1; if (z->c < z->I[0]) return 0; z->c = z->I[0]; /* tomark, line 69 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m1; z->ket = z->c; /* [, line 69 */ among_var = find_among_b(z, a_4, 9); /* substring, line 69 */ if (!(among_var)) { z->lb = mlimit; return 0; } z->bra = z->c; /* ], line 69 */ z->lb = mlimit; } switch(among_var) { case 0: return 0; case 1: { int m2 = z->l - z->c; (void)m2; /* not, line 72 */ if (!(eq_s_b(z, 1, s_0))) goto lab0; return 0; lab0: z->c = z->l - m2; } { int ret = slice_del(z); /* delete, line 72 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_del(z); /* delete, line 74 */ if (ret < 0) return ret; } z->ket = z->c; /* [, line 74 */ if (!(eq_s_b(z, 3, s_1))) return 0; z->bra = z->c; /* ], line 74 */ { int ret = slice_from_s(z, 3, s_2); /* <-, line 74 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_del(z); /* delete, line 78 */ if (ret < 0) return ret; } break; case 4: if (z->c - 1 <= z->lb || z->p[z->c - 1] != 97) return 0; if (!(find_among_b(z, a_1, 6))) return 0; /* among, line 81 */ { int ret = slice_del(z); /* delete, line 81 */ if (ret < 0) return ret; } break; case 5: if (z->c - 1 <= z->lb || z->p[z->c - 1] != 228) return 0; if (!(find_among_b(z, a_2, 6))) return 0; /* among, line 83 */ { int ret = slice_del(z); /* delete, line 84 */ if (ret < 0) return ret; } break; case 6: if (z->c - 2 <= z->lb || z->p[z->c - 1] != 101) return 0; if (!(find_among_b(z, a_3, 2))) return 0; /* among, line 86 */ { int ret = slice_del(z); /* delete, line 86 */ if (ret < 0) return ret; } break; } return 1; } static int r_LONG(struct SN_env * z) { if (!(find_among_b(z, a_5, 7))) return 0; /* among, line 91 */ return 1; } static int r_VI(struct SN_env * z) { if (!(eq_s_b(z, 1, s_3))) return 0; if (in_grouping_b(z, g_V2, 97, 246, 0)) return 0; return 1; } static int r_case_ending(struct SN_env * z) { int among_var; { int mlimit; /* setlimit, line 96 */ int m1 = z->l - z->c; (void)m1; if (z->c < z->I[0]) return 0; z->c = z->I[0]; /* tomark, line 96 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m1; z->ket = z->c; /* [, line 96 */ among_var = find_among_b(z, a_6, 30); /* substring, line 96 */ if (!(among_var)) { z->lb = mlimit; return 0; } z->bra = z->c; /* ], line 96 */ z->lb = mlimit; } switch(among_var) { case 0: return 0; case 1: if (!(eq_s_b(z, 1, s_4))) return 0; break; case 2: if (!(eq_s_b(z, 1, s_5))) return 0; break; case 3: if (!(eq_s_b(z, 1, s_6))) return 0; break; case 4: if (!(eq_s_b(z, 1, s_7))) return 0; break; case 5: if (!(eq_s_b(z, 1, s_8))) return 0; break; case 6: if (!(eq_s_b(z, 1, s_9))) return 0; break; case 7: { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 111 */ { int m2 = z->l - z->c; (void)m2; /* and, line 113 */ { int m3 = z->l - z->c; (void)m3; /* or, line 112 */ { int ret = r_LONG(z); if (ret == 0) goto lab2; /* call LONG, line 111 */ if (ret < 0) return ret; } goto lab1; lab2: z->c = z->l - m3; if (!(eq_s_b(z, 2, s_10))) { z->c = z->l - m_keep; goto lab0; } } lab1: z->c = z->l - m2; if (z->c <= z->lb) { z->c = z->l - m_keep; goto lab0; } z->c--; /* next, line 113 */ } z->bra = z->c; /* ], line 113 */ lab0: ; } break; case 8: if (in_grouping_b(z, g_V1, 97, 246, 0)) return 0; if (out_grouping_b(z, g_V1, 97, 246, 0)) return 0; break; case 9: if (!(eq_s_b(z, 1, s_11))) return 0; break; } { int ret = slice_del(z); /* delete, line 138 */ if (ret < 0) return ret; } z->B[0] = 1; /* set ending_removed, line 139 */ return 1; } static int r_other_endings(struct SN_env * z) { int among_var; { int mlimit; /* setlimit, line 142 */ int m1 = z->l - z->c; (void)m1; if (z->c < z->I[1]) return 0; z->c = z->I[1]; /* tomark, line 142 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m1; z->ket = z->c; /* [, line 142 */ among_var = find_among_b(z, a_7, 14); /* substring, line 142 */ if (!(among_var)) { z->lb = mlimit; return 0; } z->bra = z->c; /* ], line 142 */ z->lb = mlimit; } switch(among_var) { case 0: return 0; case 1: { int m2 = z->l - z->c; (void)m2; /* not, line 146 */ if (!(eq_s_b(z, 2, s_12))) goto lab0; return 0; lab0: z->c = z->l - m2; } break; } { int ret = slice_del(z); /* delete, line 151 */ if (ret < 0) return ret; } return 1; } static int r_i_plural(struct SN_env * z) { { int mlimit; /* setlimit, line 154 */ int m1 = z->l - z->c; (void)m1; if (z->c < z->I[0]) return 0; z->c = z->I[0]; /* tomark, line 154 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m1; z->ket = z->c; /* [, line 154 */ if (z->c <= z->lb || (z->p[z->c - 1] != 105 && z->p[z->c - 1] != 106)) { z->lb = mlimit; return 0; } if (!(find_among_b(z, a_8, 2))) { z->lb = mlimit; return 0; } /* substring, line 154 */ z->bra = z->c; /* ], line 154 */ z->lb = mlimit; } { int ret = slice_del(z); /* delete, line 158 */ if (ret < 0) return ret; } return 1; } static int r_t_plural(struct SN_env * z) { int among_var; { int mlimit; /* setlimit, line 161 */ int m1 = z->l - z->c; (void)m1; if (z->c < z->I[0]) return 0; z->c = z->I[0]; /* tomark, line 161 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m1; z->ket = z->c; /* [, line 162 */ if (!(eq_s_b(z, 1, s_13))) { z->lb = mlimit; return 0; } z->bra = z->c; /* ], line 162 */ { int m_test = z->l - z->c; /* test, line 162 */ if (in_grouping_b(z, g_V1, 97, 246, 0)) { z->lb = mlimit; return 0; } z->c = z->l - m_test; } { int ret = slice_del(z); /* delete, line 163 */ if (ret < 0) return ret; } z->lb = mlimit; } { int mlimit; /* setlimit, line 165 */ int m2 = z->l - z->c; (void)m2; if (z->c < z->I[1]) return 0; z->c = z->I[1]; /* tomark, line 165 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m2; z->ket = z->c; /* [, line 165 */ if (z->c - 2 <= z->lb || z->p[z->c - 1] != 97) { z->lb = mlimit; return 0; } among_var = find_among_b(z, a_9, 2); /* substring, line 165 */ if (!(among_var)) { z->lb = mlimit; return 0; } z->bra = z->c; /* ], line 165 */ z->lb = mlimit; } switch(among_var) { case 0: return 0; case 1: { int m3 = z->l - z->c; (void)m3; /* not, line 167 */ if (!(eq_s_b(z, 2, s_14))) goto lab0; return 0; lab0: z->c = z->l - m3; } break; } { int ret = slice_del(z); /* delete, line 170 */ if (ret < 0) return ret; } return 1; } static int r_tidy(struct SN_env * z) { { int mlimit; /* setlimit, line 173 */ int m1 = z->l - z->c; (void)m1; if (z->c < z->I[0]) return 0; z->c = z->I[0]; /* tomark, line 173 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m1; { int m2 = z->l - z->c; (void)m2; /* do, line 174 */ { int m3 = z->l - z->c; (void)m3; /* and, line 174 */ { int ret = r_LONG(z); if (ret == 0) goto lab0; /* call LONG, line 174 */ if (ret < 0) return ret; } z->c = z->l - m3; z->ket = z->c; /* [, line 174 */ if (z->c <= z->lb) goto lab0; z->c--; /* next, line 174 */ z->bra = z->c; /* ], line 174 */ { int ret = slice_del(z); /* delete, line 174 */ if (ret < 0) return ret; } } lab0: z->c = z->l - m2; } { int m4 = z->l - z->c; (void)m4; /* do, line 175 */ z->ket = z->c; /* [, line 175 */ if (in_grouping_b(z, g_AEI, 97, 228, 0)) goto lab1; z->bra = z->c; /* ], line 175 */ if (out_grouping_b(z, g_V1, 97, 246, 0)) goto lab1; { int ret = slice_del(z); /* delete, line 175 */ if (ret < 0) return ret; } lab1: z->c = z->l - m4; } { int m5 = z->l - z->c; (void)m5; /* do, line 176 */ z->ket = z->c; /* [, line 176 */ if (!(eq_s_b(z, 1, s_15))) goto lab2; z->bra = z->c; /* ], line 176 */ { int m6 = z->l - z->c; (void)m6; /* or, line 176 */ if (!(eq_s_b(z, 1, s_16))) goto lab4; goto lab3; lab4: z->c = z->l - m6; if (!(eq_s_b(z, 1, s_17))) goto lab2; } lab3: { int ret = slice_del(z); /* delete, line 176 */ if (ret < 0) return ret; } lab2: z->c = z->l - m5; } { int m7 = z->l - z->c; (void)m7; /* do, line 177 */ z->ket = z->c; /* [, line 177 */ if (!(eq_s_b(z, 1, s_18))) goto lab5; z->bra = z->c; /* ], line 177 */ if (!(eq_s_b(z, 1, s_19))) goto lab5; { int ret = slice_del(z); /* delete, line 177 */ if (ret < 0) return ret; } lab5: z->c = z->l - m7; } z->lb = mlimit; } if (in_grouping_b(z, g_V1, 97, 246, 1) < 0) return 0; /* goto */ /* non V1, line 179 */ z->ket = z->c; /* [, line 179 */ if (z->c <= z->lb) return 0; z->c--; /* next, line 179 */ z->bra = z->c; /* ], line 179 */ z->S[0] = slice_to(z, z->S[0]); /* -> x, line 179 */ if (z->S[0] == 0) return -1; /* -> x, line 179 */ if (!(eq_v_b(z, z->S[0]))) return 0; /* name x, line 179 */ { int ret = slice_del(z); /* delete, line 179 */ if (ret < 0) return ret; } return 1; } extern int finnish_ISO_8859_1_stem(struct SN_env * z) { { int c1 = z->c; /* do, line 185 */ { int ret = r_mark_regions(z); if (ret == 0) goto lab0; /* call mark_regions, line 185 */ if (ret < 0) return ret; } lab0: z->c = c1; } z->B[0] = 0; /* unset ending_removed, line 186 */ z->lb = z->c; z->c = z->l; /* backwards, line 187 */ { int m2 = z->l - z->c; (void)m2; /* do, line 188 */ { int ret = r_particle_etc(z); if (ret == 0) goto lab1; /* call particle_etc, line 188 */ if (ret < 0) return ret; } lab1: z->c = z->l - m2; } { int m3 = z->l - z->c; (void)m3; /* do, line 189 */ { int ret = r_possessive(z); if (ret == 0) goto lab2; /* call possessive, line 189 */ if (ret < 0) return ret; } lab2: z->c = z->l - m3; } { int m4 = z->l - z->c; (void)m4; /* do, line 190 */ { int ret = r_case_ending(z); if (ret == 0) goto lab3; /* call case_ending, line 190 */ if (ret < 0) return ret; } lab3: z->c = z->l - m4; } { int m5 = z->l - z->c; (void)m5; /* do, line 191 */ { int ret = r_other_endings(z); if (ret == 0) goto lab4; /* call other_endings, line 191 */ if (ret < 0) return ret; } lab4: z->c = z->l - m5; } { int m6 = z->l - z->c; (void)m6; /* or, line 192 */ if (!(z->B[0])) goto lab6; /* Boolean test ending_removed, line 192 */ { int m7 = z->l - z->c; (void)m7; /* do, line 192 */ { int ret = r_i_plural(z); if (ret == 0) goto lab7; /* call i_plural, line 192 */ if (ret < 0) return ret; } lab7: z->c = z->l - m7; } goto lab5; lab6: z->c = z->l - m6; { int m8 = z->l - z->c; (void)m8; /* do, line 192 */ { int ret = r_t_plural(z); if (ret == 0) goto lab8; /* call t_plural, line 192 */ if (ret < 0) return ret; } lab8: z->c = z->l - m8; } } lab5: { int m9 = z->l - z->c; (void)m9; /* do, line 193 */ { int ret = r_tidy(z); if (ret == 0) goto lab9; /* call tidy, line 193 */ if (ret < 0) return ret; } lab9: z->c = z->l - m9; } z->c = z->lb; return 1; } extern struct SN_env * finnish_ISO_8859_1_create_env(void) { return SN_create_env(1, 2, 1); } extern void finnish_ISO_8859_1_close_env(struct SN_env * z) { SN_close_env(z, 1); } swish-e-2.4.7/src/snowball/stem_nl.h0000664000077100017500000000050211166010110014232 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #ifdef __cplusplus extern "C" { #endif extern struct SN_env * dutch_ISO_8859_1_create_env(void); extern void dutch_ISO_8859_1_close_env(struct SN_env * z); extern int dutch_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif swish-e-2.4.7/src/snowball/stem_en1.c0000664000077100017500000006054311166010107014320 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #include "header.h" #ifdef __cplusplus extern "C" { #endif extern int porter_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif static int r_Step_5b(struct SN_env * z); static int r_Step_5a(struct SN_env * z); static int r_Step_4(struct SN_env * z); static int r_Step_3(struct SN_env * z); static int r_Step_2(struct SN_env * z); static int r_Step_1c(struct SN_env * z); static int r_Step_1b(struct SN_env * z); static int r_Step_1a(struct SN_env * z); static int r_R2(struct SN_env * z); static int r_R1(struct SN_env * z); static int r_shortv(struct SN_env * z); #ifdef __cplusplus extern "C" { #endif extern struct SN_env * porter_ISO_8859_1_create_env(void); extern void porter_ISO_8859_1_close_env(struct SN_env * z); #ifdef __cplusplus } #endif static const symbol s_0_0[1] = { 's' }; static const symbol s_0_1[3] = { 'i', 'e', 's' }; static const symbol s_0_2[4] = { 's', 's', 'e', 's' }; static const symbol s_0_3[2] = { 's', 's' }; static const struct among a_0[4] = { /* 0 */ { 1, s_0_0, -1, 3, 0}, /* 1 */ { 3, s_0_1, 0, 2, 0}, /* 2 */ { 4, s_0_2, 0, 1, 0}, /* 3 */ { 2, s_0_3, 0, -1, 0} }; static const symbol s_1_1[2] = { 'b', 'b' }; static const symbol s_1_2[2] = { 'd', 'd' }; static const symbol s_1_3[2] = { 'f', 'f' }; static const symbol s_1_4[2] = { 'g', 'g' }; static const symbol s_1_5[2] = { 'b', 'l' }; static const symbol s_1_6[2] = { 'm', 'm' }; static const symbol s_1_7[2] = { 'n', 'n' }; static const symbol s_1_8[2] = { 'p', 'p' }; static const symbol s_1_9[2] = { 'r', 'r' }; static const symbol s_1_10[2] = { 'a', 't' }; static const symbol s_1_11[2] = { 't', 't' }; static const symbol s_1_12[2] = { 'i', 'z' }; static const struct among a_1[13] = { /* 0 */ { 0, 0, -1, 3, 0}, /* 1 */ { 2, s_1_1, 0, 2, 0}, /* 2 */ { 2, s_1_2, 0, 2, 0}, /* 3 */ { 2, s_1_3, 0, 2, 0}, /* 4 */ { 2, s_1_4, 0, 2, 0}, /* 5 */ { 2, s_1_5, 0, 1, 0}, /* 6 */ { 2, s_1_6, 0, 2, 0}, /* 7 */ { 2, s_1_7, 0, 2, 0}, /* 8 */ { 2, s_1_8, 0, 2, 0}, /* 9 */ { 2, s_1_9, 0, 2, 0}, /* 10 */ { 2, s_1_10, 0, 1, 0}, /* 11 */ { 2, s_1_11, 0, 2, 0}, /* 12 */ { 2, s_1_12, 0, 1, 0} }; static const symbol s_2_0[2] = { 'e', 'd' }; static const symbol s_2_1[3] = { 'e', 'e', 'd' }; static const symbol s_2_2[3] = { 'i', 'n', 'g' }; static const struct among a_2[3] = { /* 0 */ { 2, s_2_0, -1, 2, 0}, /* 1 */ { 3, s_2_1, 0, 1, 0}, /* 2 */ { 3, s_2_2, -1, 2, 0} }; static const symbol s_3_0[4] = { 'a', 'n', 'c', 'i' }; static const symbol s_3_1[4] = { 'e', 'n', 'c', 'i' }; static const symbol s_3_2[4] = { 'a', 'b', 'l', 'i' }; static const symbol s_3_3[3] = { 'e', 'l', 'i' }; static const symbol s_3_4[4] = { 'a', 'l', 'l', 'i' }; static const symbol s_3_5[5] = { 'o', 'u', 's', 'l', 'i' }; static const symbol s_3_6[5] = { 'e', 'n', 't', 'l', 'i' }; static const symbol s_3_7[5] = { 'a', 'l', 'i', 't', 'i' }; static const symbol s_3_8[6] = { 'b', 'i', 'l', 'i', 't', 'i' }; static const symbol s_3_9[5] = { 'i', 'v', 'i', 't', 'i' }; static const symbol s_3_10[6] = { 't', 'i', 'o', 'n', 'a', 'l' }; static const symbol s_3_11[7] = { 'a', 't', 'i', 'o', 'n', 'a', 'l' }; static const symbol s_3_12[5] = { 'a', 'l', 'i', 's', 'm' }; static const symbol s_3_13[5] = { 'a', 't', 'i', 'o', 'n' }; static const symbol s_3_14[7] = { 'i', 'z', 'a', 't', 'i', 'o', 'n' }; static const symbol s_3_15[4] = { 'i', 'z', 'e', 'r' }; static const symbol s_3_16[4] = { 'a', 't', 'o', 'r' }; static const symbol s_3_17[7] = { 'i', 'v', 'e', 'n', 'e', 's', 's' }; static const symbol s_3_18[7] = { 'f', 'u', 'l', 'n', 'e', 's', 's' }; static const symbol s_3_19[7] = { 'o', 'u', 's', 'n', 'e', 's', 's' }; static const struct among a_3[20] = { /* 0 */ { 4, s_3_0, -1, 3, 0}, /* 1 */ { 4, s_3_1, -1, 2, 0}, /* 2 */ { 4, s_3_2, -1, 4, 0}, /* 3 */ { 3, s_3_3, -1, 6, 0}, /* 4 */ { 4, s_3_4, -1, 9, 0}, /* 5 */ { 5, s_3_5, -1, 12, 0}, /* 6 */ { 5, s_3_6, -1, 5, 0}, /* 7 */ { 5, s_3_7, -1, 10, 0}, /* 8 */ { 6, s_3_8, -1, 14, 0}, /* 9 */ { 5, s_3_9, -1, 13, 0}, /* 10 */ { 6, s_3_10, -1, 1, 0}, /* 11 */ { 7, s_3_11, 10, 8, 0}, /* 12 */ { 5, s_3_12, -1, 10, 0}, /* 13 */ { 5, s_3_13, -1, 8, 0}, /* 14 */ { 7, s_3_14, 13, 7, 0}, /* 15 */ { 4, s_3_15, -1, 7, 0}, /* 16 */ { 4, s_3_16, -1, 8, 0}, /* 17 */ { 7, s_3_17, -1, 13, 0}, /* 18 */ { 7, s_3_18, -1, 11, 0}, /* 19 */ { 7, s_3_19, -1, 12, 0} }; static const symbol s_4_0[5] = { 'i', 'c', 'a', 't', 'e' }; static const symbol s_4_1[5] = { 'a', 't', 'i', 'v', 'e' }; static const symbol s_4_2[5] = { 'a', 'l', 'i', 'z', 'e' }; static const symbol s_4_3[5] = { 'i', 'c', 'i', 't', 'i' }; static const symbol s_4_4[4] = { 'i', 'c', 'a', 'l' }; static const symbol s_4_5[3] = { 'f', 'u', 'l' }; static const symbol s_4_6[4] = { 'n', 'e', 's', 's' }; static const struct among a_4[7] = { /* 0 */ { 5, s_4_0, -1, 2, 0}, /* 1 */ { 5, s_4_1, -1, 3, 0}, /* 2 */ { 5, s_4_2, -1, 1, 0}, /* 3 */ { 5, s_4_3, -1, 2, 0}, /* 4 */ { 4, s_4_4, -1, 2, 0}, /* 5 */ { 3, s_4_5, -1, 3, 0}, /* 6 */ { 4, s_4_6, -1, 3, 0} }; static const symbol s_5_0[2] = { 'i', 'c' }; static const symbol s_5_1[4] = { 'a', 'n', 'c', 'e' }; static const symbol s_5_2[4] = { 'e', 'n', 'c', 'e' }; static const symbol s_5_3[4] = { 'a', 'b', 'l', 'e' }; static const symbol s_5_4[4] = { 'i', 'b', 'l', 'e' }; static const symbol s_5_5[3] = { 'a', 't', 'e' }; static const symbol s_5_6[3] = { 'i', 'v', 'e' }; static const symbol s_5_7[3] = { 'i', 'z', 'e' }; static const symbol s_5_8[3] = { 'i', 't', 'i' }; static const symbol s_5_9[2] = { 'a', 'l' }; static const symbol s_5_10[3] = { 'i', 's', 'm' }; static const symbol s_5_11[3] = { 'i', 'o', 'n' }; static const symbol s_5_12[2] = { 'e', 'r' }; static const symbol s_5_13[3] = { 'o', 'u', 's' }; static const symbol s_5_14[3] = { 'a', 'n', 't' }; static const symbol s_5_15[3] = { 'e', 'n', 't' }; static const symbol s_5_16[4] = { 'm', 'e', 'n', 't' }; static const symbol s_5_17[5] = { 'e', 'm', 'e', 'n', 't' }; static const symbol s_5_18[2] = { 'o', 'u' }; static const struct among a_5[19] = { /* 0 */ { 2, s_5_0, -1, 1, 0}, /* 1 */ { 4, s_5_1, -1, 1, 0}, /* 2 */ { 4, s_5_2, -1, 1, 0}, /* 3 */ { 4, s_5_3, -1, 1, 0}, /* 4 */ { 4, s_5_4, -1, 1, 0}, /* 5 */ { 3, s_5_5, -1, 1, 0}, /* 6 */ { 3, s_5_6, -1, 1, 0}, /* 7 */ { 3, s_5_7, -1, 1, 0}, /* 8 */ { 3, s_5_8, -1, 1, 0}, /* 9 */ { 2, s_5_9, -1, 1, 0}, /* 10 */ { 3, s_5_10, -1, 1, 0}, /* 11 */ { 3, s_5_11, -1, 2, 0}, /* 12 */ { 2, s_5_12, -1, 1, 0}, /* 13 */ { 3, s_5_13, -1, 1, 0}, /* 14 */ { 3, s_5_14, -1, 1, 0}, /* 15 */ { 3, s_5_15, -1, 1, 0}, /* 16 */ { 4, s_5_16, 15, 1, 0}, /* 17 */ { 5, s_5_17, 16, 1, 0}, /* 18 */ { 2, s_5_18, -1, 1, 0} }; static const unsigned char g_v[] = { 17, 65, 16, 1 }; static const unsigned char g_v_WXY[] = { 1, 17, 65, 208, 1 }; static const symbol s_0[] = { 's', 's' }; static const symbol s_1[] = { 'i' }; static const symbol s_2[] = { 'e', 'e' }; static const symbol s_3[] = { 'e' }; static const symbol s_4[] = { 'e' }; static const symbol s_5[] = { 'y' }; static const symbol s_6[] = { 'Y' }; static const symbol s_7[] = { 'i' }; static const symbol s_8[] = { 't', 'i', 'o', 'n' }; static const symbol s_9[] = { 'e', 'n', 'c', 'e' }; static const symbol s_10[] = { 'a', 'n', 'c', 'e' }; static const symbol s_11[] = { 'a', 'b', 'l', 'e' }; static const symbol s_12[] = { 'e', 'n', 't' }; static const symbol s_13[] = { 'e' }; static const symbol s_14[] = { 'i', 'z', 'e' }; static const symbol s_15[] = { 'a', 't', 'e' }; static const symbol s_16[] = { 'a', 'l' }; static const symbol s_17[] = { 'a', 'l' }; static const symbol s_18[] = { 'f', 'u', 'l' }; static const symbol s_19[] = { 'o', 'u', 's' }; static const symbol s_20[] = { 'i', 'v', 'e' }; static const symbol s_21[] = { 'b', 'l', 'e' }; static const symbol s_22[] = { 'a', 'l' }; static const symbol s_23[] = { 'i', 'c' }; static const symbol s_24[] = { 's' }; static const symbol s_25[] = { 't' }; static const symbol s_26[] = { 'e' }; static const symbol s_27[] = { 'l' }; static const symbol s_28[] = { 'l' }; static const symbol s_29[] = { 'y' }; static const symbol s_30[] = { 'Y' }; static const symbol s_31[] = { 'y' }; static const symbol s_32[] = { 'Y' }; static const symbol s_33[] = { 'Y' }; static const symbol s_34[] = { 'y' }; static int r_shortv(struct SN_env * z) { if (out_grouping_b(z, g_v_WXY, 89, 121, 0)) return 0; if (in_grouping_b(z, g_v, 97, 121, 0)) return 0; if (out_grouping_b(z, g_v, 97, 121, 0)) return 0; return 1; } static int r_R1(struct SN_env * z) { if (!(z->I[0] <= z->c)) return 0; return 1; } static int r_R2(struct SN_env * z) { if (!(z->I[1] <= z->c)) return 0; return 1; } static int r_Step_1a(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 25 */ if (z->c <= z->lb || z->p[z->c - 1] != 115) return 0; among_var = find_among_b(z, a_0, 4); /* substring, line 25 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 25 */ switch(among_var) { case 0: return 0; case 1: { int ret = slice_from_s(z, 2, s_0); /* <-, line 26 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 1, s_1); /* <-, line 27 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_del(z); /* delete, line 29 */ if (ret < 0) return ret; } break; } return 1; } static int r_Step_1b(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 34 */ if (z->c - 1 <= z->lb || (z->p[z->c - 1] != 100 && z->p[z->c - 1] != 103)) return 0; among_var = find_among_b(z, a_2, 3); /* substring, line 34 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 34 */ switch(among_var) { case 0: return 0; case 1: { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 35 */ if (ret < 0) return ret; } { int ret = slice_from_s(z, 2, s_2); /* <-, line 35 */ if (ret < 0) return ret; } break; case 2: { int m_test = z->l - z->c; /* test, line 38 */ { /* gopast */ /* grouping v, line 38 */ int ret = out_grouping_b(z, g_v, 97, 121, 1); if (ret < 0) return 0; z->c -= ret; } z->c = z->l - m_test; } { int ret = slice_del(z); /* delete, line 38 */ if (ret < 0) return ret; } { int m_test = z->l - z->c; /* test, line 39 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((68514004 >> (z->p[z->c - 1] & 0x1f)) & 1)) among_var = 3; else among_var = find_among_b(z, a_1, 13); /* substring, line 39 */ if (!(among_var)) return 0; z->c = z->l - m_test; } switch(among_var) { case 0: return 0; case 1: { int c_keep = z->c; int ret = insert_s(z, z->c, z->c, 1, s_3); /* <+, line 41 */ z->c = c_keep; if (ret < 0) return ret; } break; case 2: z->ket = z->c; /* [, line 44 */ if (z->c <= z->lb) return 0; z->c--; /* next, line 44 */ z->bra = z->c; /* ], line 44 */ { int ret = slice_del(z); /* delete, line 44 */ if (ret < 0) return ret; } break; case 3: if (z->c != z->I[0]) return 0; /* atmark, line 45 */ { int m_test = z->l - z->c; /* test, line 45 */ { int ret = r_shortv(z); if (ret == 0) return 0; /* call shortv, line 45 */ if (ret < 0) return ret; } z->c = z->l - m_test; } { int c_keep = z->c; int ret = insert_s(z, z->c, z->c, 1, s_4); /* <+, line 45 */ z->c = c_keep; if (ret < 0) return ret; } break; } break; } return 1; } static int r_Step_1c(struct SN_env * z) { z->ket = z->c; /* [, line 52 */ { int m1 = z->l - z->c; (void)m1; /* or, line 52 */ if (!(eq_s_b(z, 1, s_5))) goto lab1; goto lab0; lab1: z->c = z->l - m1; if (!(eq_s_b(z, 1, s_6))) return 0; } lab0: z->bra = z->c; /* ], line 52 */ { /* gopast */ /* grouping v, line 53 */ int ret = out_grouping_b(z, g_v, 97, 121, 1); if (ret < 0) return 0; z->c -= ret; } { int ret = slice_from_s(z, 1, s_7); /* <-, line 54 */ if (ret < 0) return ret; } return 1; } static int r_Step_2(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 58 */ if (z->c - 2 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((815616 >> (z->p[z->c - 1] & 0x1f)) & 1)) return 0; among_var = find_among_b(z, a_3, 20); /* substring, line 58 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 58 */ { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 58 */ if (ret < 0) return ret; } switch(among_var) { case 0: return 0; case 1: { int ret = slice_from_s(z, 4, s_8); /* <-, line 59 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 4, s_9); /* <-, line 60 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_from_s(z, 4, s_10); /* <-, line 61 */ if (ret < 0) return ret; } break; case 4: { int ret = slice_from_s(z, 4, s_11); /* <-, line 62 */ if (ret < 0) return ret; } break; case 5: { int ret = slice_from_s(z, 3, s_12); /* <-, line 63 */ if (ret < 0) return ret; } break; case 6: { int ret = slice_from_s(z, 1, s_13); /* <-, line 64 */ if (ret < 0) return ret; } break; case 7: { int ret = slice_from_s(z, 3, s_14); /* <-, line 66 */ if (ret < 0) return ret; } break; case 8: { int ret = slice_from_s(z, 3, s_15); /* <-, line 68 */ if (ret < 0) return ret; } break; case 9: { int ret = slice_from_s(z, 2, s_16); /* <-, line 69 */ if (ret < 0) return ret; } break; case 10: { int ret = slice_from_s(z, 2, s_17); /* <-, line 71 */ if (ret < 0) return ret; } break; case 11: { int ret = slice_from_s(z, 3, s_18); /* <-, line 72 */ if (ret < 0) return ret; } break; case 12: { int ret = slice_from_s(z, 3, s_19); /* <-, line 74 */ if (ret < 0) return ret; } break; case 13: { int ret = slice_from_s(z, 3, s_20); /* <-, line 76 */ if (ret < 0) return ret; } break; case 14: { int ret = slice_from_s(z, 3, s_21); /* <-, line 77 */ if (ret < 0) return ret; } break; } return 1; } static int r_Step_3(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 82 */ if (z->c - 2 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((528928 >> (z->p[z->c - 1] & 0x1f)) & 1)) return 0; among_var = find_among_b(z, a_4, 7); /* substring, line 82 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 82 */ { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 82 */ if (ret < 0) return ret; } switch(among_var) { case 0: return 0; case 1: { int ret = slice_from_s(z, 2, s_22); /* <-, line 83 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 2, s_23); /* <-, line 85 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_del(z); /* delete, line 87 */ if (ret < 0) return ret; } break; } return 1; } static int r_Step_4(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 92 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((3961384 >> (z->p[z->c - 1] & 0x1f)) & 1)) return 0; among_var = find_among_b(z, a_5, 19); /* substring, line 92 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 92 */ { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 92 */ if (ret < 0) return ret; } switch(among_var) { case 0: return 0; case 1: { int ret = slice_del(z); /* delete, line 95 */ if (ret < 0) return ret; } break; case 2: { int m1 = z->l - z->c; (void)m1; /* or, line 96 */ if (!(eq_s_b(z, 1, s_24))) goto lab1; goto lab0; lab1: z->c = z->l - m1; if (!(eq_s_b(z, 1, s_25))) return 0; } lab0: { int ret = slice_del(z); /* delete, line 96 */ if (ret < 0) return ret; } break; } return 1; } static int r_Step_5a(struct SN_env * z) { z->ket = z->c; /* [, line 101 */ if (!(eq_s_b(z, 1, s_26))) return 0; z->bra = z->c; /* ], line 101 */ { int m1 = z->l - z->c; (void)m1; /* or, line 102 */ { int ret = r_R2(z); if (ret == 0) goto lab1; /* call R2, line 102 */ if (ret < 0) return ret; } goto lab0; lab1: z->c = z->l - m1; { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 102 */ if (ret < 0) return ret; } { int m2 = z->l - z->c; (void)m2; /* not, line 102 */ { int ret = r_shortv(z); if (ret == 0) goto lab2; /* call shortv, line 102 */ if (ret < 0) return ret; } return 0; lab2: z->c = z->l - m2; } } lab0: { int ret = slice_del(z); /* delete, line 103 */ if (ret < 0) return ret; } return 1; } static int r_Step_5b(struct SN_env * z) { z->ket = z->c; /* [, line 107 */ if (!(eq_s_b(z, 1, s_27))) return 0; z->bra = z->c; /* ], line 107 */ { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 108 */ if (ret < 0) return ret; } if (!(eq_s_b(z, 1, s_28))) return 0; { int ret = slice_del(z); /* delete, line 109 */ if (ret < 0) return ret; } return 1; } extern int porter_ISO_8859_1_stem(struct SN_env * z) { z->B[0] = 0; /* unset Y_found, line 115 */ { int c1 = z->c; /* do, line 116 */ z->bra = z->c; /* [, line 116 */ if (!(eq_s(z, 1, s_29))) goto lab0; z->ket = z->c; /* ], line 116 */ { int ret = slice_from_s(z, 1, s_30); /* <-, line 116 */ if (ret < 0) return ret; } z->B[0] = 1; /* set Y_found, line 116 */ lab0: z->c = c1; } { int c2 = z->c; /* do, line 117 */ while(1) { /* repeat, line 117 */ int c3 = z->c; while(1) { /* goto, line 117 */ int c4 = z->c; if (in_grouping(z, g_v, 97, 121, 0)) goto lab3; z->bra = z->c; /* [, line 117 */ if (!(eq_s(z, 1, s_31))) goto lab3; z->ket = z->c; /* ], line 117 */ z->c = c4; break; lab3: z->c = c4; if (z->c >= z->l) goto lab2; z->c++; /* goto, line 117 */ } { int ret = slice_from_s(z, 1, s_32); /* <-, line 117 */ if (ret < 0) return ret; } z->B[0] = 1; /* set Y_found, line 117 */ continue; lab2: z->c = c3; break; } z->c = c2; } z->I[0] = z->l; z->I[1] = z->l; { int c5 = z->c; /* do, line 121 */ { /* gopast */ /* grouping v, line 122 */ int ret = out_grouping(z, g_v, 97, 121, 1); if (ret < 0) goto lab4; z->c += ret; } { /* gopast */ /* non v, line 122 */ int ret = in_grouping(z, g_v, 97, 121, 1); if (ret < 0) goto lab4; z->c += ret; } z->I[0] = z->c; /* setmark p1, line 122 */ { /* gopast */ /* grouping v, line 123 */ int ret = out_grouping(z, g_v, 97, 121, 1); if (ret < 0) goto lab4; z->c += ret; } { /* gopast */ /* non v, line 123 */ int ret = in_grouping(z, g_v, 97, 121, 1); if (ret < 0) goto lab4; z->c += ret; } z->I[1] = z->c; /* setmark p2, line 123 */ lab4: z->c = c5; } z->lb = z->c; z->c = z->l; /* backwards, line 126 */ { int m6 = z->l - z->c; (void)m6; /* do, line 127 */ { int ret = r_Step_1a(z); if (ret == 0) goto lab5; /* call Step_1a, line 127 */ if (ret < 0) return ret; } lab5: z->c = z->l - m6; } { int m7 = z->l - z->c; (void)m7; /* do, line 128 */ { int ret = r_Step_1b(z); if (ret == 0) goto lab6; /* call Step_1b, line 128 */ if (ret < 0) return ret; } lab6: z->c = z->l - m7; } { int m8 = z->l - z->c; (void)m8; /* do, line 129 */ { int ret = r_Step_1c(z); if (ret == 0) goto lab7; /* call Step_1c, line 129 */ if (ret < 0) return ret; } lab7: z->c = z->l - m8; } { int m9 = z->l - z->c; (void)m9; /* do, line 130 */ { int ret = r_Step_2(z); if (ret == 0) goto lab8; /* call Step_2, line 130 */ if (ret < 0) return ret; } lab8: z->c = z->l - m9; } { int m10 = z->l - z->c; (void)m10; /* do, line 131 */ { int ret = r_Step_3(z); if (ret == 0) goto lab9; /* call Step_3, line 131 */ if (ret < 0) return ret; } lab9: z->c = z->l - m10; } { int m11 = z->l - z->c; (void)m11; /* do, line 132 */ { int ret = r_Step_4(z); if (ret == 0) goto lab10; /* call Step_4, line 132 */ if (ret < 0) return ret; } lab10: z->c = z->l - m11; } { int m12 = z->l - z->c; (void)m12; /* do, line 133 */ { int ret = r_Step_5a(z); if (ret == 0) goto lab11; /* call Step_5a, line 133 */ if (ret < 0) return ret; } lab11: z->c = z->l - m12; } { int m13 = z->l - z->c; (void)m13; /* do, line 134 */ { int ret = r_Step_5b(z); if (ret == 0) goto lab12; /* call Step_5b, line 134 */ if (ret < 0) return ret; } lab12: z->c = z->l - m13; } z->c = z->lb; { int c14 = z->c; /* do, line 137 */ if (!(z->B[0])) goto lab13; /* Boolean test Y_found, line 137 */ while(1) { /* repeat, line 137 */ int c15 = z->c; while(1) { /* goto, line 137 */ int c16 = z->c; z->bra = z->c; /* [, line 137 */ if (!(eq_s(z, 1, s_33))) goto lab15; z->ket = z->c; /* ], line 137 */ z->c = c16; break; lab15: z->c = c16; if (z->c >= z->l) goto lab14; z->c++; /* goto, line 137 */ } { int ret = slice_from_s(z, 1, s_34); /* <-, line 137 */ if (ret < 0) return ret; } continue; lab14: z->c = c15; break; } lab13: z->c = c14; } return 1; } extern struct SN_env * porter_ISO_8859_1_create_env(void) { return SN_create_env(0, 2, 1); } extern void porter_ISO_8859_1_close_env(struct SN_env * z) { SN_close_env(z, 0); } swish-e-2.4.7/src/snowball/stem_fr.h0000664000077100017500000000050511166010110014233 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #ifdef __cplusplus extern "C" { #endif extern struct SN_env * french_ISO_8859_1_create_env(void); extern void french_ISO_8859_1_close_env(struct SN_env * z); extern int french_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif swish-e-2.4.7/src/snowball/stem_de.h0000664000077100017500000000050511166010110014214 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #ifdef __cplusplus extern "C" { #endif extern struct SN_env * german_ISO_8859_1_create_env(void); extern void german_ISO_8859_1_close_env(struct SN_env * z); extern int german_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif swish-e-2.4.7/src/snowball/stem_no.h0000664000077100017500000000051611166010110014242 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #ifdef __cplusplus extern "C" { #endif extern struct SN_env * norwegian_ISO_8859_1_create_env(void); extern void norwegian_ISO_8859_1_close_env(struct SN_env * z); extern int norwegian_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif swish-e-2.4.7/src/snowball/stem_pt.h0000664000077100017500000000052111166010110014245 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #ifdef __cplusplus extern "C" { #endif extern struct SN_env * portuguese_ISO_8859_1_create_env(void); extern void portuguese_ISO_8859_1_close_env(struct SN_env * z); extern int portuguese_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif swish-e-2.4.7/src/snowball/stem_se.c0000664000077100017500000002453511166010107014245 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #include "header.h" #ifdef __cplusplus extern "C" { #endif extern int swedish_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif static int r_other_suffix(struct SN_env * z); static int r_consonant_pair(struct SN_env * z); static int r_main_suffix(struct SN_env * z); static int r_mark_regions(struct SN_env * z); #ifdef __cplusplus extern "C" { #endif extern struct SN_env * swedish_ISO_8859_1_create_env(void); extern void swedish_ISO_8859_1_close_env(struct SN_env * z); #ifdef __cplusplus } #endif static const symbol s_0_0[1] = { 'a' }; static const symbol s_0_1[4] = { 'a', 'r', 'n', 'a' }; static const symbol s_0_2[4] = { 'e', 'r', 'n', 'a' }; static const symbol s_0_3[7] = { 'h', 'e', 't', 'e', 'r', 'n', 'a' }; static const symbol s_0_4[4] = { 'o', 'r', 'n', 'a' }; static const symbol s_0_5[2] = { 'a', 'd' }; static const symbol s_0_6[1] = { 'e' }; static const symbol s_0_7[3] = { 'a', 'd', 'e' }; static const symbol s_0_8[4] = { 'a', 'n', 'd', 'e' }; static const symbol s_0_9[4] = { 'a', 'r', 'n', 'e' }; static const symbol s_0_10[3] = { 'a', 'r', 'e' }; static const symbol s_0_11[4] = { 'a', 's', 't', 'e' }; static const symbol s_0_12[2] = { 'e', 'n' }; static const symbol s_0_13[5] = { 'a', 'n', 'd', 'e', 'n' }; static const symbol s_0_14[4] = { 'a', 'r', 'e', 'n' }; static const symbol s_0_15[5] = { 'h', 'e', 't', 'e', 'n' }; static const symbol s_0_16[3] = { 'e', 'r', 'n' }; static const symbol s_0_17[2] = { 'a', 'r' }; static const symbol s_0_18[2] = { 'e', 'r' }; static const symbol s_0_19[5] = { 'h', 'e', 't', 'e', 'r' }; static const symbol s_0_20[2] = { 'o', 'r' }; static const symbol s_0_21[1] = { 's' }; static const symbol s_0_22[2] = { 'a', 's' }; static const symbol s_0_23[5] = { 'a', 'r', 'n', 'a', 's' }; static const symbol s_0_24[5] = { 'e', 'r', 'n', 'a', 's' }; static const symbol s_0_25[5] = { 'o', 'r', 'n', 'a', 's' }; static const symbol s_0_26[2] = { 'e', 's' }; static const symbol s_0_27[4] = { 'a', 'd', 'e', 's' }; static const symbol s_0_28[5] = { 'a', 'n', 'd', 'e', 's' }; static const symbol s_0_29[3] = { 'e', 'n', 's' }; static const symbol s_0_30[5] = { 'a', 'r', 'e', 'n', 's' }; static const symbol s_0_31[6] = { 'h', 'e', 't', 'e', 'n', 's' }; static const symbol s_0_32[4] = { 'e', 'r', 'n', 's' }; static const symbol s_0_33[2] = { 'a', 't' }; static const symbol s_0_34[5] = { 'a', 'n', 'd', 'e', 't' }; static const symbol s_0_35[3] = { 'h', 'e', 't' }; static const symbol s_0_36[3] = { 'a', 's', 't' }; static const struct among a_0[37] = { /* 0 */ { 1, s_0_0, -1, 1, 0}, /* 1 */ { 4, s_0_1, 0, 1, 0}, /* 2 */ { 4, s_0_2, 0, 1, 0}, /* 3 */ { 7, s_0_3, 2, 1, 0}, /* 4 */ { 4, s_0_4, 0, 1, 0}, /* 5 */ { 2, s_0_5, -1, 1, 0}, /* 6 */ { 1, s_0_6, -1, 1, 0}, /* 7 */ { 3, s_0_7, 6, 1, 0}, /* 8 */ { 4, s_0_8, 6, 1, 0}, /* 9 */ { 4, s_0_9, 6, 1, 0}, /* 10 */ { 3, s_0_10, 6, 1, 0}, /* 11 */ { 4, s_0_11, 6, 1, 0}, /* 12 */ { 2, s_0_12, -1, 1, 0}, /* 13 */ { 5, s_0_13, 12, 1, 0}, /* 14 */ { 4, s_0_14, 12, 1, 0}, /* 15 */ { 5, s_0_15, 12, 1, 0}, /* 16 */ { 3, s_0_16, -1, 1, 0}, /* 17 */ { 2, s_0_17, -1, 1, 0}, /* 18 */ { 2, s_0_18, -1, 1, 0}, /* 19 */ { 5, s_0_19, 18, 1, 0}, /* 20 */ { 2, s_0_20, -1, 1, 0}, /* 21 */ { 1, s_0_21, -1, 2, 0}, /* 22 */ { 2, s_0_22, 21, 1, 0}, /* 23 */ { 5, s_0_23, 22, 1, 0}, /* 24 */ { 5, s_0_24, 22, 1, 0}, /* 25 */ { 5, s_0_25, 22, 1, 0}, /* 26 */ { 2, s_0_26, 21, 1, 0}, /* 27 */ { 4, s_0_27, 26, 1, 0}, /* 28 */ { 5, s_0_28, 26, 1, 0}, /* 29 */ { 3, s_0_29, 21, 1, 0}, /* 30 */ { 5, s_0_30, 29, 1, 0}, /* 31 */ { 6, s_0_31, 29, 1, 0}, /* 32 */ { 4, s_0_32, 21, 1, 0}, /* 33 */ { 2, s_0_33, -1, 1, 0}, /* 34 */ { 5, s_0_34, -1, 1, 0}, /* 35 */ { 3, s_0_35, -1, 1, 0}, /* 36 */ { 3, s_0_36, -1, 1, 0} }; static const symbol s_1_0[2] = { 'd', 'd' }; static const symbol s_1_1[2] = { 'g', 'd' }; static const symbol s_1_2[2] = { 'n', 'n' }; static const symbol s_1_3[2] = { 'd', 't' }; static const symbol s_1_4[2] = { 'g', 't' }; static const symbol s_1_5[2] = { 'k', 't' }; static const symbol s_1_6[2] = { 't', 't' }; static const struct among a_1[7] = { /* 0 */ { 2, s_1_0, -1, -1, 0}, /* 1 */ { 2, s_1_1, -1, -1, 0}, /* 2 */ { 2, s_1_2, -1, -1, 0}, /* 3 */ { 2, s_1_3, -1, -1, 0}, /* 4 */ { 2, s_1_4, -1, -1, 0}, /* 5 */ { 2, s_1_5, -1, -1, 0}, /* 6 */ { 2, s_1_6, -1, -1, 0} }; static const symbol s_2_0[2] = { 'i', 'g' }; static const symbol s_2_1[3] = { 'l', 'i', 'g' }; static const symbol s_2_2[3] = { 'e', 'l', 's' }; static const symbol s_2_3[5] = { 'f', 'u', 'l', 'l', 't' }; static const symbol s_2_4[4] = { 'l', 0xF6, 's', 't' }; static const struct among a_2[5] = { /* 0 */ { 2, s_2_0, -1, 1, 0}, /* 1 */ { 3, s_2_1, 0, 1, 0}, /* 2 */ { 3, s_2_2, -1, 1, 0}, /* 3 */ { 5, s_2_3, -1, 3, 0}, /* 4 */ { 4, s_2_4, -1, 2, 0} }; static const unsigned char g_v[] = { 17, 65, 16, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 0, 32 }; static const unsigned char g_s_ending[] = { 119, 127, 149 }; static const symbol s_0[] = { 'l', 0xF6, 's' }; static const symbol s_1[] = { 'f', 'u', 'l', 'l' }; static int r_mark_regions(struct SN_env * z) { z->I[0] = z->l; { int c_test = z->c; /* test, line 29 */ { int ret = z->c + 3; if (0 > ret || ret > z->l) return 0; z->c = ret; /* hop, line 29 */ } z->I[1] = z->c; /* setmark x, line 29 */ z->c = c_test; } if (out_grouping(z, g_v, 97, 246, 1) < 0) return 0; /* goto */ /* grouping v, line 30 */ { /* gopast */ /* non v, line 30 */ int ret = in_grouping(z, g_v, 97, 246, 1); if (ret < 0) return 0; z->c += ret; } z->I[0] = z->c; /* setmark p1, line 30 */ /* try, line 31 */ if (!(z->I[0] < z->I[1])) goto lab0; z->I[0] = z->I[1]; lab0: return 1; } static int r_main_suffix(struct SN_env * z) { int among_var; { int mlimit; /* setlimit, line 37 */ int m1 = z->l - z->c; (void)m1; if (z->c < z->I[0]) return 0; z->c = z->I[0]; /* tomark, line 37 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m1; z->ket = z->c; /* [, line 37 */ if (z->c <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((1851442 >> (z->p[z->c - 1] & 0x1f)) & 1)) { z->lb = mlimit; return 0; } among_var = find_among_b(z, a_0, 37); /* substring, line 37 */ if (!(among_var)) { z->lb = mlimit; return 0; } z->bra = z->c; /* ], line 37 */ z->lb = mlimit; } switch(among_var) { case 0: return 0; case 1: { int ret = slice_del(z); /* delete, line 44 */ if (ret < 0) return ret; } break; case 2: if (in_grouping_b(z, g_s_ending, 98, 121, 0)) return 0; { int ret = slice_del(z); /* delete, line 46 */ if (ret < 0) return ret; } break; } return 1; } static int r_consonant_pair(struct SN_env * z) { { int mlimit; /* setlimit, line 50 */ int m1 = z->l - z->c; (void)m1; if (z->c < z->I[0]) return 0; z->c = z->I[0]; /* tomark, line 50 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m1; { int m2 = z->l - z->c; (void)m2; /* and, line 52 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((1064976 >> (z->p[z->c - 1] & 0x1f)) & 1)) { z->lb = mlimit; return 0; } if (!(find_among_b(z, a_1, 7))) { z->lb = mlimit; return 0; } /* among, line 51 */ z->c = z->l - m2; z->ket = z->c; /* [, line 52 */ if (z->c <= z->lb) { z->lb = mlimit; return 0; } z->c--; /* next, line 52 */ z->bra = z->c; /* ], line 52 */ { int ret = slice_del(z); /* delete, line 52 */ if (ret < 0) return ret; } } z->lb = mlimit; } return 1; } static int r_other_suffix(struct SN_env * z) { int among_var; { int mlimit; /* setlimit, line 55 */ int m1 = z->l - z->c; (void)m1; if (z->c < z->I[0]) return 0; z->c = z->I[0]; /* tomark, line 55 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m1; z->ket = z->c; /* [, line 56 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((1572992 >> (z->p[z->c - 1] & 0x1f)) & 1)) { z->lb = mlimit; return 0; } among_var = find_among_b(z, a_2, 5); /* substring, line 56 */ if (!(among_var)) { z->lb = mlimit; return 0; } z->bra = z->c; /* ], line 56 */ switch(among_var) { case 0: { z->lb = mlimit; return 0; } case 1: { int ret = slice_del(z); /* delete, line 57 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 3, s_0); /* <-, line 58 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_from_s(z, 4, s_1); /* <-, line 59 */ if (ret < 0) return ret; } break; } z->lb = mlimit; } return 1; } extern int swedish_ISO_8859_1_stem(struct SN_env * z) { { int c1 = z->c; /* do, line 66 */ { int ret = r_mark_regions(z); if (ret == 0) goto lab0; /* call mark_regions, line 66 */ if (ret < 0) return ret; } lab0: z->c = c1; } z->lb = z->c; z->c = z->l; /* backwards, line 67 */ { int m2 = z->l - z->c; (void)m2; /* do, line 68 */ { int ret = r_main_suffix(z); if (ret == 0) goto lab1; /* call main_suffix, line 68 */ if (ret < 0) return ret; } lab1: z->c = z->l - m2; } { int m3 = z->l - z->c; (void)m3; /* do, line 69 */ { int ret = r_consonant_pair(z); if (ret == 0) goto lab2; /* call consonant_pair, line 69 */ if (ret < 0) return ret; } lab2: z->c = z->l - m3; } { int m4 = z->l - z->c; (void)m4; /* do, line 70 */ { int ret = r_other_suffix(z); if (ret == 0) goto lab3; /* call other_suffix, line 70 */ if (ret < 0) return ret; } lab3: z->c = z->l - m4; } z->c = z->lb; return 1; } extern struct SN_env * swedish_ISO_8859_1_create_env(void) { return SN_create_env(0, 2, 0); } extern void swedish_ISO_8859_1_close_env(struct SN_env * z) { SN_close_env(z, 0); } swish-e-2.4.7/src/snowball/stem_ro.h0000664000077100017500000000051311166010110014243 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #ifdef __cplusplus extern "C" { #endif extern struct SN_env * romanian_ISO_8859_2_create_env(void); extern void romanian_ISO_8859_2_close_env(struct SN_env * z); extern int romanian_ISO_8859_2_stem(struct SN_env * z); #ifdef __cplusplus } #endif swish-e-2.4.7/src/snowball/stem_it.c0000664000077100017500000011607211166010107014250 00000000000000 /* This file was generated automatically by the Snowball to ANSI C compiler */ #include "header.h" #ifdef __cplusplus extern "C" { #endif extern int italian_ISO_8859_1_stem(struct SN_env * z); #ifdef __cplusplus } #endif static int r_vowel_suffix(struct SN_env * z); static int r_verb_suffix(struct SN_env * z); static int r_standard_suffix(struct SN_env * z); static int r_attached_pronoun(struct SN_env * z); static int r_R2(struct SN_env * z); static int r_R1(struct SN_env * z); static int r_RV(struct SN_env * z); static int r_mark_regions(struct SN_env * z); static int r_postlude(struct SN_env * z); static int r_prelude(struct SN_env * z); #ifdef __cplusplus extern "C" { #endif extern struct SN_env * italian_ISO_8859_1_create_env(void); extern void italian_ISO_8859_1_close_env(struct SN_env * z); #ifdef __cplusplus } #endif static const symbol s_0_1[2] = { 'q', 'u' }; static const symbol s_0_2[1] = { 0xE1 }; static const symbol s_0_3[1] = { 0xE9 }; static const symbol s_0_4[1] = { 0xED }; static const symbol s_0_5[1] = { 0xF3 }; static const symbol s_0_6[1] = { 0xFA }; static const struct among a_0[7] = { /* 0 */ { 0, 0, -1, 7, 0}, /* 1 */ { 2, s_0_1, 0, 6, 0}, /* 2 */ { 1, s_0_2, 0, 1, 0}, /* 3 */ { 1, s_0_3, 0, 2, 0}, /* 4 */ { 1, s_0_4, 0, 3, 0}, /* 5 */ { 1, s_0_5, 0, 4, 0}, /* 6 */ { 1, s_0_6, 0, 5, 0} }; static const symbol s_1_1[1] = { 'I' }; static const symbol s_1_2[1] = { 'U' }; static const struct among a_1[3] = { /* 0 */ { 0, 0, -1, 3, 0}, /* 1 */ { 1, s_1_1, 0, 1, 0}, /* 2 */ { 1, s_1_2, 0, 2, 0} }; static const symbol s_2_0[2] = { 'l', 'a' }; static const symbol s_2_1[4] = { 'c', 'e', 'l', 'a' }; static const symbol s_2_2[6] = { 'g', 'l', 'i', 'e', 'l', 'a' }; static const symbol s_2_3[4] = { 'm', 'e', 'l', 'a' }; static const symbol s_2_4[4] = { 't', 'e', 'l', 'a' }; static const symbol s_2_5[4] = { 'v', 'e', 'l', 'a' }; static const symbol s_2_6[2] = { 'l', 'e' }; static const symbol s_2_7[4] = { 'c', 'e', 'l', 'e' }; static const symbol s_2_8[6] = { 'g', 'l', 'i', 'e', 'l', 'e' }; static const symbol s_2_9[4] = { 'm', 'e', 'l', 'e' }; static const symbol s_2_10[4] = { 't', 'e', 'l', 'e' }; static const symbol s_2_11[4] = { 'v', 'e', 'l', 'e' }; static const symbol s_2_12[2] = { 'n', 'e' }; static const symbol s_2_13[4] = { 'c', 'e', 'n', 'e' }; static const symbol s_2_14[6] = { 'g', 'l', 'i', 'e', 'n', 'e' }; static const symbol s_2_15[4] = { 'm', 'e', 'n', 'e' }; static const symbol s_2_16[4] = { 's', 'e', 'n', 'e' }; static const symbol s_2_17[4] = { 't', 'e', 'n', 'e' }; static const symbol s_2_18[4] = { 'v', 'e', 'n', 'e' }; static const symbol s_2_19[2] = { 'c', 'i' }; static const symbol s_2_20[2] = { 'l', 'i' }; static const symbol s_2_21[4] = { 'c', 'e', 'l', 'i' }; static const symbol s_2_22[6] = { 'g', 'l', 'i', 'e', 'l', 'i' }; static const symbol s_2_23[4] = { 'm', 'e', 'l', 'i' }; static const symbol s_2_24[4] = { 't', 'e', 'l', 'i' }; static const symbol s_2_25[4] = { 'v', 'e', 'l', 'i' }; static const symbol s_2_26[3] = { 'g', 'l', 'i' }; static const symbol s_2_27[2] = { 'm', 'i' }; static const symbol s_2_28[2] = { 's', 'i' }; static const symbol s_2_29[2] = { 't', 'i' }; static const symbol s_2_30[2] = { 'v', 'i' }; static const symbol s_2_31[2] = { 'l', 'o' }; static const symbol s_2_32[4] = { 'c', 'e', 'l', 'o' }; static const symbol s_2_33[6] = { 'g', 'l', 'i', 'e', 'l', 'o' }; static const symbol s_2_34[4] = { 'm', 'e', 'l', 'o' }; static const symbol s_2_35[4] = { 't', 'e', 'l', 'o' }; static const symbol s_2_36[4] = { 'v', 'e', 'l', 'o' }; static const struct among a_2[37] = { /* 0 */ { 2, s_2_0, -1, -1, 0}, /* 1 */ { 4, s_2_1, 0, -1, 0}, /* 2 */ { 6, s_2_2, 0, -1, 0}, /* 3 */ { 4, s_2_3, 0, -1, 0}, /* 4 */ { 4, s_2_4, 0, -1, 0}, /* 5 */ { 4, s_2_5, 0, -1, 0}, /* 6 */ { 2, s_2_6, -1, -1, 0}, /* 7 */ { 4, s_2_7, 6, -1, 0}, /* 8 */ { 6, s_2_8, 6, -1, 0}, /* 9 */ { 4, s_2_9, 6, -1, 0}, /* 10 */ { 4, s_2_10, 6, -1, 0}, /* 11 */ { 4, s_2_11, 6, -1, 0}, /* 12 */ { 2, s_2_12, -1, -1, 0}, /* 13 */ { 4, s_2_13, 12, -1, 0}, /* 14 */ { 6, s_2_14, 12, -1, 0}, /* 15 */ { 4, s_2_15, 12, -1, 0}, /* 16 */ { 4, s_2_16, 12, -1, 0}, /* 17 */ { 4, s_2_17, 12, -1, 0}, /* 18 */ { 4, s_2_18, 12, -1, 0}, /* 19 */ { 2, s_2_19, -1, -1, 0}, /* 20 */ { 2, s_2_20, -1, -1, 0}, /* 21 */ { 4, s_2_21, 20, -1, 0}, /* 22 */ { 6, s_2_22, 20, -1, 0}, /* 23 */ { 4, s_2_23, 20, -1, 0}, /* 24 */ { 4, s_2_24, 20, -1, 0}, /* 25 */ { 4, s_2_25, 20, -1, 0}, /* 26 */ { 3, s_2_26, 20, -1, 0}, /* 27 */ { 2, s_2_27, -1, -1, 0}, /* 28 */ { 2, s_2_28, -1, -1, 0}, /* 29 */ { 2, s_2_29, -1, -1, 0}, /* 30 */ { 2, s_2_30, -1, -1, 0}, /* 31 */ { 2, s_2_31, -1, -1, 0}, /* 32 */ { 4, s_2_32, 31, -1, 0}, /* 33 */ { 6, s_2_33, 31, -1, 0}, /* 34 */ { 4, s_2_34, 31, -1, 0}, /* 35 */ { 4, s_2_35, 31, -1, 0}, /* 36 */ { 4, s_2_36, 31, -1, 0} }; static const symbol s_3_0[4] = { 'a', 'n', 'd', 'o' }; static const symbol s_3_1[4] = { 'e', 'n', 'd', 'o' }; static const symbol s_3_2[2] = { 'a', 'r' }; static const symbol s_3_3[2] = { 'e', 'r' }; static const symbol s_3_4[2] = { 'i', 'r' }; static const struct among a_3[5] = { /* 0 */ { 4, s_3_0, -1, 1, 0}, /* 1 */ { 4, s_3_1, -1, 1, 0}, /* 2 */ { 2, s_3_2, -1, 2, 0}, /* 3 */ { 2, s_3_3, -1, 2, 0}, /* 4 */ { 2, s_3_4, -1, 2, 0} }; static const symbol s_4_0[2] = { 'i', 'c' }; static const symbol s_4_1[4] = { 'a', 'b', 'i', 'l' }; static const symbol s_4_2[2] = { 'o', 's' }; static const symbol s_4_3[2] = { 'i', 'v' }; static const struct among a_4[4] = { /* 0 */ { 2, s_4_0, -1, -1, 0}, /* 1 */ { 4, s_4_1, -1, -1, 0}, /* 2 */ { 2, s_4_2, -1, -1, 0}, /* 3 */ { 2, s_4_3, -1, 1, 0} }; static const symbol s_5_0[2] = { 'i', 'c' }; static const symbol s_5_1[4] = { 'a', 'b', 'i', 'l' }; static const symbol s_5_2[2] = { 'i', 'v' }; static const struct among a_5[3] = { /* 0 */ { 2, s_5_0, -1, 1, 0}, /* 1 */ { 4, s_5_1, -1, 1, 0}, /* 2 */ { 2, s_5_2, -1, 1, 0} }; static const symbol s_6_0[3] = { 'i', 'c', 'a' }; static const symbol s_6_1[5] = { 'l', 'o', 'g', 'i', 'a' }; static const symbol s_6_2[3] = { 'o', 's', 'a' }; static const symbol s_6_3[4] = { 'i', 's', 't', 'a' }; static const symbol s_6_4[3] = { 'i', 'v', 'a' }; static const symbol s_6_5[4] = { 'a', 'n', 'z', 'a' }; static const symbol s_6_6[4] = { 'e', 'n', 'z', 'a' }; static const symbol s_6_7[3] = { 'i', 'c', 'e' }; static const symbol s_6_8[6] = { 'a', 't', 'r', 'i', 'c', 'e' }; static const symbol s_6_9[4] = { 'i', 'c', 'h', 'e' }; static const symbol s_6_10[5] = { 'l', 'o', 'g', 'i', 'e' }; static const symbol s_6_11[5] = { 'a', 'b', 'i', 'l', 'e' }; static const symbol s_6_12[5] = { 'i', 'b', 'i', 'l', 'e' }; static const symbol s_6_13[6] = { 'u', 's', 'i', 'o', 'n', 'e' }; static const symbol s_6_14[6] = { 'a', 'z', 'i', 'o', 'n', 'e' }; static const symbol s_6_15[6] = { 'u', 'z', 'i', 'o', 'n', 'e' }; static const symbol s_6_16[5] = { 'a', 't', 'o', 'r', 'e' }; static const symbol s_6_17[3] = { 'o', 's', 'e' }; static const symbol s_6_18[4] = { 'a', 'n', 't', 'e' }; static const symbol s_6_19[5] = { 'm', 'e', 'n', 't', 'e' }; static const symbol s_6_20[6] = { 'a', 'm', 'e', 'n', 't', 'e' }; static const symbol s_6_21[4] = { 'i', 's', 't', 'e' }; static const symbol s_6_22[3] = { 'i', 'v', 'e' }; static const symbol s_6_23[4] = { 'a', 'n', 'z', 'e' }; static const symbol s_6_24[4] = { 'e', 'n', 'z', 'e' }; static const symbol s_6_25[3] = { 'i', 'c', 'i' }; static const symbol s_6_26[6] = { 'a', 't', 'r', 'i', 'c', 'i' }; static const symbol s_6_27[4] = { 'i', 'c', 'h', 'i' }; static const symbol s_6_28[5] = { 'a', 'b', 'i', 'l', 'i' }; static const symbol s_6_29[5] = { 'i', 'b', 'i', 'l', 'i' }; static const symbol s_6_30[4] = { 'i', 's', 'm', 'i' }; static const symbol s_6_31[6] = { 'u', 's', 'i', 'o', 'n', 'i' }; static const symbol s_6_32[6] = { 'a', 'z', 'i', 'o', 'n', 'i' }; static const symbol s_6_33[6] = { 'u', 'z', 'i', 'o', 'n', 'i' }; static const symbol s_6_34[5] = { 'a', 't', 'o', 'r', 'i' }; static const symbol s_6_35[3] = { 'o', 's', 'i' }; static const symbol s_6_36[4] = { 'a', 'n', 't', 'i' }; static const symbol s_6_37[6] = { 'a', 'm', 'e', 'n', 't', 'i' }; static const symbol s_6_38[6] = { 'i', 'm', 'e', 'n', 't', 'i' }; static const symbol s_6_39[4] = { 'i', 's', 't', 'i' }; static const symbol s_6_40[3] = { 'i', 'v', 'i' }; static const symbol s_6_41[3] = { 'i', 'c', 'o' }; static const symbol s_6_42[4] = { 'i', 's', 'm', 'o' }; static const symbol s_6_43[3] = { 'o', 's', 'o' }; static const symbol s_6_44[6] = { 'a', 'm', 'e', 'n', 't', 'o' }; static const symbol s_6_45[6] = { 'i', 'm', 'e', 'n', 't', 'o' }; static const symbol s_6_46[3] = { 'i', 'v', 'o' }; static const symbol s_6_47[3] = { 'i', 't', 0xE0 }; static const symbol s_6_48[4] = { 'i', 's', 't', 0xE0 }; static const symbol s_6_49[4] = { 'i', 's', 't', 0xE8 }; static const symbol s_6_50[4] = { 'i', 's', 't', 0xEC }; static const struct among a_6[51] = { /* 0 */ { 3, s_6_0, -1, 1, 0}, /* 1 */ { 5, s_6_1, -1, 3, 0}, /* 2 */ { 3, s_6_2, -1, 1, 0}, /* 3 */ { 4, s_6_3, -1, 1, 0}, /* 4 */ { 3, s_6_4, -1, 9, 0}, /* 5 */ { 4, s_6_5, -1, 1, 0}, /* 6 */ { 4, s_6_6, -1, 5, 0}, /* 7 */ { 3, s_6_7, -1, 1, 0}, /* 8 */ { 6, s_6_8, 7, 1, 0}, /* 9 */ { 4, s_6_9, -1, 1, 0}, /* 10 */ { 5, s_6_10, -1, 3, 0}, /* 11 */ { 5, s_6_11, -1, 1, 0}, /* 12 */ { 5, s_6_12, -1, 1, 0}, /* 13 */ { 6, s_6_13, -1, 4, 0}, /* 14 */ { 6, s_6_14, -1, 2, 0}, /* 15 */ { 6, s_6_15, -1, 4, 0}, /* 16 */ { 5, s_6_16, -1, 2, 0}, /* 17 */ { 3, s_6_17, -1, 1, 0}, /* 18 */ { 4, s_6_18, -1, 1, 0}, /* 19 */ { 5, s_6_19, -1, 1, 0}, /* 20 */ { 6, s_6_20, 19, 7, 0}, /* 21 */ { 4, s_6_21, -1, 1, 0}, /* 22 */ { 3, s_6_22, -1, 9, 0}, /* 23 */ { 4, s_6_23, -1, 1, 0}, /* 24 */ { 4, s_6_24, -1, 5, 0}, /* 25 */ { 3, s_6_25, -1, 1, 0}, /* 26 */ { 6, s_6_26, 25, 1, 0}, /* 27 */ { 4, s_6_27, -1, 1, 0}, /* 28 */ { 5, s_6_28, -1, 1, 0}, /* 29 */ { 5, s_6_29, -1, 1, 0}, /* 30 */ { 4, s_6_30, -1, 1, 0}, /* 31 */ { 6, s_6_31, -1, 4, 0}, /* 32 */ { 6, s_6_32, -1, 2, 0}, /* 33 */ { 6, s_6_33, -1, 4, 0}, /* 34 */ { 5, s_6_34, -1, 2, 0}, /* 35 */ { 3, s_6_35, -1, 1, 0}, /* 36 */ { 4, s_6_36, -1, 1, 0}, /* 37 */ { 6, s_6_37, -1, 6, 0}, /* 38 */ { 6, s_6_38, -1, 6, 0}, /* 39 */ { 4, s_6_39, -1, 1, 0}, /* 40 */ { 3, s_6_40, -1, 9, 0}, /* 41 */ { 3, s_6_41, -1, 1, 0}, /* 42 */ { 4, s_6_42, -1, 1, 0}, /* 43 */ { 3, s_6_43, -1, 1, 0}, /* 44 */ { 6, s_6_44, -1, 6, 0}, /* 45 */ { 6, s_6_45, -1, 6, 0}, /* 46 */ { 3, s_6_46, -1, 9, 0}, /* 47 */ { 3, s_6_47, -1, 8, 0}, /* 48 */ { 4, s_6_48, -1, 1, 0}, /* 49 */ { 4, s_6_49, -1, 1, 0}, /* 50 */ { 4, s_6_50, -1, 1, 0} }; static const symbol s_7_0[4] = { 'i', 's', 'c', 'a' }; static const symbol s_7_1[4] = { 'e', 'n', 'd', 'a' }; static const symbol s_7_2[3] = { 'a', 't', 'a' }; static const symbol s_7_3[3] = { 'i', 't', 'a' }; static const symbol s_7_4[3] = { 'u', 't', 'a' }; static const symbol s_7_5[3] = { 'a', 'v', 'a' }; static const symbol s_7_6[3] = { 'e', 'v', 'a' }; static const symbol s_7_7[3] = { 'i', 'v', 'a' }; static const symbol s_7_8[6] = { 'e', 'r', 'e', 'b', 'b', 'e' }; static const symbol s_7_9[6] = { 'i', 'r', 'e', 'b', 'b', 'e' }; static const symbol s_7_10[4] = { 'i', 's', 'c', 'e' }; static const symbol s_7_11[4] = { 'e', 'n', 'd', 'e' }; static const symbol s_7_12[3] = { 'a', 'r', 'e' }; static const symbol s_7_13[3] = { 'e', 'r', 'e' }; static const symbol s_7_14[3] = { 'i', 'r', 'e' }; static const symbol s_7_15[4] = { 'a', 's', 's', 'e' }; static const symbol s_7_16[3] = { 'a', 't', 'e' }; static const symbol s_7_17[5] = { 'a', 'v', 'a', 't', 'e' }; static const symbol s_7_18[5] = { 'e', 'v', 'a', 't', 'e' }; static const symbol s_7_19[5] = { 'i', 'v', 'a', 't', 'e' }; static const symbol s_7_20[3] = { 'e', 't', 'e' }; static const symbol s_7_21[5] = { 'e', 'r', 'e', 't', 'e' }; static const symbol s_7_22[5] = { 'i', 'r', 'e', 't', 'e' }; static const symbol s_7_23[3] = { 'i', 't', 'e' }; static const symbol s_7_24[6] = { 'e', 'r', 'e', 's', 't', 'e' }; static const symbol s_7_25[6] = { 'i', 'r', 'e', 's', 't', 'e' }; static const symbol s_7_26[3] = { 'u', 't', 'e' }; static const symbol s_7_27[4] = { 'e', 'r', 'a', 'i' }; static const symbol s_7_28[4] = { 'i', 'r', 'a', 'i' }; static const symbol s_7_29[4] = { 'i', 's', 'c', 'i' }; static const symbol s_7_30[4] = { 'e', 'n', 'd', 'i' }; static const symbol s_7_31[4] = { 'e', 'r', 'e', 'i' }; static const symbol s_7_32[4] = { 'i', 'r', 'e', 'i' }; static const symbol s_7_33[4] = { 'a', 's', 's', 'i' }; static const symbol s_7_34[3] = { 'a', 't', 'i' }; static const symbol s_7_35[3] = { 'i', 't', 'i' }; static const symbol s_7_36[6] = { 'e', 'r', 'e', 's', 't', 'i' }; static const symbol s_7_37[6] = { 'i', 'r', 'e', 's', 't', 'i' }; static const symbol s_7_38[3] = { 'u', 't', 'i' }; static const symbol s_7_39[3] = { 'a', 'v', 'i' }; static const symbol s_7_40[3] = { 'e', 'v', 'i' }; static const symbol s_7_41[3] = { 'i', 'v', 'i' }; static const symbol s_7_42[4] = { 'i', 's', 'c', 'o' }; static const symbol s_7_43[4] = { 'a', 'n', 'd', 'o' }; static const symbol s_7_44[4] = { 'e', 'n', 'd', 'o' }; static const symbol s_7_45[4] = { 'Y', 'a', 'm', 'o' }; static const symbol s_7_46[4] = { 'i', 'a', 'm', 'o' }; static const symbol s_7_47[5] = { 'a', 'v', 'a', 'm', 'o' }; static const symbol s_7_48[5] = { 'e', 'v', 'a', 'm', 'o' }; static const symbol s_7_49[5] = { 'i', 'v', 'a', 'm', 'o' }; static const symbol s_7_50[5] = { 'e', 'r', 'e', 'm', 'o' }; static const symbol s_7_51[5] = { 'i', 'r', 'e', 'm', 'o' }; static const symbol s_7_52[6] = { 'a', 's', 's', 'i', 'm', 'o' }; static const symbol s_7_53[4] = { 'a', 'm', 'm', 'o' }; static const symbol s_7_54[4] = { 'e', 'm', 'm', 'o' }; static const symbol s_7_55[6] = { 'e', 'r', 'e', 'm', 'm', 'o' }; static const symbol s_7_56[6] = { 'i', 'r', 'e', 'm', 'm', 'o' }; static const symbol s_7_57[4] = { 'i', 'm', 'm', 'o' }; static const symbol s_7_58[3] = { 'a', 'n', 'o' }; static const symbol s_7_59[6] = { 'i', 's', 'c', 'a', 'n', 'o' }; static const symbol s_7_60[5] = { 'a', 'v', 'a', 'n', 'o' }; static const symbol s_7_61[5] = { 'e', 'v', 'a', 'n', 'o' }; static const symbol s_7_62[5] = { 'i', 'v', 'a', 'n', 'o' }; static const symbol s_7_63[6] = { 'e', 'r', 'a', 'n', 'n', 'o' }; static const symbol s_7_64[6] = { 'i', 'r', 'a', 'n', 'n', 'o' }; static const symbol s_7_65[3] = { 'o', 'n', 'o' }; static const symbol s_7_66[6] = { 'i', 's', 'c', 'o', 'n', 'o' }; static const symbol s_7_67[5] = { 'a', 'r', 'o', 'n', 'o' }; static const symbol s_7_68[5] = { 'e', 'r', 'o', 'n', 'o' }; static const symbol s_7_69[5] = { 'i', 'r', 'o', 'n', 'o' }; static const symbol s_7_70[8] = { 'e', 'r', 'e', 'b', 'b', 'e', 'r', 'o' }; static const symbol s_7_71[8] = { 'i', 'r', 'e', 'b', 'b', 'e', 'r', 'o' }; static const symbol s_7_72[6] = { 'a', 's', 's', 'e', 'r', 'o' }; static const symbol s_7_73[6] = { 'e', 's', 's', 'e', 'r', 'o' }; static const symbol s_7_74[6] = { 'i', 's', 's', 'e', 'r', 'o' }; static const symbol s_7_75[3] = { 'a', 't', 'o' }; static const symbol s_7_76[3] = { 'i', 't', 'o' }; static const symbol s_7_77[3] = { 'u', 't', 'o' }; static const symbol s_7_78[3] = { 'a', 'v', 'o' }; static const symbol s_7_79[3] = { 'e', 'v', 'o' }; static const symbol s_7_80[3] = { 'i', 'v', 'o' }; static const symbol s_7_81[2] = { 'a', 'r' }; static const symbol s_7_82[2] = { 'i', 'r' }; static const symbol s_7_83[3] = { 'e', 'r', 0xE0 }; static const symbol s_7_84[3] = { 'i', 'r', 0xE0 }; static const symbol s_7_85[3] = { 'e', 'r', 0xF2 }; static const symbol s_7_86[3] = { 'i', 'r', 0xF2 }; static const struct among a_7[87] = { /* 0 */ { 4, s_7_0, -1, 1, 0}, /* 1 */ { 4, s_7_1, -1, 1, 0}, /* 2 */ { 3, s_7_2, -1, 1, 0}, /* 3 */ { 3, s_7_3, -1, 1, 0}, /* 4 */ { 3, s_7_4, -1, 1, 0}, /* 5 */ { 3, s_7_5, -1, 1, 0}, /* 6 */ { 3, s_7_6, -1, 1, 0}, /* 7 */ { 3, s_7_7, -1, 1, 0}, /* 8 */ { 6, s_7_8, -1, 1, 0}, /* 9 */ { 6, s_7_9, -1, 1, 0}, /* 10 */ { 4, s_7_10, -1, 1, 0}, /* 11 */ { 4, s_7_11, -1, 1, 0}, /* 12 */ { 3, s_7_12, -1, 1, 0}, /* 13 */ { 3, s_7_13, -1, 1, 0}, /* 14 */ { 3, s_7_14, -1, 1, 0}, /* 15 */ { 4, s_7_15, -1, 1, 0}, /* 16 */ { 3, s_7_16, -1, 1, 0}, /* 17 */ { 5, s_7_17, 16, 1, 0}, /* 18 */ { 5, s_7_18, 16, 1, 0}, /* 19 */ { 5, s_7_19, 16, 1, 0}, /* 20 */ { 3, s_7_20, -1, 1, 0}, /* 21 */ { 5, s_7_21, 20, 1, 0}, /* 22 */ { 5, s_7_22, 20, 1, 0}, /* 23 */ { 3, s_7_23, -1, 1, 0}, /* 24 */ { 6, s_7_24, -1, 1, 0}, /* 25 */ { 6, s_7_25, -1, 1, 0}, /* 26 */ { 3, s_7_26, -1, 1, 0}, /* 27 */ { 4, s_7_27, -1, 1, 0}, /* 28 */ { 4, s_7_28, -1, 1, 0}, /* 29 */ { 4, s_7_29, -1, 1, 0}, /* 30 */ { 4, s_7_30, -1, 1, 0}, /* 31 */ { 4, s_7_31, -1, 1, 0}, /* 32 */ { 4, s_7_32, -1, 1, 0}, /* 33 */ { 4, s_7_33, -1, 1, 0}, /* 34 */ { 3, s_7_34, -1, 1, 0}, /* 35 */ { 3, s_7_35, -1, 1, 0}, /* 36 */ { 6, s_7_36, -1, 1, 0}, /* 37 */ { 6, s_7_37, -1, 1, 0}, /* 38 */ { 3, s_7_38, -1, 1, 0}, /* 39 */ { 3, s_7_39, -1, 1, 0}, /* 40 */ { 3, s_7_40, -1, 1, 0}, /* 41 */ { 3, s_7_41, -1, 1, 0}, /* 42 */ { 4, s_7_42, -1, 1, 0}, /* 43 */ { 4, s_7_43, -1, 1, 0}, /* 44 */ { 4, s_7_44, -1, 1, 0}, /* 45 */ { 4, s_7_45, -1, 1, 0}, /* 46 */ { 4, s_7_46, -1, 1, 0}, /* 47 */ { 5, s_7_47, -1, 1, 0}, /* 48 */ { 5, s_7_48, -1, 1, 0}, /* 49 */ { 5, s_7_49, -1, 1, 0}, /* 50 */ { 5, s_7_50, -1, 1, 0}, /* 51 */ { 5, s_7_51, -1, 1, 0}, /* 52 */ { 6, s_7_52, -1, 1, 0}, /* 53 */ { 4, s_7_53, -1, 1, 0}, /* 54 */ { 4, s_7_54, -1, 1, 0}, /* 55 */ { 6, s_7_55, 54, 1, 0}, /* 56 */ { 6, s_7_56, 54, 1, 0}, /* 57 */ { 4, s_7_57, -1, 1, 0}, /* 58 */ { 3, s_7_58, -1, 1, 0}, /* 59 */ { 6, s_7_59, 58, 1, 0}, /* 60 */ { 5, s_7_60, 58, 1, 0}, /* 61 */ { 5, s_7_61, 58, 1, 0}, /* 62 */ { 5, s_7_62, 58, 1, 0}, /* 63 */ { 6, s_7_63, -1, 1, 0}, /* 64 */ { 6, s_7_64, -1, 1, 0}, /* 65 */ { 3, s_7_65, -1, 1, 0}, /* 66 */ { 6, s_7_66, 65, 1, 0}, /* 67 */ { 5, s_7_67, 65, 1, 0}, /* 68 */ { 5, s_7_68, 65, 1, 0}, /* 69 */ { 5, s_7_69, 65, 1, 0}, /* 70 */ { 8, s_7_70, -1, 1, 0}, /* 71 */ { 8, s_7_71, -1, 1, 0}, /* 72 */ { 6, s_7_72, -1, 1, 0}, /* 73 */ { 6, s_7_73, -1, 1, 0}, /* 74 */ { 6, s_7_74, -1, 1, 0}, /* 75 */ { 3, s_7_75, -1, 1, 0}, /* 76 */ { 3, s_7_76, -1, 1, 0}, /* 77 */ { 3, s_7_77, -1, 1, 0}, /* 78 */ { 3, s_7_78, -1, 1, 0}, /* 79 */ { 3, s_7_79, -1, 1, 0}, /* 80 */ { 3, s_7_80, -1, 1, 0}, /* 81 */ { 2, s_7_81, -1, 1, 0}, /* 82 */ { 2, s_7_82, -1, 1, 0}, /* 83 */ { 3, s_7_83, -1, 1, 0}, /* 84 */ { 3, s_7_84, -1, 1, 0}, /* 85 */ { 3, s_7_85, -1, 1, 0}, /* 86 */ { 3, s_7_86, -1, 1, 0} }; static const unsigned char g_v[] = { 17, 65, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 128, 128, 8, 2, 1 }; static const unsigned char g_AEIO[] = { 17, 65, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 128, 128, 8, 2 }; static const unsigned char g_CG[] = { 17 }; static const symbol s_0[] = { 0xE0 }; static const symbol s_1[] = { 0xE8 }; static const symbol s_2[] = { 0xEC }; static const symbol s_3[] = { 0xF2 }; static const symbol s_4[] = { 0xF9 }; static const symbol s_5[] = { 'q', 'U' }; static const symbol s_6[] = { 'u' }; static const symbol s_7[] = { 'U' }; static const symbol s_8[] = { 'i' }; static const symbol s_9[] = { 'I' }; static const symbol s_10[] = { 'i' }; static const symbol s_11[] = { 'u' }; static const symbol s_12[] = { 'e' }; static const symbol s_13[] = { 'i', 'c' }; static const symbol s_14[] = { 'l', 'o', 'g' }; static const symbol s_15[] = { 'u' }; static const symbol s_16[] = { 'e', 'n', 't', 'e' }; static const symbol s_17[] = { 'a', 't' }; static const symbol s_18[] = { 'a', 't' }; static const symbol s_19[] = { 'i', 'c' }; static const symbol s_20[] = { 'i' }; static const symbol s_21[] = { 'h' }; static int r_prelude(struct SN_env * z) { int among_var; { int c_test = z->c; /* test, line 35 */ while(1) { /* repeat, line 35 */ int c1 = z->c; z->bra = z->c; /* [, line 36 */ among_var = find_among(z, a_0, 7); /* substring, line 36 */ if (!(among_var)) goto lab0; z->ket = z->c; /* ], line 36 */ switch(among_var) { case 0: goto lab0; case 1: { int ret = slice_from_s(z, 1, s_0); /* <-, line 37 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 1, s_1); /* <-, line 38 */ if (ret < 0) return ret; } break; case 3: { int ret = slice_from_s(z, 1, s_2); /* <-, line 39 */ if (ret < 0) return ret; } break; case 4: { int ret = slice_from_s(z, 1, s_3); /* <-, line 40 */ if (ret < 0) return ret; } break; case 5: { int ret = slice_from_s(z, 1, s_4); /* <-, line 41 */ if (ret < 0) return ret; } break; case 6: { int ret = slice_from_s(z, 2, s_5); /* <-, line 42 */ if (ret < 0) return ret; } break; case 7: if (z->c >= z->l) goto lab0; z->c++; /* next, line 43 */ break; } continue; lab0: z->c = c1; break; } z->c = c_test; } while(1) { /* repeat, line 46 */ int c2 = z->c; while(1) { /* goto, line 46 */ int c3 = z->c; if (in_grouping(z, g_v, 97, 249, 0)) goto lab2; z->bra = z->c; /* [, line 47 */ { int c4 = z->c; /* or, line 47 */ if (!(eq_s(z, 1, s_6))) goto lab4; z->ket = z->c; /* ], line 47 */ if (in_grouping(z, g_v, 97, 249, 0)) goto lab4; { int ret = slice_from_s(z, 1, s_7); /* <-, line 47 */ if (ret < 0) return ret; } goto lab3; lab4: z->c = c4; if (!(eq_s(z, 1, s_8))) goto lab2; z->ket = z->c; /* ], line 48 */ if (in_grouping(z, g_v, 97, 249, 0)) goto lab2; { int ret = slice_from_s(z, 1, s_9); /* <-, line 48 */ if (ret < 0) return ret; } } lab3: z->c = c3; break; lab2: z->c = c3; if (z->c >= z->l) goto lab1; z->c++; /* goto, line 46 */ } continue; lab1: z->c = c2; break; } return 1; } static int r_mark_regions(struct SN_env * z) { z->I[0] = z->l; z->I[1] = z->l; z->I[2] = z->l; { int c1 = z->c; /* do, line 58 */ { int c2 = z->c; /* or, line 60 */ if (in_grouping(z, g_v, 97, 249, 0)) goto lab2; { int c3 = z->c; /* or, line 59 */ if (out_grouping(z, g_v, 97, 249, 0)) goto lab4; { /* gopast */ /* grouping v, line 59 */ int ret = out_grouping(z, g_v, 97, 249, 1); if (ret < 0) goto lab4; z->c += ret; } goto lab3; lab4: z->c = c3; if (in_grouping(z, g_v, 97, 249, 0)) goto lab2; { /* gopast */ /* non v, line 59 */ int ret = in_grouping(z, g_v, 97, 249, 1); if (ret < 0) goto lab2; z->c += ret; } } lab3: goto lab1; lab2: z->c = c2; if (out_grouping(z, g_v, 97, 249, 0)) goto lab0; { int c4 = z->c; /* or, line 61 */ if (out_grouping(z, g_v, 97, 249, 0)) goto lab6; { /* gopast */ /* grouping v, line 61 */ int ret = out_grouping(z, g_v, 97, 249, 1); if (ret < 0) goto lab6; z->c += ret; } goto lab5; lab6: z->c = c4; if (in_grouping(z, g_v, 97, 249, 0)) goto lab0; if (z->c >= z->l) goto lab0; z->c++; /* next, line 61 */ } lab5: ; } lab1: z->I[0] = z->c; /* setmark pV, line 62 */ lab0: z->c = c1; } { int c5 = z->c; /* do, line 64 */ { /* gopast */ /* grouping v, line 65 */ int ret = out_grouping(z, g_v, 97, 249, 1); if (ret < 0) goto lab7; z->c += ret; } { /* gopast */ /* non v, line 65 */ int ret = in_grouping(z, g_v, 97, 249, 1); if (ret < 0) goto lab7; z->c += ret; } z->I[1] = z->c; /* setmark p1, line 65 */ { /* gopast */ /* grouping v, line 66 */ int ret = out_grouping(z, g_v, 97, 249, 1); if (ret < 0) goto lab7; z->c += ret; } { /* gopast */ /* non v, line 66 */ int ret = in_grouping(z, g_v, 97, 249, 1); if (ret < 0) goto lab7; z->c += ret; } z->I[2] = z->c; /* setmark p2, line 66 */ lab7: z->c = c5; } return 1; } static int r_postlude(struct SN_env * z) { int among_var; while(1) { /* repeat, line 70 */ int c1 = z->c; z->bra = z->c; /* [, line 72 */ if (z->c >= z->l || (z->p[z->c + 0] != 73 && z->p[z->c + 0] != 85)) among_var = 3; else among_var = find_among(z, a_1, 3); /* substring, line 72 */ if (!(among_var)) goto lab0; z->ket = z->c; /* ], line 72 */ switch(among_var) { case 0: goto lab0; case 1: { int ret = slice_from_s(z, 1, s_10); /* <-, line 73 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 1, s_11); /* <-, line 74 */ if (ret < 0) return ret; } break; case 3: if (z->c >= z->l) goto lab0; z->c++; /* next, line 75 */ break; } continue; lab0: z->c = c1; break; } return 1; } static int r_RV(struct SN_env * z) { if (!(z->I[0] <= z->c)) return 0; return 1; } static int r_R1(struct SN_env * z) { if (!(z->I[1] <= z->c)) return 0; return 1; } static int r_R2(struct SN_env * z) { if (!(z->I[2] <= z->c)) return 0; return 1; } static int r_attached_pronoun(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 87 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((33314 >> (z->p[z->c - 1] & 0x1f)) & 1)) return 0; if (!(find_among_b(z, a_2, 37))) return 0; /* substring, line 87 */ z->bra = z->c; /* ], line 87 */ if (z->c - 1 <= z->lb || (z->p[z->c - 1] != 111 && z->p[z->c - 1] != 114)) return 0; among_var = find_among_b(z, a_3, 5); /* among, line 97 */ if (!(among_var)) return 0; { int ret = r_RV(z); if (ret == 0) return 0; /* call RV, line 97 */ if (ret < 0) return ret; } switch(among_var) { case 0: return 0; case 1: { int ret = slice_del(z); /* delete, line 98 */ if (ret < 0) return ret; } break; case 2: { int ret = slice_from_s(z, 1, s_12); /* <-, line 99 */ if (ret < 0) return ret; } break; } return 1; } static int r_standard_suffix(struct SN_env * z) { int among_var; z->ket = z->c; /* [, line 104 */ among_var = find_among_b(z, a_6, 51); /* substring, line 104 */ if (!(among_var)) return 0; z->bra = z->c; /* ], line 104 */ switch(among_var) { case 0: return 0; case 1: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 111 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 111 */ if (ret < 0) return ret; } break; case 2: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 113 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 113 */ if (ret < 0) return ret; } { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 114 */ z->ket = z->c; /* [, line 114 */ if (!(eq_s_b(z, 2, s_13))) { z->c = z->l - m_keep; goto lab0; } z->bra = z->c; /* ], line 114 */ { int ret = r_R2(z); if (ret == 0) { z->c = z->l - m_keep; goto lab0; } /* call R2, line 114 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 114 */ if (ret < 0) return ret; } lab0: ; } break; case 3: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 117 */ if (ret < 0) return ret; } { int ret = slice_from_s(z, 3, s_14); /* <-, line 117 */ if (ret < 0) return ret; } break; case 4: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 119 */ if (ret < 0) return ret; } { int ret = slice_from_s(z, 1, s_15); /* <-, line 119 */ if (ret < 0) return ret; } break; case 5: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 121 */ if (ret < 0) return ret; } { int ret = slice_from_s(z, 4, s_16); /* <-, line 121 */ if (ret < 0) return ret; } break; case 6: { int ret = r_RV(z); if (ret == 0) return 0; /* call RV, line 123 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 123 */ if (ret < 0) return ret; } break; case 7: { int ret = r_R1(z); if (ret == 0) return 0; /* call R1, line 125 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 125 */ if (ret < 0) return ret; } { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 126 */ z->ket = z->c; /* [, line 127 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((4722696 >> (z->p[z->c - 1] & 0x1f)) & 1)) { z->c = z->l - m_keep; goto lab1; } among_var = find_among_b(z, a_4, 4); /* substring, line 127 */ if (!(among_var)) { z->c = z->l - m_keep; goto lab1; } z->bra = z->c; /* ], line 127 */ { int ret = r_R2(z); if (ret == 0) { z->c = z->l - m_keep; goto lab1; } /* call R2, line 127 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 127 */ if (ret < 0) return ret; } switch(among_var) { case 0: { z->c = z->l - m_keep; goto lab1; } case 1: z->ket = z->c; /* [, line 128 */ if (!(eq_s_b(z, 2, s_17))) { z->c = z->l - m_keep; goto lab1; } z->bra = z->c; /* ], line 128 */ { int ret = r_R2(z); if (ret == 0) { z->c = z->l - m_keep; goto lab1; } /* call R2, line 128 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 128 */ if (ret < 0) return ret; } break; } lab1: ; } break; case 8: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 134 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 134 */ if (ret < 0) return ret; } { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 135 */ z->ket = z->c; /* [, line 136 */ if (z->c - 1 <= z->lb || z->p[z->c - 1] >> 5 != 3 || !((4198408 >> (z->p[z->c - 1] & 0x1f)) & 1)) { z->c = z->l - m_keep; goto lab2; } among_var = find_among_b(z, a_5, 3); /* substring, line 136 */ if (!(among_var)) { z->c = z->l - m_keep; goto lab2; } z->bra = z->c; /* ], line 136 */ switch(among_var) { case 0: { z->c = z->l - m_keep; goto lab2; } case 1: { int ret = r_R2(z); if (ret == 0) { z->c = z->l - m_keep; goto lab2; } /* call R2, line 137 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 137 */ if (ret < 0) return ret; } break; } lab2: ; } break; case 9: { int ret = r_R2(z); if (ret == 0) return 0; /* call R2, line 142 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 142 */ if (ret < 0) return ret; } { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 143 */ z->ket = z->c; /* [, line 143 */ if (!(eq_s_b(z, 2, s_18))) { z->c = z->l - m_keep; goto lab3; } z->bra = z->c; /* ], line 143 */ { int ret = r_R2(z); if (ret == 0) { z->c = z->l - m_keep; goto lab3; } /* call R2, line 143 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 143 */ if (ret < 0) return ret; } z->ket = z->c; /* [, line 143 */ if (!(eq_s_b(z, 2, s_19))) { z->c = z->l - m_keep; goto lab3; } z->bra = z->c; /* ], line 143 */ { int ret = r_R2(z); if (ret == 0) { z->c = z->l - m_keep; goto lab3; } /* call R2, line 143 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 143 */ if (ret < 0) return ret; } lab3: ; } break; } return 1; } static int r_verb_suffix(struct SN_env * z) { int among_var; { int mlimit; /* setlimit, line 148 */ int m1 = z->l - z->c; (void)m1; if (z->c < z->I[0]) return 0; z->c = z->I[0]; /* tomark, line 148 */ mlimit = z->lb; z->lb = z->c; z->c = z->l - m1; z->ket = z->c; /* [, line 149 */ among_var = find_among_b(z, a_7, 87); /* substring, line 149 */ if (!(among_var)) { z->lb = mlimit; return 0; } z->bra = z->c; /* ], line 149 */ switch(among_var) { case 0: { z->lb = mlimit; return 0; } case 1: { int ret = slice_del(z); /* delete, line 163 */ if (ret < 0) return ret; } break; } z->lb = mlimit; } return 1; } static int r_vowel_suffix(struct SN_env * z) { { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 171 */ z->ket = z->c; /* [, line 172 */ if (in_grouping_b(z, g_AEIO, 97, 242, 0)) { z->c = z->l - m_keep; goto lab0; } z->bra = z->c; /* ], line 172 */ { int ret = r_RV(z); if (ret == 0) { z->c = z->l - m_keep; goto lab0; } /* call RV, line 172 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 172 */ if (ret < 0) return ret; } z->ket = z->c; /* [, line 173 */ if (!(eq_s_b(z, 1, s_20))) { z->c = z->l - m_keep; goto lab0; } z->bra = z->c; /* ], line 173 */ { int ret = r_RV(z); if (ret == 0) { z->c = z->l - m_keep; goto lab0; } /* call RV, line 173 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 173 */ if (ret < 0) return ret; } lab0: ; } { int m_keep = z->l - z->c;/* (void) m_keep;*/ /* try, line 175 */ z->ket = z->c; /* [, line 176 */ if (!(eq_s_b(z, 1, s_21))) { z->c = z->l - m_keep; goto lab1; } z->bra = z->c; /* ], line 176 */ if (in_grouping_b(z, g_CG, 99, 103, 0)) { z->c = z->l - m_keep; goto lab1; } { int ret = r_RV(z); if (ret == 0) { z->c = z->l - m_keep; goto lab1; } /* call RV, line 176 */ if (ret < 0) return ret; } { int ret = slice_del(z); /* delete, line 176 */ if (ret < 0) return ret; } lab1: ; } return 1; } extern int italian_ISO_8859_1_stem(struct SN_env * z) { { int c1 = z->c; /* do, line 182 */ { int ret = r_prelude(z); if (ret == 0) goto lab0; /* call prelude, line 182 */ if (ret < 0) return ret; } lab0: z->c = c1; } { int c2 = z->c; /* do, line 183 */ { int ret = r_mark_regions(z); if (ret == 0) goto lab1; /* call mark_regions, line 183 */ if (ret < 0) return ret; } lab1: z->c = c2; } z->lb = z->c; z->c = z->l; /* backwards, line 184 */ { int m3 = z->l - z->c; (void)m3; /* do, line 185 */ { int ret = r_attached_pronoun(z); if (ret == 0) goto lab2; /* call attached_pronoun, line 185 */ if (ret < 0) return ret; } lab2: z->c = z->l - m3; } { int m4 = z->l - z->c; (void)m4; /* do, line 186 */ { int m5 = z->l - z->c; (void)m5; /* or, line 186 */ { int ret = r_standard_suffix(z); if (ret == 0) goto lab5; /* call standard_suffix, line 186 */ if (ret < 0) return ret; } goto lab4; lab5: z->c = z->l - m5; { int ret = r_verb_suffix(z); if (ret == 0) goto lab3; /* call verb_suffix, line 186 */ if (ret < 0) return ret; } } lab4: lab3: z->c = z->l - m4; } { int m6 = z->l - z->c; (void)m6; /* do, line 187 */ { int ret = r_vowel_suffix(z); if (ret == 0) goto lab6; /* call vowel_suffix, line 187 */ if (ret < 0) return ret; } lab6: z->c = z->l - m6; } z->c = z->lb; { int c7 = z->c; /* do, line 189 */ { int ret = r_postlude(z); if (ret == 0) goto lab7; /* call postlude, line 189 */ if (ret < 0) return ret; } lab7: z->c = c7; } return 1; } extern struct SN_env * italian_ISO_8859_1_create_env(void) { return SN_create_env(0, 3, 0); } extern void italian_ISO_8859_1_close_env(struct SN_env * z) { SN_close_env(z, 0); } swish-e-2.4.7/src/hash.h0000664000077100017500000000267411166010110011707 00000000000000/* ** ** $Id: hash.h 1736 2005-05-12 15:41:22Z karman $ ** This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 18:19:34 CDT 2005 ** added GPL */ unsigned string_hash(char *, int); unsigned int_hash(int, int); unsigned hash (char *); unsigned numhash (int); unsigned bighash (char *); unsigned bignumhash (int); unsigned verybighash (char *); struct swline *add_word_to_hash_table( WORD_HASH_TABLE *table_ptr, char *word, int hash_size); struct swline * is_word_in_hash_table( WORD_HASH_TABLE table, char *word); void free_word_hash_table( WORD_HASH_TABLE *table_ptr); swish-e-2.4.7/src/error.c0000664000077100017500000002207011166010110012100 00000000000000/* $Id: error.c 1815 2006-08-27 20:22:54Z karman $ ** Copyright (C) 1995, 1996, 1997, 1998 Hewlett-Packard Company ** Originally by Kevin Hughes, kev@kevcom.com, 3/11/94 ** This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 15:51:39 CDT 2005 ** added GPL ** if this really is re-written in 2001, we should remove the HP copyright?? ** ** 2001-02-12 rasc rewritten (progerr uses vargs, now) ** */ #include "swish.h" #include #include #include #include #include "error.h" #include /* -- print program error message (like printf) -- exit (1) */ /* Allow overriding swish-e's old behavior of errors to stdout */ FILE *error_handle = NULL; void set_error_handle( FILE *where ) { error_handle = where; } void SwishErrorsToStderr( void ) { error_handle = stderr; } void progerr(char *msgfmt,...) { va_list args; if ( !error_handle ) error_handle = stdout; va_start (args,msgfmt); fprintf (error_handle, "err: "); vfprintf (error_handle, msgfmt, args); fprintf (error_handle, "\n.\n"); va_end (args); exit(1); } /* -- print program error message (like printf) -- includes text of errno at end of message -- exit (1) */ void progerrno(char *msgfmt,...) { va_list args; if ( !error_handle ) error_handle = stdout; va_start (args,msgfmt); fprintf (error_handle, "err: "); vfprintf (error_handle, msgfmt, args); fprintf (error_handle, "%s", strerror(errno)); fprintf (error_handle, "\n.\n"); va_end (args); exit(1); } /********** These are an attempt to prevent aborting in the library *********/ void set_progerr(int errornum, SWISH *sw, char *msgfmt,...) { va_list args; sw->lasterror = errornum; va_start (args,msgfmt); vsnprintf (sw->lasterrorstr, MAX_ERROR_STRING_LEN, msgfmt, args); va_end (args); } void reset_lasterror(SWISH *sw) { sw->lasterror = RC_OK; sw->lasterrorstr[0] = '\0'; } void set_progerrno(int errornum, SWISH *sw, char *msgfmt,...) { va_list args; char *errstr = strerror(errno); sw->lasterror = errornum; va_start (args,msgfmt); vsnprintf (sw->lasterrorstr, MAX_ERROR_STRING_LEN - strlen( errstr ), msgfmt, args); strcat( sw->lasterrorstr, errstr ); va_end (args); } /* only print a warning (also to error_handle) and return */ /* might want to have an enum level WARN_INFO, WARN_ERROR, WARN_CRIT, WARN_DEBUG */ void progwarn(char *msgfmt,...) { va_list args; if ( !error_handle ) error_handle = stdout; va_start (args,msgfmt); fprintf (error_handle, "\nWarning: "); vfprintf (error_handle, msgfmt, args); fprintf (error_handle, "\n"); va_end (args); } /* only print a warning (also to error_handle) and return */ /* might want to have an enum level WARN_INFO, WARN_ERROR, WARN_CRIT, WARN_DEBUG */ /* includes text of errno at end of message */ void progwarnno(char *msgfmt,...) { va_list args; if ( !error_handle ) error_handle = stdout; va_start (args,msgfmt); fprintf (error_handle, "\nWarning: "); vfprintf (error_handle, msgfmt, args); fprintf (error_handle, "%s", strerror(errno)); fprintf (error_handle, "\n"); va_end (args); } typedef struct { int critical; /* If true the calling code needs to call SwishClose */ int error_num; char *message_string; } error_msg_map; /* See errors.h to the correspondant numerical value */ static error_msg_map swishErrors[]={ { 0, RC_OK, "" }, { 0, NO_WORDS_IN_SEARCH, "No search words specified" }, { 0, WORDS_TOO_COMMON, "All search words too common to be useful" }, { 0, UNKNOWN_PROPERTY_NAME_IN_SEARCH_DISPLAY, "Unknown property name in display properties" }, { 0, UNKNOWN_PROPERTY_NAME_IN_SEARCH_SORT, "Unknown property name to sort by" }, { 0, INVALID_PROPERTY_TYPE, "Invalid property type" }, { 0, UNKNOWN_METANAME, "Unknown metaname" }, { 0, UNIQUE_WILDCARD_NOT_ALLOWED_IN_WORD, "Single wildcard not allowed as word" }, { 0, WILDCARD_NOT_ALLOWED_AT_WORD_START, "Wildcard may not start a word" }, { 0, WILDCARD_NOT_ALLOWED_WITHIN_WORD, "Wildcard not allowed within a word" }, { 0, WORD_NOT_FOUND, "Word not found" }, { 0, SEARCH_WORD_TOO_BIG, "Search word exceeded maxwordlimit setting" }, { 0, QUERY_SYNTAX_ERROR, "Syntax error in query (missing end quote or unbalanced parenthesis?)" }, { 0, PROP_LIMIT_ERROR, "Failed to setup limit by property"}, { 0, SWISH_LISTRESULTS_EOF, "No more results" }, { 0, HEADER_READ_ERROR, "Index Header Error", }, { 1, INDEX_FILE_NOT_FOUND, "Could not open index file" }, { 1, UNKNOWN_INDEX_FILE_FORMAT, "Unknown index file format" }, { 1, INDEX_FILE_IS_EMPTY, "Index file(s) is empty" }, { 1, INDEX_FILE_ERROR, "Index file error" }, { 1, INVALID_SWISH_HANDLE, "Invalid swish handle" }, { 1, INVALID_RESULTS_HANDLE, "Invalid results object" }, }; /***************************************************************** * SwishError * * Pass: * SWISH *sw * * Returns: * value of the last error number, or zero * ******************************************************************/ int SwishError(SWISH * sw) { if (!sw) return INVALID_SWISH_HANDLE; return (sw->lasterror); } /***************************************************************** * SwishErrorString * * Pass: * SWISH *sw * * Returns: * pointer to string of generic error message related to * the last error number * ******************************************************************/ char *SwishErrorString(SWISH *sw) { return getErrorString(sw ? sw->lasterror : INVALID_SWISH_HANDLE); } /***************************************************************** * SwishLastErrorMsg * * Pass: * SWISH *sw * * Returns: * pointer to the string comment of the last error message, if any * ******************************************************************/ char *SwishLastErrorMsg(SWISH *sw ) { return sw->lasterrorstr; } /***************************************************************** * getErrorString * * Pass: * error number * * Returns: * value of the last error number, or zero * ******************************************************************/ char *getErrorString(int number) { int i; static char message[50]; for (i = 0; i < (int)(sizeof(swishErrors) / sizeof(swishErrors[0])); i++) if ( number == swishErrors[i].error_num ) return swishErrors[i].message_string; sprintf( message, "Invalid error number '%d'", number ); return( message ); } /***************************************************************** * SwishCriticalError * * This returns true if the last error was critical and means that * the swish object should be destroyed * * Pass: * *sw * * Returns: * true if the current sw->lasterror is a critical error * or if the number is invalid or the sw is null * ******************************************************************/ int SwishCriticalError(SWISH *sw) { int i; if ( !sw ) return 1; for (i = 0; i < (int)(sizeof(swishErrors) / sizeof(swishErrors[0])); i++) if ( sw->lasterror == swishErrors[i].error_num ) return swishErrors[i].critical; return 1; } /***************************************************************** * SwishAbortLastError * * Aborts with the error message type, and the optional comment message * * Pass: * SWISH *sw * * Returns: * nope * ******************************************************************/ void SwishAbortLastError(SWISH *sw) { if ( sw->lasterror < 0 ) { if ( *(SwishLastErrorMsg( sw )) ) progerr( "%s: %s", SwishErrorString( sw ), SwishLastErrorMsg( sw ) ); else progerr( "%s", SwishErrorString( sw ) ); } progerr("Swish aborted with non-negative lasterror"); } swish-e-2.4.7/src/db.h0000664000077100017500000002243611166010110011347 00000000000000/* $Id: db.h 1946 2007-10-22 14:56:35Z karpet $ This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 18:19:34 CDT 2005 ** added GPL ** ** 2001-01 jose initial coding ** */ #ifndef __HasSeenModule_DB #define __HasSeenModule_DB 1 /* Possible Open File modes */ typedef enum { DB_CREATE, DB_READ, DB_READWRITE } DB_OPEN_MODE; void initModule_DB (SWISH *); void freeModule_DB (SWISH *); void write_header(SWISH *ws, int merged_flag ); void update_header(SWISH *, void *, int, int ); void write_index(SWISH *, IndexFILE *); void write_word(SWISH *, ENTRY *, IndexFILE *); #ifdef USE_BTREE void update_wordID(SWISH *, ENTRY *, IndexFILE *); void delete_worddata(SWISH *, sw_off_t, IndexFILE *); #endif void build_worddata(SWISH *, ENTRY *); void write_worddata(SWISH *, ENTRY *, IndexFILE *); sw_off_t read_worddata(SWISH * sw, ENTRY * ep, IndexFILE * indexf, unsigned char **bufer, int *sz_buffer); void add_worddata(SWISH *sw, unsigned char *buffer, int sz_buffer); void write_pathlookuptable_to_header(SWISH *, int id, INDEXDATAHEADER *header, void *DB); void write_MetaNames (SWISH *, int id, INDEXDATAHEADER *header, void *DB); int write_integer_table_to_header(SWISH *, int id, int table[], int table_size, void *DB); void read_header(SWISH *, INDEXDATAHEADER *header, void *DB); void parse_MetaNames_from_buffer(INDEXDATAHEADER *header, char *buffer); void parse_pathlookuptable_from_buffer(INDEXDATAHEADER *header, char *buffer); void parse_integer_table_from_buffer(int table[], int table_size, char *buffer); char *getfilewords(SWISH *sw, int, IndexFILE *); void setTotalWordsPerFile(IndexFILE *,int ,int ); int getTotalWordsInFile(IndexFILE *indexf, int filenum); /* Common DB api */ void *DB_Create (SWISH *sw, char *dbname); void *DB_Open (SWISH *sw, char *dbname, int mode); void DB_Close(SWISH *sw, void *DB); void DB_Remove(SWISH *sw, void *DB); int DB_InitWriteHeader(SWISH *sw, void *DB); int DB_EndWriteHeader(SWISH *sw, void *DB); int DB_WriteHeaderData(SWISH *sw, int id, unsigned char *s, int len, void *DB); int DB_InitReadHeader(SWISH *sw, void *DB); int DB_ReadHeaderData(SWISH *sw, int *id, unsigned char **s, int *len, void *DB); int DB_EndReadHeader(SWISH *sw, void *DB); int DB_InitWriteWords(SWISH *sw, void *DB); sw_off_t DB_GetWordID(SWISH *sw, void *DB); int DB_WriteWord(SWISH *sw, char *word, sw_off_t wordID, void *DB); #ifdef USE_BTREE int DB_UpdateWordID(SWISH *sw, char *word, sw_off_t wordID, void *DB); int DB_DeleteWordData(SWISH *sw,sw_off_t wordID, void *DB); #endif int DB_WriteWordHash(SWISH *sw, char *word, sw_off_t wordID, void *DB); long DB_WriteWordData(SWISH *sw, sw_off_t wordID, unsigned char *worddata, int data_size, int saved_bytes, void *DB); int DB_EndWriteWords(SWISH *sw, void *DB); int DB_InitReadWords(SWISH *sw, void *DB); int DB_ReadWordHash(SWISH *sw, char *word, sw_off_t *wordID, void *DB); int DB_ReadFirstWordInvertedIndex(SWISH *sw, char *word, char **resultword, sw_off_t *wordID, void *DB); int DB_ReadNextWordInvertedIndex(SWISH *sw, char *word, char **resultword, sw_off_t *wordID, void *DB); long DB_ReadWordData(SWISH *sw, sw_off_t wordID, unsigned char **worddata, int *data_size, int *saved_bytes, void *DB); int DB_EndReadWords(SWISH *sw, void *DB); #ifdef USE_PRESORT_ARRAY int DB_InitWriteSortedIndex(SWISH *sw, void *DB, int n_props ); int DB_WriteSortedIndex(SWISH *sw, int propID, int *data, int sz_data,void *DB); #else int DB_InitWriteSortedIndex(SWISH *sw, void *DB ); int DB_WriteSortedIndex(SWISH *sw, int propID, unsigned char *data, int sz_data,void *DB); #endif int DB_EndWriteSortedIndex(SWISH *sw, void *DB); int DB_InitReadSortedIndex(SWISH *sw, void *DB); int DB_ReadSortedIndex(SWISH *sw, int propID, unsigned char **data, int *sz_data,void *DB); /* this is defined in db_native.h now int DB_ReadSortedData(SWISH *sw, int *data,int index, int *value, void *DB); */ #ifdef USE_PRESORT_ARRAY #define DB_ReadSortedData(data, index) (ARRAY_Get((ARRAY *)data,index)) #else #define DB_ReadSortedData(data, index) (data[index]) #endif int DB_EndReadSortedIndex(SWISH *sw, void *DB); int DB_WriteFileNum(SWISH *sw, int filenum, unsigned char *filedata,int sz_filedata, void *DB); int DB_ReadFileNum(SWISH *sw, unsigned char *filedata, void *DB); int DB_CheckFileNum(SWISH *sw, int filenum, void *DB); int DB_RemoveFileNum(SWISH *sw, int filenum, void *DB); int DB_InitWriteProperties(SWISH *sw, void *DB); void DB_WriteProperty( SWISH *sw, IndexFILE *indexf, FileRec *fi, int propID, char *buffer, int buf_len, int uncompressed_len, void *db); void DB_WritePropPositions(SWISH *sw, IndexFILE *indexf, FileRec *fi, void *db); void DB_ReadPropPositions(SWISH *sw, IndexFILE *indexf, FileRec *fi, void *db); char *DB_ReadProperty(SWISH *sw, IndexFILE *indexf, FileRec *fi, int propID, int *buf_len, int *uncompressed_len, void *db); void DB_Reopen_PropertiesForRead(SWISH *sw, void *DB); #ifdef USE_BTREE int DB_WriteTotalWordsPerFile(SWISH *sw, int idx, int wordcount, void *DB); int DB_ReadTotalWordsPerFile(SWISH *sw, int idx, int *wordcount, void *DB); #endif struct MOD_DB { char *DB_name; /* short name for data source */ void * (*DB_Create) (SWISH *sw, char *dbname); void * (*DB_Open) (SWISH *sw, char *dbname, int mode); void (*DB_Close) (void *DB); void (*DB_Remove) (void *DB); int (*DB_InitWriteHeader) (void *DB); int (*DB_WriteHeaderData) (int id, unsigned char *s, int len, void *DB); int (*DB_EndWriteHeader) (void *DB); int (*DB_InitReadHeader) (void *DB); int (*DB_ReadHeaderData) (int *id, unsigned char **s, int *len, void *DB); int (*DB_EndReadHeader) (void *DB); int (*DB_InitWriteWords) (void *DB); sw_off_t (*DB_GetWordID) (void *DB); int (*DB_WriteWord) (char *word, sw_off_t wordID, void *DB); #ifdef USE_BTREE int (*DB_UpdateWordID)(char *word, sw_off_t new_wordID, void *DB); int (*DB_DeleteWordData)(sw_off_t wordID, void *DB); #endif int (*DB_WriteWordHash) (char *word, sw_off_t wordID, void *DB); long (*DB_WriteWordData) (sw_off_t wordID, unsigned char *worddata, int data_size, int saved_bytes, void *DB); int (*DB_EndWriteWords) (void *DB); int (*DB_InitReadWords) (void *DB); int (*DB_ReadWordHash) (char *word, sw_off_t *wordID, void *DB); int (*DB_ReadFirstWordInvertedIndex) (char *word, char **resultword, sw_off_t *wordID, void *DB); int (*DB_ReadNextWordInvertedIndex) (char *word, char **resultword, sw_off_t *wordID, void *DB); long (*DB_ReadWordData) (sw_off_t wordID, unsigned char **worddata, int *data_size, int *saved_bytes, void *DB); int (*DB_EndReadWords) (void *DB); int (*DB_WriteFileNum) (int filenum, unsigned char *filedata,int sz_filedata, void *DB); int (*DB_ReadFileNum) ( unsigned char *filedata, void *DB); int (*DB_CheckFileNum) (int filenum, void *DB); int (*DB_RemoveFileNum) (int filenum, void *DB); #ifdef USE_PRESORT_ARRAY int (*DB_InitWriteSortedIndex) (void *DB, int n_props); int (*DB_WriteSortedIndex) (int propID, int *data, int sz_data,void *DB); #else int (*DB_InitWriteSortedIndex) (void *DB); int (*DB_WriteSortedIndex) (int propID, unsigned char *data, int sz_data,void *DB); #endif int (*DB_EndWriteSortedIndex) (void *DB); int (*DB_InitReadSortedIndex) (void *DB); int (*DB_ReadSortedIndex) (int propID, unsigned char **data, int *sz_data,void *DB); int (*DB_ReadSortedData) (int *data,int index, int *value, void *DB); int (*DB_EndReadSortedIndex) (void *DB); int (*DB_InitWriteProperties) (void *DB); void (*DB_WriteProperty)( IndexFILE *indexf, FileRec *fi, int propID, char *buffer, int buf_len, int uncompressed_len, void *db); void (*DB_WritePropPositions)(IndexFILE *indexf, FileRec *fi, void *db); void (*DB_ReadPropPositions)(IndexFILE *indexf, FileRec *fi, void *db); char *(*DB_ReadProperty)(IndexFILE *indexf, FileRec *fi, int propID, int *buf_len, int *uncompressed_len, void *db); void (*DB_Reopen_PropertiesForRead)(void *DB); #ifdef USE_BTREE int (*DB_WriteTotalWordsPerFile)(SWISH *sw, int idx, int wordcount, void *DB); int (*DB_ReadTotalWordsPerFile)(SWISH *sw, int idx, int *wordcount, void *DB); #endif }; #endif swish-e-2.4.7/src/Makefile.am0000664000077100017500000000550711166010110012645 00000000000000SUBDIRS = expat replace snowball # Using AM_CPPFLAGS instead of per-target flags means object names # don't get renamed. If using per-target _CPPFLAGS then would need # to update configure.in to use the prefix names on all optional objects # passed in. (e.g. $BTREE_OBJS). AM_CPPFLAGS = -Dlibexecdir=\"${libexecdir}\" \ -DPATH_SEPARATOR=\"${PATH_SEPARATOR}\" \ $(Z_CFLAGS) $(PCRE_CFLAGS) $(LIBXML2_CFLAGS) -Ireplace # Until can figure out how to use AM_AUTOMAKE_INIT([-Wall]) AM_CFLAGS = -Wall @LARGEFILES_MACROS@ bin_PROGRAMS = swish-e swish_e_SOURCES = swish.c swish.h keychar_out.c keychar_out.h dump.c dump.h result_output.c result_output.h swish_e_LDADD = libswishindex.la libswish-e.la EXTRA_PROGRAMS = libtest libtest_SOURCES = libtest.c libtest_LDADD = libswish-e.la libtest_LDFLAGS = -static ## can also use -all-static for a stand-alone binary. ## The search library ## -- note that libreplace may have code specific for indexing only. ## -- will that be a problem on systems? lib_LTLIBRARIES = libswish-e.la libswish_e_la_LDFLAGS = -no-undefined -version-info 2:0:0 $(Z_LIBS) $(PCRE_LIBS) libswish_e_la_LIBADD = $(BTREE_OBJS) replace/libreplace.la snowball/libsnowball.la libswish_e_la_DEPENDENCIES = $(libswish_e_la_LIBADD) libswish_e_la_SOURCES = \ config.h \ search.c search.h \ swish2.c \ swish_words.c swish_words.h \ proplimit.c proplimit.h \ rank.c rank.h \ db_read.c db.h \ result_sort.c result_sort.h \ hash.c hash.h \ compress.c compress.h \ db_native.c db_native.h \ ramdisk.c ramdisk.h \ check.c check.h \ error.c error.h \ list.c list.h \ mem.c mem.h sys.h\ swstring.c swstring.h \ docprop.c docprop.h \ metanames.c metanames.h \ headers.c headers.h \ swish_qsort.c swish_qsort.h \ date_time.c date_time.h \ double_metaphone.c double_metaphone.h \ stemmer.c stemmer.h \ soundex.c soundex.h EXTRA_libswish_e_la_SOURCES = \ btree.c btree.h \ array.c array.h \ worddata.c worddata.h \ fhash.c fhash.h ## Convenience lib for indexing code noinst_LTLIBRARIES = libswishindex.la libswishindex_la_LIBADD = expat/libswexpat.la $(LIBXML2_OBJS) snowball/libsnowball.la libswishindex_la_LDFLAGS = $(LIBXML2_LIB) $(Z_LIBS) $(PCRE_LIBS) ## in case these change libswishindex_la_DEPENDENCIES = $(libswishindex_la_LIBADD) EXTRA_libswishindex_la_SOURCES = parser.c parser.h libswishindex_la_SOURCES = \ fs.c fs.h \ http.c http.h \ httpserver.c httpserver.h \ extprog.c extprog.h \ bash.c bash.h \ methods.c \ html.c html.h \ txt.c txt.h \ xml.c xml.h \ entities.c entities.h \ index.c index.h \ merge.c merge.h \ pre_sort.c \ file.c file.h \ filter.c filter.h \ parse_conffile.c parse_conffile.h \ swregex.c swregex.h \ db_write.c \ docprop_write.c \ getruntime.c getruntime.h include_HEADERS = swish-e.h libexec_SCRIPTS = swishspider EXTRA_DIST = swishspider swish-e-2.4.7/src/ramdisk.h0000664000077100017500000000257211166010110012413 00000000000000/* $Id: ramdisk.h 1946 2007-10-22 14:56:35Z karpet $ This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 18:19:34 CDT 2005 ** added GPL */ struct ramdisk *ramdisk_create(char *, int); int ramdisk_close(FILE *); void add_buffer_ramdisk(struct ramdisk *); sw_off_t ramdisk_tell(FILE *); size_t ramdisk_write(const void *,size_t, size_t, FILE *); int ramdisk_seek(FILE *,sw_off_t, int ); size_t ramdisk_read(void *, size_t, size_t, FILE *); int ramdisk_getc(FILE *); int ramdisk_putc(int , FILE *); swish-e-2.4.7/src/parser.h0000775000077100017500000000237611166010110012262 00000000000000/* $Id: parser.h 1736 2005-05-12 15:41:22Z karman $ ** ** ** The prototypes This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 18:19:34 CDT 2005 ** added GPL */ int parse_HTML(SWISH *sw, FileProp *fprop, FileRec *fi, char *buffer); int parse_XML(SWISH *sw, FileProp *fprop, FileRec *fi, char *buffer); int parse_TXT(SWISH * sw, FileProp * fprop, FileRec *fi, char *buffer); swish-e-2.4.7/src/docprop_write.c0000775000077100017500000001302011166010110013625 00000000000000/* ** $Id: docprop_write.c 1945 2007-10-22 14:54:07Z karpet $ This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 15:51:39 CDT 2005 ** added GPL ** */ #include "swish.h" #include "mem.h" #include "swstring.h" #include "error.h" #include "search.h" #include "metanames.h" #include "merge.h" #include "docprop.h" #include "db.h" #ifdef HAVE_ZLIB #include #endif static unsigned char *compress_property( propEntry *prop, SWISH *sw, int *buf_len, int *uncompressed_len ); /******************************************************************* * Write Properties to disk, and save seek pointers * * DB_WriteProperty - should write filenum:propID as the key * DB_WritePropPositions - writes the stored positions * * * *********************************************************************/ void WritePropertiesToDisk( SWISH *sw , FileRec *fi ) { IndexFILE *indexf = sw->indexlist; INDEXDATAHEADER *header = &indexf->header; docProperties *docProperties = fi->docProperties; propEntry *prop; int uncompressed_len; unsigned char *buf; int buf_len; int count; int i; /* initialize the first time called */ if ( header->property_count == 0 ) { /* Get the current seek position in the index, since will now write the file info */ DB_InitWriteProperties(sw, indexf->DB); /* build a list of properties that are in use */ /* And create the prop index to propID (metaID) mapping arrays */ init_property_list(header); } if ( (count = header->property_count) <= 0) return; /* any props exist, unlikely, but need to save a space. */ if ( !docProperties ) { DB_WritePropPositions( sw, indexf, fi, indexf->DB); return; } for( i = 0; i < count; i++ ) { /* convert the count to a propID */ int propID = header->propIDX_to_metaID[i]; // here's the array created in init_property_list() /* Here's why I need to redo the properties so it's always header->property_count size in the fi rec */ /* The mapping is all a temporary kludge */ if ( propID >= docProperties->n ) // Does this file have this many properties? continue; if ( !(prop = docProperties->propEntry[propID])) // does this file have this prop? continue; buf = compress_property( prop, sw, &buf_len, &uncompressed_len ); DB_WriteProperty( sw, indexf, fi, propID, (char *)buf, buf_len, uncompressed_len, indexf->DB ); } /* Write the position data */ DB_WritePropPositions( sw, indexf, fi, indexf->DB); freeDocProperties( docProperties ); fi->docProperties = NULL; } /******************************************************************* * Compress a Property * * Call with: * propEntry - the in data and its length * SWISH - to get access to the common buffer * *uncompress_len - returns the length of the original buffer, or zero if not compressed * *buf_len - the length of the returned buffer * * Returns: * pointer the buffer of buf_len size * * *********************************************************************/ static unsigned char *compress_property( propEntry *prop, SWISH *sw, int *buf_len, int *uncompressed_len ) { #ifndef HAVE_ZLIB *buf_len = prop->propLen; *uncompressed_len = 0; return prop->propValue; #else unsigned char *PropBuf; /* For compressing and uncompressing */ uLongf dest_size; int zlib_status = 0; /* Don't bother compressing smaller items */ if ( prop->propLen < MIN_PROP_COMPRESS_SIZE ) { *buf_len = prop->propLen; *uncompressed_len = 0; return prop->propValue; } /* Buffer should be +1% + a few bytes. */ dest_size = (uLongf)(prop->propLen + ( prop->propLen / 100 ) + 1000); // way more than should be needed /* Get an output buffer */ PropBuf = allocatePropIOBuffer( sw, dest_size ); zlib_status = compress2( (Bytef *)PropBuf, &dest_size, prop->propValue, prop->propLen, sw->PropCompressionLevel); if ( zlib_status != Z_OK ) progerr("Property Compression Error. zlib compress2 returned: %d Prop len: %d compress buf size: %d compress level:%d", zlib_status, prop->propLen, (int)dest_size,sw->PropCompressionLevel); /* Make sure it's compressed enough */ if ( dest_size >= prop->propLen ) { *buf_len = prop->propLen; *uncompressed_len = 0; return prop->propValue; } *buf_len = (int)dest_size; *uncompressed_len = prop->propLen; return PropBuf; #endif } swish-e-2.4.7/src/result_output.h0000664000077100017500000000513711166010110013717 00000000000000/* $Id: result_output.h 1736 2005-05-12 15:41:22Z karman $ ** This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 18:19:34 CDT 2005 ** added GPL ** ** 2001-01 R. Scherg (rasc) initial coding ** */ #ifndef __HasSeenModule_ResultOutput #define __HasSeenModule_ResultOutput 1 /* -- module data */ struct ResultExtFmtStrList { /* -x extended format by defined names */ char *name; char *fmtstr; struct ResultExtFmtStrList *next; struct ResultExtFmtStrList *nodep; }; /* -- global module data structure */ struct MOD_ResultOutput { /* public: */ /* private: don't use outside this module! */ /* -x extended format by defined names */ char *extendedformat; /* -x "fmt", holds fmt or NULL */ char *stdResultFieldDelimiter; /* -d delimiter , (def: config.h) v1.x output style */ /* ResultExtendedFormat predefined List see: -x */ struct ResultExtFmtStrList *resultextfmtlist; int numPropertiesToDisplay; int currentMaxPropertiesToDisplay; char **propNameToDisplay; int **propIDToDisplay; }; void addSearchResultDisplayProperty (SWISH *, char* ); void initModule_ResultOutput (SWISH *sw); void freeModule_ResultOutput (SWISH *sw); int configModule_ResultOutput (SWISH *sw, StringList *sl); void initPrintExtResult (SWISH *sw, char *fmt); void printSortedResults(RESULTS_OBJECT *results, int begin, int maxhits); int initSearchResultProperties(SWISH *sw); char *hasResultExtFmtStr (SWISH *sw, char *name); int resultHeaderOut (SWISH *sw, int min_verbose, char *prtfmt, ...); void resultPrintHeader (SWISH *sw, int min_verbose, INDEXDATAHEADER *h, char *pathname, int merged); #endif swish-e-2.4.7/src/keychar_out.c0000664000077100017500000000455611166010110013275 00000000000000/* $Id: keychar_out.c 1945 2007-10-22 14:54:07Z karpet $ ** This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 15:23:45 CDT 2005 ** added GPL ** ** ** ** 2001-03-20 rasc own module for this routine (from swish.c) ** */ #include "swish.h" #include "swstring.h" #include "error.h" #include "mem.h" #include "search.h" #include "result_output.h" #include "db.h" #include "keychar_out.h" /* ** ---------------------------------------------- ** ** Module code starts here ** ** ---------------------------------------------- */ /* -- output all indexed words starting with character "keychar" -- keychar == '*' prints all words */ void OutputKeyChar (SWISH *sw, int keychar) { IndexFILE *tmpindexlist; int keychar2; char *keywords; if ( !SwishAttach(sw) ) SwishAbortLastError( sw ); resultHeaderOut(sw,1, "%s\n", INDEXHEADER); /* print out "original" search words */ for(tmpindexlist=sw->indexlist;tmpindexlist;tmpindexlist=tmpindexlist->next) { resultHeaderOut(sw,1, "%s:",tmpindexlist->line); if(keychar=='*') { for(keychar2=1;keychar2<256;keychar2++) { keywords=getfilewords(sw,(unsigned char )keychar2,tmpindexlist); for(;keywords && keywords[0];keywords+=strlen(keywords)+1) resultHeaderOut(sw,1, " %s",keywords); } } else { keywords=getfilewords(sw,keychar,tmpindexlist); for(;keywords && keywords[0];keywords+=strlen(keywords)+1) resultHeaderOut(sw,1, " %s",keywords); } resultHeaderOut(sw,1, "\n"); } return; } swish-e-2.4.7/src/swish_qsort.h0000664000077100017500000000404111166010110013337 00000000000000/* $Id: swish_qsort.h 1717 2005-05-09 23:36:20Z karman $ * * Copyright (c) 1992, 1993 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD: src/lib/libc/stdlib/qsort.c,v 1.8 1999/08/28 00:01:35 peter Exp $ * * Renamed swish_qsort for use with swish. [wsm] June 2001 * * Mon May 9 18:02:15 CDT 2005 * #3 in BSD license removed per * ftp://ftp.cs.berkeley.edu/pub/4bsd/README.Impt.License.Change * this also makes it GPL friendly. */ typedef int cmp_t (const void *, const void *); void swish_qsort(void *a, size_t n, size_t es, cmp_t *cmp); swish-e-2.4.7/src/swish_words.h0000775000077100017500000000372211166010110013335 00000000000000/* $Id: swish_words.h 1811 2006-08-24 04:06:28Z karman $ This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 18:15:43 CDT 2005 ** added GPL */ #ifndef SWISH_WORDS_H #define SWISH_WORDS_H 1 #ifdef __cplusplus extern "C" { #endif /* internal representation, may not be changed */ /* BTW, keep strlen(MAGIC_NOT_WORD) >= strlen(NOT_WORD) to avoid string reallocation in the code */ #define AND_WORD "" #define OR_WORD "" #define NOT_WORD "" #define MAGIC_NOT_WORD "<__not__>" #define PHRASE_WORD "" #define AND_NOT_WORD "" #define NEAR_WORD "" /* internal search rule numbers */ #define NO_RULE 0 #define AND_RULE 1 #define OR_RULE 2 #define NOT_RULE 3 #define PHRASE_RULE 4 #define AND_NOT_RULE 5 #define NEAR_RULE 6 struct swline *parse_swish_query( DB_RESULTS *db_results ); void initModule_Swish_Words (SWISH *sw); void freeModule_Swish_Words (SWISH *sw); void stripIgnoreFirstChars(INDEXDATAHEADER *, char *); void stripIgnoreLastChars(INDEXDATAHEADER *, char *); #ifdef __cplusplus } #endif /* __cplusplus */ #endif swish-e-2.4.7/src/stemmer.h0000664000077100017500000000753511166010110012441 00000000000000/* ** stemmer.h $Id: stemmer.h 1949 2007-10-24 03:02:08Z karpet $ This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 18:19:34 CDT 2005 ** added GPL */ #ifndef STEMMER_H #define STEMMER_H 1 #include "snowball/api.h" /* For snoball's SN_env */ /* * Warning: * Don't change the order of these as it will break existing indexes. * The index stores this enum number. */ typedef enum { FUZZY_NONE = 0, FUZZY_STEMMING_EN, FUZZY_SOUNDEX, FUZZY_METAPHONE, FUZZY_DOUBLE_METAPHONE, FUZZY_STEMMING_ES, FUZZY_STEMMING_FR, FUZZY_STEMMING_IT, FUZZY_STEMMING_PT, FUZZY_STEMMING_DE, FUZZY_STEMMING_NL, FUZZY_STEMMING_EN1, FUZZY_STEMMING_EN2, FUZZY_STEMMING_NO, FUZZY_STEMMING_SE, FUZZY_STEMMING_DK, FUZZY_STEMMING_RU, FUZZY_STEMMING_FI, FUZZY_STEMMING_RO, FUZZY_STEMMING_HU } FuzzyIndexType; typedef enum { STEM_OK, STEM_NOT_ALPHA, /* not all alpha */ STEM_TOO_SMALL, /* word too small to be stemmed */ STEM_WORD_TOO_BIG, /* word it too large to stem, would would be too large */ STEM_TO_NOTHING /* word stemmed to the null string */ } STEM_RETURNS; /* * This structure manages the results from a stemming operation. */ typedef struct { STEM_RETURNS error; /* return value from stemmer */ const char *orig_word; /* address of input string */ int list_size; /* number of entries in the string list */ char **word_list; /* pointer to list of stemmed words */ int free_strings; /* flag if true means free individual strings in string list */ char *string_list[1]; /* null terminated array of string pointers */ } FUZZY_WORD; /* FUZZY_OBJECT and FUZZY_OPTS reference each other, so pre-define */ typedef struct s_FUZZY_OBJECT FUZZY_OBJECT; typedef struct s_FUZZY_OPTS FUZZY_OPTS; /* * This structure defines the layout of fuzzy options table. * The table lists the available stemmers. */ struct s_FUZZY_OPTS { FuzzyIndexType fuzzy_mode; char *name; FUZZY_WORD *(*routine) ( FUZZY_OBJECT *fi, const char *inword ); struct SN_env *(*init) (void); void (*stemmer_free) (struct SN_env *); int (*lang_stem)(struct SN_env *); }; /* * This structure caches what stemmer to use. */ struct s_FUZZY_OBJECT { FUZZY_OPTS *stemmer; /* stemmer options to use */ struct SN_env *snowball_options; /* snowball's data */ }; FUZZY_WORD *fuzzy_convert( FUZZY_OBJECT *fi, const char *inword ); void fuzzy_free_word( FUZZY_WORD *fd ); FUZZY_WORD *create_fuzzy_word( const char *input_word, int word_count ); FuzzyIndexType fuzzy_mode_value( FUZZY_OBJECT *fi ); const char *fuzzy_string( FUZZY_OBJECT *fi ); FUZZY_OBJECT *set_fuzzy_mode( FUZZY_OBJECT *fi, char *param ); FUZZY_OBJECT *get_fuzzy_mode( FUZZY_OBJECT *fi, int fuzzy ); void free_fuzzy_mode( FUZZY_OBJECT *fi ); int stemmer_applied( FUZZY_OBJECT *fi ); void dump_fuzzy_list( void ); #endif swish-e-2.4.7/src/worddata.h0000664000077100017500000000450011166010110012557 00000000000000/* $Id: worddata.h 1946 2007-10-22 14:56:35Z karpet $ This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 18:15:43 CDT 2005 ** added GPL */ #define WORDDATA_MAX_REUSABLE_PAGES 16 typedef struct WORDDATA_Reusable_Page { sw_off_t page_number; int page_size; } WORDDATA_Reusable_Page; typedef struct WORDDATA_Page { sw_off_t page_number; int used_blocks; int n; int modified; int in_use; struct WORDDATA_Page *next_cache; unsigned char data[0]; /* Page data */ } WORDDATA_Page; #define WORDDATA_CACHE_SIZE 97 typedef struct WORDDATA { WORDDATA_Page *last_put_page; /* last page after an insert (put) */ WORDDATA_Page *last_del_page; /* last page after a delete (del) */ WORDDATA_Page *last_get_page; /* last page after a read (get) */ struct WORDDATA_Page *cache[WORDDATA_CACHE_SIZE]; int page_counter; sw_off_t lastid; int num_Reusable_Pages; WORDDATA_Reusable_Page Reusable_Pages[WORDDATA_MAX_REUSABLE_PAGES]; FILE *fp; } WORDDATA; WORDDATA *WORDDATA_Open(FILE *fp); void WORDDATA_Close(WORDDATA *bt); sw_off_t WORDDATA_Put(WORDDATA *b, unsigned int len, unsigned char *data); unsigned char * WORDDATA_Get(WORDDATA *b, sw_off_t global_id, unsigned int *len); void WORDDATA_Del(WORDDATA *b, sw_off_t global_id, unsigned int *len); swish-e-2.4.7/src/swish_words.c0000775000077100017500000011142111166010110013324 00000000000000/* $Id: swish_words.c 2145 2008-06-06 02:09:42Z karpet $ This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. Mon May 9 10:57:22 CDT 2005 -- added GPL notice ** ** 2001-05-23 moseley created - replaced parser in search.c ** ** 2001-12-11 moseley, updated to deal with swish operators inside of phrases ** Still broken with regard to double-quotes inside of phrases ** Very unlikely someone would want to search for a single double quote ** within a phrase. It currently works if the double-quotes doesn't have ** white space around. Really should tag the words as being operators, or ** or "swish words", or let the backslash stay in the query until searching. ** */ #include "swish.h" #include "mem.h" #include "swstring.h" #include "search.h" #include "index.h" #include "file.h" #include "list.h" #include "hash.h" #include "stemmer.h" #include "double_metaphone.h" #include "error.h" #include "metanames.h" #include "config.h" // for _AND_WORD... //#include "search_alt.h" // for AND_WORD... humm maybe needs better organization #include "swish_words.h" static struct swline *tokenize_query_string( SEARCH_OBJECT *srch, char *words, INDEXDATAHEADER *header ); static struct swline *ignore_words_in_query(DB_RESULTS *db_results, struct swline *searchwordlist); static struct swline *fixmetanames(struct swline *); static struct swline *fixnot1(struct swline *); static struct swline *fixnot2(struct swline *); static struct swline *expandphrase(struct swline *, char); static char *isBooleanOperatorWord( char * word ); static void print_swline( char *msg, struct swline *word_list ) { #ifdef SWISH_WORDS_DEBUG struct swline *sl = word_list; printf("%s: ", msg ); while ( sl ) { printf("%s ", sl->line ); sl = sl->next; } printf("\n"); #endif } struct MOD_Swish_Words { char *word; int lenword; }; /* -- init structures for this module */ void initModule_Swish_Words (SWISH *sw) { struct MOD_Swish_Words *self; self = (struct MOD_Swish_Words *) emalloc(sizeof(struct MOD_Swish_Words)); sw->SwishWords = self; /* initialize buffers used by indexstring */ self->word = (char *) emalloc((self->lenword = MAXWORDLEN) + 1); return; } void freeModule_Swish_Words (SWISH *sw) { struct MOD_Swish_Words *self = sw->SwishWords; efree( self->word ); efree ( self ); sw->SwishWords = NULL; return; } /* Returns true if the character is a search operator */ /* this could be a macro, but gcc is probably smart enough */ static int isSearchOperatorChar( int c, int phrase_delimiter, int inphrase ) { return inphrase ? ( '*' == c || '?' == c || c == phrase_delimiter ) : ( '(' == c || ')' == c || '=' == c || '*' == c || '?' == c || c == phrase_delimiter ); } /* This simply tokenizes by whitespace and by the special characters "()=" */ /* If within a phrase, then just splits by whitespace */ /* Funny how argv was joined into a string just to be split again... */ // $$$ BUG in next_token() is that "search *" or "searh*" becomes two tokens, and then // is patched back to at the end (so "search *" becomes "search*"). static int next_token( char **buf, char **word, int *lenword, int phrase_delimiter, int inphrase ) { int i; int backslash; int leading_space = 0; **word = '\0'; /* skip any leading whitespace */ while ( **buf && isspace( (unsigned char) **buf) ) { (*buf)++; leading_space = 1; /* for catching single wild cards */ /* current parsing will joing "foo" and "*" into "foo*" */ } /* extract out word */ i = 0; backslash = 0; while ( **buf && !isspace( (unsigned char) **buf) ) { // This should be looking at swish words, not raw input //if ( i > max_size + 4 ) /* leave a little room for operators */ // progerr( "Search word exceeded maxwordlimit setting." ); /* reallocate buffer, if needed -- only if maxwordlimit was set larger than MAXWORDLEN (1000) */ if ( i == *lenword ) { *lenword *= 2; *word = erealloc(*word, *lenword + 1); } /* backslash says take next char as-is */ /* note that you cannot backslash whitespace */ if ( '\\' == **buf && ! backslash++ ) { (*buf)++; continue; } if ( backslash || !isSearchOperatorChar( (unsigned char) **buf, phrase_delimiter, inphrase ) ) { backslash = 0; (*word)[i++] = **buf; /* can't this be done in one line? */ (*buf)++; } else if ( **buf == '?' ) { /* ? is a search operator so fails first test above but we want to pass it through like a normal word char if it is not the first character in the word */ if(!i) return WILDCARD_NOT_ALLOWED_AT_WORD_START; backslash = 0; (*word)[i++] = **buf; /* can't this be done in one line? */ (*buf)++; } else /* this is a search operator char */ { if ( **word ) /* break if characters already found - end of this token */ break; /* special hacks for wild cards */ if ( **buf == '*' ) { if ( leading_space ) return UNIQUE_WILDCARD_NOT_ALLOWED_IN_WORD; if ( (*buf)[1] && !inphrase && !isspace( (unsigned char) (*buf)[1])) if ( !isSearchOperatorChar( (unsigned char) (*buf)[1], phrase_delimiter, inphrase ) ) return WILDCARD_NOT_ALLOWED_WITHIN_WORD; } (*word)[i++] = **buf; /* save the search operator char as it's own token, and end. */ (*buf)++; break; } } /* flag if we found a token */ if ( i ) { (*word)[i] = '\0'; return 1; } return 0; } static int next_swish_word(INDEXDATAHEADER *header, char **buf, char **word, int *lenword ) { int i; /* Also set flag for "?" (wildcard), in general set and at end of a term/word * At start there is never a wildcard allowed, because of sequential lookup in index * performance issue !! */ header->wordcharslookuptable[63] = 1; header->endcharslookuptable[63] = 1; /* skip non-wordchars */ while ( **buf && !header->wordcharslookuptable[tolower((unsigned char)(**buf))] ) (*buf)++; i = 0; while ( **buf && header->wordcharslookuptable[tolower((unsigned char)(**buf))] ) { /* reallocate buffer, if needed */ if ( i + 1 == *lenword ) { *lenword *= 2; *word = erealloc(*word, *lenword + 1); } (*word)[i++] = **buf; (*word)[i] = '\0'; (*buf)++; } if ( i ) { stripIgnoreLastChars( header, *word); stripIgnoreFirstChars(header, *word); return **word ? 1 : 0; } return 0; } /* Convert a word into swish words */ static struct swline *parse_swish_words( SWISH *sw, INDEXDATAHEADER *header, char *word, int max_size ) { struct swline *swish_words = NULL; char *curpos; struct MOD_Swish_Words *self = sw->SwishWords; /* Some initial adjusting of the word */ TranslateChars(header->translatecharslookuptable, (unsigned char *)word); curpos = word; while( next_swish_word( header, &curpos, &self->word, &self->lenword ) ) { /* Check Begin & EndCharacters */ if (!header->begincharslookuptable[(int) ((unsigned char) self->word[0])]) continue; if (!header->endcharslookuptable[(int) ((unsigned char) self->word[strlen(self->word) - 1])]) continue; /* limit by stopwords, min/max length, max number of digits, ... */ /* ------- processed elsewhere for search --------- if (!isokword(sw, self->word, indexf)) continue; - stopwords are processed in search.c because removing them may have side effects - maxwordlen is checked when first tokenizing for security reasons - limit by vowels, consonants and digits is not needed since search will just fail ----------- */ if ( (int)strlen( self->word ) > max_size ) { sw->lasterror = SEARCH_WORD_TOO_BIG; return NULL; } if (!*self->word) continue; /* Now stem word, if set to setm */ { FUZZY_WORD *fw = fuzzy_convert( header->fuzzy_data, self->word ); if ( fw->list_size != 2 ) { swish_words = (struct swline *) addswline( swish_words, fw->string_list[0] ); } else { /* yuck! */ swish_words = (struct swline *) addswline( swish_words, "(" ); swish_words = (struct swline *) addswline( swish_words, fw->string_list[0] ); swish_words = (struct swline *) addswline( swish_words, OR_WORD ); swish_words = (struct swline *) addswline( swish_words, fw->string_list[1] ); swish_words = (struct swline *) addswline( swish_words, ")" ); } fuzzy_free_word( fw ); } } return swish_words; } /* This is really dumb. swline needs a ->prev entry, really search needs its own linked list */ /* Replaces a given node with another node (or nodes) */ static void replace_swline( struct swline **original, struct swline *entry, struct swline *new_words ) { struct swline *temp; temp = *original; /* check for case of first one */ if ( temp == entry ) { if ( new_words ) { new_words->other.nodep->next = temp->next; new_words->other.nodep = temp->other.nodep; *original = new_words; } else /* just delete first node */ { if ( entry->next ) entry->next->other.nodep = entry->other.nodep; /* point next one to last one */ *original = entry->next; } } else /* not first node */ { /* search for the preceeding node */ for ( temp = *original; temp && temp->next != entry; temp = temp->next ); if ( !temp ) progerr("Fatal Error: Failed to find insert point in replace_swline"); if ( new_words ) { if(!entry->next) /* Adding at the end. So, fix the last one */ (*original)->other.nodep = new_words->other.nodep; /* set the previous record to point to the start of the new entry (or entries) */ temp->next = new_words; /* set the end of the new string to point to the next entry */ new_words->other.nodep->next = entry->next; } else /* delete the entry */ { temp->next = temp->next->next; if(!temp->next) /* Adding at the end. So, fix the last one */ (*original)->other.nodep = temp; } } /* now free the removed item */ efree( entry ); } static int checkbuzzword(INDEXDATAHEADER *header, char *word ) { if ( !header->hashbuzzwordlist.count ) return 0; /* only strip when buzzwords are being used since stripped again as a "swish word" */ stripIgnoreLastChars( header, word ); stripIgnoreFirstChars( header, word ); if ( !*word ) /* stripped clean? */ return 0; return (int)is_word_in_hash_table( header->hashbuzzwordlist, word ); } /* I hope this doesn't live too long */ static void fudge_wildcard( struct swline **original, struct swline *entry ) { struct swline *wild_card, *new; wild_card = entry->next; /* New entry */ new = newswline_n(entry->line, strlen( entry->line ) + strlen(wild_card->line)); strcat( new->line, wild_card->line); /* Change entry by new */ new->other.nodep = new; // Group of 1 node (last is itself) replace_swline(original,entry,new); /* remove wild_card */ replace_swline(original,wild_card,(struct swline *)NULL); } /* Converts an operator word into an operator */ static char *isBooleanOperatorWord( char * word ) { /* don't need strcasecmp here, since word should alrady be lowercase -- need to check alt-search first */ if (!strcasecmp( word, _AND_WORD)) return AND_WORD; if (!strncasecmp( word, _NEAR_WORD, strlen(_NEAR_WORD))) return NEAR_WORD; if (!strcasecmp( word, _OR_WORD)) return OR_WORD; if (!strcasecmp( word, _NOT_WORD)) return NOT_WORD; return (char *)NULL; } /* This "fixes" the problem of showing operators in Parsed Words as */ /* Really, it's probably a much better way to display Parsed Words, but that's not the way it was first created */ /* Converts an operator into a string */ static char *isBooleanOperator( char * word ) { if (!strcasecmp( word, AND_WORD)) return _AND_WORD; if (!strncasecmp( word, NEAR_WORD, strlen(NEAR_WORD))) return NEAR_WORD; if (!strcasecmp( word, OR_WORD)) return _OR_WORD; if (!strcasecmp( word, NOT_WORD)) return _NOT_WORD; return (char *)NULL; } /* Simply replace with "and" */ /* it's required that the replacement string is <= to inital string. */ static void switch_back_operators( struct swline *sl ) { char *operator; while ( sl ) { if ( (operator = isBooleanOperator( sl->line )) ) strcpy( sl->line, operator ); sl = sl->next; } } static struct swline *tokenize_query_string( SEARCH_OBJECT *srch, char *words, INDEXDATAHEADER *header ) { char *curpos; /* current position in the words string */ struct swline *tokens = NULL; struct swline *temp; struct swline *new; struct swline *swish_words; struct swline *next_node; SWISH *sw = srch->sw; struct MOD_Swish_Words *self = sw->SwishWords; unsigned char PhraseDelimiter; int max_size; int inphrase = 0; int rc; PhraseDelimiter = (unsigned char) srch->PhraseDelimiter; max_size = header->maxwordlimit; curpos = words; /* split into words by whitespace and by the swish operator characters */ while ( (rc = next_token( &curpos, &self->word, &self->lenword, PhraseDelimiter, inphrase )) ) { /* catch single wild card early */ if ( rc < 0 ) { sw->lasterror = rc; return NULL; } tokens = (struct swline *) addswline( tokens, self->word ); if ( self->word[0] == PhraseDelimiter && !self->word[1] ) inphrase = !inphrase; } /* no search words found */ if ( !tokens ) return NULL; inphrase = 0; temp = tokens; while ( temp ) { /* do look-ahead processing first -- metanames */ if ( !inphrase && isMetaNameOpNext(temp->next) ) { if( !getMetaNameByName( header, temp->line ) ) { set_progerr( UNKNOWN_METANAME, sw, "'%s'", temp->line ); freeswline( tokens ); return NULL; } /* this might be an option with XML */ strtolower( temp->line ); temp = temp->next; continue; } /* skip operators */ if ( strlen( temp->line ) == 1 && isSearchOperatorChar( (unsigned char) temp->line[0], PhraseDelimiter, inphrase ) ) { if ( temp->line[0] == PhraseDelimiter && !temp->line[1] ) inphrase = !inphrase; temp = temp->next; continue; } /* this might be an option if case sensitive searches are used */ strtolower( temp->line ); /* check Boolean operators -- and replace with the operator string */ if ( !inphrase ) { char *operator, *nextoperator; char nearop[100]; if ( (operator = isBooleanOperatorWord( temp->line )) ) { /* replace the common "and not" with simply not" */ /* probably not the best place to do this level of processing */ /* since should also check for things like "and this" and "and and and not this" */ /* should probably be moved to end and recursively check for these (to catch "and and not") */ if ( temp->next && ( strcmp( operator, AND_WORD ) == 0) && ( (nextoperator = isBooleanOperatorWord( temp->next->line))) && ( strcmp( nextoperator, NOT_WORD ) == 0) ) { struct swline *andword = temp; /* save position of entry to remove */ temp = temp->next; /* now point to "not" word */ operator = nextoperator; /* Remove the "and" word */ replace_swline( &tokens, andword, (struct swline *)NULL ); /* cut it out */ } strcpy(nearop, operator); if (!strncasecmp( operator, NEAR_WORD, strlen(NEAR_WORD))) strcat(nearop, temp->line + strlen(_NEAR_WORD)); /* Replace the string with the operator string */ new = newswline(nearop); new->other.nodep = new; // Group of 1 node (last is itself) replace_swline( &tokens, temp, new ); /* change it */ temp = new->next; continue; } } /* buzzwords */ if ( checkbuzzword( header, temp->line ) ) { temp = temp->next; continue; } /* query words left. Turn into "swish_words" */ swish_words = NULL; swish_words = parse_swish_words( sw, header, temp->line, max_size); if ( sw->lasterror ) return NULL; next_node = temp->next; /* move into list.c at some point */ replace_swline( &tokens, temp, swish_words ); temp = next_node; } /* fudge wild cards back onto preceeding word */ /* $$$ This is broken because a query of "foo *" ends up "foo*" */ /* Now almost fixed "foo *" is an error, but */ /* Also doesn't check for an operator followed by "*" */ for ( temp = tokens ; temp; ) if ( temp->next && strcmp( temp->next->line, "*") == 0 ) { next_node = (temp->next)->next; fudge_wildcard( &tokens, temp ); temp = next_node; } else temp = temp->next; return tokens; } /********************************************************************************** * parse_swish_query -- convert a string in a SEARCH_OBJECT into a list of tokens * * Pass in: * db_results - container for the search results for a single index * * Returns: * false on error * sets sw->lasterror on fatal errors. * * but some errors are cleared and set in the search object to allow * processing to continue. For example, when searching multiple index * files one index may end up removing all the words in a query (if they * are all stopwords) where another index will not have the same stopword * list and produce a query. Room for improvement, of course. * * Notes: * calls tokenize_query_string() as first pass * ignore_words_in_query() to remove stop words (could be combined with tokenize, I think -- but removing stop words is a bit tricky) * expandpharse() prepares for phrase searching * fixmetanames() fixnot1() and fixnot2() make other adjustments * * Clearly, a better query parser is in order. * * Sep 29, 2002 - moseley * ***********************************************************************************/ struct swline *parse_swish_query( DB_RESULTS *db_results ) { struct swline *searchwordlist; IndexFILE *indexf = db_results->indexf; SEARCH_OBJECT *srch = db_results->srch; /* tokenize the query into swish words based on this current index */ /* returns false if no words or error */ /* may set sw->lasterror on unknown metanames or word too big */ if (!(searchwordlist = tokenize_query_string(srch, srch->query, &indexf->header))) return NULL; print_swline("after tokenize", searchwordlist ); /* Remove stopwords from the query -- also sets db_results->removed_stopwords */ /* This can set QUERY_SYNTAX_ERROR which should abort */ /* WORDS_TOO_COMMON & NO_WORDS_IN_SEARCH should be used if no index files are searched */ /* This is a bit ugly */ searchwordlist = ignore_words_in_query(db_results, searchwordlist); if ( !searchwordlist || srch->sw->lasterror ) { if ( searchwordlist ) freeswline( searchwordlist ); return NULL; } db_results->parsed_words = dupswline(searchwordlist); /* see notes in this function why this is done */ switch_back_operators( db_results->parsed_words ); /* Now hack up the query for searh processing */ /* $$$ please fix this! Let's get a real parser */ /* Expand phrase search: "kim harlow" becomes (kim PHRASE_WORD harlow) */ searchwordlist = expandphrase(searchwordlist, (char)srch->PhraseDelimiter); searchwordlist = fixmetanames(searchwordlist); searchwordlist = fixnot1(searchwordlist); searchwordlist = fixnot2(searchwordlist); print_swline("Final", searchwordlist ); return searchwordlist; } static int isrule(char *word) { if (!strcmp(word, AND_WORD) || !strncmp(word, NEAR_WORD, strlen(NEAR_WORD)) || !strcmp(word, OR_WORD) || !strcmp(word, NOT_WORD)) return 1; else return 0; } static int isnotrule(char *word) { if (!strcmp(word, NOT_WORD)) return 1; else return 0; } /****************************************************************************** * Remove the stop words from the tokenized query * rewritten Nov 24, 2001 - moseley * Still horrible! Need a real parse tree. *******************************************************************************/ static struct swline *ignore_words_in_query(DB_RESULTS *db_results, struct swline *searchwordlist) { IndexFILE *indexf = db_results->indexf; SEARCH_OBJECT *srch = db_results->srch; SWISH *sw = srch->sw; struct swline *cur_token = searchwordlist; struct swline *prev_token = NULL; struct swline *prev_prev_token = NULL; // for removing two things int in_phrase = 0; int word_count = 0; /* number of search words found */ int paren_count = 0; int stop_word_removed = 0; unsigned char phrase_delimiter = (unsigned char)srch->PhraseDelimiter; while ( cur_token ) { int remove = 0; char first_char = cur_token->line[0]; if ( cur_token == searchwordlist ) { prev_token = prev_prev_token = NULL; word_count = 0; paren_count = 0; in_phrase = 0; } while ( 1 ) // so we can use break. { /* Can't backslash here -- (because this code should really be include in swish_words.c) */ if ( first_char == phrase_delimiter ) { in_phrase = !in_phrase; if ( !in_phrase && prev_token && prev_token->line[0] == phrase_delimiter ) remove = 2; break; } /* leave everything alone inside a pharse */ if ( in_phrase ) { if ( is_word_in_hash_table( indexf->header.hashstoplist, cur_token->line ) ) { db_results->removed_stopwords = addswline( db_results->removed_stopwords, cur_token->line ); stop_word_removed++; remove = 1; } else word_count++; break; } /* Allow operators */ if ( first_char == '=' ) break; if ( first_char == '(' ) { paren_count++; break; } if ( first_char == ')' ) { paren_count--; if ( prev_token && prev_token->line[0] == '(' ) remove = 2; break; } /* Allow all metanames */ if ( isMetaNameOpNext(cur_token->next) ) break; /* Look for AND OR NOT - remove AND OR at start, and remove second of doubles */ if ( isrule(cur_token->line) ) { if ( prev_token ) { /* remove double tokens */ if ( isrule(prev_token->line ) ) remove = 1; } /* allow NOT at the start */ else if ( !isnotrule(cur_token->line) ) remove = 1; break; } /* is the token of an ok length to consider? treat min/max length like stopwords */ if ( strlen(cur_token->line) < indexf->header.minwordlimit || strlen(cur_token->line) > indexf->header.maxwordlimit ) { db_results->removed_stopwords = addswline( db_results->removed_stopwords, cur_token->line ); stop_word_removed++; remove = 1; } /* Finally, is it a stop word? */ if ( is_word_in_hash_table( indexf->header.hashstoplist, cur_token->line ) ) { db_results->removed_stopwords = addswline( db_results->removed_stopwords, cur_token->line ); stop_word_removed++; remove = 1; } else word_count++; break; } /* Catch dangling metanames */ if ( !remove && !cur_token->next && isMetaNameOpNext( cur_token ) ) remove = 2; if ( remove ) { struct swline *tmp = cur_token; if ( cur_token == searchwordlist ) // we are removing first token searchwordlist = cur_token->next; else { prev_token->next = cur_token->next; // remove one in the middle cur_token = prev_token; // save if remove == 2 } efree( tmp ); if ( remove == 2 ) { tmp = cur_token; if ( cur_token == searchwordlist ) // we are removing first token searchwordlist = cur_token->next; else prev_prev_token->next = cur_token->next; // remove one in the middle efree( tmp ); } /* start at the beginning again */ cur_token = searchwordlist; continue; } if ( prev_token ) prev_prev_token = prev_token; prev_token = cur_token; cur_token = cur_token->next; } if ( in_phrase || paren_count ) sw->lasterror = QUERY_SYNTAX_ERROR; else if ( !word_count ) sw->lasterror = stop_word_removed ? WORDS_TOO_COMMON : NO_WORDS_IN_SEARCH; return searchwordlist; } /* 2001-09 jmruiz - Rewriting ** This puts parentheses in the right places around meta searches ** to avoid problems whith them. Basically "metaname = bla" ** becomes "(metanames = bla)" */ static struct swline *fixmetanames(struct swline *sp) { int metapar; struct swline *tmpp, *newp; tmpp = sp; newp = NULL; /* Fix metanames with parenthesys eg: metaname = bla => (metanames = bla) */ while (tmpp != NULL) { if (isMetaNameOpNext(tmpp->next)) { /* If it is a metaName add the name and = and skip to next */ newp = (struct swline *) addswline(newp, "("); newp = (struct swline *) addswline(newp, tmpp->line); newp = (struct swline *) addswline(newp, "="); tmpp = tmpp->next; tmpp = tmpp->next; if ( !tmpp ) return NULL; /* no more words! */ /* 06/00 Jose Ruiz ** Fix to consider parenthesys in the ** content of a MetaName */ if (tmpp->line[0] == '(') { metapar = 1; newp = (struct swline *) addswline(newp, tmpp->line); tmpp = tmpp->next; while (metapar && tmpp) { if (tmpp->line[0] == '(') metapar++; else if (tmpp->line[0] == ')') metapar--; newp = (struct swline *) addswline(newp, tmpp->line); if (metapar) tmpp = tmpp->next; } if (!tmpp) return (newp); } else newp = (struct swline *) addswline(newp, tmpp->line); newp = (struct swline *) addswline(newp, ")"); } else newp = (struct swline *) addswline(newp, tmpp->line); /* next one */ tmpp = tmpp->next; } freeswline(sp); return newp; } /* 2001 -09 jmruiz Rewritten ** This optimizes some NOT operator to be faster. ** ** "word1 not word" is changed by "word1 and_not word2" ** ** In the old way the previous query was... ** get results if word1 ** get results of word2 ** not results of word2 (If we have 100000 docs and word2 is in ** just 3 docs, this means read 99997 ** results) ** intersect both list of results ** ** The "new way" ** get results if word1 ** get results of word2 ** intersect (and_not_rule) both lists of results ** */ static struct swline *fixnot1(struct swline *sp) { struct swline *tmpp, *prev, *new; if (!sp) return NULL; /* 06/00 Jose Ruiz - Check if first word is NOT_RULE */ /* Change remaining NOT by AND_NOT_RULE */ for (tmpp = sp, prev = NULL; tmpp; prev = tmpp, tmpp = tmpp->next) { if (tmpp->line[0] == '(') continue; else if (isnotrule(tmpp->line)) { if(prev && prev->line[0]!='=' && prev->line[0]!='(') { new = newswline(AND_NOT_WORD); new->other.nodep = new; // group of 1 node replace_swline(&sp, tmpp, new); tmpp = new; } } } return sp; } /* 2001 -09 jmruiz - Totally new - Fix the meta=(not ahsg) bug ** Add parentheses to avoid the way operator NOT confuse complex queries */ static struct swline *fixnot2(struct swline *sp) { int openparen, found; struct swline *tmpp, *newp; char *magic = MAGIC_NOT_WORD; /* magic avoids parsing the ** "not" operator twice ** and put the code in an ** endless loop */ found = 1; while(found) { openparen = 0; found = 0; for (tmpp = sp , newp = NULL; tmpp ; tmpp = tmpp->next) { if (isnotrule(tmpp->line)) { found = 1; /* Add parentheses */ newp = (struct swline *) addswline(newp, "("); /* Change "NOT" by magic to avoid find it in next iteration */ newp = (struct swline *) addswline(newp, magic); for(tmpp = tmpp->next; tmpp; tmpp = tmpp->next) { if ((tmpp->line)[0] == '(') openparen++; else if(!openparen) { newp = (struct swline *) addswline(newp, tmpp->line); /* Add parentheses */ newp = (struct swline *) addswline(newp, ")"); break; } else if ((tmpp->line)[0] == ')') openparen--; newp = (struct swline *) addswline(newp, tmpp->line); } if(!tmpp) break; } else newp = (struct swline *) addswline(newp, tmpp->line); } freeswline(sp); sp = newp; } /* remove magic and put the "real" NOT in place */ for(tmpp = newp; tmpp ; tmpp = tmpp->next) { if(!strcmp(tmpp->line,magic)) { strcpy(tmpp->line, NOT_WORD); } } return newp; } /* expandstar removed - Jose Ruiz 04/00 */ /* Expands phrase search. Berkeley University becomes Berkeley PHRASE_WORD University */ /* It also fixes the and, not or problem when they appeared inside a phrase */ static struct swline *expandphrase(struct swline *sp, char delimiter) { struct swline *tmp, *newp; int inphrase; if (!sp) return NULL; inphrase = 0; newp = NULL; tmp = sp; while (tmp != NULL) { if ((tmp->line)[0] == delimiter) { if (inphrase) { inphrase = 0; newp = (struct swline *) addswline(newp, ")"); } else { inphrase++; newp = (struct swline *) addswline(newp, "("); } } else { if (inphrase) { if (inphrase > 1) newp = (struct swline *) addswline(newp, PHRASE_WORD); inphrase++; newp = (struct swline *) addswline(newp, tmp->line); } else newp = (struct swline *) addswline(newp, tmp->line); } tmp = tmp->next; } freeswline(sp); return newp; } /* These 2 routines fix the problem when a word ends with mutiple ** IGNORELASTCHAR's (eg, qwerty'. ). The old code correctly deleted ** the ".", but didn't check if the new last character ("'") is also ** an ignore character. */ void stripIgnoreLastChars(INDEXDATAHEADER *header, char *word) { int k,j,i = strlen(word); /* Get rid of specified last char's */ /* for (i=0; word[i] != '\0'; i++); */ /* Iteratively strip off the last character if it's an ignore character */ while ((i > 0) && (isIgnoreLastChar(header, word[--i]))) { word[i] = '\0'; /* We must take care of the escaped characeters */ /* Things like hello\c hello\\c hello\\\c can appear */ for(j=0,k=i-1;k>=0 && word[k]=='\\';k--,j++); /* j contains the number of \ */ if(j%2) /* Remove the escape if even */ { word[--i]='\0'; } } } void stripIgnoreFirstChars(INDEXDATAHEADER *header, char *word) { int j, k; int i = 0; /* Keep going until a char not to ignore is found */ /* We must take care of the escaped characeters */ /* Things like \chello \\chello can appear */ while (word[i]) { if(word[i]=='\\') /* Jump escape */ k=i+1; else k=i; if(!word[k] || !isIgnoreFirstChar(header, word[k])) break; else i=k+1; } /* If all the char's are valid, just return */ if (0 == i) return; else { for (k = i, j = 0; word[k] != '\0'; j++, k++) { word[j] = word[k]; } /* Add the NULL */ word[j] = '\0'; } } swish-e-2.4.7/src/swstring.h0000664000077100017500000000617511166010110012644 00000000000000/* ** Copyright (C) 1995, 1996, 1997, 1998 Hewlett-Packard Company ** Originally by Kevin Hughes, kev@kevcom.com, 3/11/94 ** This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 18:15:43 CDT 2005 ** added GPL ** ** 2001-02-22 rasc fixed macros (unsigned char) */ #ifndef STRING_H #define STRING_H 1 #define CASE_SENSITIVE_ON 1 #define CASE_SENSITIVE_OFF 0 char *lstrstr (char *, char *); char *getconfvalue (char *, char *); int isoksuffix (char *filename, struct swline *rulelist); char *replace (char *, char *, char *); char *SafeStrCopy (char *,char *, int *); void sortstring (char *); char *mergestrings (char *,char *); void makelookuptable (char * ,int *); void makeallstringlookuptables (SWISH *); /* 06/00 Jose Ruiz ** Macros iswordchar, isvowel */ #define iswordchar(header,c) header.wordcharslookuptable[tolower((unsigned char)(c))] #define isvowel(sw,c) sw->isvowellookuptable[tolower((unsigned char)(c))] /* #define isindexchar(header,c) header.indexcharslookuptable[c] indexchars stuff removed */ /* Functions for comparing integers for qsort */ int icomp2 (const void *,const void *); /* 06/00 Jose Ruiz ** Function to parse a line into a StringList */ StringList *parse_line (char *); /* 06/00 ** Function to free memory used by a StringList */ void freeStringList (StringList *); int isnumstring (unsigned char*); void remove_newlines (char*); void remove_tags (char*); unsigned char *bin2string(unsigned char *,int); char *strtolower (char *str); #define makeItLow(a) strtolower ((a)) /* map old name to new $$$ */ char *str_skip_ws (char *s); void str_trim_ws(char *string); char charDecode_C_Escape (char *s, char **se); /* ISO-Routines */ unsigned char char_ISO_normalize (unsigned char c); char *str_ISO_normalize (char *s); unsigned char *StringListToString(StringList *sl,int n); int BuildTranslateChars (int trlookup[], unsigned char *from, unsigned char *to); unsigned char *TranslateChars (int trlookup[], unsigned char *s); char *str_basename (char *path); char *cstr_basename (char *path); char *cstr_dirname (char *path); char *estrdup (char *str); char *estrndup (char *str, size_t n); char *estrredup (char *s1, char *s2); const char *comma_long( unsigned long u ); /* Make life easy for now */ #include "swregex.h" #endif /* STRING_H */ swish-e-2.4.7/src/swish.c0000664000077100017500000014075611166010110012120 00000000000000/* $Id: swish.c 2291 2009-03-31 01:56:00Z karpet $ ** ** Swish Originally by Kevin Hughes, kev@kevcom.com, 3/11/94 ** ** Mon May 9 11:07:38 CDT 2005 ** like swish2.c -- how much of this is really Kevin's original work? ** This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. Mon May 9 10:57:22 CDT 2005 -- added GPL notice */ #include // for ULONG_MAX #include "swish.h" #include "swstring.h" #include "mem.h" #include "error.h" #include "list.h" #include "search.h" #include "index.h" #include "file.h" #include "http.h" #include "merge.h" #include "docprop.h" #include "hash.h" #include "entities.h" #include "filter.h" /* #include "search_alt.h" */ #include "result_output.h" #include "result_sort.h" #include "db.h" #include "fs.h" #include "swish_words.h" #include "extprog.h" #include "metanames.h" #include "proplimit.h" #include "parse_conffile.h" #include "date_time.h" #include "dump.h" #include "keychar_out.h" #ifdef HAVE_ZLIB #include #endif #include "headers.h" #include "stemmer.h" /* ** This array has pointers to all the indexing data source ** structures */ extern struct _indexing_data_source_def *data_sources[]; typedef struct { char *name; unsigned int bit; char *description; } DEBUG_MAP; static DEBUG_MAP debug_map[] = { /* These dump data from the index file */ {"INDEX_HEADER", DEBUG_INDEX_HEADER, "Show the headers from the index"}, {"INDEX_WORDS", DEBUG_INDEX_WORDS, "List words stored in index"}, {"INDEX_WORDS_ONLY", DEBUG_INDEX_WORDS_ONLY, "List only words, one per line, stored in index"}, {"INDEX_WORDS_META", DEBUG_INDEX_WORDS_META, "List only words and associated metaID separated by a tab"}, {"INDEX_WORDS_FULL", DEBUG_INDEX_WORDS_FULL, "List words stored in index (more verbose)"}, {"INDEX_STOPWORDS", DEBUG_INDEX_STOPWORDS, "List stopwords stored in index"}, {"INDEX_FILES", DEBUG_INDEX_FILES, "List file data stored in index"}, {"INDEX_WORD_COUNT", DEBUG_INDEX_WORD_COUNT, "List number of words in all files"}, {"INDEX_METANAMES", DEBUG_INDEX_METANAMES, "List metaname table stored in index"}, {"INDEX_ALL", DEBUG_INDEX_ALL, "Dump data ALL above data from index file"}, {"LIST_FUZZY_MODES", DEBUG_LIST_FUZZY, "List fuzzy options available\n\n-- indexing --\n"}, /* These trace indexing */ {"INDEXED_WORDS", DEBUG_WORDS, "Display words as they are indexed"}, {"PARSED_WORDS", DEBUG_PARSED_WORDS, "Display words as they are parsed from source"}, {"PROPERTIES", DEBUG_PROPERTIES, "Display properties associted with each file as they are indexed"}, {"REGEX", DEBUG_REGEX, "Debug regular expression processing"}, {"PARSED_TAGS", DEBUG_PARSED_TAGS, "Show meta tags as they are found"}, {"PARSED_TEXT", DEBUG_PARSED_TEXT, "Show text as it's parsed"}, }; /* Parameters read from the command line, that are not stored in *SWISH */ typedef struct { CMD_MODE run_mode; /* selected run mode. Default is MODE_SEARCH */ char keychar; /* for dumping words */ /* Search related params */ char *query; /* Query string */ int PhraseDelimiter; /* Phrase delimiter char */ int structure; /* Structure for limiting to HTML tags */ struct swline *sort_params; /* sort properties */ LIMIT_PARAMS *limit_params; /* for storing -L command line settings */ int query_len; /* length of buffer */ struct swline *disp_props; /* extra display props */ int beginhits; /* starting hit number */ int maxhits; /* total hits to display */ struct swline *conflist; /* Configuration file list */ int hasverbose; /* flag if -v was used */ int index_read_only; /* flag to not allow indexing or merging */ int swap_mode; char *merge_out_file; /* the output file for merge */ } CMDPARAMS; /************* TOC ***************************************/ static CMDPARAMS *new_swish_params(void); static void printTime(double time); static void get_command_line_params(SWISH *sw, char **argv, CMDPARAMS *params ); static void free_command_line_params( CMDPARAMS *params ); static unsigned int isDebugWord(char *word, CMDPARAMS *params ); static void printversion(); static void usage(); static int check_readonly_mode( char * ); static void cmd_dump( SWISH *sw, CMDPARAMS *params ); static void cmd_index( SWISH *sw, CMDPARAMS *params ); static void cmd_merge( SWISH *sw, CMDPARAMS *params ); static void cmd_search( SWISH *sw, CMDPARAMS *params ); static void cmd_keywords( SWISH *sw, CMDPARAMS *params ); static void write_index_file( SWISH *sw, int process_stopwords, double elapsedStart, double cpuStart, int merge); static char **fetch_search_params(SWISH *sw, char **argv, CMDPARAMS *params, char switch_char ); static char **fetch_indexing_params(SWISH *sw, char **argv, CMDPARAMS *params, char switch_char ); static void display_result_headers( RESULTS_OBJECT *results ); static void swline_header_out( SWISH *sw, int v, char *desc, struct swline *sl ); static SWISH *swish_new(); static void swish_close(SWISH * sw); struct _indexing_data_source_def *IndexingDataSource; /************* TOC ***************************************/ int main(int argc, char **argv) { SWISH *sw; CMDPARAMS *params; setlocale(LC_ALL, ""); /* Start a session */ sw = swish_new(); /* Get swish handle */ /* By default we are set up to use the first data source in the list */ /* I don't like this. modules.c would fix this */ IndexingDataSource = data_sources[0]; params = new_swish_params(); get_command_line_params(sw, argv, params ); switch( params->run_mode ) { case MODE_DUMP: cmd_dump( sw, params ); /* first so will override */ break; case MODE_MERGE: cmd_merge( sw, params ); break; case MODE_INDEX: case MODE_UPDATE: case MODE_REMOVE: cmd_index( sw, params ); break; case MODE_SEARCH: cmd_search( sw, params ); break; case MODE_WORDS: cmd_keywords( sw ,params ); /* -k setting */ break; default: progerr("Invalid operation mode '%d'", (int)params->run_mode); } free_command_line_params( params ); swish_close(sw); Mem_Summary("At end of program", 1); exit(0); return 0; } /* Prints the running time (the time it took for indexing). */ static void printTime(double time) { int hh, mm, ss; int delta; delta = (int) (time + 0.5); ss = delta % 60; delta /= 60; hh = delta / 60; mm = delta % 60; printf("%02d:%02d:%02d", hh, mm, ss); } /* Prints the SWISH usage. */ static void usage() { const char *defaultIndexingSystem = ""; printf(" usage:\n"); printf(" swish [-e] [-i dir file ... ] [-S system] [-c file] [-f file] [-l] [-v (num)]\n"); printf(" swish -w word1 word2 ... [-f file1 file2 ...] \\\n"); printf(" [-P phrase_delimiter] [-p prop1 ...] [-s sortprop1 [asc|desc] ...] \\\n"); printf(" [-m num] [-t str] [-d delim] [-H (num)] [-x output_format] \\\n"); printf(" [-R rank_scheme] [-L prop low high] [-a]\n"); printf(" swish -k (char|*) [-f file1 file2 ...]\n"); printf(" swish -M index1 index2 ... outputfile\n"); printf(" swish -N /path/to/compare/file\n"); printf(" swish -V\n"); putchar('\n'); printf("options: defaults are in brackets\n"); printf(" -a : return raw (unscaled) rank scores in swishrank PropertyName\n"); printf(" -b : begin results at this number\n"); printf(" -c : configuration file(s) to use for indexing\n"); printf(" -d : next param is delimiter.\n"); printf(" -E : Append errors to file specified, or stderr if file not specified.\n"); printf(" -e : \"Economic Mode\": The index proccess uses less RAM.\n"); printf(" -f : index file to create or file(s) to search from [%s]\n", INDEXFILE); printf(" -H : \"Result Header Output\": verbosity (0 to 9) [1].\n"); printf(" -i : create an index from the specified files\n"); #ifdef ALLOW_FILESYSTEM_INDEXING_DATA_SOURCE printf(" for \"-S fs\" - specify a list of files or directories\n"); #endif #ifdef ALLOW_HTTP_INDEXING_DATA_SOURCE printf(" for \"-S http\" - specify a list of URLs\n"); #endif #ifdef ALLOW_EXTERNAL_PROGRAM_DATA_SOURCE printf(" for \"-S prog\" - specify a list of programs or the string \"stdin\"\n"); #endif printf(" -k : Print words starting with a given char.\n"); printf(" -l : follow symbolic links when indexing\n"); printf(" -L : Limit results to a range of property values\n"); printf(" -M : merges index files\n"); printf(" -m : the maximum number of results to return [defaults to all results]\n"); printf(" -N : index only files with a modification date newer than path supplied\n"); printf(" -P : next param is Phrase delimiter.\n"); printf(" -p : include these document properties in the output \"prop1 prop2 ...\"\n"); printf(" -R : next param is Rank Scheme number (0 to 2) [0].\n"); #ifdef USE_BTREE printf(" -r : remove: remove files from index\n"); #endif printf(" -S : specify which indexing system to use.\n"); printf(" Valid options are:\n"); #ifdef ALLOW_FILESYSTEM_INDEXING_DATA_SOURCE printf(" \"fs\" - index local files in your File System\n"); if (!*defaultIndexingSystem) defaultIndexingSystem = "fs"; #endif #ifdef ALLOW_HTTP_INDEXING_DATA_SOURCE printf(" \"http\" - index web site files using a web crawler\n"); if (!*defaultIndexingSystem) defaultIndexingSystem = "http"; #endif #ifdef ALLOW_EXTERNAL_PROGRAM_DATA_SOURCE printf(" \"prog\" - index files supplied by an external program\n"); if (!*defaultIndexingSystem) defaultIndexingSystem = "http"; #endif printf(" The default value is: \"%s\"\n", defaultIndexingSystem); printf(" -s : sort by these document properties in the output \"prop1 prop2 ...\"\n"); printf(" -T : Trace options ('-T help' for info)\n"); printf(" -t : tags to search in - specify as a string\n"); printf(" \"HBthec\" - in Head|Body|title|header|emphasized|comments\n"); #ifdef USE_BTREE printf(" -u : update: adds files to existing index\n"); #endif printf(" -V : prints the current version\n"); printf(" -v : indexing verbosity level (0 to 3) [-v %d]\n", VERBOSE); printf(" -w : search for words \"word1 word2 ...\"\n"); printf(" -W : next param is ParserWarnLevel [-W 2]\n"); printf(" -x : \"Extended Output Format\": Specify the output format.\n"); printf("\n"); printf("version: %s\n docs: http://swish-e.org\n Scripts and Modules at: (libexecdir) = %s\n", VERSION, get_libexec()); exit(1); } static void printversion() { printf("SWISH-E %s\n", VERSION ); exit(0); } /* -- init swish structure */ /************************************************************************* * swish_new -- create a general purpose swish structure * * Note that initModule_* code is called even when it's not going to be used * (e.g. initModule_HTTP is called when searching). * * **************************************************************************/ static SWISH *swish_new() { SWISH *sw = SwishNew(); /* Additional modules needed for indexin (which we are not sure about yet... */ initModule_ResultSort(sw); initModule_Filter(sw); initModule_Entities(sw); /* used only by the old HTML parser -- not long to live */ initModule_Index(sw); initModule_FS(sw); initModule_HTTP(sw); initModule_Prog(sw); return (sw); } /************************************************************************* * swish_close -- free up a general purpose swish structure * * NOTE: ANY CHANGES HERE SHOULD ALSO BE MADE IN swish2.c:SwishClose() * * SwishClose is search related only * **************************************************************************/ static void swish_close(SWISH * sw) { if (!sw) return; free_swish_memory(sw); /* Free specific data related to indexing */ freeModule_Filter(sw); freeModule_Entities(sw); freeModule_Index(sw); freeModule_ResultSort(sw); freeModule_FS(sw); freeModule_HTTP(sw); freeModule_Prog(sw); /* Free ReplaceRules regular expressions */ free_regex_list(&sw->replaceRegexps); /* Free ExtractPath list */ free_Extracted_Path(sw); /* FileRules?? $$$ */ /* meta name for ALT tags */ if ( sw->IndexAltTagMeta ) { efree( sw->IndexAltTagMeta ); sw->IndexAltTagMeta = NULL; } freeSwishConfigOptions( sw ); // should be freeConfigOptions( sw->config ) efree(sw); } /************************************************************************* * Deal with -T debug options * * **************************************************************************/ static unsigned int isDebugWord(char *word, CMDPARAMS *params) { int i, help; help = strcasecmp(word, "help") == 0; if (help) printf("\nAvailable debugging options for swish-e:\n"); for (i = 0; i < (int)(sizeof(debug_map) / sizeof(debug_map[0])); i++) if (help) printf(" %20s => %s\n", debug_map[i].name, debug_map[i].description); else if (strcasecmp(debug_map[i].name, word) == 0) { if (strncasecmp(word, "INDEX_", 6) == 0) params->run_mode = MODE_DUMP; return debug_map[i].bit; } if (help) exit(1); return 0; } /************************************************************************* * Initialize the swish command parameters * * Call with: * void * * Returns: * pointer to CMDPARAMS * * To Do: * The swish parameters probably should be groupped by switches and * by config file (and maybe someday also by directory or path or * content-type) and then merged. * **************************************************************************/ static CMDPARAMS *new_swish_params() { CMDPARAMS *params = (CMDPARAMS *)emalloc( sizeof( CMDPARAMS ) ); memset( params, 0, sizeof( CMDPARAMS ) ); params->run_mode = MODE_SEARCH; /* default run mode */ params->PhraseDelimiter = PHRASE_DELIMITER_CHAR; params->structure = IN_FILE; return params; } /************************************************************************* * Free the swish command parameters * * Call with: * *CMDPARAMS * * Returns: * void * * To Do: * The swish parameters probably should be groupped by switches and * by config file (and maybe someday also by directory or path or * content-type) and then merged. * **************************************************************************/ static void free_command_line_params( CMDPARAMS *params ) { if ( params->disp_props ) freeswline( params->disp_props ); if ( params->conflist ) freeswline( params->conflist ); if ( params->query ) efree( params->query ); if ( params->sort_params ) freeswline( params->sort_params ); if ( params->limit_params ) ClearLimitParams( params->limit_params ); efree( params ); } /************************************************************************* * Just checks if there is a next word * Three helper fuctions - to be replaced by better command parsing soon... **************************************************************************/ static char *is_another_param( char **argv ) { return ( *(argv + 1) && *(argv + 1)[0] != '-' ) ? *(argv + 1) : NULL; } static char *next_param( char ***argv ) { char *c; if ( ( c = is_another_param( *argv ) ) ) { (*argv)++; return c; } return NULL; } static int get_param_number(char ***argv, char c ) { char *badchar; long num; char *string = next_param( argv ); if ( !string ) progerr(" '-%c' requires a positive integer.", c ); num = strtol( string, &badchar, 10 ); // would base zero be more flexible? if ( num == LONG_MAX || num == LONG_MIN ) progerrno("Failed to convert '-%c %s' to a number: ", c, string ); if ( *badchar ) progerr("Invalid char '%c' found in argument to '-%c %s'", badchar[0], c, string); return (int) num; } /************************************************************************* * Gets the command line parameters, if any, and set values in the CMDPARMAS structure * * * Returns: * void (changes *sw and *params) * * To Do: * This code is horrific. Get a structure to define the parameters, and messages! * Move this into its own module! * * Also, mixes two structres for parameters, SWISH and CMDPARAMS. Not a great setup. * * * I'd like to see a centeral routine for processing switches, and a way for * modules to "register" what config options to parse out by the central routine. * **************************************************************************/ static void get_command_line_params(SWISH *sw, char **argv, CMDPARAMS *params ) { char c; char *w; #if defined(_WIN32) || defined(__CYGWIN__) volatile unsigned int DEBUG_MASK_HACK; #endif params->index_read_only = check_readonly_mode( *argv ); if ( !*(argv + 1 ) ) progerr("Missing parameter. Use -h for options.", *argv); while ( *++argv ) { if ((*argv)[0] != '-') // every parameter starts with a dash progerr("Missing switch character at '%s'. Use -h for options.", *argv); if ( !(c = (*argv)[1] ) ) // get single switch char progerr("Missing switch character at '%s'. Use -h for options.", *argv); /* allow joined arguments */ if ( (*argv)[2] ) { *argv += 2; argv--; } switch (c) { /* Search related options */ case 'w': /* query string */ case 'L': /* Limit range */ case 'P': /* phrase char */ case 't': /* struture match */ case 's': /* sort */ case 'b': /* begin location */ case 'm': /* max hits */ case 'H': /* Header display control */ case 'x': /* extended format */ case 'p': /* old-style display properties */ case 'd': /* old-style custom delimiter */ case 'o': /* don't use pre-sorted indexes */ case 'R': /* Ranking Scheme -- default is 1 */ case 'a': /* return raw rank */ argv = fetch_search_params( sw, argv, params, c ); break; /* Indexing options */ case 'i': /* input files for indexing */ case 'S': /* data Source */ case 'c': /* config file */ case 'v': /* verbose indexing - not really limited to indexing */ case 'W': /* ParserWarnLevel - also configurable in conf file */ case 'N': /* limit by date */ case 'l': /* follow symbolic links */ case 'e': /* economy indexing mode (also for merge) */ argv = fetch_indexing_params( sw, argv, params, c ); break; /* Index file(s) selection */ case 'f': { if ( !is_another_param( argv ) ) progerr(" '-f' requires list of index files."); while ( (w = next_param( &argv )) ) addindexfile(sw, w); break; } /* words to dump from index */ case 'k': { if ( !(w = next_param( &argv )) ) progerr(" '-k' requires a character (or '*')."); if ( strlen( w ) != 1 ) progerr(" '-k' requires a character (or '*')."); params->run_mode = MODE_WORDS; params->keychar = w[0]; return; /* nothing else to look for */ } /* print the version number */ case 'V': printversion(); case 'h': case '?': usage(); /* Merge settings */ case 'M': { if ( !is_another_param( argv ) ) progerr(" '-M' requires an output file name."); params->run_mode = MODE_MERGE; while ( (w = next_param( &argv )) ) { /* Last one listed is the output file */ if ( is_another_param( argv ) ) addindexfile(sw, w); else params->merge_out_file = estrdup( w ); } break; } /* Debugging options */ case 'T': { while ( (w = next_param( &argv )) ) { unsigned int bit; if ((bit = isDebugWord( w, params) )) DEBUG_MASK |= bit; else progerr("Invalid debugging option '%s'. Use '-T help' for help.", w); } #if defined(_WIN32) || defined(__CYGWIN__) /* * DEBUG_MASK: can't sum constants imported from a DLL * (see: "man ld" under --enable-auto-import) * 2005-05-12 - David L Norris */ DEBUG_MASK_HACK = DEBUG_MASK; if ( DEBUG_MASK_HACK & DEBUG_LIST_FUZZY ) #else if ( DEBUG_MASK & DEBUG_LIST_FUZZY ) #endif { dump_fuzzy_list(); exit(1); } break; } /* Set where errors go */ case 'E': { if ( !is_another_param( argv ) ) set_error_handle( stderr ); // -E alone goes to stderr else { FILE *f; w = next_param( &argv ); f = fopen( w, "a" ); if ( !f ) progerrno("Failed to open Error file '%s' for appending: ", w ); set_error_handle( f ); } break; } case 'u': case 'r': #ifndef USE_BTREE progerr("Must compile swish-e with --enable-incremental to use -%c option",c); #else { int mode = ( 'u' == c ) ? MODE_UPDATE : MODE_REMOVE; /* Make sure not trying to mix modes at same time */ if ( MODE_UPDATE == params->run_mode || MODE_REMOVE == params->run_mode ) if ( mode != params->run_mode ) progerr("Cannot mix -u (update) and -r (remove) indexing modes."); params->run_mode = mode; if ( is_another_param( argv ) ) progerr("Option -%c does not take a parameter -- use -i to list paths", c ); break; } #endif default: progerr("Unknown switch '-%c'. Use -h for options.", c ); } } } /************************************************************************* * Set config options for the indexing switches * **************************************************************************/ static char **fetch_indexing_params(SWISH *sw, char **argv, CMDPARAMS *params, char switch_char ) { char *w; switch (switch_char) { /* files to index */ case 'i': { if ( !is_another_param( argv ) ) progerr(" '-i' requires a list of things to index."); /* Set run_mode to index, unless in update/remove mode */ if ( MODE_UPDATE != params->run_mode && MODE_REMOVE != params->run_mode ) params->run_mode = MODE_INDEX; while ( (w = next_param( &argv )) ) sw->dirlist = addswline(sw->dirlist, w ); break; } /* Data source */ case 'S': { struct _indexing_data_source_def **data_source; if ( !(w = next_param( &argv )) ) progerr(" '-S' requires a valid data source."); for (data_source = data_sources; *data_source != 0; data_source++) if (strcmp(w, (*data_source)->IndexingDataSourceId) == 0) break; if (!*data_source) progerr("Unknown -S option \"%s\"", w); else IndexingDataSource = *data_source; break; } /* config file list */ case 'c': { if ( !is_another_param( argv ) ) progerr(" '-c' requires one or more configuration files."); /* Set one of the indexing modes when specifying -c */ if ( MODE_UPDATE != params->run_mode && MODE_REMOVE != params->run_mode ) params->run_mode = MODE_INDEX; while ( (w = next_param( &argv )) ) params->conflist = addswline(params->conflist, w); break; } /* Follow symbolic links */ case 'l': sw->FS->followsymlinks = 1; break; /* Save the time for limiting indexing by a file date */ case 'N': { struct stat stat_buf; if ( !(w = next_param( &argv )) ) progerr("-N requires a path to a local file"); if (stat( w, &stat_buf)) progerrno("Bad path '%s' specified with -N: ", w ); sw->mtime_limit = stat_buf.st_mtime; break; } /* Econ mode */ case 'e': params->swap_mode = 1; /* "Economic mode": Uses less RAM */ break; /* verbose while indexing */ case 'v': { params->hasverbose = 1; sw->verbose = get_param_number( &argv, switch_char ); break; } /* ParserWarnLevel */ case 'W': sw->parser_warn_level = get_param_number( &argv, switch_char ); break; default: progerr("Invalid index switch option '%s'", switch_char ); } return argv; } /************************************************************************* * Set config options for the search switches * **************************************************************************/ static char **fetch_search_params(SWISH *sw, char **argv, CMDPARAMS *params, char switch_char ) { char *w; /*** Display Properties Setup ***/ if ( !sw->ResultOutput ) initModule_ResultOutput(sw); switch (switch_char) { /* search words */ case 'w': { if ( !is_another_param( argv ) ) progerr(" '-w' requires list of search words."); if ( !params->query ) { params->query_len = 200; params->query = (char *)emalloc( params->query_len + 1 ); params->query[0] = '\0'; } while ( (w = next_param( &argv )) ) { /* don't add blank words */ if (w[0] == '\0') continue; if ((int)( strlen(params->query) + strlen(" ") + strlen(w) ) >= params->query_len) { params->query_len = strlen(params->query) + strlen(" ") + strlen(w) + 200; params->query = (char *) erealloc(params->query, params->query_len + 1); } params->run_mode = MODE_SEARCH; sprintf(params->query, "%s%s%s", params->query, (params->query[0] == '\0') ? "" : " ", w); } break; } /* Set limit values */ case 'L': { if ( !( is_another_param( argv ) && is_another_param( argv + 1 ) && is_another_param( argv + 2 )) ) progerr("-L requires three parameters "); params->limit_params = setlimit_params(sw, params->limit_params, argv[1], argv[2], argv[3]); if ( sw->lasterror ) SwishAbortLastError( sw ); argv += 3; break; } /* Custom Phrase Delimiter - Jose Ruiz 01/00 */ case 'P': { if ( !(w = next_param( &argv )) ) progerr("'-P' requires a phrase delimiter."); params->PhraseDelimiter = (int) w[0]; break; } /* limit by structure */ case 't': { char * c; if ( !(w = next_param( &argv )) ) progerr("Specify tag fields (HBtheca)."); params->structure = 0; /* reset to none */ for ( c = w; *c; c++ ) switch ( *c ) { case 'H': params->structure |= IN_HEAD; break; case 'B': params->structure |= IN_BODY; break; case 't': params->structure |= IN_TITLE; break; case 'h': params->structure |= IN_HEADER; break; case 'e': params->structure |= IN_EMPHASIZED; break; case 'c': params->structure |= IN_COMMENTS; break; case 'a': params->structure |= IN_ALL; break; default: progerr("-t must only include HBthec. Found '%c'", *c ); } break; } /* sort properties */ case 's': { if ( !is_another_param( argv ) ) progerr(" '-s' requires list of sort properties."); while ( (w = next_param( &argv )) ) params->sort_params = addswline(params->sort_params, w); break; } /* Set begin hit location */ case 'b': params->beginhits = get_param_number( &argv, switch_char ); break; /* Set max hits */ case 'm': params->maxhits = get_param_number( &argv, switch_char ); break; /* $$$ These need better error reporting */ /* Extended format */ case 'x': { /* Jose Ruiz 09/00 */ /* Search proc will show more info */ /* rasc 2001-02 extended -x fmtstr */ if ( !(w = next_param( &argv )) ) progerr("'-x' requires an output format string."); { char *s; /* check if name is a predefined format - not implemented */ s = hasResultExtFmtStr(sw, w); sw->ResultOutput->extendedformat = (s) ? s : w; initPrintExtResult(sw, sw->ResultOutput->extendedformat); } break; } /* Search header output control */ case 'H': sw->headerOutVerbose = get_param_number( &argv, switch_char ); break; /* display properties */ case 'p': { if ( !is_another_param( argv ) ) progerr(" '-p' requires list of properties."); while ( (w = next_param( &argv )) ) params->disp_props = addswline( params->disp_props, w); break; } /* Set the output custom delimiter */ case 'd': { if ( !(w = next_param( &argv )) ) progerr("'-d' requires an output delimiter."); sw->ResultOutput->stdResultFieldDelimiter = estrredup(sw->ResultOutput->stdResultFieldDelimiter, w ); /* This really doesn't work as is probably expected since it's a delimiter and not quoting the fields */ if (strcmp(sw->ResultOutput->stdResultFieldDelimiter, "dq") == 0) strcpy( sw->ResultOutput->stdResultFieldDelimiter, "\"" ); else { int i,j; int backslash = 0; for ( j=0, i=0; i < (int)strlen( w ); i++ ) { if ( !backslash ) { if ( w[i] == '\\' ) { backslash++; continue; } else { sw->ResultOutput->stdResultFieldDelimiter[j++] = w[i]; continue; } } switch ( w[i] ) { case 'f': sw->ResultOutput->stdResultFieldDelimiter[j++] = '\f'; break; case 'n': sw->ResultOutput->stdResultFieldDelimiter[j++] = '\n'; break; case 'r': sw->ResultOutput->stdResultFieldDelimiter[j++] = '\r'; break; case 't': sw->ResultOutput->stdResultFieldDelimiter[j++] = '\t'; break; case '\\': sw->ResultOutput->stdResultFieldDelimiter[j++] = '\\'; sw->ResultOutput->stdResultFieldDelimiter[j++] = '\\'; break; default: progerr("Unknown escape sequence '\\%c'. Must be one of \\f \\n \\r \\t \\\\", w[i]); } backslash = 0; } sw->ResultOutput->stdResultFieldDelimiter[j] = '\0'; } break; } /* Ranking Scheme */ case 'R': sw->RankScheme = get_param_number( &argv, switch_char ); break; case 'a': sw->ReturnRawRank = 1; break; /* Ignore sorted indexes */ case 'o': sw->ResultSort->isPreSorted = 0; break; default: progerr("Invalid search switch option '%c'\n", switch_char ); break; } return argv; } /************************************************************************* * Returns true if we think the program is called swish-search * offers no real security * **************************************************************************/ static int check_readonly_mode( char *prog ) { char *tmp = prog + strlen(prog) - strlen("swish-search"); if ( tmp < prog ) return 0; /* We must ignore case for WIN 32 */ if (strcasecmp(tmp, "swish-search") == 0) return 1; return 0; } /************************************************************************* * Dumps the index file(s) * **************************************************************************/ static void cmd_dump( SWISH *sw, CMDPARAMS *params ) { /* Set the default index file */ if ( sw->indexlist == NULL ) addindexfile(sw, INDEXFILE); while ( sw->indexlist != NULL ) { DB_decompress(sw, sw->indexlist, params->beginhits, params->maxhits); putchar('\n'); sw->indexlist = sw->indexlist->next; } } /************************************************************************* * This run the indexing code * **************************************************************************/ static void cmd_index( SWISH *sw, CMDPARAMS *params ) { int hasdir = (sw->dirlist == NULL) ? 0 : 1; int hasindex = (sw->indexlist == NULL) ? 0 : 1; double elapsedStart = TimeElapsed(); double cpuStart = TimeCPU(); struct swline *tmpswline; if ( params->index_read_only ) progerr("Sorry, this program is in readonly mode"); /* Read configuration files */ { struct swline *tmp = params->conflist; while ( tmp != NULL) { getdefaults(sw, tmp->line, &hasdir, &hasindex, params->hasverbose); tmp = tmp->next; } } /* Default index file */ if ( sw->indexlist == NULL ) addindexfile(sw, INDEXFILE); if (!hasdir) progerr("Specify directories or files to %s.", MODE_INDEX == params->run_mode ? "index" : MODE_UPDATE == params->run_mode ? "update" : "remove" ); if (sw->verbose < 0) sw->verbose = 0; /* Update Economic mode */ sw->Index->swap_locdata = params->swap_mode; /* Check for UPDATE_MODE jmruiz 2002/03 */ if ( MODE_UPDATE == params->run_mode || MODE_REMOVE == params->run_mode ) #ifndef USE_BTREE progerr("Invalid operation mode '%d': Update mode only supported with USE_BTREE feature", (int)params->run_mode); #else { /* Set update_mode */ sw->Index->update_mode = params->run_mode; if ( !open_single_index( sw, sw->indexlist, DB_READWRITE ) ) SwishAbortLastError( sw ); /* Adjust file number to start after the last file number in the index */ sw->Index->filenum = sw->indexlist->header.totalfiles; } #endif else { /* Create an empty File - before indexing to make sure can write to the index */ sw->indexlist->DB = (void *) DB_Create(sw, sw->indexlist->line); if ( sw->lasterror ) SwishAbortLastError( sw ); } /* This should be printed by the module that's reading the source */ if (sw->verbose >= 1) printf("Indexing Data Source: \"%s\"\n", IndexingDataSource->IndexingDataSourceName); tmpswline = sw->dirlist; while (tmpswline != NULL) { if (sw->verbose) { printf("Indexing \"%s\"\n", tmpswline->line); fflush(stdout); } indexpath(sw, tmpswline->line); tmpswline = tmpswline->next; } Mem_Summary("After indexing", 0); if (sw->verbose > 1) putchar('\n'); if (sw->verbose) printf("Removing very common words...\n"); fflush(stdout); write_index_file( sw, 1, elapsedStart, cpuStart, 0); } /************************************************************************* * MERGE: prepare index files for merging, and call merge.c * * Most of this should probably be in merge.c * **************************************************************************/ static void cmd_merge( SWISH *sw_input, CMDPARAMS *params ) { SWISH *sw_out; double elapsedStart = TimeElapsed(); double cpuStart = TimeCPU(); if ( params->index_read_only ) progerr("Sorry, this program is in readonly mode"); if (!sw_input->indexlist) progerr("Failed to list any input files for merging"); /* Open all the index files for reading */ if ( !SwishAttach(sw_input) ) SwishAbortLastError( sw_input ); /* Check output file */ if ( !params->merge_out_file ) progerr("Failed to provide merge output file"); if ( isfile(params->merge_out_file) ) progerr("Merge output file '%s' already exists. Won't overwrite.\n", params->merge_out_file); /* create output */ sw_out = swish_new(); sw_out->verbose = sw_input->verbose; sw_out->headerOutVerbose = sw_input->headerOutVerbose; addindexfile(sw_out, params->merge_out_file); /* Update Economic mode */ sw_out->Index->swap_locdata = params->swap_mode; /* Create an empty File - before indexing to make sure can write to the index */ sw_out->indexlist->DB = (void *) DB_Create(sw_out, params->merge_out_file); if ( sw_out->lasterror ) SwishAbortLastError( sw_out ); merge_indexes( sw_input, sw_out ); write_index_file( sw_out, 0, elapsedStart, cpuStart, 1); swish_close( sw_out ); efree( params->merge_out_file ); } /************************************************************************* * Displays all the words staring with params->keychar * **************************************************************************/ static void cmd_keywords( SWISH *sw, CMDPARAMS *params ) { if (!sw->indexlist) addindexfile(sw, INDEXFILE); OutputKeyChar(sw, (int) (unsigned char) params->keychar); } /************************************************************************* * Runs a swish query * **************************************************************************/ static void cmd_search( SWISH *sw, CMDPARAMS *params ) { double elapsedStart = TimeElapsed(); double elapsedSearchStart; double elapsedEnd; SEARCH_OBJECT *srch; RESULTS_OBJECT *results; /* Set default index file, if none specified */ if (!sw->indexlist) addindexfile(sw, INDEXFILE); /* Open index files */ if ( !SwishAttach(sw) ) SwishAbortLastError( sw ); srch = New_Search_Object( sw, params->query ); if ( sw->lasterror ) SwishAbortLastError( sw ); srch->PhraseDelimiter = params->PhraseDelimiter; srch->structure = params->structure; if ( params->sort_params ) { srch->sort_params = params->sort_params; params->sort_params = NULL; } if ( params->limit_params ) { srch->limit_params = params->limit_params; params->limit_params = NULL; } /* Set up for -p printing */ if ( params->disp_props ) { struct swline *tmp = params->disp_props; while ( tmp ) { addSearchResultDisplayProperty(sw, tmp->line); tmp = tmp->next; } initSearchResultProperties(sw); } /* Get starting time */ elapsedSearchStart = TimeElapsed(); /* Run the query */ results = SwishExecute( srch, NULL ); display_result_headers(results); if ( sw->lasterror ) SwishAbortLastError( sw ); if (results->total_results > 0) { resultHeaderOut(sw, 1, "# Number of hits: %d\n", results->total_results); elapsedEnd = TimeElapsed(); resultHeaderOut(sw, 1, "# Search time: %0.3f seconds\n", elapsedEnd - elapsedSearchStart); resultHeaderOut(sw, 1, "# Run time: %0.3f seconds\n", elapsedEnd - elapsedStart); /* Display the results */ printSortedResults(results, params->beginhits, params->maxhits); resultHeaderOut(sw, 1, ".\n"); } else resultHeaderOut(sw, 1, "err: no results\n.\n"); Free_Results_Object( results ); Free_Search_Object( srch ); freeModule_ResultOutput(sw); } /************************************************************************* * write_index_file -- used for both merge and for indexing * **************************************************************************/ static void write_index_file( SWISH *sw, int process_stopwords, double elapsedStart, double cpuStart, int merge) { int totalfiles = getfilecount(sw->indexlist) - sw->indexlist->header.removedfiles; /* just for display */ int stopwords = 0; struct swline *cur_line; /* Coalesce all remaining locations */ coalesce_all_word_locations(sw, sw->indexlist); if ( process_stopwords ) { /* Proccess IgnoreLimit option */ getPositionsFromIgnoreLimitWords(sw); stopwords = getNumberOfIgnoreLimitWords(sw); if (sw->verbose ) { if (stopwords) { /* 05/00 Jose Ruiz Adjust totalwords for IgnoreLimit ONLY */ /* 2002-07 jmruiz **This is already done in getPositionsFromIgnoreLimitWords ** sw->indexlist->header.totalwords -= stopwords; */ if (sw->indexlist->header.totalwords < 0) sw->indexlist->header.totalwords = 0; /* Same as "stopwords" */ printf("%d words removed by IgnoreLimit:\n", stopwords); for (cur_line = sw->Index->IgnoreLimitWords; cur_line; cur_line = cur_line->next ) printf("%s, ", cur_line->line); printf("\n"); freeswline( sw->Index->IgnoreLimitWords ); } else printf("no words removed.\n"); } } if (sw->verbose) printf("Writing main index...\n"); if ( !sw->indexlist->header.totalwords ) { /* Would be better to flag so db_native would know not to rename the (empty) index file */ // printf("No unique words indexed!\n"); progerr("No unique words indexed!"); } if (sw->verbose) printf("Sorting words ...\n"); sort_words(sw); if (sw->verbose) printf("Writing header ...\n"); fflush(stdout); write_header( sw, merge ); fflush(stdout); if (sw->verbose) printf("Writing index entries ...\n"); write_index(sw, sw->indexlist); if (sw->verbose) { int totalwords = sw->indexlist->header.totalwords; printf("%s unique word%s indexed.\n", comma_long( totalwords ), (totalwords == 1) ? "" : "s"); } /* Sort properties -> Better search performance */ /* First reopen the property file in read only mode for seek speed */ DB_Reopen_PropertiesForRead( sw, sw->indexlist->DB ); if ( sw->lasterror ) SwishAbortLastError( sw ); /* This sorts all the properties */ sortFileProperties(sw,sw->indexlist); if (sw->verbose) { if (totalfiles) { printf("%s file%s indexed. ", comma_long( totalfiles ), (totalfiles == 1) ? "" : "s"); /* common_long is not thread safe -- shares memory */ printf("%s total bytes. ", comma_long(sw->indexlist->total_bytes) ); printf("%s total words.\n", comma_long(sw->indexlist->total_word_positions_cur_run) ); } else printf("no files indexed.\n"); printf("Elapsed time: "); printTime(TimeElapsed() - elapsedStart); printf(" CPU time: "); printTime(TimeCPU() - cpuStart); printf("\n"); printf("Indexing done!\n"); } } /***************************************************************** * dispaly_result_headers -- prints the result header list * ******************************************************************/ static void display_result_headers( RESULTS_OBJECT *results ) { SWISH *sw = results->sw; DB_RESULTS *db_results = results->db_results; resultHeaderOut(sw, 1, "%s\n", INDEXHEADER); /* print # SWISH format: */ /* print out "original" search words */ resultHeaderOut(sw, 1, "# Search words: %s\n", results->query); while ( db_results ) { IndexFILE *indexf = db_results->indexf; resultHeaderOut(sw, 2, "#\n# Index File: %s\n", indexf->line); /* Print all the headers */ print_index_headers( indexf ); /* $$$ move to headers.c and fix as it's corrupting memory */ // translatecharHeaderOut(sw, 2, &indexf->header); /* Show parsed words */ resultHeaderOut(sw, 2, "# Search words: %s\n", results->query); swline_header_out( sw, 2, "# Parsed Words: ", db_results->parsed_words ); swline_header_out( sw, 1, "# Removed stopwords: ", db_results->removed_stopwords ); db_results = db_results->next; } resultHeaderOut(sw, 2, "#\n"); } static void swline_header_out( SWISH *sw, int v, char *desc, struct swline *sl ) { resultHeaderOut(sw, v, desc); while (sl) { resultHeaderOut(sw, v, "%s ", sl->line); sl = sl->next; } resultHeaderOut(sw, v, "\n"); } swish-e-2.4.7/src/rank.c0000664000077100017500000007604111166010110011711 00000000000000/* $Id: rank.c 2291 2009-03-31 01:56:00Z karpet $ This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 14:51:21 CDT 2005 ** added GPL (nothing previously) */ #include #include "swish.h" #include "db.h" #include "rank.h" /* 1000 precomputed 10000 * log(i) */ static int swish_log[] = {\ 0, 0, 6931, 10986, 13863, 16094, 17918, 19459, 20794, 21972,\ 23026, 23979, 24849, 25649, 26391, 27081, 27726, 28332, 28904, 29444,\ 29957, 30445, 30910, 31355, 31781, 32189, 32581, 32958, 33322, 33673,\ 34012, 34340, 34657, 34965, 35264, 35553, 35835, 36109, 36376, 36636,\ 36889, 37136, 37377, 37612, 37842, 38067, 38286, 38501, 38712, 38918,\ 39120, 39318, 39512, 39703, 39890, 40073, 40254, 40431, 40604, 40775,\ 40943, 41109, 41271, 41431, 41589, 41744, 41897, 42047, 42195, 42341,\ 42485, 42627, 42767, 42905, 43041, 43175, 43307, 43438, 43567, 43694,\ 43820, 43944, 44067, 44188, 44308, 44427, 44543, 44659, 44773, 44886,\ 44998, 45109, 45218, 45326, 45433, 45539, 45643, 45747, 45850, 45951,\ 46052, 46151, 46250, 46347, 46444, 46540, 46634, 46728, 46821, 46913,\ 47005, 47095, 47185, 47274, 47362, 47449, 47536, 47622, 47707, 47791,\ 47875, 47958, 48040, 48122, 48203, 48283, 48363, 48442, 48520, 48598,\ 48675, 48752, 48828, 48903, 48978, 49053, 49127, 49200, 49273, 49345,\ 49416, 49488, 49558, 49628, 49698, 49767, 49836, 49904, 49972, 50039,\ 50106, 50173, 50239, 50304, 50370, 50434, 50499, 50562, 50626, 50689,\ 50752, 50814, 50876, 50938, 50999, 51059, 51120, 51180, 51240, 51299,\ 51358, 51417, 51475, 51533, 51591, 51648, 51705, 51761, 51818, 51874,\ 51930, 51985, 52040, 52095, 52149, 52204, 52257, 52311, 52364, 52417,\ 52470, 52523, 52575, 52627, 52679, 52730, 52781, 52832, 52883, 52933,\ 52983, 53033, 53083, 53132, 53181, 53230, 53279, 53327, 53375, 53423,\ 53471, 53519, 53566, 53613, 53660, 53706, 53753, 53799, 53845, 53891,\ 53936, 53982, 54027, 54072, 54116, 54161, 54205, 54250, 54293, 54337,\ 54381, 54424, 54467, 54510, 54553, 54596, 54638, 54681, 54723, 54765,\ 54806, 54848, 54889, 54931, 54972, 55013, 55053, 55094, 55134, 55175,\ 55215, 55255, 55294, 55334, 55373, 55413, 55452, 55491, 55530, 55568,\ 55607, 55645, 55683, 55722, 55759, 55797, 55835, 55872, 55910, 55947,\ 55984, 56021, 56058, 56095, 56131, 56168, 56204, 56240, 56276, 56312,\ 56348, 56384, 56419, 56454, 56490, 56525, 56560, 56595, 56630, 56664,\ 56699, 56733, 56768, 56802, 56836, 56870, 56904, 56937, 56971, 57004,\ 57038, 57071, 57104, 57137, 57170, 57203, 57236, 57268, 57301, 57333,\ 57366, 57398, 57430, 57462, 57494, 57526, 57557, 57589, 57621, 57652,\ 57683, 57714, 57746, 57777, 57807, 57838, 57869, 57900, 57930, 57961,\ 57991, 58021, 58051, 58081, 58111, 58141, 58171, 58201, 58230, 58260,\ 58289, 58319, 58348, 58377, 58406, 58435, 58464, 58493, 58522, 58551,\ 58579, 58608, 58636, 58665, 58693, 58721, 58749, 58777, 58805, 58833,\ 58861, 58889, 58916, 58944, 58972, 58999, 59026, 59054, 59081, 59108,\ 59135, 59162, 59189, 59216, 59243, 59269, 59296, 59322, 59349, 59375,\ 59402, 59428, 59454, 59480, 59506, 59532, 59558, 59584, 59610, 59636,\ 59661, 59687, 59713, 59738, 59764, 59789, 59814, 59839, 59865, 59890,\ 59915, 59940, 59965, 59989, 60014, 60039, 60064, 60088, 60113, 60137,\ 60162, 60186, 60210, 60234, 60259, 60283, 60307, 60331, 60355, 60379,\ 60403, 60426, 60450, 60474, 60497, 60521, 60544, 60568, 60591, 60615,\ 60638, 60661, 60684, 60707, 60730, 60753, 60776, 60799, 60822, 60845,\ 60868, 60890, 60913, 60936, 60958, 60981, 61003, 61026, 61048, 61070,\ 61092, 61115, 61137, 61159, 61181, 61203, 61225, 61247, 61269, 61291,\ 61312, 61334, 61356, 61377, 61399, 61420, 61442, 61463, 61485, 61506,\ 61527, 61549, 61570, 61591, 61612, 61633, 61654, 61675, 61696, 61717,\ 61738, 61759, 61779, 61800, 61821, 61841, 61862, 61883, 61903, 61924,\ 61944, 61964, 61985, 62005, 62025, 62046, 62066, 62086, 62106, 62126,\ 62146, 62166, 62186, 62206, 62226, 62246, 62265, 62285, 62305, 62324,\ 62344, 62364, 62383, 62403, 62422, 62442, 62461, 62480, 62500, 62519,\ 62538, 62558, 62577, 62596, 62615, 62634, 62653, 62672, 62691, 62710,\ 62729, 62748, 62766, 62785, 62804, 62823, 62841, 62860, 62879, 62897,\ 62916, 62934, 62953, 62971, 62989, 63008, 63026, 63044, 63063, 63081,\ 63099, 63117, 63135, 63154, 63172, 63190, 63208, 63226, 63244, 63261,\ 63279, 63297, 63315, 63333, 63351, 63368, 63386, 63404, 63421, 63439,\ 63456, 63474, 63491, 63509, 63526, 63544, 63561, 63578, 63596, 63613,\ 63630, 63648, 63665, 63682, 63699, 63716, 63733, 63750, 63767, 63784,\ 63801, 63818, 63835, 63852, 63869, 63886, 63902, 63919, 63936, 63953,\ 63969, 63986, 64003, 64019, 64036, 64052, 64069, 64085, 64102, 64118,\ 64135, 64151, 64167, 64184, 64200, 64216, 64232, 64249, 64265, 64281,\ 64297, 64313, 64329, 64345, 64362, 64378, 64394, 64409, 64425, 64441,\ 64457, 64473, 64489, 64505, 64520, 64536, 64552, 64568, 64583, 64599,\ 64615, 64630, 64646, 64661, 64677, 64693, 64708, 64723, 64739, 64754,\ 64770, 64785, 64800, 64816, 64831, 64846, 64862, 64877, 64892, 64907,\ 64922, 64938, 64953, 64968, 64983, 64998, 65013, 65028, 65043, 65058,\ 65073, 65088, 65103, 65117, 65132, 65147, 65162, 65177, 65191, 65206,\ 65221, 65236, 65250, 65265, 65280, 65294, 65309, 65323, 65338, 65352,\ 65367, 65381, 65396, 65410, 65425, 65439, 65453, 65468, 65482, 65497,\ 65511, 65525, 65539, 65554, 65568, 65582, 65596, 65610, 65624, 65639,\ 65653, 65667, 65681, 65695, 65709, 65723, 65737, 65751, 65765, 65779,\ 65793, 65806, 65820, 65834, 65848, 65862, 65876, 65889, 65903, 65917,\ 65930, 65944, 65958, 65971, 65985, 65999, 66012, 66026, 66039, 66053,\ 66067, 66080, 66093, 66107, 66120, 66134, 66147, 66161, 66174, 66187,\ 66201, 66214, 66227, 66241, 66254, 66267, 66280, 66294, 66307, 66320,\ 66333, 66346, 66359, 66373, 66386, 66399, 66412, 66425, 66438, 66451,\ 66464, 66477, 66490, 66503, 66516, 66529, 66542, 66554, 66567, 66580,\ 66593, 66606, 66619, 66631, 66644, 66657, 66670, 66682, 66695, 66708,\ 66720, 66733, 66746, 66758, 66771, 66783, 66796, 66809, 66821, 66834,\ 66846, 66859, 66871, 66884, 66896, 66908, 66921, 66933, 66946, 66958,\ 66970, 66983, 66995, 67007, 67020, 67032, 67044, 67056, 67069, 67081,\ 67093, 67105, 67117, 67130, 67142, 67154, 67166, 67178, 67190, 67202,\ 67214, 67226, 67238, 67250, 67262, 67274, 67286, 67298, 67310, 67322,\ 67334, 67346, 67358, 67370, 67382, 67393, 67405, 67417, 67429, 67441,\ 67452, 67464, 67476, 67488, 67499, 67511, 67523, 67534, 67546, 67558,\ 67569, 67581, 67593, 67604, 67616, 67627, 67639, 67650, 67662, 67673,\ 67685, 67696, 67708, 67719, 67731, 67742, 67754, 67765, 67776, 67788,\ 67799, 67811, 67822, 67833, 67845, 67856, 67867, 67878, 67890, 67901,\ 67912, 67923, 67935, 67946, 67957, 67968, 67979, 67991, 68002, 68013,\ 68024, 68035, 68046, 68057, 68068, 68079, 68090, 68101, 68112, 68123,\ 68134, 68145, 68156, 68167, 68178, 68189, 68200, 68211, 68222, 68233,\ 68244, 68255, 68265, 68276, 68287, 68298, 68309, 68320, 68330, 68341,\ 68352, 68363, 68373, 68384, 68395, 68405, 68416, 68427, 68437, 68448,\ 68459, 68469, 68480, 68491, 68501, 68512, 68522, 68533, 68544, 68554,\ 68565, 68575, 68586, 68596, 68607, 68617, 68628, 68638, 68648, 68659,\ 68669, 68680, 68690, 68701, 68711, 68721, 68732, 68742, 68752, 68763,\ 68773, 68783, 68794, 68804, 68814, 68824, 68835, 68845, 68855, 68865,\ 68876, 68886, 68896, 68906, 68916, 68926, 68937, 68947, 68957, 68967,\ 68977, 68987, 68997, 69007, 69017, 69027, 69037, 69048, 69058, 69068,\ 69078, }; /* 1000 precomputed 1000 * log10(i) */ static int swish_log10[] = {\ 0, 0, 3010, 4771, 6021, 6990, 7782, 8451, 9031, 9542,\ 10000, 10414, 10792, 11139, 11461, 11761, 12041, 12304, 12553, 12788,\ 13010, 13222, 13424, 13617, 13802, 13979, 14150, 14314, 14472, 14624,\ 14771, 14914, 15051, 15185, 15315, 15441, 15563, 15682, 15798, 15911,\ 16021, 16128, 16232, 16335, 16435, 16532, 16628, 16721, 16812, 16902,\ 16990, 17076, 17160, 17243, 17324, 17404, 17482, 17559, 17634, 17709,\ 17782, 17853, 17924, 17993, 18062, 18129, 18195, 18261, 18325, 18388,\ 18451, 18513, 18573, 18633, 18692, 18751, 18808, 18865, 18921, 18976,\ 19031, 19085, 19138, 19191, 19243, 19294, 19345, 19395, 19445, 19494,\ 19542, 19590, 19638, 19685, 19731, 19777, 19823, 19868, 19912, 19956,\ 20000, 20043, 20086, 20128, 20170, 20212, 20253, 20294, 20334, 20374,\ 20414, 20453, 20492, 20531, 20569, 20607, 20645, 20682, 20719, 20755,\ 20792, 20828, 20864, 20899, 20934, 20969, 21004, 21038, 21072, 21106,\ 21139, 21173, 21206, 21239, 21271, 21303, 21335, 21367, 21399, 21430,\ 21461, 21492, 21523, 21553, 21584, 21614, 21644, 21673, 21703, 21732,\ 21761, 21790, 21818, 21847, 21875, 21903, 21931, 21959, 21987, 22014,\ 22041, 22068, 22095, 22122, 22148, 22175, 22201, 22227, 22253, 22279,\ 22304, 22330, 22355, 22380, 22405, 22430, 22455, 22480, 22504, 22529,\ 22553, 22577, 22601, 22625, 22648, 22672, 22695, 22718, 22742, 22765,\ 22788, 22810, 22833, 22856, 22878, 22900, 22923, 22945, 22967, 22989,\ 23010, 23032, 23054, 23075, 23096, 23118, 23139, 23160, 23181, 23201,\ 23222, 23243, 23263, 23284, 23304, 23324, 23345, 23365, 23385, 23404,\ 23424, 23444, 23464, 23483, 23502, 23522, 23541, 23560, 23579, 23598,\ 23617, 23636, 23655, 23674, 23692, 23711, 23729, 23747, 23766, 23784,\ 23802, 23820, 23838, 23856, 23874, 23892, 23909, 23927, 23945, 23962,\ 23979, 23997, 24014, 24031, 24048, 24065, 24082, 24099, 24116, 24133,\ 24150, 24166, 24183, 24200, 24216, 24232, 24249, 24265, 24281, 24298,\ 24314, 24330, 24346, 24362, 24378, 24393, 24409, 24425, 24440, 24456,\ 24472, 24487, 24502, 24518, 24533, 24548, 24564, 24579, 24594, 24609,\ 24624, 24639, 24654, 24669, 24683, 24698, 24713, 24728, 24742, 24757,\ 24771, 24786, 24800, 24814, 24829, 24843, 24857, 24871, 24886, 24900,\ 24914, 24928, 24942, 24955, 24969, 24983, 24997, 25011, 25024, 25038,\ 25051, 25065, 25079, 25092, 25105, 25119, 25132, 25145, 25159, 25172,\ 25185, 25198, 25211, 25224, 25237, 25250, 25263, 25276, 25289, 25302,\ 25315, 25328, 25340, 25353, 25366, 25378, 25391, 25403, 25416, 25428,\ 25441, 25453, 25465, 25478, 25490, 25502, 25514, 25527, 25539, 25551,\ 25563, 25575, 25587, 25599, 25611, 25623, 25635, 25647, 25658, 25670,\ 25682, 25694, 25705, 25717, 25729, 25740, 25752, 25763, 25775, 25786,\ 25798, 25809, 25821, 25832, 25843, 25855, 25866, 25877, 25888, 25899,\ 25911, 25922, 25933, 25944, 25955, 25966, 25977, 25988, 25999, 26010,\ 26021, 26031, 26042, 26053, 26064, 26075, 26085, 26096, 26107, 26117,\ 26128, 26138, 26149, 26160, 26170, 26180, 26191, 26201, 26212, 26222,\ 26232, 26243, 26253, 26263, 26274, 26284, 26294, 26304, 26314, 26325,\ 26335, 26345, 26355, 26365, 26375, 26385, 26395, 26405, 26415, 26425,\ 26435, 26444, 26454, 26464, 26474, 26484, 26493, 26503, 26513, 26522,\ 26532, 26542, 26551, 26561, 26571, 26580, 26590, 26599, 26609, 26618,\ 26628, 26637, 26646, 26656, 26665, 26675, 26684, 26693, 26702, 26712,\ 26721, 26730, 26739, 26749, 26758, 26767, 26776, 26785, 26794, 26803,\ 26812, 26821, 26830, 26839, 26848, 26857, 26866, 26875, 26884, 26893,\ 26902, 26911, 26920, 26928, 26937, 26946, 26955, 26964, 26972, 26981,\ 26990, 26998, 27007, 27016, 27024, 27033, 27042, 27050, 27059, 27067,\ 27076, 27084, 27093, 27101, 27110, 27118, 27126, 27135, 27143, 27152,\ 27160, 27168, 27177, 27185, 27193, 27202, 27210, 27218, 27226, 27235,\ 27243, 27251, 27259, 27267, 27275, 27284, 27292, 27300, 27308, 27316,\ 27324, 27332, 27340, 27348, 27356, 27364, 27372, 27380, 27388, 27396,\ 27404, 27412, 27419, 27427, 27435, 27443, 27451, 27459, 27466, 27474,\ 27482, 27490, 27497, 27505, 27513, 27520, 27528, 27536, 27543, 27551,\ 27559, 27566, 27574, 27582, 27589, 27597, 27604, 27612, 27619, 27627,\ 27634, 27642, 27649, 27657, 27664, 27672, 27679, 27686, 27694, 27701,\ 27709, 27716, 27723, 27731, 27738, 27745, 27752, 27760, 27767, 27774,\ 27782, 27789, 27796, 27803, 27810, 27818, 27825, 27832, 27839, 27846,\ 27853, 27860, 27868, 27875, 27882, 27889, 27896, 27903, 27910, 27917,\ 27924, 27931, 27938, 27945, 27952, 27959, 27966, 27973, 27980, 27987,\ 27993, 28000, 28007, 28014, 28021, 28028, 28035, 28041, 28048, 28055,\ 28062, 28069, 28075, 28082, 28089, 28096, 28102, 28109, 28116, 28122,\ 28129, 28136, 28142, 28149, 28156, 28162, 28169, 28176, 28182, 28189,\ 28195, 28202, 28209, 28215, 28222, 28228, 28235, 28241, 28248, 28254,\ 28261, 28267, 28274, 28280, 28287, 28293, 28299, 28306, 28312, 28319,\ 28325, 28331, 28338, 28344, 28351, 28357, 28363, 28370, 28376, 28382,\ 28388, 28395, 28401, 28407, 28414, 28420, 28426, 28432, 28439, 28445,\ 28451, 28457, 28463, 28470, 28476, 28482, 28488, 28494, 28500, 28506,\ 28513, 28519, 28525, 28531, 28537, 28543, 28549, 28555, 28561, 28567,\ 28573, 28579, 28585, 28591, 28597, 28603, 28609, 28615, 28621, 28627,\ 28633, 28639, 28645, 28651, 28657, 28663, 28669, 28675, 28681, 28686,\ 28692, 28698, 28704, 28710, 28716, 28722, 28727, 28733, 28739, 28745,\ 28751, 28756, 28762, 28768, 28774, 28779, 28785, 28791, 28797, 28802,\ 28808, 28814, 28820, 28825, 28831, 28837, 28842, 28848, 28854, 28859,\ 28865, 28871, 28876, 28882, 28887, 28893, 28899, 28904, 28910, 28915,\ 28921, 28927, 28932, 28938, 28943, 28949, 28954, 28960, 28965, 28971,\ 28976, 28982, 28987, 28993, 28998, 29004, 29009, 29015, 29020, 29025,\ 29031, 29036, 29042, 29047, 29053, 29058, 29063, 29069, 29074, 29079,\ 29085, 29090, 29096, 29101, 29106, 29112, 29117, 29122, 29128, 29133,\ 29138, 29143, 29149, 29154, 29159, 29165, 29170, 29175, 29180, 29186,\ 29191, 29196, 29201, 29206, 29212, 29217, 29222, 29227, 29232, 29238,\ 29243, 29248, 29253, 29258, 29263, 29269, 29274, 29279, 29284, 29289,\ 29294, 29299, 29304, 29309, 29315, 29320, 29325, 29330, 29335, 29340,\ 29345, 29350, 29355, 29360, 29365, 29370, 29375, 29380, 29385, 29390,\ 29395, 29400, 29405, 29410, 29415, 29420, 29425, 29430, 29435, 29440,\ 29445, 29450, 29455, 29460, 29465, 29469, 29474, 29479, 29484, 29489,\ 29494, 29499, 29504, 29509, 29513, 29518, 29523, 29528, 29533, 29538,\ 29542, 29547, 29552, 29557, 29562, 29566, 29571, 29576, 29581, 29586,\ 29590, 29595, 29600, 29605, 29609, 29614, 29619, 29624, 29628, 29633,\ 29638, 29643, 29647, 29652, 29657, 29661, 29666, 29671, 29675, 29680,\ 29685, 29689, 29694, 29699, 29703, 29708, 29713, 29717, 29722, 29727,\ 29731, 29736, 29741, 29745, 29750, 29754, 29759, 29763, 29768, 29773,\ 29777, 29782, 29786, 29791, 29795, 29800, 29805, 29809, 29814, 29818,\ 29823, 29827, 29832, 29836, 29841, 29845, 29850, 29854, 29859, 29863,\ 29868, 29872, 29877, 29881, 29886, 29890, 29894, 29899, 29903, 29908,\ 29912, 29917, 29921, 29926, 29930, 29934, 29939, 29943, 29948, 29952,\ 29956, 29961, 29965, 29969, 29974, 29978, 29983, 29987, 29991, 29996,\ 30000, }; typedef struct { int mask; int rank; } RankFactor; static RankFactor ranks[] = { {IN_TITLE, RANK_TITLE}, {IN_HEADER, RANK_HEADER}, {IN_META, RANK_META}, {IN_COMMENTS, RANK_COMMENTS}, {IN_EMPHASIZED, RANK_EMPHASIZED} }; #define numRanks (sizeof(ranks)/sizeof(ranks[0])) /****************************************************************************** * build_struct_map * * Builds an array to hold all possible structure values * (where the value is determined by the bits set in the structure flag) * This is just to provide a faster adjustment of rank based on structure. * * A word's rank value is one plus the sum of the strucutre bit values. * The value of each structure bit is stored in the RankFactor array, and the * defaults for each RANK_* are in config.h. * *******************************************************************************/ static void build_struct_map( SWISH *sw ) { int structure; int i; int array_size = sizeof( sw->structure_map ) / sizeof( sw->structure_map[0]); for ( structure = 0; structure < array_size; structure++ ) { int factor = 1; /* All words are of value 1 */ for (i = 0; i < (int)numRanks; i++) if (ranks[i].mask & structure) factor += ranks[i].rank; sw->structure_map[structure] = factor; } sw->structure_map_set = 1; /* flag */ } int getrank ( RESULT *r ) { SWISH *sw; IndexFILE *indexf; int scheme; indexf = r->db_results->indexf; sw = indexf->sw; scheme = sw->RankScheme; if( DEBUG_RANK ) { fprintf( stderr, "-----------------------------------------------------------------\n"); fprintf( stderr, "Ranking Scheme: %d \n", scheme ); } switch ( scheme ) { case 0: { return ( getrankDEF( r ) ); } case 1: { if ( indexf->header.ignoreTotalWordCountWhenRanking ) { fprintf(stderr, "IgnoreTotalWordCountWhenRanking must be 0 to use IDF ranking\n"); exit(1); } return ( getrankIDF( r ) ); } default: { return ( getrankDEF( r ) ); } } } /* 2001-11 jmruiz With thousands results (>1000000) this routine is a bottleneck. ** (it is called thousands of times) Trying to avoid ** this I have added some optimizations. ** To avoid the annoying conversion in "return (int)rank" ** from double to int that degrades performance search ** I have switched to integer computations. ** ** To avoid the loss of precision I use rank *10000, ** reduction *10000, factor *10000, etc... */ /* renamed getrankDEF to allow for multiple schemes with generic case caller getrank() karman Mon Aug 30 07:03:35 CDT 2004 */ int getrankDEF( RESULT *r ) { unsigned int *posdata; int meta_bias; IndexFILE *indexf; int rank; int reduction; int words; /* number of word positions (total words, not unique) in the given file -- probably should be per metaname */ int i; SWISH *sw; int metaID; int freq; int struct_tally[256]; if( DEBUG_RANK ) { for ( i = 0; i <= 255; i++ ) struct_tally[i] = 0; } /* has rank already been calculated? */ if ( r->rank >= 0 ) return r->rank; /* load data locally */ indexf = r->db_results->indexf; sw = indexf->sw; posdata = r->posdata; /* Get bias for the current metaID - metaID is stored in the rank for ease here */ /* Currently, the rankbias is a number from -10 to +10. It's an arbitrary range. */ /* MetaBias is added to the word value for *each* word position, so it's multipied */ /* times the number of words in the document (r->frequency). */ metaID = r->rank * -1; meta_bias = indexf->header.metaEntryArray[ metaID - 1 ]->rank_bias; /* pre-build the structure map array */ /* this maps a word's structure to a rank value */ if ( !sw->structure_map_set ) { build_struct_map( sw ); } /* Add up the raw word values for each word found in the current document. */ /* Notice the meta_bias added to each position */ /* Might also bias words with low position values, for example */ /* Should really consider r->tfrequency, which is the number of files that have */ /* this word. If the word is not found in many files then it should be ranked higher */ rank = 1; freq = r->frequency; if ( freq > 100 ) freq = 100; for(i = 0; i < freq; i++) { /* GET_STRUCTURE must return value in range! */ rank += sw->structure_map[ GET_STRUCTURE(posdata[i]) ] + meta_bias; if( DEBUG_RANK > 1 ) { fprintf(stderr, "Word entry %d at position %d has struct %d\n", i, GET_POSITION(posdata[i]), GET_STRUCTURE(posdata[i]) ); struct_tally[ GET_STRUCTURE(posdata[i]) ]++; } } if( DEBUG_RANK ) { fprintf( stderr, "File num: %d. Raw Rank: %d. Frequency: %d ", r->filenum, rank, r->frequency ); } /* Ranks could end up less than zero -- but since the *final* rank is calcualted here */ /* we can't know the *lowest* value to use an offset. It might be better to track */ /* the lowest value and delay actual final rank calculation/scaling to when the rank */ /* is printed. Especially when AND or OR'ing resutls. */ if ( rank < 1 ) rank = 1; rank = scale_word_score( rank ); if( DEBUG_RANK > 1 ) { fprintf( stderr, "scaled rank: %d\n Structure tally:\n", rank ); for ( i = 0; i <= 255; i++ ) { if ( struct_tally[i] ) { fprintf( stderr, " struct 0x%x = count of %2d (", i, struct_tally[i] ); if ( i & IN_EMPHASIZED ) fprintf(stderr," EM"); if ( i & IN_HEADER ) fprintf(stderr," HEADING"); if ( i & IN_COMMENTS ) fprintf(stderr," COMMENT"); if ( i & IN_META ) fprintf(stderr," META"); if ( i & IN_BODY ) fprintf(stderr," BODY"); if ( i & IN_HEAD ) fprintf(stderr," HEAD"); if ( i & IN_TITLE ) fprintf(stderr," TITLE"); if ( i & IN_FILE ) fprintf(stderr," FILE"); fprintf(stderr," ) x rank map of %d = %d\n\n", sw->structure_map[i], sw->structure_map[i] * struct_tally[i]); } } } /* Return if IgnoreTotalWordCountWhenRanking is true (the default) */ if ( indexf->header.ignoreTotalWordCountWhenRanking ) return ( r->rank = rank / 100); /* bias by total words in the file -- this is off by default */ /* if word count is significant, reduce rank by a number between 1.0 and 5.0 */ words = getTotalWordsInFile( indexf, r->filenum ); if (words <= 10) reduction = 10000; /* 10000 * log10(10) = 10000 */ else if (words > 1000) { if(words >= 100000) /* log10(10000) is 5 */ reduction = 50000; /* As it was in previous version (5 * 10000) */ /* rare case - do not overrun the static arrays (they only have 1000 entries) */ else reduction = (int) (10000 * (floor(log10((double)words) + 0.5))); } else reduction = swish_log10[words]; r->rank = (rank * 100) / reduction; return r->rank; } /* multiple ranking schemes allow for more fine-tuning as users require. Use the -R command line option or RankScheme() API method. Default is to use getrankDEF() -- the same as -R 0 IDF ranking uses the total word frequency across all searched indexes and a normalizing formula to negate effect of docs with different sizes. The normalizing formula evaluates the word's density within a doc. Greater density = greater relevance (higher rank) NOTE that IgnoreTotalWordCountWhenRanking must be FALSE (0) in order to use IDF ranking. Do error check on that config when calling getrankIDF() TODO: The ranking functions could likely be split up into smaller functions since much code is shared. karman Sun Aug 29 21:01:28 CDT 2004 */ int getrankIDF( RESULT *r ) { unsigned int *posdata; int meta_bias; IndexFILE *indexf; int words; int i; SWISH *sw; int metaID; int freq; int total_files; int total_words; int average_words; int density; int idf; int total_word_freq; int word_weight; int word_score; /* int density_magic = 2; */ /* the value named 'rank' in getrank() is here named 'word_score'. it's largely semantic, but helps emphasize that *docs* are ranked, but *words* are scored. The doc rank is calculated based on the accrued word scores. However, the hash key name 'rank' is preserved in the r (RESULT) object for compatibility with getrank() */ /* this first part is identical to getrankDEF -- could be optimized as a single function */ int struct_tally[256]; if( DEBUG_RANK ) { for ( i = 0; i <= 255; i++ ) struct_tally[i] = 0; } if ( r->rank >= 0 ) return r->rank; indexf = r->db_results->indexf; sw = indexf->sw; posdata = r->posdata; metaID = r->rank * -1; meta_bias = indexf->header.metaEntryArray[ metaID - 1 ]->rank_bias; if ( !sw->structure_map_set ) { build_struct_map( sw ); } /* here we start to diverge */ word_score = 1; freq = r->frequency; if( DEBUG_RANK ) { fprintf( stderr, "File num: %d Word Score: %d Frequency: %d ", r->filenum, word_score, freq ); } /* don't do this here; let density calc do it if ( freq > 100 ) freq = 100; */ /* IDF is the Inverse Document Frequency, or, the weight of the word in relationship to the collection of documents as a whole. Multiply the weight against the rank to give greater weight to words that appear less often in the collection. The biggest impact should be seen when OR'ing words together instead of AND'ing them. */ total_files = sw->TotalFiles; total_word_freq = r->tfrequency; idf = (int) ( log( total_files / total_word_freq ) * 1000 ); /* *1000 helps create a wider spread between the most common words and the rest of the pack: "word frequencies in natural language obey a power-law distribution" -- Maciej Ceglowski */ if ( idf < 1 ) idf = 1; /* only ubiquitous words like 'the' get idfs < 1. these should probably be stopwords anyway... */ if( DEBUG_RANK ) { fprintf(stderr, "Total files: %d Total word freq: %d IDF: %d \n", total_files, total_word_freq, idf ); } /* calc word density. this normalizes document length so that longer docs don't rank higher out of sheer quantity. Hopefully this is a little more scientific than the out-of-the-air calc in getrankDEF() -- though effectiveness is likely the same... */ words = getTotalWordsInFile( indexf, r->filenum ); total_words = sw->TotalWordPos; average_words = total_words / total_files; if( DEBUG_RANK ) { fprintf(stderr, "Total words: %d Average words: %d Indexed words in this doc: %d ", total_words, average_words, words ); } /* normalizing term density in a collection. Amati & Van Rijsbergen "normalization 2" from A Study of Parameter Tuning for Term Frequency Normalization Ben HE Department of Computing Science University of Glasgow Glasgow, UK ben@dcs.gla.ac.uk term_freq_density = term_freq * log( 1 + c * av_doc_leng/doc_leng) where c > 0 (optimized at 2 ... we think...) */ /* density = freq * log( 1 + ( density_magic * ( average_words / words ) ) ); */ /* doesn't work that well with int values. Use below (cruder) instead. NOTE that there is likely a sweet spot for density. A word like 'the' will always have a high density in normal language, but it is not very relevant. A word like 'foo' might have a very low density in doc A but slightly higher in doc B -- doc B is likely more relevant. So low density is not a good indicator and neither is high density. Instead, something like: | density scale ---> | 0 X 100 useless sweet useless */ /* if for some reason there are 0 words in this document, we'll get a divide by zero error in this next calculation. why might a doc have no words in it, and yet be in the set of matches we're evaluating? maybe because stopwords aren't counted? it's a mystery, just like mankind... set words=1 if <1 so that we at least avoid the core dump -- would it be better to somehow skip it altogether? throw an error? let's warn on stderr for now, just to alert the user that something is awry. */ if ( words < 1 ) { fprintf(stderr, "Word count for document %d is zero\n", r->filenum ); words = 1; } density = ( ( average_words * 1000 ) / words ) * freq; /* minimum density */ if (density < 1) density = 1; /* scale word_weight back down by 100 or so, just to make it a little saner */ word_weight = ( density * idf ) / 100; if( DEBUG_RANK ) { fprintf(stderr, "Density: %d Word Weight: %d \n", density, word_weight ); } for(i = 0; i < freq; i++) { /* GET_STRUCTURE must return value in range! */ word_score += word_weight * ( sw->structure_map[ GET_STRUCTURE(posdata[i]) ] + meta_bias ); if( DEBUG_RANK > 1 ) { fprintf(stderr, "Word entry %d at position %d has struct %d\n", i, GET_POSITION(posdata[i]), GET_STRUCTURE(posdata[i]) ); struct_tally[ GET_STRUCTURE(posdata[i]) ]++; } } /* see comment in getrank() about why we make sure score is positive and non-zero */ if ( word_score < 1 ) word_score = 1; if( DEBUG_RANK ) { fprintf(stderr, "Raw score after IDF weighting: %d \n", word_score ); } word_score = scale_word_score( word_score ); if( DEBUG_RANK > 1 ) { fprintf( stderr, "scaled rank: %d\n Structure tally:\n", word_score ); for ( i = 0; i <= 255; i++ ) { if ( struct_tally[i] ) { fprintf( stderr, " struct 0x%x = count of %2d (", i, struct_tally[i] ); if ( i & IN_EMPHASIZED ) fprintf(stderr," EM"); if ( i & IN_HEADER ) fprintf(stderr," HEADING"); if ( i & IN_COMMENTS ) fprintf(stderr," COMMENT"); if ( i & IN_META ) fprintf(stderr," META"); if ( i & IN_BODY ) fprintf(stderr," BODY"); if ( i & IN_HEAD ) fprintf(stderr," HEAD"); if ( i & IN_TITLE ) fprintf(stderr," TITLE"); if ( i & IN_FILE ) fprintf(stderr," FILE"); fprintf(stderr," ) x rank map of %d = %d\n\n", sw->structure_map[i], sw->structure_map[i] * struct_tally[i]); } } } if ( DEBUG_RANK ) { fprintf(stderr, "Scaled score: %d \n", word_score ); } return ( r->rank = word_score ); } int scale_word_score( int score ) { /* Scale the rank - this was originally based on frequency */ /* Uses lookup tables for values <= 1000, otherwise calculate */ return score > 1000 ? (int) floor( (log((double)score) * 10000 ) + 0.5) : swish_log[score]; } swish-e-2.4.7/src/stemmer.c0000664000077100017500000004211611166010110012426 00000000000000/* $Id: stemmer.c 1949 2007-10-24 03:02:08Z karpet $ This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Tue May 10 08:19:25 CDT 2005 ** removed original Porter stemmer code, Stem() function, and all related copyrights ** everything else is GPL by wmoseley **************************************************************************************** ** NOTE: This implementation was originally part of the WAIS system ** The main function, Stem(), was incorporated into Swish-E 1.1 ** to provide a stemming function. ** 11/24/98 Mark Gaulin ** ** Stem returns original word if words stems to empty string ** Bill Moseley 10/11/99 ** ** Repeats stemming until word will stem no more ** Bill Moseley 10/17/99 ** ** function: EndsWithCVC patched a bug. see below. Moseley 10/19/99 ** ** Added word length arg to ReplaceEnd and Stem to avoid strcat overflow ** 11/17/99 - SRE ** ** fixed int cast, missing return value, braces around initializations: problems pointed out by "gcc -Wall" ** SRE 2/22/00 ** ** Jose Ruiz 18/10/00 ** Remove static word from end var and make the code thread safe ** ** Bill Moseley 20/05/01 ** Rewrote to simplify. No more need to repeat stem (expandstar gone from search.c) ** got rid of most of the reallocation of memory. * * * Dec 11, 2003 - refactored ;) * * Now all stemmers are accessed the same way. get_fuzzy_mode/set_fuzzy_mode both * create a FUZZY_OBJECT for the selected fuzzy mode. This action also calls the init * function for the stemmer (only for snowball at this time). * fuzzy_convert() is passed the object and the word to stem and a FUZZY_WORD is returned. * The FUZZY_WORD is a structure and will always contain a word which may be the unchanged * input word or a stemmed word or words. The FUZZY_WORD can be checked to see if * the status of the stemming (in some cases anyway). The idea is you can use the * returned word list regardless of if it stemmed or not. * After stemming call fuzzy_free_word() to free the memory use in the conversion. * When done with the stemmer call free_fuzzy_mode() to free up the stemmer. * Look at index.c for an example. */ #include "swish.h" #include "error.h" #include "soundex.h" #include "swstring.h" #include #include #include #include "headers.h" #include "stemmer.h" #include "mem.h" #include "search.h" /* for stemming via a result */ #define FALSE 0 #define TRUE 1 #include "double_metaphone.h" /* Includes for using SNOWBALL stemmer */ #include "snowball/stem_es.h" #include "snowball/stem_fr.h" #include "snowball/stem_it.h" #include "snowball/stem_pt.h" #include "snowball/stem_de.h" #include "snowball/stem_nl.h" #include "snowball/stem_en1.h" #include "snowball/stem_en2.h" #include "snowball/stem_no.h" #include "snowball/stem_se.h" #include "snowball/stem_dk.h" #include "snowball/stem_ru.h" #include "snowball/stem_fi.h" #include "snowball/stem_ro.h" #include "snowball/stem_hu.h" #include "snowball/api.h" static FUZZY_WORD *no_stem( FUZZY_OBJECT *fi, const char *inword); static FUZZY_WORD *Stem_snowball( FUZZY_OBJECT *fi, const char *inword); static FUZZY_WORD *double_metaphone( FUZZY_OBJECT *fi, const char *inword); static FUZZY_OPTS fuzzy_opts[] = { /* fuzzy_mode *name *routine *init *free *lang_stem */ { FUZZY_NONE, "None", no_stem, NULL, NULL, NULL }, { FUZZY_SOUNDEX, "Soundex", soundex, NULL, NULL, NULL }, { FUZZY_METAPHONE, "Metaphone", double_metaphone, NULL, NULL, NULL }, { FUZZY_DOUBLE_METAPHONE, "DoubleMetaphone", double_metaphone, NULL, NULL, NULL }, { FUZZY_STEMMING_ES, "Stemming_es", Stem_snowball, spanish_ISO_8859_1_create_env, spanish_ISO_8859_1_close_env, spanish_ISO_8859_1_stem }, { FUZZY_STEMMING_FR, "Stemming_fr", Stem_snowball, french_ISO_8859_1_create_env, french_ISO_8859_1_close_env, french_ISO_8859_1_stem }, { FUZZY_STEMMING_IT, "Stemming_it", Stem_snowball, italian_ISO_8859_1_create_env, italian_ISO_8859_1_close_env, italian_ISO_8859_1_stem }, { FUZZY_STEMMING_PT, "Stemming_pt", Stem_snowball, portuguese_ISO_8859_1_create_env, portuguese_ISO_8859_1_close_env, portuguese_ISO_8859_1_stem }, { FUZZY_STEMMING_DE, "Stemming_de", Stem_snowball, german_ISO_8859_1_create_env, german_ISO_8859_1_close_env, german_ISO_8859_1_stem }, { FUZZY_STEMMING_NL, "Stemming_nl", Stem_snowball, dutch_ISO_8859_1_create_env, dutch_ISO_8859_1_close_env, dutch_ISO_8859_1_stem }, { FUZZY_STEMMING_EN1, "Stemming_en1", Stem_snowball, porter_ISO_8859_1_create_env, porter_ISO_8859_1_close_env, porter_ISO_8859_1_stem }, { FUZZY_STEMMING_EN2, "Stemming_en2", Stem_snowball, english_ISO_8859_1_create_env, english_ISO_8859_1_close_env, english_ISO_8859_1_stem }, { FUZZY_STEMMING_NO, "Stemming_no", Stem_snowball, norwegian_ISO_8859_1_create_env, norwegian_ISO_8859_1_close_env, norwegian_ISO_8859_1_stem }, { FUZZY_STEMMING_SE, "Stemming_se", Stem_snowball, swedish_ISO_8859_1_create_env, swedish_ISO_8859_1_close_env, swedish_ISO_8859_1_stem }, { FUZZY_STEMMING_DK, "Stemming_dk", Stem_snowball, danish_ISO_8859_1_create_env, danish_ISO_8859_1_close_env, danish_ISO_8859_1_stem }, { FUZZY_STEMMING_RU, "Stemming_ru", Stem_snowball, russian_KOI8_R_create_env, russian_KOI8_R_close_env, russian_KOI8_R_stem }, { FUZZY_STEMMING_FI, "Stemming_fi", Stem_snowball, finnish_ISO_8859_1_create_env, finnish_ISO_8859_1_close_env, finnish_ISO_8859_1_stem }, { FUZZY_STEMMING_RO, "Stemming_ro", Stem_snowball, romanian_ISO_8859_2_create_env, romanian_ISO_8859_2_close_env, romanian_ISO_8859_2_stem }, { FUZZY_STEMMING_HU, "Stemming_hu", Stem_snowball, hungarian_ISO_8859_1_create_env, hungarian_ISO_8859_1_close_env, hungarian_ISO_8859_1_stem }, /* these next two are deprecated and are identical to Stemming_en1 */ { FUZZY_STEMMING_EN1, "Stemming_en", Stem_snowball, porter_ISO_8859_1_create_env, porter_ISO_8859_1_close_env, porter_ISO_8859_1_stem }, { FUZZY_STEMMING_EN1, "Stem", Stem_snowball, porter_ISO_8859_1_create_env, porter_ISO_8859_1_close_env, porter_ISO_8859_1_stem } }; /* * This function calls the individual stemmer based on the stemmer selected in the fuzzy_index * Returns pointer to a FUZZY_WORD which contains a char** of stemmed words -- typically the * list is a single word followed by a null pointer (null terminated list). Double-metaphone * may return two strings. The string is initially set to the incoming word so can normally * just use the list without checking for errors (if you don't care if the word stems or not). * Otherwise, need to check fw->error to see if there was a problem in stemming. */ FUZZY_WORD *fuzzy_convert( FUZZY_OBJECT *fi, const char *inword ) { if ( !fi ) progerr("called fuzzy_convert with NULL FUZZY_OBJECT"); return fi->stemmer->routine( fi, inword ); /* call the specific stemmer */ } /* This is a dummy stemmer that returns nothing and just eats cpu */ static FUZZY_WORD *no_stem( FUZZY_OBJECT *fi, const char *inword) { return create_fuzzy_word( inword, 1 ); } /* Frees up a fuzzy data structure for a given word. Frees memory of strings, if needed */ void fuzzy_free_word( FUZZY_WORD *fw ) { if ( !fw ) progerr("called fuzzy_free_data with null value"); if ( fw->free_strings ) { char **word = fw->word_list; while ( *word ) { efree( *word ); word++; } } efree( fw ); } /* * creates a FUZZY_WORD structure for "word_count" words + a null at the end * called by the individual stemming routines. */ FUZZY_WORD *create_fuzzy_word( const char *input_word, int word_count ) { size_t bytes; FUZZY_WORD *fw; if ( word_count < 1 ) word_count = 1; bytes = sizeof(FUZZY_WORD) + ( word_count * sizeof(char *) ); fw = (FUZZY_WORD *)emalloc( bytes ); memset( fw, 0, bytes ); fw->error = STEM_OK; /* default to OK */ fw->orig_word = input_word; /* original string */ fw->string_list[0] = (char *)input_word;/* so we have an output word */ fw->list_size = 1; /* count of words in list */ fw->word_list = &fw->string_list[0]; /* so we have a **char */ return fw; } void dump_fuzzy_list( void ) { int i; printf("Options available for FuzzyIndexingMode:\n"); for (i = 0; i < (int)(sizeof(fuzzy_opts) / sizeof(fuzzy_opts[0])); i++) printf(" %s\n", fuzzy_opts[i].name ); } /* * sets the fuzzy indexing mode once the option has been selected. */ FUZZY_OBJECT *create_fuzzy_struct( FUZZY_OBJECT *fi, FUZZY_OPTS *fuzzy_opts ) { FUZZY_OBJECT *f_new = (FUZZY_OBJECT *)emalloc( sizeof( FUZZY_OBJECT) ); free_fuzzy_mode( fi ); /* tidy up previous mode, if one */ f_new->stemmer = fuzzy_opts; /* save a reference to the stemmer data */ /* Call the init function if there is one */ if ( fuzzy_opts->init ) f_new->snowball_options = fuzzy_opts->init(); /* initialize the stemmer */ return f_new; } /* * Free a stemmer object -- calling the free() function if one is defined. */ void free_fuzzy_mode( FUZZY_OBJECT *fi ) { if ( !fi ) return; if ( fi->stemmer->stemmer_free ) fi->stemmer->stemmer_free( fi->snowball_options ); efree( fi ); } /* * Selects the fuzzy mode by passing in a string describing the fuzzy mode. * Returns a pointer to a structure, or null if can't find a valid stemmer. */ FUZZY_OBJECT *set_fuzzy_mode(FUZZY_OBJECT *fi, char *param ) { int i; for (i = 0; i < (int)(sizeof(fuzzy_opts) / sizeof(fuzzy_opts[0])); i++) if ( 0 == strcasecmp(fuzzy_opts[i].name, param ) ) { if ( fuzzy_opts[i].name == "Stem" || fuzzy_opts[i].name == "Stemming_en" ) { fprintf(stderr, "*************\n"); fprintf(stderr, " Old stemmer '%s' is no longer supported -- using Stemming_en1 instead.\n", fuzzy_opts[i].name); fprintf(stderr, " Please update your config file.\n*************\n"); } return create_fuzzy_struct( fi, &fuzzy_opts[i] ); } return NULL; } /* Sets FUZZY_OBJECT struc (fi) based on the integer mode passed in */ /* Used to set the fuzzy data structure based on the value stored in the index header */ /* This one will fail badly (abort) since it's getting the value from the index file */ FUZZY_OBJECT *get_fuzzy_mode( FUZZY_OBJECT *fi, int fuzzy ) { int i; for (i = 0; i < (int)(sizeof(fuzzy_opts) / sizeof(fuzzy_opts[0])); i++) if ( (FuzzyIndexType)fuzzy == fuzzy_opts[i].fuzzy_mode ) { return create_fuzzy_struct( fi, &fuzzy_opts[i] ); } progerr("Invalid FuzzyIndexingMode '%d' in index file", fuzzy); return NULL; } /* * Converts a fuzzy mode to a string by looking up the string in the table. */ const char *fuzzy_string( FUZZY_OBJECT *fi ) { if ( !fi ) return "Unknown FuzzyIndexingMode"; return fi->stemmer->name; } FuzzyIndexType fuzzy_mode_value( FUZZY_OBJECT *fi ) { if ( !fi ) return FUZZY_NONE; return fi->stemmer->fuzzy_mode; } int stemmer_applied( FUZZY_OBJECT *fi ) { return (FUZZY_NONE != fi->stemmer->fuzzy_mode ) ? 1 : 0; } /* 06/2003 Jose Ruiz - Interface to snowball's spanish stemmer */ static FUZZY_WORD *Stem_snowball( FUZZY_OBJECT *fi, const char *inword) { char *out_word; struct SN_env *snowball = fi->snowball_options; FUZZY_WORD *fw = create_fuzzy_word( inword, 1 ); /* create place to store stemmed word */ SN_set_current(snowball,strlen(inword),(const symbol *)inword); /* Set Word to Stem */ fi->stemmer->lang_stem(snowball); /* Stem the word */ if ( 0 == snowball->l ) { fw->error = STEM_TO_NOTHING; return fw; } fw->free_strings = 1; /* flag that malloc is used */ out_word = emalloc(snowball->l + 1); memcpy(out_word, snowball->p, snowball->l); out_word[snowball->l] = '\0'; fw->string_list[0] = out_word; return fw; } static FUZZY_WORD *double_metaphone( FUZZY_OBJECT *fi, const char *inword) { FUZZY_WORD *fw = create_fuzzy_word( inword, 2 ); /* create place to store stemmed word */ char *codes[2]; DoubleMetaphone( inword, codes ); if ( !(*codes[0]) ) /* was there at least one conversion? */ { efree( codes[0] ); efree( codes[1] ); return fw; } fw->free_strings = 1; fw->string_list[0] = codes[0]; /* Is double metaphone enabled? */ if ( FUZZY_DOUBLE_METAPHONE != fi->stemmer->fuzzy_mode ) return fw; /* Is there a second metaphone that is different from the first? */ if ( *codes[1] && strcmp(codes[0], codes[1]) ) { fw->list_size++; fw->string_list[1] = codes[1]; } else { efree( codes[1] ); } return fw; } /************************************************************************* * * These routines are the API interface for stemming. * *************************************************************************/ /************************************************************************* * SwishStemWord -- utility function to stem a word * * This stores the stemmed word locally so it can be freed # *Depreciated* because this only calls the original stemmer. * **************************************************************************/ char *SwishStemWord( SWISH *sw, char *word ) { FUZZY_OBJECT *fo = NULL; FUZZY_WORD *fw = NULL; if ( sw->stemmed_word ) { efree( sw->stemmed_word ); sw->stemmed_word = NULL; } fo = set_fuzzy_mode( fo, "Stem" ); if ( !fo ) return sw->stemmed_word; fw = fuzzy_convert( fo, word ); sw->stemmed_word = estrdup( fw->string_list[0] ); fuzzy_free_word( fw ); free_fuzzy_mode( fo ); return sw->stemmed_word; } /************************************************************************ * SwishFuzzyWord -- utility function to stem a word based on the current result * (that is, the stemming mode of the current index. * * This is really a method call on a RESULT and returns a new object * Currently this requires calling SwishFreeFuzzyWord to free memory. * **************************************************************************/ FUZZY_WORD *SwishFuzzyWord( RESULT *r, char *word ) { if ( !r ) return NULL; return fuzzy_convert( r->db_results->indexf->header.fuzzy_data, word ); } const char *SwishFuzzyMode( RESULT *r ) { return fuzzy_string( r->db_results->indexf->header.fuzzy_data ); } /* * These are accessors for the FUZZY_WORD */ /* Returns the list of words */ const char **SwishFuzzyWordList( FUZZY_WORD *fw ) { if ( !fw ) return NULL; return (const char **)fw->word_list; } /* Returns the number of words in the list */ int SwishFuzzyWordCount( FUZZY_WORD *fw ) { if ( !fw ) return 0; return fw->list_size; } /* Returns the integer value of the error */ int SwishFuzzyWordError( FUZZY_WORD *fw ) { if ( !fw ) return -1; return (int)fw->error; } /* Frees the word */ void SwishFuzzyWordFree( FUZZY_WORD *fw ) { fuzzy_free_word( fw ); } /******************************************************************************* Stemmer access for SWISH object. idea is to be able to stem a word if an index is named specifically, rather than waiting to have a RESULT object. this is for API, to allow access to stemming functions without searching. karman - Wed Oct 27 10:51:03 CDT 2004 *********************************************************************************/ FUZZY_WORD *SwishFuzzify( SWISH *sw, const char *index_name, char *word ) { /* create FUZZY object like SwishFuzzyWord does, but with named index */ IndexFILE *indexf = indexf_by_name( sw, index_name ); if ( !sw ) progerr("SwishFuzzify requires a valid swish handle"); if ( !indexf ) { set_progerr( HEADER_READ_ERROR, sw, "Index file '%s' is not an active index file", index_name ); return( NULL ); } if ( !word ) return NULL; return fuzzy_convert( indexf->header.fuzzy_data, word ); } /* end SWISH stemmer function *************************************************/ swish-e-2.4.7/src/filter.c0000664000077100017500000003531711166010110012244 00000000000000/* $Id: filter.c 1859 2007-01-05 22:14:10Z whmoseley $ This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 15:51:39 CDT 2005 ** added GPL ** ** 1998-07-04 rasc original filter code ** 1999-08-07 rasc ** 2001-02-28 rasc own module started for filters ** some functions rewritten and enhanced... ** 2001-04-09 rasc options for filters (%f, etc) ** 2001-05-31 rasc fix for possible crashes (NULL checks) ** */ #include #include "swish.h" #include "file.h" #include "mem.h" #include "swstring.h" #include "error.h" #include "filter.h" #include #ifdef HAVE_WORKING_FORK #include #include #ifdef HAVE_SYS_WAIT_H #include #endif /* HAVE_SYS_WAIT_H */ #endif /* HAVE_WORKING_FORK */ /* private module prototypes */ static FilterList *addfilter(FilterList *rp, char *FilterSuffix, char *FilterProg, char *options, char *FilterDir, char **regex); static char *expand_options( FileProp * fprop, char * template ); static char *expand_percent( FileProp * fprop, char escape_code ); #ifdef HAVE_WORKING_FORK static void fork_program( FileProp * fprop, char **arg ); #else static char *join_string( char **string_list ); #endif /* -- init structures for filters */ void initModule_Filter(SWISH * sw) { struct MOD_Filter *md; md = (struct MOD_Filter *) emalloc(sizeof(struct MOD_Filter)); memset( md, 0, sizeof( struct MOD_Filter ) ); sw->Filter = md; return; } /* -- release structures for filters -- release all wired memory -- 2001-04-09 rasc */ void freeModule_Filter(SWISH * sw) { struct MOD_Filter *md = sw->Filter; if (md->filterdir) efree(md->filterdir); /* free FilterDir */ /* Free the FileFilterMatch selections */ if ( md->filterlist ) { FilterList *fm = md->filterlist; FilterList *fm2; while( fm ) { efree( fm->prog ); free_regex_list( &fm->regex ); if ( fm->options ) freeStringList( fm->options ); if ( fm->suffix ) efree( fm->suffix ); fm2 = fm; fm = fm->next; efree( fm2 ); } md->filterlist = NULL; } efree(sw->Filter); /* free modul data structure */ sw->Filter = NULL; return; } /* -- Config Directives -- Configuration directives for this Module -- return: 0/1 = none/config applied */ int configModule_Filter(SWISH * sw, StringList * sl) { struct MOD_Filter *md = sw->Filter; char *w0 = sl->word[0]; if (strcasecmp(w0, "FilterDir") == 0) { /* 1999-05-05 rasc */ if (sl->n == 2) { md->filterdir = estrredup(md->filterdir, sl->word[1]); normalize_path( md->filterdir ); if (!isdirectory(md->filterdir)) { progerr("%s: %s is not a directory", w0, md->filterdir); } } else progerr("%s: requires one value", w0); return 1; } if (strcasecmp(w0, "FileFilter") == 0) { /* 1999-05-05 rasc */ /* FileFilter fileextension filterprog [options] */ if (sl->n == 3 || sl->n == 4) md->filterlist = addfilter(md->filterlist, sl->word[1], sl->word[2], sl->word[3], md->filterdir, NULL); else progerr("%s: requires \"extension\" \"filter\" \"[options]\"", w0); return 1; } /* added March 16, 2002 - moseley */ if ( strcasecmp( w0, "FileFilterMatch") == 0 ) { if ( sl->n < 4 ) progerr("%s requires at least three parameters: 'filterprog' 'options' 'regexp' ['regexp'...]\n"); md->filterlist = addfilter(md->filterlist, NULL, sl->word[1], sl->word[2], md->filterdir, &(sl->word[3]) ); return 1; } return 0; /* not a filter directive */ } /* -- Add a filter to the filterlist (file ext -> filterprog [cmd-options]) -- (filterdir may be NULL) -- 1999-08-07 rasc -- 2001-02-28 rasc -- 2001-04-09 rasc options, maybe NULL */ static FilterList *addfilter(FilterList *rp, char *suffix, char *prog, char *options, char *filterdir, char **regex ) { FilterList *newnode; char *buf; normalize_path( prog ); /* Don't really see how this is right */ newnode = (FilterList *) emalloc(sizeof(FilterList)); memset( newnode, 0, sizeof( FilterList ) ); newnode->suffix = (char *) estrdup(suffix); /* Parse the filter options into tokens */ newnode->options = parse_line( options ? options : "%p %P" ); /* If this is a FileFilterMatch then add patterns */ if ( regex ) add_regex_patterns( prog, &newnode->regex, regex, 1 ); /* Append the directory on */ if ( filterdir && (*prog != '/' ) ) { buf = emalloc( strlen( filterdir ) + strlen( prog ) + 2 ); *buf = '\0'; strcat( buf, filterdir ); strcat( buf, "/" ); strcat( buf, prog ); newnode->prog = buf; } else newnode->prog = estrdup( prog ); newnode->next = NULL; if (rp == NULL) rp = newnode; else { /* add to end of list */ FilterList *f = rp; while ( f->next ) f = f->next; f->next = (struct FilterList *)newnode; } return rp; } /* -- Check, if a filter is needed to retrieve information from a file -- Returns NULL or path to filter prog according conf file. -- 1999-08-07 rasc -- 2001-02-28 rasc rewritten, now possible: search for ".pdf.gz", etc. -- 3002-05-04 rasc adapted to new module design (wow, and way into the future!) -- 2002-03-16 moseley added regexp check (other code not reviewed) */ FilterList *hasfilter(SWISH * sw, char *filename) { struct MOD_Filter *md = sw->Filter; FilterList *fl; char *s, *fe; fl = md->filterlist; if (!fl) return (FilterList *) NULL; fe = (filename + strlen(filename)); while (fl != NULL) { /* added regex check - moseley */ if ( fl->regex ) { if ( match_regex_list( filename, fl->regex, "Filter match" ) ) return fl; } else { s = fe - strlen(fl->suffix); if (s >= filename) { /* no negative overflow! */ if (!strcasecmp(fl->suffix, s)) { return fl; } } } fl = fl->next; } return (FilterList *) NULL; } /* -- open filter (in: file, out: FILE *) -- params are in (FileProp *) - but should be adapted later -- Return: fprop->fp Sets: fprop->fp FILE* fprop->filter_pid pid of filter, if forked */ FILE *FilterOpen(FileProp * fprop) { FilterList *fi = fprop->hasfilter; char **options = fi->options->word; int num_options = fi->options->n; char **arg; /* where to store expanded arguments */ int arg_size; #ifndef HAVE_WORKING_FORK char *command; /* command string used with popen */ #endif int n; fprop->fp = NULL; /* Create second argument list that gets updated during each run */ arg_size = ( num_options + 2 ) * sizeof(char*); arg = emalloc( arg_size ); memset( arg, 0, arg_size ); arg[0] = estrdup( fi->prog ); /* arg[0] is the program name by convention */ #if defined(_WIN32) && !defined(__CYGWIN__) make_windows_path( arg[0] ); #endif for ( n = 0; n < num_options; n++ ) arg[n+1] = expand_options( fprop, options[n] ); #ifdef HAVE_WORKING_FORK fork_program( fprop, arg ); #else command = join_string( arg ); fprop->fp = popen(command, F_READ_TEXT); /* Open stream */ efree( command ); #endif /* HAVE_WORKING_FORK */ /* Free up memory used by args list */ options = arg; while ( *options ) { efree( *options ); options++; } efree( arg ); return fprop->fp; } /* * Expands the % escapes. * Pass in: * fprop - for file names * template - string with possibly unexpanded % escapes * Returns: * pointer to a newly allocated string * */ static char *expand_options( FileProp * fprop, char * template ) { int cur_size = strlen( template ); char *outstr = (char *) emalloc( cur_size + 1 ); char *cur_char = outstr; char *tmpstr; while( *template ) { switch (*template) { case '\\': /* convert encoded char to char */ *(cur_char++) = charDecode_C_Escape(template, &template); break; case '%': template++; tmpstr = expand_percent( fprop, *template ); template++; if ( tmpstr ) { *cur_char = '\0'; /* terminate */ #if defined(_WIN32) && !defined(__CYGWIN__) make_windows_path( tmpstr ); #endif /* expand string to hold new string */ cur_size += strlen( tmpstr ); outstr = erealloc( outstr, cur_size + 1 ); /* cat new string on to output string */ strcat( outstr, tmpstr ); efree( tmpstr ); /* Set current char pointer */ cur_char = outstr + strlen( outstr ); } break; default: *(cur_char++) = *(template++); break; } } *cur_char = '\0'; return outstr; } /* * expand_percent -- expands escape code into string * returns a string which must be freed. */ static char *expand_percent( FileProp * fprop, char escape_code ) { switch( escape_code ) { case 'P': return estrdup( fprop->real_path ? fprop->real_path : "" ); case 'p': return estrdup( fprop->work_path ? fprop->work_path : "" ); case 'F': return estrdup( fprop->real_filename ? fprop->real_filename : "" ); case 'f': return estrdup( fprop->work_path ? str_basename( fprop->work_path ) : "" ); /* cstr_dirname allocates memory */ case 'D': return fprop->real_path ? cstr_dirname( fprop->real_path ) : estrdup(""); case 'd': return fprop->work_path ? cstr_dirname( fprop->work_path ) : estrdup(""); case '%': return estrdup( "%" ); default: progerr("Failed to decode percent escape in FileFilter* directive [%%%c]", escape_code ); } return NULL; } /* -- Close filter stream -- return: errcode */ int FilterClose(FileProp *fprop) { #ifdef HAVE_WORKING_FORK FilterList *fl = fprop->hasfilter; char *prog = fl->prog; #ifdef HAVE_SYS_WAIT_H int status; pid_t pid; #ifdef HAVE_KILL pid = waitpid( fprop->filter_pid, &status, WNOHANG ); /* Is program still running? */ if ( 0 == pid ) { if ( -1 == kill( fprop->filter_pid, 9 ) ) progerrno("Failed to kill filter program with pid %d", fprop->filter_pid ); /* Now reap killed filter */ pid = waitpid( fprop->filter_pid, &status, 0 ); } #else pid = wait(&status); #endif /* HAVE_KILL */ if ( !WIFEXITED(status) ) progwarn("filter '%s' did not terminate normally", prog ); else if ( WEXITSTATUS(status) ) progwarn("filter '%s' exited with non-zero status: [%d]", prog, WEXITSTATUS(status)); else if ( WIFSIGNALED(status) ) progwarn("filter '%s' killed by signal: [%d]", prog, WTERMSIG(status) ); #endif /* HAVE_SYS_WAIT_H */ if ( fclose( fprop->fp ) != 0 ) progwarnno("Error closing filter '%s'", prog ); return 0; #else return pclose(fprop->fp); #endif /* HAVE_WORKING_FORK */ } /* This should be elsewhere, but at this time this is the only place * it is used. */ #ifdef HAVE_WORKING_FORK static void fork_program( FileProp * fprop, char **arg ) { pid_t pid; int pipe_fd[2]; FILE *fi; if ( pipe( pipe_fd ) ) progerrno( "failed to create pipe for running [%s]: ", *arg ); if ( (pid = fork()) == -1 ) progerrno( "failed to fork for running [%s]: ", *arg ); if ( !pid ) { /* child process */ close( pipe_fd[0] ); /* close the reading end of the pipe */ /* Make child's stdout go to pipe */ if ( dup2( pipe_fd[1], 1 ) == -1 ) fprintf( stderr, "failed to dup stdout in child process [%s]: %s", arg[0], strerror(errno) ); /* Set non-buffered output */ /* setvbuf(stdout,(char*)NULL,_IONBF,0); */ /* Now exec */ execvp( arg[0], arg ); /* can't use progerr since it writes to stdout by default. */ fprintf( stderr, "Failed to exec program [%s]: %s ", arg[0], strerror(errno) ); exit(1); } /* in parent */ close( pipe_fd[1] ); /* close writing end of pipe */ fi = fdopen( pipe_fd[0], "r" ); if ( fi == NULL ) progerrno( "failed fdopen for filter program [%s]: ", arg[0] ); fprop->fp = fi; fprop->filter_pid = pid; } #else /* HAVE_WORKING_FORK */ static char *join_string( char **string_list ) { int len = 0; /* total size of strings */ char **cur_string = string_list; char *outstr; int first = 1; char *quote_char = "\""; while ( *cur_string ) { char *str = *cur_string; len += strlen( str ) + 3; /* plus two for the quotes and one for the space */ cur_string++; } outstr = (char *) emalloc( len + 1 ); outstr[0] = '\0'; while ( *string_list ) { if ( !first ){ strcat( outstr, " " ); strcat( outstr, quote_char ); strcat( outstr, *string_list ); strcat( outstr, quote_char ); } else { strcat( outstr, *string_list ); } first = 0; string_list++; } return outstr; } #endif /* HAVE_WORKING_FORK */ swish-e-2.4.7/src/swish-e.h0000775000077100017500000001625011166013126012353 00000000000000/* $Id: swish-e.h 2295 2009-04-05 02:23:49Z karpet $ This file is part of Swish-e. Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Swish-e is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Swish-e; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA See the COPYING file that accompanies the Swish-e distribution for details of the GNU GPL and the special exception available for linking against the Swish-e library. ** Mon May 9 18:19:34 CDT 2005 ** added GPL */ #ifndef SEARCHSWISH_H #define SEARCHSWISH_H 1 #include "time.h" /* for time_t, which isn't really needed */ #ifdef __cplusplus extern "C" { #endif typedef void * SW_HANDLE; typedef void * SW_SEARCH; typedef void * SW_RESULTS; typedef void * SW_RESULT; typedef void * SW_FUZZYWORD; /* access to the swish-e stemmers */ /* These must match headers.h */ typedef enum { SWISH_NUMBER, SWISH_STRING, SWISH_LIST, SWISH_BOOL, SWISH_WORD_HASH, SWISH_OTHER_DATA, SWISH_HEADER_ERROR /* must check error in this case */ } SWISH_HEADER_TYPE; typedef union { const char *string; const char **string_list; unsigned long number; int boolean; } SWISH_HEADER_VALUE; const char **SwishHeaderNames( SW_HANDLE ); /* fetch the list of available header names */ const char **SwishIndexNames( SW_HANDLE ); /* fetch list of index files names associated */ SWISH_HEADER_VALUE SwishHeaderValue( SW_HANDLE, const char *index_name, const char *cur_header, SWISH_HEADER_TYPE *type ); SWISH_HEADER_VALUE SwishResultIndexValue( SW_RESULT, const char *name, SWISH_HEADER_TYPE *type ); typedef const void * SW_META; typedef SW_META * SWISH_META_LIST; /* Meta and Property Values */ #define SW_META_TYPE_UNDEF 0 #define SW_META_TYPE_STRING 4 #define SW_META_TYPE_ULONG 8 #define SW_META_TYPE_DATE 16 SWISH_META_LIST SwishMetaList( SW_HANDLE, const char *index_name ); SWISH_META_LIST SwishPropertyList( SW_HANDLE, const char *index_name ); SWISH_META_LIST SwishResultMetaList( SW_RESULT ); SWISH_META_LIST SwishResultPropertyList( SW_RESULT ); const char *SwishMetaName( SW_META ); int SwishMetaType( SW_META ); int SwishMetaID( SW_META ); /* Limit searches by structure */ #define IN_FILE_BIT 0 #define IN_TITLE_BIT 1 #define IN_HEAD_BIT 2 #define IN_BODY_BIT 3 #define IN_COMMENTS_BIT 4 #define IN_HEADER_BIT 5 #define IN_EMPHASIZED_BIT 6 #define IN_META_BIT 7 #define IN_FILE (1< BUFFER_CHUNK_SIZE. ** ** The buffer is really only flushed when a real metaName or PropertyName is ** found, or when the strucutre changes -- anything that changes the ** properities of the text that might be in the buffer. ** ** An optional arrangement might be to flush the buffer after processing each ** READ_CHUNK_SIZE from the stream (flush to last word). This would limit the ** character buffer size. It might be nice to flush on any meta tag (not just ** tags listed as PropertyNames or MetaNames), but for large XML files one would ** expect some use of Meta/PropertyNames. HTML files should flush more often ** since the structure will change often. Exceptions to this are large
**  sections, but then the append_buffer() routine will force a flush when the buffer
**  exceeds BUFFER_CHUNK_SIZE.
**
**  The TXT buffer does flush after every chunk read.
**
**  I doubt messing with any of these would change much...
**
**
** TODO:
**
**  - FileRules title (and all abort_parsing calls - define some constants)
**
**  - There's a lot of mixing of xmlChar and char, which will generate warnings.
**
**  - Add a fprop->orig_path (before ReplaceRules) and a directive BaseURI to be used
**    to fixup relative urls (if no ).
**    This would save space in the property file, but probably not enough to worry about.
**
**
**  - UndefinedMetaTags ignore might throw things like structure off since
**    processing continues (unlike IgnoreMetaTags).  But everything should balance out.
**
**  - There are two buffers that are created for every file, but these could be done once
**    and only expanded when needed.  If that would make any difference in indexing speed.
**
**  - Note that these parse_*() functions get passed a "buffer" which is not used
**    (to be compatible with old swihs-e buffer-based parsers)
**
**  - XML elements and attributes are all converted to lowercase.
**
*/

/* libxml2 */
#include 
#include 
#include 


#include   // for va_list
#ifdef HAVE_VARARGS_H
#include   // va_list on Win32
#endif
#include "swish.h"
#include "fs.h"  // for the title check
#include "merge.h"
#include "mem.h"
#include "swstring.h"
#include "search.h"
#include "docprop.h"
#include "error.h"
#include "index.h"
#include "metanames.h"


/* Should be in config.h */

#define BUFFER_CHUNK_SIZE 10000 // This is the size of buffers used to accumulate text
#define READ_CHUNK_SIZE 2048    // The size of chunks read from the stream (4096 seems to cause problems)

/* to buffer text until an end tag is found */

typedef struct {
    char   *buffer;     // text for buffer
    int     cur;        // length
    int     max;        // max size of buffer
    int     defaultID;  // default ID for no meta names.
} CHAR_BUFFER;



// I think that the property system can deal with StoreDescription in a cleaner way.
// This code shouldn't need to know about that StoreDescription.

typedef struct {
    struct metaEntry    *meta;
    int                 save_size;   /* save max size */
    char                *tag;        /* summary tag */
    int                 active;      /* inside summary */
} SUMMARY_INFO;

#define STACK_SIZE 255  // stack size, but can grow.

typedef struct MetaStackElement {
    struct MetaStackElement *next;      // pointer to *siblings*, if any
    struct metaEntry        *meta;      // pointer to meta that's inuse
    int                      ignore;    // flag that this meta turned on ignore
    char                     tag[1];    // tag to look for
} MetaStackElement, *MetaStackElementPtr;

typedef struct {
    int                 pointer;        // next empty slot in stack
    int                 maxsize;        // size of stack
    int                 ignore_flag;    // count of ignores
    MetaStackElementPtr *stack;         // pointer to an array of stack data
    int                 is_meta;        // is this a metaname or property stack?
} MetaStack;






/* This struct is returned in all call-back functions as user data */

typedef struct {
    CHAR_BUFFER         text_buffer;    // buffer for collecting text
 // CHAR_BUFFER         prop_buffer;    // someday, may want a separate property buffer if want to collect tags within props
    SUMMARY_INFO        summary;        // argh.
    MetaStack           meta_stack;     // stacks for tracking the nested metas
    MetaStack           prop_stack;
    int                 total_words;
    int                 word_pos;
    int                 filenum;
    INDEXDATAHEADER    *header;
    SWISH              *sw;
    FileProp           *fprop;
    FileRec            *thisFileEntry;
    int                 structure[STRUCTURE_END+1];
    int                 parsing_html;
    struct metaEntry   *titleProp;
    struct metaEntry   *titleMeta;
    struct metaEntry   *swishdefaultMeta;
    int                 flush_word;         // flag to flush buffer next time there's a white space.
    xmlSAXHandlerPtr    SAXHandler;         // for aborting, I guess.
    xmlParserCtxtPtr    ctxt;
    CHAR_BUFFER         ISO_Latin1;         // buffer to hold UTF-8 -> ISO Latin-1 converted text
    int                 abort;              // flag to stop parsing
    char               *baseURL;            // for fixing up relative links
    int                 swish_noindex;      // swishindex swishnoindex -- for hiding blocks with comments
} PARSE_DATA;


/* Prototypes */
static void start_hndl(void *data, const char *el, const char **attr);
static void end_hndl(void *data, const char *el);
static void char_hndl(void *data, const char *txt, int txtlen);
static void Whitespace(void *data, const xmlChar *txt, int txtlen);
static void append_buffer( CHAR_BUFFER *buf, const char *txt, int txtlen );
static void flush_buffer( PARSE_DATA  *parse_data, int clear );
static void comment_hndl(void *data, const char *txt);
static char *isIgnoreMetaName(SWISH * sw, char *tag);
static void error(void *data, const char *msg, ...);
static void warning(void *data, const char *msg, ...);
static void process_htmlmeta( PARSE_DATA *parse_data, const char ** attr );
static int check_html_tag( PARSE_DATA *parse_data, char * tag, int start );
static void start_metaTag( PARSE_DATA *parse_data, char * tag, char *endtag, int *meta_append, int *prop_append , int is_html_tag );
static void end_metaTag( PARSE_DATA *parse_data, char * tag, int is_html_tag );
static void init_sax_handler( xmlSAXHandlerPtr SAXHandler, SWISH * sw );
static void init_parse_data( PARSE_DATA *parse_data, SWISH * sw, FileProp * fprop, FileRec *fi, xmlSAXHandlerPtr SAXHandler  );
static void free_parse_data( PARSE_DATA *parse_data );
static void Convert_to_latin1( PARSE_DATA *parse_data, char *txt, int txtlen );
static int parse_chunks( PARSE_DATA *parse_data );

static void index_alt_tab( PARSE_DATA *parse_data, const char **attr );
static char *extract_html_links( PARSE_DATA *parse_data, const char **attr, struct metaEntry *meta_entry, char *tag );
static int read_next_chunk( FileProp *fprop, char *buf, int buf_size, int max_size );
static void abort_parsing( PARSE_DATA *parse_data, int abort_code );
static int get_structure( PARSE_DATA *parse_data );

static void push_stack( MetaStack *stack, char *tag, struct metaEntry *meta, int *append, int ignore );
static int pop_stack_ifMatch( PARSE_DATA *parse_data, MetaStack *stack, char *tag );
static int pop_stack( MetaStack *stack );

static void index_XML_attributes( PARSE_DATA *parse_data, char *tag, const char **attr );
static int  start_XML_ClassAttributes(  PARSE_DATA *parse_data, char *tag, const char **attr, int *meta_append, int *prop_append );
static char *isXMLClassAttribute(SWISH * sw, char *tag);

static void debug_show_tag( char *tag, PARSE_DATA *parse_data, int start, char *message );
static void debug_show_parsed_text( PARSE_DATA *parse_data, char *txt, int len );



/*********************************************************************
*   XML Push parser
*
*   Returns:
*       Count of words indexed
*
*
*********************************************************************/

int parse_XML(SWISH * sw, FileProp * fprop, FileRec *fi, char *buffer)

{
    xmlSAXHandler       SAXHandlerStruct;
    xmlSAXHandlerPtr    SAXHandler = &SAXHandlerStruct;
    PARSE_DATA          parse_data;


    init_sax_handler( SAXHandler, sw );
    init_parse_data( &parse_data, sw, fprop, fi, SAXHandler );


    /* Now parse the XML file */
    return parse_chunks( &parse_data );

}

/*********************************************************************
*   HTML Push parser
*
*   Returns:
*       Count of words indexed
*
*********************************************************************/

int parse_HTML(SWISH * sw, FileProp * fprop, FileRec *fi, char *buffer)
{
    htmlSAXHandler       SAXHandlerStruct;
    htmlSAXHandlerPtr    SAXHandler = &SAXHandlerStruct;
    PARSE_DATA           parse_data;

    init_sax_handler( (xmlSAXHandlerPtr)SAXHandler, sw );
    init_parse_data( &parse_data, sw, fprop, fi, (xmlSAXHandlerPtr)SAXHandler );


    parse_data.parsing_html = 1;
    parse_data.titleProp    = getPropNameByName( parse_data.header, AUTOPROPERTY_TITLE );
    parse_data.titleMeta    = getMetaNameByName( parse_data.header, AUTOPROPERTY_TITLE );
    parse_data.swishdefaultMeta = getMetaNameByName( parse_data.header, AUTOPROPERTY_DEFAULT );

    /* Now parse the HTML file */
    return parse_chunks( &parse_data );

}

/*********************************************************************
*   TXT "Push" parser
*
*   Returns:
*       Count of words indexed
*
*********************************************************************/

int parse_TXT(SWISH * sw, FileProp * fprop, FileRec *fi, char *buffer)
{
    PARSE_DATA          parse_data;
    int                 res;
    char       chars[READ_CHUNK_SIZE];



    /* This does stuff that's not needed for txt */
    init_parse_data( &parse_data, sw, fprop, fi, NULL );


    /* Document Summary */
    if ( parse_data.summary.meta && parse_data.summary.meta->max_len )
        parse_data.summary.active++;


    while ( (res = read_next_chunk( fprop, chars, READ_CHUNK_SIZE, sw->truncateDocSize )) )
    {
        append_buffer( &parse_data.text_buffer, chars, res );
        flush_buffer( &parse_data, 0 );  // flush upto whitespace


        /* turn off summary when we exceed size */
        if ( parse_data.summary.meta && parse_data.summary.meta->max_len && fprop->bytes_read > parse_data.summary.meta->max_len )
            parse_data.summary.active = 0;

    }

    flush_buffer( &parse_data, 1 );
    free_parse_data( &parse_data );
    return parse_data.total_words;
}


/*********************************************************************
*   Parse chunks (used for both XML and HTML parsing)
*   Creates the parsers, reads in chunks as one might expect
*
*
*********************************************************************/
static int parse_chunks( PARSE_DATA *parse_data )
{
    SWISH              *sw = parse_data->sw;
    FileProp           *fprop = parse_data->fprop;
    xmlSAXHandlerPtr    SAXHandler = parse_data->SAXHandler;
    int                 res;
    char       chars[READ_CHUNK_SIZE];
    xmlParserCtxtPtr    ctxt;


    /* Now start pulling into the libxml2 parser */

    res = read_next_chunk( fprop, chars, READ_CHUNK_SIZE, sw->truncateDocSize );
    if (res == 0)
        return 0;

    /* Create parser */
    if ( parse_data->parsing_html )
        ctxt = (xmlParserCtxtPtr)htmlCreatePushParserCtxt((htmlSAXHandlerPtr)SAXHandler, parse_data, chars, res, fprop->real_path,0);
    else
        ctxt = xmlCreatePushParserCtxt(SAXHandler, parse_data, chars, res, fprop->real_path);

    parse_data->ctxt = ctxt; // save



    while ( !parse_data->abort && (res = read_next_chunk( fprop, chars, READ_CHUNK_SIZE, sw->truncateDocSize )) )
    {
        if ( parse_data->parsing_html )
            htmlParseChunk((htmlParserCtxtPtr)ctxt, chars, res, 0);
        else
            xmlParseChunk(ctxt, chars, res, 0);

        /* Doesn't seem to make much difference to flush here */
        //flush_buffer( parse_data, 0 );  // flush upto whitespace
    }



    /* Tell the parser we are done, and free it */
    if ( parse_data->parsing_html )
    {
        if ( !parse_data->abort ) // bug in libxml 2.4.5
            htmlParseChunk( (htmlParserCtxtPtr)ctxt, chars, 0, 1 );
        htmlFreeParserCtxt( (htmlParserCtxtPtr)ctxt);
    }
    else
    {
        if ( !parse_data->abort ) // bug in libxml
            xmlParseChunk(ctxt, chars, 0, 1);
        xmlFreeParserCtxt(ctxt);
    }

    /* Daniel Veillard on Nov 21, 2001 says this should not be called for every doc. */
    // But, it probably should be called when done parsing.
    // xmlCleanupParser();


    /* Check for abort condition set while parsing (isoktitle, NoContents) */
    /* (But may not abort if the HTML parser never sees any data) */

    if ( fprop->index_no_content && !parse_data->total_words )
    {
        append_buffer( &parse_data->text_buffer, fprop->real_path, strlen(fprop->real_path) );

        parse_data->meta_stack.ignore_flag = 0;  /* make sure we can write */
        flush_buffer( parse_data, 3 );
    }


    /* Flush any text left in the buffer */

    if ( !parse_data->abort )
        flush_buffer( parse_data, 3 );



    free_parse_data( parse_data );


    // $$$ This doesn't work since the file (and maybe some words) already added
    // $$$ need a way to "remove" the file entry and words already added

    if ( parse_data->abort < 0 )
        return parse_data->abort;

    return parse_data->total_words;
}

/*********************************************************************
*   read_next_chunk - read another chunk from the stream
*
*   Call with:
*       fprop
*       *buf        - where to save the data
*       *buf_size   - max size of buffer
*       *max_size   - limit of *total* bytes read from this stream (for truncate)
*
*   Returns:
*       number of bytes read (as returned from fread)
*
*
*********************************************************************/
static int read_next_chunk( FileProp *fprop, char *buf, int buf_size, int max_size )
{
    int size;
    int res;

    if ( fprop->done )
        return 0;

    /* For -S prog, only read in the right amount of data */
    if ( fprop->external_program && (fprop->bytes_read >= fprop->fsize ))
        return 0;


    /* fprop->external_program is set if -S prog and NOT reading from a filter */

    size = fprop->external_program && (( fprop->fsize - fprop->bytes_read ) < buf_size)
           ? fprop->fsize - fprop->bytes_read
           : buf_size;

    if ( !fprop->bytes_read && size > 4 )
        size = 4;



    /* Truncate -- safety feature from Rainer.  No attempt is made to backup to a whole word */
    if ( max_size && fprop->bytes_read + size > max_size )
    {
        fprop->done++;  // flag that we are done
        size = max_size - fprop->bytes_read;
    }


    res = fread(buf, 1, size, fprop->fp);

    fprop->bytes_read += res;

    return res;
}



/*********************************************************************
*   Init a sax handler structure
*   Must pass in the structure
*
*********************************************************************/
static void init_sax_handler( xmlSAXHandlerPtr SAXHandler, SWISH * sw )
{
    /* Set event handlers for libxml2 parser */
    memset( SAXHandler, 0, sizeof( xmlSAXHandler ) );

    SAXHandler->startElement   = (startElementSAXFunc)&start_hndl;
    SAXHandler->endElement     = (endElementSAXFunc)&end_hndl;
    SAXHandler->characters     = (charactersSAXFunc)&char_hndl;
    SAXHandler->cdataBlock     = (charactersSAXFunc)&char_hndl;
    SAXHandler->ignorableWhitespace = (ignorableWhitespaceSAXFunc)&Whitespace;

    SAXHandler->comment    = (commentSAXFunc)&comment_hndl;

    if ( sw->parser_warn_level >= 1 )
        SAXHandler->fatalError     = (fatalErrorSAXFunc)&error;

    if ( sw->parser_warn_level >= 2 )
        SAXHandler->error          = (errorSAXFunc)&error;

    if ( sw->parser_warn_level >= 3 )
        SAXHandler->warning        = (warningSAXFunc)&warning;

}


/*********************************************************************
*   Init the parer data structure
*   Must pass in the structure
*
*********************************************************************/
static void init_parse_data( PARSE_DATA *parse_data, SWISH * sw, FileProp * fprop, FileRec *fi, xmlSAXHandlerPtr SAXHandler  )
{
    IndexFILE          *indexf = sw->indexlist;
    struct StoreDescription *stordesc = fprop->stordesc;

    /* Set defaults  */
    memset( parse_data, 0, sizeof(PARSE_DATA));

    parse_data->header      = &indexf->header;
    parse_data->sw          = sw;
    parse_data->fprop       = fprop;
    parse_data->filenum     = fi->filenum;
    parse_data->word_pos    = 1;  /* compress doesn't like zero */
    parse_data->SAXHandler  = SAXHandler;
    parse_data->thisFileEntry = fi;


    /* Don't really like this, as mentioned above */
    if ( stordesc && (parse_data->summary.meta = getPropNameByName(parse_data->header, AUTOPROPERTY_SUMMARY)))
    {
        /* Set property limit size for this document type, and store previous size limit */
        parse_data->summary.save_size = parse_data->summary.meta->max_len;
        parse_data->summary.meta->max_len = stordesc->size;
        parse_data->summary.tag = stordesc->field;
        if ( parse_data->summary.tag )
            strtolower(parse_data->summary.tag);
    }


    /* Initialize the meta and property stacks */
    /* Not needed for TXT processing, of course */
    {
        MetaStack   *s;

        s = &parse_data->meta_stack;
        s->is_meta = 1;
        s->maxsize = STACK_SIZE;

        s->stack = (MetaStackElementPtr *)emalloc( sizeof( MetaStackElementPtr ) * s->maxsize );
        if ( fprop->index_no_content )
            s->ignore_flag++;

        s = &parse_data->prop_stack;
        s->is_meta = 0;
        s->maxsize = STACK_SIZE;
        s->stack = (MetaStackElementPtr *)emalloc( sizeof( MetaStackElementPtr ) * s->maxsize );
        if ( fprop->index_no_content )  /* only works for HTML */
            s->ignore_flag++;
    }

    addCommonProperties(sw, fprop, fi, NULL, NULL, 0);
}


/*********************************************************************
*   Free any data used by the parse_data struct
*
*********************************************************************/
static void free_parse_data( PARSE_DATA *parse_data )
{

    if ( parse_data->ISO_Latin1.buffer )
        efree( parse_data->ISO_Latin1.buffer );

    if ( parse_data->text_buffer.buffer )
        efree( parse_data->text_buffer.buffer );

    if ( parse_data->baseURL )
        efree( parse_data->baseURL );


    /* Pop the stacks */
    while( pop_stack( &parse_data->meta_stack ) );
    while( pop_stack( &parse_data->prop_stack ) );

    /* Free the stacks */
    if ( parse_data->meta_stack.stack )
        efree( parse_data->meta_stack.stack );

    if ( parse_data->prop_stack.stack )
        efree( parse_data->prop_stack.stack );



    /* Restore the size in the StoreDescription property */
    if ( parse_data->summary.save_size )
        parse_data->summary.meta->max_len = parse_data->summary.save_size;

}

/*********************************************************************
*   Start Tag Event Handler
*
*   This is called by libxml2.  It normally just calls start_metaTag()
*   and that decides how to deal with that meta tag.
*   It also converts  and  into meta tags as swish
*   would expect them (and then calls start_metaTag().
*
*   To Do:
*       deal with attributes!
*
*********************************************************************/


static void start_hndl(void *data, const char *el, const char **attr)
{
    PARSE_DATA *parse_data = (PARSE_DATA *)data;
    char        tag[MAXSTRLEN + 1];
    int         is_html_tag = 0;   // to allow  type of meta tags in HTML
    int         meta_append = 0;   // used to allow siblings metanames
    int         prop_append = 0;


    /* disabeld by a comment? */
    if ( parse_data->swish_noindex )
        return;

    if(strlen(el) >= MAXSTRLEN)  // easy way out
    {
        warning( (void *)data, "Warning: Tag found in %s is too long: '%s'\n", parse_data->fprop->real_path, el );
        return;
    }

    strcpy(tag,(char *)el);
    strtolower( tag );  // xml?


    if ( parse_data->parsing_html )
    {

        /* handle  */
        if ( (strcmp( tag, "meta") == 0) && attr  )
        {
            process_htmlmeta( parse_data, attr );
            return;
        }


        /* Deal with structure */
        if ( (is_html_tag = check_html_tag( parse_data, tag, 1 )) )
        {
            /** Special handling for , , and  tags **/

            /* Extract out links - currently only keep  links */
            if ( strcmp( tag, "a") == 0 )
                extract_html_links( parse_data, attr, parse_data->sw->links_meta, "href" );


            /* Extract out links from images */
            else if ( strcmp( tag, "img") == 0 )
            {
                /* Index contents of ALT tag text */
                if (parse_data->sw->IndexAltTag)
                    index_alt_tab( parse_data, attr );

                extract_html_links( parse_data, attr, parse_data->sw->images_meta, "src" );
            }


            /* Extract out the BASE URL for fixups */
            else if ( strcmp( tag, "base") == 0 )
                parse_data->baseURL = estrdup( extract_html_links( parse_data, attr, NULL, "href" ) );
        }

    }


    /* Now check if we are in a meta tag */
    start_metaTag( parse_data, tag, tag, &meta_append, &prop_append, is_html_tag );



    /* Index the content of attributes */

    if ( !parse_data->parsing_html && attr )
    {
        int class_found = 0;

        /* Allow  to look like  */

        if ( parse_data->sw->XMLClassAttributes )
            class_found = start_XML_ClassAttributes( parse_data, tag, attr, &meta_append, &prop_append );


        /* Index XML attributes */

        if ( !class_found && parse_data->sw->UndefinedXMLAttributes != UNDEF_META_DISABLE )
            index_XML_attributes( parse_data, tag, attr );
    }

}





/*********************************************************************
*   End Tag Event Handler
*
*   Called by libxml2.
*
*
*
*********************************************************************/


static void end_hndl(void *data, const char *el)
{
    PARSE_DATA *parse_data = (PARSE_DATA *)data;
    char        tag[MAXSTRLEN + 1];
    int         is_html_tag = 0;  // to allow  type of metatags in html.


    /* disabeld by a comment? */
    if ( parse_data->swish_noindex )
        return;

    if(strlen(el) > MAXSTRLEN)
    {
        warning( (void *)data, "Warning: Tag found in %s is too long: '%s'\n", parse_data->fprop->real_path, el );
        return;
    }

    strcpy(tag,(char *)el);
    strtolower( tag );



    if ( parse_data->parsing_html )
    {

        /*  tags are closed in start_hndl */

        if ( (strcmp( tag, "meta") == 0)   )
            return;  // this was flushed at end tag



        /* Deal with structure */
        is_html_tag = check_html_tag( parse_data, tag, 0 );
    }


    end_metaTag( parse_data, tag, is_html_tag );
}



/*********************************************************************
*   Character Data Event Handler
*
*   This does the actual adding of text to the index and adding properties
*   if any tags have been found to index
*
*
*********************************************************************/

static void char_hndl(void *data, const char *txt, int txtlen)
{
    PARSE_DATA         *parse_data = (PARSE_DATA *)data;


    /* Have we been disabled? */
    if ( !parse_data->SAXHandler->characters )
        return;

    /* disabeld by a comment? */
    if ( parse_data->swish_noindex )
        return;


    /* If currently in an ignore block, then return */
    if ( parse_data->meta_stack.ignore_flag && parse_data->prop_stack.ignore_flag )
        return;

    /* $$$ this was added to limit the buffer size */
    if ( parse_data->text_buffer.cur + txtlen >= BUFFER_CHUNK_SIZE )
        flush_buffer( parse_data, 0 );  // flush upto last word - somewhat expensive



    Convert_to_latin1( parse_data, (char *)txt, txtlen );


    if ( DEBUG_MASK & DEBUG_PARSED_TEXT )
        debug_show_parsed_text( parse_data, parse_data->ISO_Latin1.buffer, parse_data->ISO_Latin1.cur );




    /* Check if we are waiting for a word boundry, and there is white space in the text */
    /* If so, write the word, then reset the structure, then write the rest of the text. */

    if ( parse_data->flush_word  )
    {
        /* look for whitespace */
        char *c = parse_data->ISO_Latin1.buffer;
        int   i;
        for ( i=0; i < parse_data->ISO_Latin1.cur; i++ )
            if ( isspace( (int)c[i] ) )
            {
                append_buffer( &parse_data->text_buffer, parse_data->ISO_Latin1.buffer, i );
                flush_buffer( parse_data, 1 );  // Flush the entire buffer

                parse_data->structure[parse_data->flush_word-1]--;  // now it's ok to turn of the structure bit
                parse_data->flush_word = 0;

                /* flush the rest */
                append_buffer( &parse_data->text_buffer, &c[i], parse_data->ISO_Latin1.cur - i );

                return;
            }
    }



    /* Buffer the text */
    append_buffer( &parse_data->text_buffer, parse_data->ISO_Latin1.buffer, parse_data->ISO_Latin1.cur );

    /* Some day, might want to have a separate property buffer if need to collect more than plain text */
    // append_buffer( &parse_data->prop_buffer, txt, txtlen );



}

/*********************************************************************
*   ignorableWhitespace handler
*
*   Just adds a space to the buffer
*
*
*********************************************************************/

static void Whitespace(void *data, const xmlChar *txt, int txtlen)
{
    PARSE_DATA         *parse_data = (PARSE_DATA *)data;

    append_buffer( &parse_data->text_buffer, " ", 1 );  // could flush buffer, I suppose
}




/*********************************************************************
*   Convert UTF-8 to Latin-1
*
*   Buffer is extended/created if needed
*
*********************************************************************/

static void Convert_to_latin1( PARSE_DATA *parse_data, char *txt, int txtlen )
{
    CHAR_BUFFER     *buf = &parse_data->ISO_Latin1;
    int             inlen = txtlen;
    int             ret;
    char  *start_buf;
    char  *end_buf = txt + txtlen - 1;
    int             used;


    /* (re)allocate buf if needed */

    if ( txtlen >= buf->max )
    {
        buf->max = ( buf->max + BUFFER_CHUNK_SIZE+1 < txtlen )
                    ? buf->max + txtlen+1
                    : buf->max + BUFFER_CHUNK_SIZE+1;

        buf->buffer = erealloc( buf->buffer, buf->max );
    }

    buf->cur = 0;  /* start at the beginning of the buffer */

    while( 1 )
    {
        used = buf->max - buf->cur;             /* size available in buffer */
        start_buf = &buf->buffer[buf->cur];     /* offset into buffer */

        /* Returns 0 for OK */
        ret = UTF8Toisolat1( (unsigned char *)start_buf, &used, (const unsigned char *)txt, &inlen );

        if ( used > 0 )         // tally up total bytes consumed
            buf->cur += used;

        if ( ret >= 0 )         /* libxml2 seems confused about what this should return */
            return;             /* either 0 for ok, or number of chars converted */

        if ( ret == -2 )        // encoding failed
        {
            if ( parse_data->sw->parser_warn_level >= 3 )
                xmlParserWarning(parse_data->ctxt, "Failed to convert internal UTF-8 to Latin-1.\nReplacing non ISO-8859-1 char with char '%c'\n", ENCODE_ERROR_CHAR);


            buf->buffer[buf->cur++] = ENCODE_ERROR_CHAR;


            /* Skip one UTF-8 character -- returns null if not pointing to a UTF-8 char */

            /*
             * xmlUTF8Strpos() calls xmlUTFStrlen() which requires a null-terminated string.
             * so, jump over the utf-8 char here.  Mostly from libxml2's encoding.c.
             *
             *
            if (  !(txt = (char *)xmlUTF8Strpos( (const xmlChar *)(&txt[inlen]), 1) ))
                return;
            */


            {
                char ch;
                txt += inlen;  /* point to the start of the utf-8 char (where conversion left off) */

                /* grab first utf-8 char and check that it's not null */
                if ( 0 == (ch = *txt++) )
                    return;

                /* Make sure valid starting utf-8 char (must be 11xxxxxx) */
                if ( 0xc0 != ( ch & 0xc0 ) )
                    return;

                /* Now skip over the aditional bytes based on the number of high-order bits */
                while ( (ch <<= 1 ) & 0x80 )
                {
                    /* Are we past the end of the buffer? */
                    if ( txt > end_buf )
                    {
                        if ( parse_data->sw->parser_warn_level >= 1 )
                            xmlParserError(parse_data->ctxt, "Incomplete UTF-8 character found\n" );

                        return;
                    }

                    /* all trailing bytes must be 10xx xxxx */
                    if ( 0x80 != (*txt++ & 0xc0) )
                    {
                        if ( parse_data->sw->parser_warn_level >= 1 )
                            xmlParserError(parse_data->ctxt, "Invalid UTF-8 sequence found. A secondary byte was: '0x%2x'\n", (unsigned char)txt[-1] );

                        return;
                    }
                }
            }

            /* Calculate the remaining length of the input string */
            inlen = (unsigned long)end_buf - (unsigned long)txt + 1;
            if ( inlen <= 0 )
                return;

            start_buf += buf->cur-1;
        }
        else
        {
            xmlParserWarning(parse_data->ctxt, "Error '%d' converting internal UTF-8 to Latin-1.\n", ret );
            return;
        }
    }
}


/*********************************************************************
*   Start of a MetaTag
*   All XML tags are metatags, but for HTML there's special handling.
*
*   Call with:
*       parse_data
*       tag         = tag to look for as a metaname/property
*       endtag      = tag to look for as the ending tag (since might be different from start tag)
*       meta_append = if zero, tells push that this is a new meta
*       prop_append   otherwise, says it's a sibling of a previous call
*                     (Argh Jan 29, 2001 -- now I don't remember what that _append does!)
*                     (it's for working with xml attributes)
*       is_html_tag = prevents UndefinedMetaTags from being applied to html tags
*
*    can start two meta tags "foo" and "foo.bar".  But "bar"
*   will end both tags.
*
*
*********************************************************************/
static void start_metaTag( PARSE_DATA *parse_data, char * tag, char *endtag, int *meta_append, int *prop_append, int is_html_tag )
{
    SWISH              *sw = parse_data->sw;
    struct metaEntry   *m = NULL;


    /* Bump on all meta names, unless overridden */
    if (!is_html_tag && !isDontBumpMetaName(sw->dontbumpstarttagslist, tag))
        parse_data->word_pos++;

    /* check for ignore tag (should probably remove char handler for speed) */
    // Should specific property names and meta names override this?

    if ( isIgnoreMetaName( sw, tag ) )
    {
        /* shouldn't need to flush buffer since it's just blocking out a section and should be balanced */
        /* but need to due to the weird way the char buffer is used (and shared with props) and how metatags are assigned to the buffer */
        /* basically, since flush_buffer looks at the ignore flag and always clears the buffer, need to do it now */
        /* flush_buffer really should not be in the business of checking the ignore flag, and rather we need to keep two buffers -- or maybe just always flush with any change */

        flush_buffer( parse_data, 1 );

        push_stack( &parse_data->meta_stack, endtag, NULL, meta_append, 1 );
        push_stack( &parse_data->prop_stack, endtag, NULL, prop_append, 1 );
        parse_data->structure[IN_META_BIT]++;  // so we are in balance with pop_stack
        return;
    }


    /* Check for metaNames */

    if ( !(m = getMetaNameByName( parse_data->header, tag)) )
    {

        if ( !is_html_tag )
        {
            if ( sw->UndefinedMetaTags == UNDEF_META_AUTO )
            {
                if (sw->verbose)
                    printf("**Adding automatic MetaName '%s' found in file '%s'\n", tag, parse_data->fprop->real_path);

                m = addMetaEntry( parse_data->header, tag, META_INDEX, 0);
            }


            else if ( sw->UndefinedMetaTags == UNDEF_META_IGNORE )  /* Ignore this block of text for metanames only (props ok) */
            {
                flush_buffer( parse_data, 66 );  // flush because we must still continue to process, and structures might change
                push_stack( &parse_data->meta_stack, endtag, NULL, meta_append, 1 );
                parse_data->structure[IN_META_BIT]++;  // so we are in balance with pop_stack
                /* must fall though to property check */
            }
        }
    }



    if ( m )     /* Is a meta name */
    {
        flush_buffer( parse_data, 6 );  /* new meta tag, so must flush */
        push_stack( &parse_data->meta_stack, endtag, m, meta_append, 0 );
        parse_data->structure[IN_META_BIT]++;
    }

    else if ( !is_html_tag )
    {
        /* If set to "error" on undefined meta tags, then error */
        if ( sw->UndefinedMetaTags == UNDEF_META_ERROR )
                progerr("Found meta name '%s' in file '%s', not listed as a MetaNames in config", tag, parse_data->fprop->real_path);

        else {
            /* In general a single word doesn't span tags */
//            append_buffer( &parse_data->text_buffer, " ", 1 );
              flush_buffer( parse_data, 1 );

            if ( DEBUG_MASK & DEBUG_PARSED_TAGS )
                debug_show_tag( tag, parse_data, 1, "(undefined meta name - no action)" );
        }
    }


    /* Check property names -- allows HTML tags as property names */


    if ( (m  = getPropNameByName( parse_data->header, tag)) )
    {
        if ( is_meta_internal( m ) )
        {
            warning( (void *)parse_data, "Found Swish-e reserved property name '%s'\n", tag );
        }
        else
        {
            flush_buffer( parse_data, 7 );  // flush since it's a new meta tag
            push_stack( &parse_data->prop_stack, endtag, m, prop_append, 0 );
        }
    }



    /* Look to enable StoreDescription - allow any tag */
    /* Don't need to flush since this has it's own buffer */

    // This should really be a property, and use aliasing as needed
    {
        SUMMARY_INFO    *summary = &parse_data->summary;

        if ( summary->tag && (strcmp( tag, summary->tag ) == 0 ))
        {
            /* Flush data in buffer */
            if ( 0 == summary->active )
                flush_buffer( parse_data, 1 );

            summary->active++;
        }
    }

}


/*********************************************************************
*   End of a MetaTag
*   All XML tags are metatags, but for HTML there's special handling.
*
*********************************************************************/
static void end_metaTag( PARSE_DATA *parse_data, char * tag, int is_html_tag )
{

    if ( pop_stack_ifMatch( parse_data, &parse_data->meta_stack, tag ) )
        parse_data->structure[IN_META_BIT]--;


    /* Out of a property? */
    pop_stack_ifMatch( parse_data, &parse_data->prop_stack, tag );


    /* Don't allow matching across tag boundry */
    if (!is_html_tag && !isDontBumpMetaName(parse_data->sw->dontbumpendtagslist, tag))
        parse_data->word_pos++;

    /* Tag normally separate words */
    if (!is_html_tag)
//        append_buffer( &parse_data->text_buffer, " ", 1 );
          flush_buffer( parse_data, 1 );




    /* Look to disable StoreDescription */
    {
        SUMMARY_INFO    *summary = &parse_data->summary;
        if ( summary->tag && (strcasecmp( tag, summary->tag ) == 0 ))
        {
            /* Flush data in buffer */
            if ( 1 == summary->active )
                flush_buffer( parse_data, 1 );  // do first since flush buffer looks at summary->active

            summary->active--;
        }
    }

}


/*********************************************************************
*   Checks the HTML tag, and sets the "structure"
*   Also deals with FileRules title
*   In general, flushes the character buffer due to the change in structure.
*
*   returns false if not a valid HTML tag (which might be a "fake" metaname)
*
*********************************************************************/

static int check_html_tag( PARSE_DATA *parse_data, char * tag, int start )
{
    int     is_html_tag = 1;
    int     bump = start ? +1 : -1;

    /* Check for structure bits */


    /** HEAD **/

    if ( strcmp( tag, "head" ) == 0 )
    {
        flush_buffer( parse_data, 10 );
        parse_data->structure[IN_HEAD_BIT] += bump;

        /* Check for NoContents - can quit looking now once out of  block since only looking for */

        if ( !start && parse_data->fprop->index_no_content )
            abort_parsing( parse_data, 1 );

    }



    /** TITLE **/

    // Note: I think storing the title words by default should be optional.
    // Someone might not want to search title tags, if if they don't they are
    // screwed since title by default ranks higher than body words.


    else if ( strcmp( tag, "title" ) == 0 )
    {
        /* Can't flush buffer until we have looked at the title */

        if ( !start )
        {
            struct MOD_FS *fs = parse_data->sw->FS;

            /* Check isoktitle - before NoContents? */
            if ( match_regex_list( parse_data->text_buffer.buffer, fs->filerules.title, "FileRules title") )
            {
                abort_parsing( parse_data, -2 );
                return 1;
            }

            /* Check for NoContents - abort since all we need is the title text */
            if ( parse_data->fprop->index_no_content )
                abort_parsing( parse_data, 1 );


        }
        else
            /* In start tag, allow capture of text (NoContents sets ignore_flag at start) */
            if ( parse_data->fprop->index_no_content )
                parse_data->meta_stack.ignore_flag--;


        /* Now it's ok to flush */
        flush_buffer( parse_data, 11 );


        /* If title is a property, turn on the property flag */
        if ( parse_data->titleProp )
            parse_data->titleProp->in_tag += bump;


        /* If title is a metaname, turn on the indexing flag */
        if ( parse_data->titleMeta )
        {
            parse_data->titleMeta->in_tag += bump;
            parse_data->swishdefaultMeta->in_tag +=bump;
        }



        parse_data->word_pos++;
        parse_data->structure[IN_TITLE_BIT] += bump;
    }



    /** BODY **/

    else if ( strcmp( tag, "body" ) == 0 )
    {
        flush_buffer( parse_data, 12 );
        parse_data->structure[IN_BODY_BIT] += bump;
        parse_data->word_pos++;
    }



    /** H1 HEADINGS **/

    /* This should be split so know different level for ranking */
    else if ( tag[0] == 'h' && isdigit((int) tag[1]))
    {
        flush_buffer( parse_data, 13 );
        parse_data->structure[IN_HEADER_BIT] += bump;
    }



    /** EMPHASIZED **/

    /* These should not be hard coded */

    else if ( !strcmp( tag, "em" ) || !strcmp( tag, "b" ) || !strcmp( tag, "strong" ) || !strcmp( tag, "i" ) )
    {
        /* This is hard.  The idea is to not break up words.  But messes up the structure
         * ie: "this is b<b>O</b>ld word" so this would only flush "this is" on <b>,
         * and </b> would not flush anything.  The PROBLEM is that then will make the next words
         * have a IN_EMPHASIZED structure.  To "fix", I set a flag to flush at next word boundry.
        */
        flush_buffer( parse_data, 0 );  // flush up to current word (leaving any leading chars in buffer)

        if ( start )
            parse_data->structure[IN_EMPHASIZED_BIT]++;
        else
        {
            /* If there is something in the buffer then delay turning off the flag until whitespace is found */
            if ( parse_data->text_buffer.cur )
                /* Flag to flush at next word boundry */
                parse_data->flush_word = IN_EMPHASIZED_BIT + 1;  // + 1 because we might need to use zero some day
            else
                parse_data->structure[IN_EMPHASIZED_BIT]--;
        }


    }




    /* Now, look for reasons to add whitespace
     * img is not really, as someone might use an image to make up a word, but
     * commonly an image would split up text.
     * other tags: frame?
     */

    if ( !strcmp( tag, "br" ) || !strcmp( tag, "img" ) )
        append_buffer( &parse_data->text_buffer, " ", 1 );  // could flush buffer, I suppose
    else
    {
        const htmlElemDesc *element = htmlTagLookup( (const xmlChar *)tag );

        if ( !element )
            is_html_tag = 0;   // flag that this might be a meta name

        else if ( !element->isinline )
        {
            /* Used to just append a space to the buffer,
               but that didn't prevent phrase matches across tags
            */
            flush_buffer( parse_data, 1 );
            parse_data->word_pos++;
        }
    }




    return is_html_tag;
}

/*********************************************************************
*   Allow <foo class="bar"> to start "foo.bar" meta tag
*
*   Returns true if any found
*
*********************************************************************/
static int  start_XML_ClassAttributes(  PARSE_DATA *parse_data, char *tag, const char **attr, int *meta_append, int *prop_append )
{
    char tagbuf[MAXSTRLEN + 1];
    char *t;
    int   i;
    int  taglen = strlen( tag );
    SWISH *sw = parse_data->sw;
    int   found = 0;

    if(strlen(tag) >= MAXSTRLEN)  // easy way out
    {
        warning( (void *)parse_data, "Warning: Tag found in %s is too long: '%s'\n", parse_data->fprop->real_path, tag );
        return 0;
    }


    strcpy( tagbuf, tag );
    t = tagbuf + taglen;
    *t = '.';  /* hard coded! */
    t++;


    for ( i = 0; attr[i] && attr[i+1]; i+=2 )
    {
        if ( !isXMLClassAttribute( sw, (char *)attr[i]) )
            continue;


        /* Is the tag going to be too long? */
        if ( strlen( (char *)attr[i+1] ) + taglen + 2 > 256 )
        {
            warning( (void *)parse_data, "ClassAttribute on tag '%s' too long\n", tag );
            continue;
        }


        /* All metanames are currently lowercase -- would be better to force this in metanames.c */
        strtolower( tagbuf );

        strcpy( t, (char *)attr[i+1] );         /* create tag.attribute metaname */
        start_metaTag( parse_data, tagbuf, tag, meta_append, prop_append, 0 );
        found++;

        /* Now, nest attributes */
        if ( sw->UndefinedXMLAttributes != UNDEF_META_DISABLE )
            index_XML_attributes( parse_data, tagbuf, attr );

    }

    return found;

}

/*********************************************************************
*   check if a tag is an XMLClassAttributes
*
*   Note: this returns a pointer to the config set tag, so don't free it!
*   Duplicate code!
*
*   This does a case-insensitive lookup
*
*
*********************************************************************/

static char *isXMLClassAttribute(SWISH * sw, char *tag)
{
    struct swline *tmplist = sw->XMLClassAttributes;

    if (!tmplist)
        return 0;

    while (tmplist)
    {
        if (strcasecmp(tag, tmplist->line) == 0)
            return tmplist->line;

        tmplist = tmplist->next;
    }

    return NULL;
}



/*********************************************************************
*   This extracts out the attributes and contents and indexes them
*
*********************************************************************/
static void index_XML_attributes( PARSE_DATA *parse_data, char *tag, const char **attr )
{
    char tagbuf[MAXSTRLEN+1];
    char *content;
    char *t;
    int   i;
    int  meta_append;
    int  prop_append;
    int  taglen = strlen( tag );
    SWISH *sw = parse_data->sw;
    UndefMetaFlag  tmp_undef = sw->UndefinedMetaTags;  // save

    sw->UndefinedMetaTags = sw->UndefinedXMLAttributes;

    if(strlen(tag) >= MAXSTRLEN)  // easy way out
    {
        warning( (void *)parse_data, "Warning: Tag found in %s is too long: '%s'\n", parse_data->fprop->real_path, tag );
        return;
    }


    strcpy( tagbuf, tag );
    t = tagbuf + taglen;
    *t = '.';  /* hard coded! */
    t++;

    for ( i = 0; attr[i] && attr[i+1]; i+=2 )
    {
        meta_append = 0;
        prop_append = 0;

        /* Skip attributes that are XMLClassAttribues */
        if ( isXMLClassAttribute( sw, (char *)attr[i] ) )
            continue;


        if ( strlen( (char *)attr[i] ) + taglen + 2 > 256 )
        {
            warning(" (void *)parse_data, Attribute '%s' on tag '%s' too long to build metaname\n", (char *)attr[i], tag );
            continue;
        }

        strcpy( t, (char *)attr[i] );         /* create tag.attribute metaname */
        content = (char *)attr[i+1];

        if ( !*content )
            continue;

        strtolower( tagbuf );



        flush_buffer( parse_data, 1 ); // isn't needed, right?
        start_metaTag( parse_data, tagbuf, tagbuf, &meta_append, &prop_append, 0 );
        char_hndl( parse_data, content, strlen( content ) );
        end_metaTag( parse_data, tagbuf, 0 );
    }

    sw->UndefinedMetaTags = tmp_undef;
}



/*********************************************************************
*   Deal with html's <meta name="foo" content="bar">
*   Simply calls start and end meta, and passes content
*
*********************************************************************/

static void process_htmlmeta( PARSE_DATA *parse_data, const char **attr )
{
    char *metatag = NULL;
    char *content = NULL;
    int  meta_append = 0;
    int  prop_append = 0;

    int  i;

    /* Don't add any meta data while looking for just the title */
    if ( parse_data->fprop->index_no_content )
        return;

    for ( i = 0; attr[i] && attr[i+1]; i+=2 )
    {
        if ( (strcmp( attr[i], "name" ) == 0 ) && attr[i+1] )
            metatag = (char *)attr[i+1];

        else if ( (strcmp( attr[i], "content" ) == 0 ) && attr[i+1] )
            content = (char *)attr[i+1];
    }


    if ( metatag && content )
    {

        /* Robots exclusion: http://www.robotstxt.org/wc/exclusion.html#meta */
        if ( !strcasecmp( metatag, "ROBOTS") && lstrstr( content, "NOINDEX" ) )
        {
            if ( parse_data->sw->obeyRobotsNoIndex )
                abort_parsing( parse_data, -3 );

            return;
        }

        /* Process as a start -> end tag sequence */
        strtolower( metatag );

        flush_buffer( parse_data, 111 );
        start_metaTag( parse_data, metatag, metatag, &meta_append, &prop_append, 0 );
        char_hndl( parse_data, content, strlen( content ) );
        end_metaTag( parse_data, metatag, 0 );
        flush_buffer( parse_data, 112 );
    }

}


/*********************************************************************
*   Append character data to the end of the buffer
*
*   Buffer is extended/created if needed
*
*   ToDo: Flush buffer if it gets too large
*
*
*********************************************************************/

static void append_buffer( CHAR_BUFFER *buf, const char *txt, int txtlen )
{


    if ( !txtlen )  // shouldn't happen
        return;

    /* (re)allocate buf if needed */

    if ( buf->cur + txtlen >= buf->max )
    {
        buf->max = ( buf->max + BUFFER_CHUNK_SIZE+1 < buf->cur + txtlen )
                    ? buf->cur + txtlen+1
                    : buf->max + BUFFER_CHUNK_SIZE+1;

        buf->buffer = erealloc( buf->buffer, buf->max+1 );
    }

    memcpy( (void *) &(buf->buffer[buf->cur]), txt, txtlen );
    buf->cur += txtlen;
    buf->buffer[buf->cur] = '\0';  /* seems like a nice thing to do -- only used now in title check */
}




/*********************************************************************
*   Flush buffer - adds words to index, and properties
*
*   If the clear flag is set then the entire buffer is flushed.
*   Otherwise, every thing up to the last *partial* word is flushed.
*   It's partial if there is not white-space at the very end of the buffer.
*
*   This prevents some<b>long</b>word from being flushed into part words.
*
*********************************************************************/
static void flush_buffer( PARSE_DATA  *parse_data, int clear )
{
    CHAR_BUFFER *buf = &parse_data->text_buffer;
    SWISH       *sw = parse_data->sw;
    int         structure = get_structure( parse_data );
    int         orig_end  = buf->cur;
    char        save_char = '?';
    char        *c;

    /* anything to do? */
    if ( !buf->cur )
        return;

    /* look back for word boundry when "clear" is not set */

    if ( !clear && !isspace( (int)buf->buffer[buf->cur-1] ) )  // flush up to current word
    {
        while ( buf->cur > 0 && !isspace( (int)buf->buffer[buf->cur-1] ) )
            buf->cur--;

        if ( !buf->cur )  // then there's only a single word in the buffer
        {
            buf->cur = orig_end;
            if ( buf->cur < BUFFER_CHUNK_SIZE )  // should reall look at indexf->header.maxwordlimit
                return;                          // but just trying to keep the buffer from growing too large
        }

        save_char =  buf->buffer[buf->cur];
    }


    /* Mark the end of the buffer - should switch over to using a length to avoid strlen */

    buf->buffer[buf->cur] = '\0';


    /* Make sure there some non-whitespace chars to print */

    c = buf->buffer;
    while ( *c && isspace( (int)*c ) )
        c++;


    if ( *c )
    {
        /* Index the text */
        if ( !parse_data->meta_stack.ignore_flag )  // this really is wrong -- should not check ignore here.  Fix should be to use two buffers
            parse_data->total_words +=
                indexstring( sw, c, parse_data->filenum, structure, 0, NULL, &(parse_data->word_pos) );

        /* Add the properties */
        addDocProperties( parse_data->header, &(parse_data->thisFileEntry->docProperties), (unsigned char *)buf->buffer, buf->cur, parse_data->fprop->real_path );


        /* yuck - addDocProperties should do this.  Ok, add to summary, if active */
        {
            SUMMARY_INFO    *summary = &parse_data->summary;
            if ( summary->active )
                addDocProperty( &(parse_data->thisFileEntry->docProperties), summary->meta, (unsigned char *)buf->buffer, buf->cur, 0 );
        }
    }


    /* clear the buffer */

    if ( orig_end && orig_end > buf->cur )
    {
        buf->buffer[buf->cur] = save_char;  // put back the char where null was placed
        memmove( buf->buffer, &buf->buffer[buf->cur], orig_end - buf->cur );
        buf->cur = orig_end - buf->cur;
    }
    else
        buf->cur = 0;

}



/*********************************************************************
*   Comments
*
*   Should be able to call the char_hndl
*   Allows comments to enable/disable indexing a block by either:
*
*       <!-- noindex -->
*       <!-- index -->
*       <!-- SwishCommand noindex -->
*       <!-- SwishCommand index -->
*
*
*
*   To Do:
*       Can't use DontBump with comments.  Might need a config variable for that.
*
*********************************************************************/
static void comment_hndl(void *data, const char *txt)
{
    PARSE_DATA  *parse_data = (PARSE_DATA *)data;
    SWISH       *sw = parse_data->sw;
    int         structure = get_structure( parse_data );
    char        *swishcmd;
    char        *comment_text = str_skip_ws( (char *)txt );
    int         found = 0;


    str_trim_ws( comment_text );
    if ( ! *comment_text )
        return;


    /* Strip off SwishCommand - might be for future use */
    if ( ( swishcmd = lstrstr( comment_text, "SwishCommand" )) && swishcmd == comment_text )
    {
        comment_text = str_skip_ws( comment_text + strlen( "SwishCommand" ) );
        found++;
    }

    if ( !strcasecmp( comment_text, "noindex" ) )
    {
        parse_data->swish_noindex++;
        return;
    }
    else if ( !strcasecmp( comment_text, "index" ) )
    {
        if ( parse_data->swish_noindex )
           parse_data->swish_noindex--;

        return;
    }


    if( found || !sw->indexComments )
        return;


    /* Bump position around comments - hard coded, always done to prevent phrase matching */
    parse_data->word_pos++;

    /* Index the text */
    parse_data->total_words +=
        indexstring( sw, comment_text, parse_data->filenum, structure | IN_COMMENTS, 0, NULL, &(parse_data->word_pos) );


    parse_data->word_pos++;

}



/*********************************************************************
*   check if a tag is an IgnoreTag
*
*   Note: this returns a pointer to the config set tag, so don't free it!
*
*
*********************************************************************/

static char *isIgnoreMetaName(SWISH * sw, char *tag)
{
    struct swline *tmplist = sw->ignoremetalist;

    if (!tmplist)
        return 0;

    while (tmplist)
    {
        if (strcmp(tag, tmplist->line) == 0)
            return tmplist->line;

        tmplist = tmplist->next;
    }

    return NULL;
}

/******************************************************************
*  Warning and Error Messages
*
******************************************************************/

static void error(void *data, const char *msg, ...)
{
    va_list args;
    PARSE_DATA *parse_data = (PARSE_DATA *)data;
    char str[1000];

    va_start(args, msg);
    vsnprintf(str, 1000, msg, args );
    va_end(args);
    xmlParserError(parse_data->ctxt, str);
}

static void warning(void *data, const char *msg, ...)
{
    va_list args;
    PARSE_DATA *parse_data = (PARSE_DATA *)data;
    char str[1000];

    va_start(args, msg);
    vsnprintf(str, 1000, msg, args );
    va_end(args);
    xmlParserWarning(parse_data->ctxt, str);
}


/*********************************************************************
*   Index ALT tabs
*
*
*********************************************************************/
static void index_alt_tab( PARSE_DATA *parse_data, const char **attr )
{
    int  meta_append = 0;
    int  prop_append = 0;
    char *tagbuf     = parse_data->sw->IndexAltTagMeta;
    char *alt_text   = extract_html_links( parse_data, attr, NULL, "alt");


    if ( !alt_text )
        return;

    /* Index as regular text? */
    if ( !parse_data->sw->IndexAltTagMeta )
    {
        char_hndl( parse_data, alt_text, strlen( alt_text ) );
        return;
    }

    flush_buffer( parse_data, 1 );
    start_metaTag( parse_data, tagbuf, tagbuf, &meta_append, &prop_append, 0 );
    char_hndl( parse_data, alt_text, strlen( alt_text ) );
    end_metaTag( parse_data, tagbuf, 0 );
}




/*********************************************************************
*   Extract out links for indexing
*
*   Pass in a metaname, and a tag
*
*********************************************************************/

static char *extract_html_links( PARSE_DATA *parse_data, const char **attr, struct metaEntry *meta_entry, char *tag )
{
    char *href = NULL;
    int  i;
    int         structure = get_structure( parse_data );
    char       *absoluteURL;
    SWISH      *sw = parse_data->sw;


    if ( !attr )
        return NULL;

    for ( i = 0; attr[i] && attr[i+1]; i+=2 )
        if ( (strcmp( attr[i], tag ) == 0 ) && attr[i+1] )
            href = (char *)attr[i+1];

    if ( !href )
        return NULL;

    if ( !meta_entry ) /* The case for <BASE> */
        return href;


    /* Now, fixup the URL, if possible */

    if ( sw->AbsoluteLinks ) // ?? || parse_data->baseURL??? always fix up if a <BASE> tag?
    {
        char *base = parse_data->baseURL
                     ? parse_data->baseURL
                     : parse_data->fprop->real_path;

        absoluteURL = (char *)xmlBuildURI( (xmlChar *)href, (xmlChar *)base );
    }
    else
        absoluteURL = NULL;



    /* Index the text */
    parse_data->total_words +=
        indexstring( sw, absoluteURL ? absoluteURL : href, parse_data->filenum, structure, 1, &meta_entry->metaID, &(parse_data->word_pos) );

    if ( absoluteURL )
        xmlFree( absoluteURL );

    return href;
}



/* This doesn't look like the best method */

static void abort_parsing( PARSE_DATA *parse_data, int abort_code )
{
    parse_data->abort = abort_code;  /* Flag that the we are all done */
    /* Disable parser */
    parse_data->SAXHandler->startElement   = (startElementSAXFunc)NULL;
    parse_data->SAXHandler->endElement     = (endElementSAXFunc)NULL;
    parse_data->SAXHandler->characters     = (charactersSAXFunc)NULL;
}


/* This sets the current structure context (IN_HEAD, IN_BODY, etc) */

static int get_structure( PARSE_DATA *parse_data )
{
    int structure = IN_FILE;

    /* Set structure bits */
    if ( parse_data->parsing_html )
    {
        int i;
        for ( i = 0; i <= STRUCTURE_END; i++ )
            if ( parse_data->structure[i] )
                structure |= ( 1 << i );
    }
    return structure;
}

/*********************************************************************
*   Push a meta entry onto the stack
*
*   Call With:
*       stack   = which stack to use
*       tag     = Element (tag name) to be used to match end tag
*       met     = metaEntry to save
*       append  = append to current if one (will be incremented)
*       ignore  = if true, then flag as an ignore block and bump ignore counter
*
*   Returns:
*       void
*
*   ToDo:
*       move to Mem_Zone?
*
*
*********************************************************************/

static void push_stack( MetaStack *stack, char *tag, struct metaEntry *meta, int *append, int ignore )
{
    MetaStackElementPtr    node;


    if ( DEBUG_MASK & DEBUG_PARSED_TAGS )
    {
        int i;
        for (i=0; i<stack->pointer; i++)
            printf("    ");

        printf("<%s> (%s [%s]%s)\n", tag, stack->is_meta ? "meta" : "property", !meta ? "no meta name defined" : meta->metaName, ignore ? " *Start Ignore*" : ""  );
    }


    /* Create a new node ( MetaStackElement already has one byte allocated for string ) */
    node = (MetaStackElementPtr) emalloc( sizeof( MetaStackElement ) + strlen( tag ) );
    node->next = NULL;

    /* Turn on the meta */
    if ( (node->meta = meta) )
        meta->in_tag++;

    if ( ( node->ignore = ignore ) )  /* entering a block to ignore */
        stack->ignore_flag++;


    strcpy( node->tag, tag );




    if ( !(*append)++ )
    {
        /* reallocate stack buffer if needed */
        if ( stack->pointer >= stack->maxsize )
        {
            progwarn("swish parser adding more stack space for tag %s. from %d to %d", tag, stack->maxsize, stack->maxsize+STACK_SIZE );

            stack->maxsize += STACK_SIZE;
            stack->stack = (MetaStackElementPtr *)erealloc( stack->stack, sizeof( MetaStackElementPtr ) * stack->maxsize );
        }

        stack->stack[stack->pointer++] = node;
    }
    else // prepend to the list
    {
        if ( !stack->pointer )
            progerr("Tried to append tag %s to stack, but stack is empty", tag );

        node->next = stack->stack[stack->pointer - 1];
        stack->stack[stack->pointer - 1] = node;
    }
}

/*********************************************************************
*   Pop the stack if the tag matches the last entry
*   Will turn off all metas associated with this tag level
*
*   Call With:
*       parse_data = to automatically flush
*       stack   = which stack to use
*       tag     = Element (tag name) to be used for removal
*
*   Returns:
*       true if tag matched
*
*********************************************************************/

static int pop_stack_ifMatch( PARSE_DATA *parse_data, MetaStack *stack, char *tag )
{

    /* return if stack is empty */
    if ( !stack->pointer )
        return 0;



    /* return if doesn't match the tag at the top of the stack */

    if ( strcmp( stack->stack[stack->pointer - 1]->tag, tag ) != 0 )
        return 0;


    flush_buffer( parse_data, 1 );
    pop_stack( stack );

    return 1;
}

/*********************************************************************
*   Pop the stack
*   Will turn off all metas associated with this tag level
*
*   Call With:
*       stack   = which stack to use
*
*   Returns:
*       the stack pointer
*
*********************************************************************/

static int pop_stack( MetaStack *stack )
{
    MetaStackElementPtr    node, this;


    /* return if stack is empty */
    if ( !stack->pointer )
        return 0;

    node =  stack->stack[--stack->pointer];

    /* Now pop the stack. */

    // Note that some end tags can pop more than one tag
    // <foo class="bar"> can be to starting metanames <foo> and <foo:bar>, and </foo> pops all.

    while ( node )
    {
        this = node;

        if ( node->meta )
            node->meta->in_tag--;

        if ( node->ignore )
            stack->ignore_flag--;


        if ( DEBUG_MASK & DEBUG_PARSED_TAGS )
        {
            int i;
            for (i=0; i<stack->pointer; i++)
                printf("    ");

            printf("</%s> (%s)%s\n", node->tag, stack->is_meta ? "meta" : "property", node->ignore ? " end ignore" : "" );
        }


        node = node->next;
        efree( this );
    }

    return stack->pointer;
}

static int debug_get_indent( INDEXDATAHEADER *header )
{
    int i;
    int indent = 0;

    for (i = 0; i < header->metaCounter; i++)
        if ( is_meta_index(header->metaEntryArray[i]) )
            indent += header->metaEntryArray[i]->in_tag;

    return indent;
}



static void debug_show_tag( char *tag, PARSE_DATA *parse_data, int start, char *message )
{
    int  indent = debug_get_indent( &parse_data->sw->indexlist->header);
    int  i;

    for (i=0; i<indent; i++)
        printf("    ");

    printf("<%s%s> %s\n", start ? "" : "/", tag, message );
}

static void debug_show_parsed_text( PARSE_DATA *parse_data, char *txt, int len )
{
    int indent = debug_get_indent( &parse_data->sw->indexlist->header);
    int i;
    char indent_buf[1000];
    int  last_newline = 0;
    int  col = 0;


    indent_buf[0] = '\0';

    for (i=0; i<indent && strlen(indent_buf)<900; i++)
        strcat( indent_buf, "    ");


    i = 0;
    while ( i < len )
    {
        printf("%s", indent_buf );
        col = 0;
        last_newline = 0;

        /* skip leading space */
        while ( i < len && isspace((int)txt[i] ) )
            i++;

        /* print text */
        while ( i < len )
        {
            col++;


            if ( txt[i] == '\n' )
            {
                while ( i < len && isspace((int)txt[i] ))
                    i++;
            }

            if ( !isprint((int)txt[i] ))
            {
                i++;
                continue;
            }

            printf("%c", txt[i] );
            i++;

            if ( (col + strlen( indent_buf ) > 60 && isspace((int)txt[i])) || col + strlen( indent_buf ) > 78 )
            {
                printf("\n");
                last_newline=1;
                break;
            }
        }
    }


    if ( !last_newline )
        printf("\n");
}

�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/httpserver.h����������������������������������������������������������������������0000664�0000771�0001750�00000003031�11166010110�013156� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/* httpserver.h
$Id: httpserver.h 1736 2005-05-12 15:41:22Z karman $


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL


**/

#ifndef __HTTPSERVER_H
#define __HTTPSERVER_H

typedef struct httpserverinfo {
    char *baseurl;
    
    time_t lastretrieval;
    
    char *useragent;
    struct robotrules *robotrules;
    
    struct httpserverinfo *next;
} httpserverinfo;

typedef struct robotrules {
    char *disallow;
    struct robotrules *next;
} robotrules;



httpserverinfo *getserverinfo (SWISH *sw, char *url);
int urldisallowed (SWISH *sw, char *url);
int equivalentserver (SWISH *sw, char *url, char *baseurl);


#endif

�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/docprop.c�������������������������������������������������������������������������0000664�0000771�0001750�00000126022�11166010110�012417� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: docprop.c 2291 2009-03-31 01:56:00Z karpet $
**

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 15:51:39 CDT 2005
** added GPL

*********************************************************************************


** Functions to manage the index's Document Properties
**
** File Created.
** Mark Gaulin 11/24/98
**
** change sprintf to snprintf to avoid corruption,
** and use MAXSTRLEN from swish.h instead of literal "200"
** SRE 11/17/99
**
** 04/00 Jose ruiz
** storeDocProperties and readNextDocPropEntry modified to store
** the int numbers compressed. This makes integers "portable"
**
** 04/00 Jose Ruiz
** Added sorting results by property
**
** 07/00 and 08/00 - Jose Ruiz
** Many modifications to make all functions thread safe
**
** 08/00 - Added ascending and descending capabilities in results sorting
**
** 2001-01    rasc    getResultPropertyByName rewritten, datatypes for properties.
** 2001-02    rasc    isAutoProperty
**                    printSearchResultProperties changed
** 2001-03-15 rasc    Outputdelimiter var name changed
** 2001-06-08 wsm     Store propValue at end of docPropertyEntry to save memory
** 2001-06-14 moseley Most of the code rewritten, and propfile added
**
** 2001-09 jmruiz - ReadAllDocPropertiesFromDisk rewriten to be used
**                  by merge.c
**
*
* Doc Properties:
*
*   Do not expect the properties to be null terminated.  There's an associated
*   proplen with wach property.
*
*   When indexing:
*       parser.c:flush_buffer calls addDocProperties()
*
*       addDocProperties() scans all metanames looking for the flag that
*           indicates the text is inside that tag, and if so calls addDocProperty()
*
*       addDocProperty() then will either add or create a new property containing
*           the text passed in.  If it's new (which is normally the case) then
*           CreateProperty() is called.  That returns the actual property structure.
*           addDocProperty() will create or extend the list of properties connected
*           to the file, as needed.
*
*       After parsing is complete (all words indexed), index.c calls:
*
*       WritePropertiesToDisk() takes the list of properties assigned to the file (FileRec)
*           struct and for each prop compresses with zlib and then call
*           DB_WriteProperty() which writes the property to disk.
*
*       See db_native.c for how the data is written to disk.
*
*
*
*   Reading properties:
*       There's a few ways props are read.  First the APIs:
*
*       SWISH::API provides SwishProperty()
*           which calls a low-level function getResultPropValue() directly so
*           that integers and strings can be created.  There's also
*           SwishResultPropertyStr() which always returns a string.
*
*       The C library also provides two functions:
*           SwishResultPropertyStr() (same as above) turns all props to strings.
*           SwishResultPropertyULong() allow access numeric access to, well, numeric
*           props.  You need to know ahead of time if the property is numeric/date
*           before calling the ULong() function.  The ULong() function uses
*           the getResultPropValue() (below) to get at the raw properyt.
*
*       SwishResultPropertyStr() calls  getDocProperty() which is a general interface
*       to the properties:
*
*       getDocProperty() will return "internal" properties like rank, or
*           read the disk for properties stored at indexing time.  For those
*           "internal" properties getDocProperty() calls CreateProperty()
*           directly.  CreateProperty() converts numbers or strings into a
*           "propEntry" structure.
*
*           For on-disk properties getDocProperty() calls
*           ReadSingleDocPropertiesFromDisk() which calls DB_ReadProperty() to
*           read the raw data from disk, uncompresses if needed and then calls
*           CreateProperty() to convert the data into a real proprty.
*
*       As noted above SWISH::API uses getResultPropValue() which calls
*       getDocProperty() but returns a slightly different structure "PropValue"
*       which is a union to hold the different types of properties in their raw form.
*       This just allows the same return value to be used for both strings and integers.
*
*
*   Properties created using CreateProperty() (and the functions that call it) need
*   to be freed by calling freeProperty().
*
*   SwishResultPropertyStr() is a little odd in that it caches strings in the 
*   db_results structure.  This is done so the caller doesn't have to worry
*   about calling freeProperty().   Users of this function need to copy
*   the strings locally as soon as possible.  For example, I think this would not work:
*
*       char *propA = SwishResultPropertyStr( resultA, "foo" );
*       char *propB = SwishResultPropertyStr( resultB, "foo" );
*       printf("The two props are %s and %s, propA, propB);  # Won't work!
*
*   Both propA and propB now point to the same place -- the place used to cache
*   prop "foo".
*
*   Properties are read by swish in a number of places:
*
*   pre_sort.c calls ReadSingleDocPropertiesFromDisk() to build a sort array.
*   It's only sorting on-disk properties.
*
*   result_sort.c calls getDocProperty() because it can be sorting both on-disk
*   and "internal" properties.
*
*   proplimit.c also calls ReadSingleDocPropertiesFromDisk(), if needed.
*
*   merge.c calls ReadAllDocPropertiesFromDisk(), which is a wrapper around the
*   ReadSingleDocPropertiesFromDisk() call.  Merge needs all the props loaded.
*
*
*
*/

#include <limits.h>     // for ULONG_MAX
#include "swish.h"
#include "swstring.h"
#include "file.h"
#include "hash.h"
#include "mem.h"
#include "merge.h"
#include "error.h"
#include "search.h"
#include "index.h"
#include "docprop.h"
#include "error.h"
#include "compress.h"
#include "metanames.h"
#include "result_output.h"
#include "result_sort.h"
#include "entities.h"
#include "db.h"
#include "rank.h"
#ifdef HAVE_ZLIB
#include <zlib.h>
#endif

/*******************************************************************
*   Convert a propValue to a unsigned long
*
********************************************************************/
union _conv_ {
      char            c[sizeof(unsigned long)];
      unsigned long   l;
};
unsigned long convPropValue2ULong(unsigned char *propValue)
{
      union _conv_ u;
      memcpy(u.c, propValue, sizeof(unsigned long));
      return u.l;
}


/*******************************************************************
*   Free a property entry
*
********************************************************************/

void freeProperty( propEntry *prop )
{
    if ( prop )
        efree(prop);
}




/*******************************************************************
*   Free all properties in the docProperties structure
*
********************************************************************/


void freeDocProperties(docProperties *docProperties)
{
    int i;

    for( i = 0; i < docProperties->n; i++ )
    {
        freeProperty( docProperties->propEntry[i] );
        docProperties->propEntry[i] = NULL;
    }

    efree(docProperties);
    docProperties = NULL;

}


/*******************************************************************
*   Frees a FileRec (struct file), which just frees
*   the properties and property index
*   Doesn't free the FileRec itself.
*
*   move here from swish.c since all FileRec really holds is property info
*
********************************************************************/

void freefileinfo(FileRec *fi)
{

    if ( fi->docProperties )
    {
        freeDocProperties( fi->docProperties );
        fi->docProperties = NULL;
    }

    if ( fi->prop_index )
    {
        efree( fi->prop_index );
        fi->prop_index = NULL;
    }
}

/*******************************************************************
*   Converts a property into a string, based on it's type.
*   Numbers are zero filled
*
*   Call with:
*       *metaEntry
*       *propEntry
*
*   Returns:
*       malloc's a new string.  Caller must call free().
*       if passed in prop is null then returns empty string.
*
*
********************************************************************/

char *DecodeDocProperty( struct metaEntry *meta_entry, propEntry *prop )
{
    char *s;
    unsigned long i;

    if ( !meta_entry )
        progerr("DecodeDocProperty passed NULL meta_entry");


    if ( !prop )
        return estrdup("");


    if ( is_meta_string(meta_entry) )      /* check for ascii/string data */
        return (char *)bin2string(prop->propValue,prop->propLen);


    if ( is_meta_date(meta_entry) )
    {
        s=emalloc(30);
        /* i = *(unsigned long *) prop->propValue; */  /* read binary */ 
        i = convPropValue2ULong(prop->propValue); /* read binary */
        i = UNPACKLONG(i);     /* Convert the portable number */
        strftime(s,30, DATE_FORMAT_STRING, (struct tm *)localtime((time_t *)&i));
        return s;
    }



    if ( is_meta_number(meta_entry) )
    {
        s=emalloc(14);
        /* i=*(unsigned long *)prop->propValue; */ /* read binary */
        i = convPropValue2ULong(prop->propValue); /* read binary */
        i = UNPACKLONG(i);     /* Convert the portable number */
        sprintf(s,"%lu",i);
        return s;
    }

    progwarn("Invalid property type for property '%s'\n", meta_entry->metaName );
    return estrdup("");
}

/*******************************************************************
*   Returns a property (really the head of the list)
*   for the specified property
*
*   Call with:
*       *RESULT
*       *metaEntry - pointer to related meta entry
*       metaID - OR, if metaEntry is NULL uses this to lookup metaEntry
*       max_size - limit size of property loaded
*
*   Returns:
*       *propEntry
*
*   Warning:
*       Only returns first property in list (which is the last property added)
*
*   Notes:
*       with PROPFILE, caller is expected to destroy the property
*
*
********************************************************************/

propEntry *getDocProperty( RESULT *result, struct metaEntry **meta_entry, int metaID, int max_size )
{
    IndexFILE *indexf = result->db_results->indexf;
    int     error_flag;
    unsigned long num;


    /* Grab the meta structure for this ID, unless one was passed in */

    if ( *meta_entry )
        metaID = (*meta_entry)->metaID;

    else if ( !(*meta_entry = getPropNameByID(&indexf->header, metaID )) )
        return NULL;


    /* This is a memory leak if not using PROPFILE */


    /* Some properties are generated during a search */
    if ( is_meta_internal( *meta_entry ) )
    {
        if ( is_meta_entry( *meta_entry, AUTOPROPERTY_RESULT_RANK ) )
        {
           /* return raw rank if flag set */
            if (result->db_results->results->sw->ReturnRawRank) {
                num = PACKLONG( result->rank );
                return CreateProperty( *meta_entry, (unsigned char *)&num, sizeof( num ), 1, &error_flag );
            }

            int scale_factor = result->db_results->results->rank_scale_factor;
            unsigned long rank_num;

            /* scale_factor is zero while sorting, so use the raw rank */
            /* otherwise, scale for display */

            if ( scale_factor )
            {
                rank_num = (unsigned long) (result->rank * scale_factor)/10000;

                if ( rank_num >= 999)
                    rank_num = 1000;
                else if ( rank_num < 1)
                    rank_num = 1;
            }
            else
                rank_num = result->rank;

            num = PACKLONG( rank_num );
            return CreateProperty( *meta_entry, (unsigned char *)&num, sizeof( num ), 1, &error_flag );
        }

        if ( is_meta_entry( *meta_entry, AUTOPROPERTY_REC_COUNT ) )
        {
            num = PACKLONG( (unsigned long)result->db_results->results->cur_rec_number );
            return CreateProperty( *meta_entry, (unsigned char *)&num, sizeof( num ), 1, &error_flag );
        }

        if ( is_meta_entry( *meta_entry, AUTOPROPERTY_FILENUM ) )
        {
            num = PACKLONG( (unsigned long)result->filenum );
            return CreateProperty( *meta_entry, (unsigned char *)&num, sizeof( num ), 1, &error_flag );
        }


        if ( is_meta_entry( *meta_entry, AUTOPROPERTY_INDEXFILE ) )
            return CreateProperty( *meta_entry, (unsigned char *)result->db_results->indexf->line, strlen( result->db_results->indexf->line ), 0, &error_flag );
    }


    return ReadSingleDocPropertiesFromDisk(indexf, &result->fi, metaID, max_size );
}


/*******************************************************************
*   Returns a string for the property ID supplied
*   Numbers are zero filled
*
*   Call with:
*       *RESULT
*       metaID
*
*   Returns:
*       malloc's a new string.  Caller must call free().
*
*   Bugs:
*       Only returns first property in list (which is the last property)
*       This function is called by dump.c and by result_output.c to
*       display the old -p style property listings.
*
********************************************************************/

char *getResultPropAsString(RESULT *result, int ID)
{
    char *s = NULL;
    propEntry *prop;
    struct metaEntry *meta_entry = NULL;


    if( !result )
        return estrdup("");  // when would this happen?



    if ( !(prop = getDocProperty(result, &meta_entry, ID, 0 )) )
        return estrdup("");

    /* $$$ Ignores possible other properties that are linked to this one */
    s = DecodeDocProperty( meta_entry, prop );

    freeProperty( prop );

    return s;
}

/*******************************************************************
*   SwishResultPropertyStr - Returns a string for the property *name* supplied
*   Numbers are zero filled  (why??)
*
*   ** Library interface call **
*
*   Call with:
*       *RESULT
*       char * property name
*
*   Returns:
*       A string -- caller does not need to free as the strings are
*       cleaned up on every call
*
*
********************************************************************/

char *SwishResultPropertyStr(RESULT *result, char *pname)
{
    char                *s = NULL;
    propEntry           *prop;
    struct metaEntry    *meta_entry = NULL;
    IndexFILE           *indexf;
    DB_RESULTS          *db_results;

    if( !result )
        progerr("SwishResultPropertyStr was called with a NULL result");


    db_results = result->db_results;
    indexf = result->db_results->indexf;


    /* Ok property name? */

    if ( !(meta_entry = getPropNameByName( &indexf->header, pname )) )
    {
        set_progerr(UNKNOWN_PROPERTY_NAME_IN_SEARCH_DISPLAY, indexf->sw, "Invalid property name '%s'", pname );
        return "(null)";
    }

    /* reset error level */
    result->db_results->indexf->sw->lasterror = 0;




    /* Does this results have this property? */

    if ( !(prop = getDocProperty(result, &meta_entry, 0, 0 )) )
        return "";

    s = DecodeDocProperty( meta_entry, prop );

    freeProperty( prop );

    if ( !*s )  /* blank string? */
    {
        efree( s );
        return "";
    }


    if ( ! db_results->prop_string_cache )
    {
        db_results->prop_string_cache = (char **)emalloc( indexf->header.metaCounter * sizeof( char *) );
        memset( db_results->prop_string_cache, 0, indexf->header.metaCounter * sizeof( char *) );
    }

    /* Free previous, if needed  -- note the metaIDs start at one */

    else if ( db_results->prop_string_cache[ meta_entry->metaID-1 ] )
        efree( db_results->prop_string_cache[ meta_entry->metaID-1 ] );

    db_results->prop_string_cache[ meta_entry->metaID-1 ] = s;
    return s;
}




/*******************************************************************
*   SwishResultPropertyULong - Returns an unsigned long for the property *name* supplied
*
*   ** Library interface call **
*
*   Call with:
*       *RESULT
*       char * property name
*
*   Returns:
*       unsigned long
*       ULONG_MAX on error
*
*
********************************************************************/

unsigned long SwishResultPropertyULong(RESULT *result, char *pname)
{
    PropValue           *pv;
    unsigned long       value = ULONG_MAX;


    /* Fetch the property */
    pv = getResultPropValue (result, pname, 0 );

    if ( !pv )
        return ULONG_MAX;  /* bad property name */

    /* Make sure it's of the correct type */
    if ( (PROP_ULONG != pv->datatype) && (PROP_DATE != pv->datatype) )
    {
        if ( PROP_UNDEFINED != pv->datatype )
            set_progerr(INVALID_PROPERTY_TYPE, result->db_results->indexf->sw,
                    "Property '%s' is not numeric", pname );
        value = ULONG_MAX;
    }
    else
        value = pv->value.v_ulong;

    freeResultPropValue( pv );

    return value;
}



/*******************************************************************
*   Returns a property as a *propValue, which is a union of different
*   data types, with a flag to indicate the type
*   Can be called with either a metaname, or a metaID.
*
*   Call with:
*       *RESULT
*       *metaName -- String name of meta entry
*       metaID    -- OR - meta ID number
*
*       Note that the ID is not really used anyplace, but
*       could be used to save the prop->id lookup.
*
*   Returns:
*       pointer to a propValue structure if found -- caller MUST free
*       Returns NULL if propertyName doesn't exist.
*       Jan 14, 2004:
*       Returns a PropValue PROP_UNDEFINED if result has not property
*       If returning NULL (i.e. bad property name) sets a swish-e error.
*       Caller is responsible for checking.
*
*   Note:
*       Feb 13, 2002 - now defined properties that just don't exist
*       for the document return a blank *string* even for numeric
*       and date properties.  This it to prevent "(NULL)" from displaying.
*       They used to return NULL, but since currently only result_output.c
*       uses this function, it's not a problem.
*
*
********************************************************************/

PropValue *getResultPropValue (RESULT *r, char *pname, int ID )
{
    PropValue *pv;
    struct metaEntry *meta_entry = NULL;
    propEntry *prop;

    /* Die on null result */
    if( !r )
        progerr("Called getResultPropValue with NULL result");

    /* Lookup by property name, if supplied */
    if ( pname )
        if ( !(meta_entry = getPropNameByName( &r->db_results->indexf->header, pname )) )
        {
            set_progerr(UNKNOWN_PROPERTY_NAME_IN_SEARCH_DISPLAY, r->db_results->indexf->sw,
                    "Invalid property name '%s'", pname );
            return NULL;
        }

    /* reset error level */
    r->db_results->indexf->sw->lasterror = 0;


    /* create a propvalue to return to caller */
    pv = (PropValue *) emalloc (sizeof (PropValue));
    pv->datatype = PROP_UNDEFINED;
    pv->destroy = 0;



    /* This will return false if the result does not have a value for this property */
    prop = getDocProperty( r, &meta_entry, ID, 0 );

    if ( !prop )
        return pv;  /* returning PROP_UNDEFINED */


    if ( is_meta_string(meta_entry) )      /* check for ascii/string data */
    {
        pv->datatype = PROP_STRING;
        pv->destroy++;       // caller must free this
        pv->value.v_str = (char *)bin2string(prop->propValue,prop->propLen);
        freeProperty( prop );
        return pv;
    }


    if ( is_meta_number(meta_entry) )
    {
        unsigned long i;
        /* i = *(unsigned long *) prop->propValue;*/  /* read binary */
        i = convPropValue2ULong(prop->propValue); /* read binary */
        i = UNPACKLONG(i);     /* Convert the portable number */
        pv->datatype = PROP_ULONG;
        pv->value.v_ulong = i;
        freeProperty( prop );
        return pv;
    }


    if ( is_meta_date(meta_entry) )
    {
        unsigned long i;
        /* i = *(unsigned long *) prop->propValue; */ /* read binary */
        i = convPropValue2ULong(prop->propValue); /* read binary */
        i = UNPACKLONG(i);     /* Convert the portable number */
        pv->datatype = PROP_DATE;
        pv->value.v_date = (time_t)i;
        freeProperty( prop );
        return pv;
    }



    /* If here, then it's an unknown property type and abort! */
    progerr("Swish-e database error.  Unknown property type '%d'", meta_entry->metaType );
    return NULL;  /* make compier happy */

}

/*******************************************************************
*   Destroys a "pv" returned from getResultPropValue
*
*
********************************************************************/
void    freeResultPropValue(PropValue *pv)
{
    if ( !pv ) return;

    if ( pv->datatype == PROP_STRING && pv->destroy )
        efree( pv->value.v_str );

    efree(pv);
}

/*******************************************************************
*   Converts a string into a string for saving as a property
*   Which means will either return a duplicated string,
*   or a packed unsigned long.
*
*   Call with:
*       *metaEntry
*       **encodedStr (destination)
*       *string
*       *error_flag - integer to indicate the difference between an error and a blank property
*
*   Returns:
*       malloc's a new string, stored in **encodedStr.  Caller must call free().
*       length of encoded string, or zero if an error
*       (zero length strings are not for encoding anyway, I guess)
*
*   QUESTION: ???
*       should this return a *docproperty instead?
*       numbers are unsigned longs.  What if someone
*       wanted to store signed numbers?
*
*   ToDO:
*       What about convert entities here?
*
********************************************************************/
static int EncodeProperty( struct metaEntry *meta_entry, char **encodedStr, char *propstring, int *error_flag )
{
    unsigned long int num;
    char     *newstr;
    char     *badchar;
    char     *tmpnum;
    char     *string;


    string = propstring;

    *error_flag = 0;

    /* skip leading white space */
    while ( isspace( (int)*string ))
        string++;

    if ( !string || !*string )
    {
        // progwarn("Null string passed to EncodeProperty for meta '%s'", meta_entry->metaName);
#ifdef BLANK_PROP_VALUE
        string = BLANK_PROP_VALUE;  // gets dup'ed below
#else
        return 0;
#endif
    }


    /* make a working copy */
    string = estrdup( string );

    /* remove trailing white space  */
    {
        int i = strlen( string );

        while ( i  && isspace( (int)string[i-1]) )
            string[--i] = '\0';
    }


    if (is_meta_number( meta_entry ) || is_meta_date( meta_entry ))
    {
        int j;

        newstr = emalloc( sizeof( num ) + 1 );
        num = strtoul( string, &badchar, 10 ); // would base zero be more flexible?

        if ( num == ULONG_MAX )
        {
            progwarnno("EncodeProperty - Attempted to convert '%s' to a number", string );
            efree(string);
            (*error_flag)++;
            return 0;
        }

        if ( *badchar ) // I think this is how it works...
        {
            progwarn("EncodeProperty - Invalid char '%c' found in string '%s'", badchar[0], string);
            efree(string);
            (*error_flag)++;
            return 0;
        }
        /* I'll bet there's an easier way */
        num = PACKLONG(num);
        tmpnum = (char *)#

        for ( j=0; j <= (int)sizeof(num)-1; j++ )
            newstr[j] = (unsigned char)tmpnum[j];

        newstr[ sizeof(num) ] = '\0';

        *encodedStr = newstr;

        efree(string);

        return (int)sizeof(num);
    }


    if ( is_meta_string(meta_entry) )
    {
        /* replace all non-printing chars with a space -- this is questionable */
        // yep, sure is questionable -- isprint() kills 8859-1 chars.

        if ( !is_meta_nostrip(meta_entry) )
        {
            char *source, *dest;
            dest = string;
            for( source = string; *source; source++ )
            {
                /* Used to replace (<=' ') even spaces with a single space */
                if ( (int)((unsigned char)*source) < (int)' ' )
                {
                    if ( dest > string && *(dest - 1) != ' ' )
                    {
                        *dest = ' ';
                        dest++;
                    }
                    continue;
                }

                *dest = *source;
                dest++;
            }
            *dest = '\0';
        }

        *encodedStr = string;
        return (int)strlen( string );
    }


    progwarn("EncodeProperty called but doesn't know the property type :(");
    return 0;
}

/*******************************************************************
*   Creates a document property
*
*   Call with:
*       *metaEntry
*       *propValue  - string to add
*       *propLen    - length of string to add, but can be limited by metaEntry->max_size
*       preEncoded  - flag saying the data is already encoded
*                     (that's for filesize, last modified, start position)
*       *error_flag - integer to indicate the difference between an error and a blank property
*
*   Returns:
*       pointer to a newly created document property
*       NULL indicates property could not be created
*
*
********************************************************************/

propEntry *CreateProperty(struct metaEntry *meta_entry, unsigned char *propValue, int propLen, int preEncoded, int *error_flag )
{
    propEntry *docProp;


    /* convert string to a document property, if not already encoded */
    if ( !preEncoded )
    {
        char *tmp;

        propLen = EncodeProperty( meta_entry, &tmp, (char *)propValue, error_flag );

        if ( !propLen )  /* Error detected in encode */
            return NULL;

        /* Limit length */
        if ( is_meta_string(meta_entry) && meta_entry->max_len && propLen > meta_entry->max_len )
            propLen = meta_entry->max_len;

        propValue = (unsigned char *)tmp;
    }

    /* Now create the property $$ could be -1 */
    /* This is creating more memory than needed (4 extra bytes) */
    docProp=(propEntry *) emalloc(sizeof(propEntry) + propLen);

    memcpy(docProp->propValue, propValue, propLen);
    docProp->propLen = propLen;

    /* sizeof(propEntry) already contains space for part of the propLen, so throw a null in */
    /* This *hack* should all compare of the properites using strcoll, which expects a null */
    /* most places expect propValue to be exactly propLen long.  The null is not written to disk */
    docProp->propValue[propLen] = '\0';


    /* EncodeProperty creates a new string */
        if ( !preEncoded )
            efree( propValue );


    return docProp;
}

/*******************************************************************
*   Appends a string onto a current property
*
*   Call with:
*       *propEntry
*       *string
*       length of string
*
*   Will limit property length, if needed.
*
*******************************************************************/
propEntry *append_property( struct metaEntry *meta_entry, propEntry *p, char *txt, int length )
{
    int     newlen;
    int     add_a_space = 0;
    char   *str = NULL;
    int     error_flag = 0;

    length = EncodeProperty( meta_entry, &str, txt, &error_flag );

    if ( !length )
        return p;

    /* When appending, we separate by a space -- could be a config setting */
    if ( !isspace( (int)*str ) && !isspace( (int)p->propValue[p->propLen-1] ) )
        add_a_space++;


    /* Any room to add the property? */
    if ( meta_entry->max_len &&  (int)p->propLen + add_a_space >=  meta_entry->max_len )
    {
        if ( str )
            efree( str );

        return p;
    }


    newlen = p->propLen + length + add_a_space;

    /* limit length */
    if ( meta_entry->max_len && newlen >= meta_entry->max_len )
    {
        newlen = meta_entry->max_len;
        length = meta_entry->max_len - p->propLen - add_a_space;
    }


    /* Now reallocate the property */
    p = (propEntry *) erealloc(p, sizeof(propEntry) + newlen);

    if ( add_a_space )
        p->propValue[p->propLen++] = ' ';

    memcpy( (void *)&(p->propValue[p->propLen]), str, length );
    p->propLen = newlen;

    if (str)
        efree(str);

    return p;
}


/*******************************************************************
*   Scans the properties (metaEntry's), and adds a doc property to any that are flagged
*   Limits size, if needed (for StoreDescription)
*   Pass in text properties (not pre-encoded binary properties)
*
*   Call with:
*       *INDEXDATAHEADER (to get to the list of metanames)
*       **docProperties - pointer to list of properties
*       *propValue  - string to add
*       *propLen    - length of string to add
*
*   Returns:
*       void, but will warn on failed properties
*
*
********************************************************************/
void addDocProperties( INDEXDATAHEADER *header, docProperties **docProperties, unsigned char *propValue, int propLen, char *filename )
{
    struct metaEntry *m;
    int     i;

    for ( i = 0; i < header->metaCounter; i++)
    {
        m = header->metaEntryArray[i];

        if ( (m->metaType & META_PROP) && m->in_tag )
            if ( !addDocProperty( docProperties, m, propValue, propLen, 0 ) )
                progwarn("Failed to add property '%s' in file '%s'", m->metaName, filename );
    }
}




/*******************************************************************
*   Adds a document property to the list of properties.
*   Creates or extends the list, as necessary
*
*   Call with:
*       **docProperties - pointer to list of properties
*       *metaEntry
*       *propValue  - string to add
*       *propLen    - length of string to add
*       preEncoded  - flag saying the data is already encoded
*                     (that's for filesize, last modified, start position)
*
*   Returns:
*       true if added property
*       sets address of **docProperties, if list changes size
*
*
********************************************************************/

int addDocProperty( docProperties **docProperties, struct metaEntry *meta_entry, unsigned char *propValue, int propLen, int preEncoded )
{
    struct docProperties *dp = *docProperties;
    propEntry *docProp;
    int i;
    int error_flag;


    /* Allocate or extend the property array, if needed */

    if( !dp )
    {
        dp = (struct docProperties *) emalloc(sizeof(struct docProperties) + (meta_entry->metaID + 1) * sizeof(propEntry *));
        *docProperties = dp;

        dp->n = meta_entry->metaID + 1;

        for( i = 0; i < dp->n; i++ )
            dp->propEntry[i] = NULL;
    }

    else /* reallocate if needed */
    {
        if( dp->n <= meta_entry->metaID )
        {
            dp = (struct docProperties *) erealloc(dp,sizeof(struct docProperties) + (meta_entry->metaID + 1) * sizeof(propEntry *));

            *docProperties = dp;
            for( i = dp->n; i <= meta_entry->metaID; i++ )
                dp->propEntry[i] = NULL;

            dp->n = meta_entry->metaID + 1;
        }
    }

    /* Un-encoded STRINGS get appended to existing properties */
    /* Others generate a warning */
    if ( dp->propEntry[meta_entry->metaID] )
    {
        if ( is_meta_string(meta_entry) )
        {
            dp->propEntry[meta_entry->metaID] = append_property( meta_entry, dp->propEntry[meta_entry->metaID], (char *)propValue, propLen );
            return 1;
        }
        else // Will this come back and bite me?
        {
            progwarn("Warning: Attempt to add duplicate property." );
            return 0;
        }
    }


    /* create the document property */
    /* Ignore some errors */

    if ( !(docProp = CreateProperty( meta_entry, propValue, propLen, preEncoded, &error_flag )) )
        return error_flag ? 0 : 1;

    dp->propEntry[meta_entry->metaID] = docProp;

    return 1;
}

/* #define DEBUGPROP 1 */

#ifdef DEBUGPROP
static int insidecompare = 0;
#endif

/*******************************************************************
*   Compares two properties for sorting
*
*   Call with:
*       *metaEntry
*       *docPropertyEntry1
*       *docPropertyEntry2
*
*   Returns:
*       0 - two properties are the same
*      -1 - docPropertyEntry1 < docPropertyEntry2
*      +1 - docPropertyEntry1 > docPropertyEntry2
*
*
********************************************************************/
int Compare_Properties( struct metaEntry *meta_entry, propEntry *p1, propEntry *p2 )
{


#ifdef DEBUGPROP
    if ( !insidecompare++ )
    {
        printf("comparing properties for meta %s: returning: %d\n", meta_entry->metaName, Compare_Properties( meta_entry, p1, p2) );
        dump_single_property( p1, meta_entry );
        dump_single_property( p2, meta_entry );
        insidecompare = 0;
    }
#endif


    if ( !p1 && p2 )
        return -1;


    if ( !p1 && !p2 )
        return 0;

    if ( p1 && !p2 )
        return +1;


    if (is_meta_number( meta_entry ) || is_meta_date( meta_entry ))
        return memcmp( (const void *)p1->propValue, (const void *)p2->propValue, p1->propLen );


    if ( is_meta_string(meta_entry) )
    {
        int rc;
        int len = Min( p1->propLen, p2->propLen );

#ifdef HAVE_STRCOLL
        /* note that the propValue should be null terminated.  See CreateProperty()  */
        /* could use a strncoll() and one for case ignore */
#ifdef DEBUGPROP
        if ( is_meta_use_strcoll(meta_entry) )
            printf("using strcol %s <=> %s = %d\n",  (const char *)p1->propValue, (const char *)p2->propValue,strcoll( (const char *)p1->propValue, (const char *)p2->propValue ));
#endif
        if ( is_meta_use_strcoll(meta_entry) )
            return strcoll( (const char *)p1->propValue, (const char *)p2->propValue );
#endif

        rc = is_meta_ignore_case( meta_entry)
             ? strncasecmp( (char *)p1->propValue, (char *)p2->propValue, len )
             : strncmp( (char *)p1->propValue, (char *)p2->propValue, len );

#ifdef DEBUGPROP
        printf("using %s  %s <=> %s = %d\n", (is_meta_ignore_case( meta_entry) ? "strncasecmp" : "strncmp"),  (const char *)p1->propValue, (const char *)p2->propValue, rc );
#endif


        if ( rc != 0 )
            return rc;

        return p1->propLen - p2->propLen;
    }

    return 0;  /* This should be an error here */

}

/*******************************************************************
*   Duplicate a property that's already in memory and return it.
*
*   Caller must destroy
*
*********************************************************************/

static propEntry *duplicate_in_mem_property( docProperties *props, int metaID, int max_size )
{
    propEntry      *docProp;
    struct          metaEntry meta_entry;
    int             propLen;
    int             error_flag;

    if ( metaID >= props->n )
        return NULL;

    if ( !(docProp = props->propEntry[ metaID ]) )
        return NULL;


    meta_entry.metaName = "(default)";  /* for error message, I think */
    meta_entry.metaID   = metaID;


    /* Duplicate the property */
    propLen = docProp->propLen;

    /* Limit size,if possible - should really check if it's a string */
    if ( max_size && (max_size < propLen ))
        propLen = max_size;

    /* Duplicate the property */
    return CreateProperty( &meta_entry, docProp->propValue, propLen, 1, &error_flag );
}


#ifdef HAVE_ZLIB

/*******************************************************************
*   Allocate or reallocate the property buffer
*
*   The buffer is kept around to avoid reallocating for every prop of every doc
*
*
*
*********************************************************************/

unsigned char *allocatePropIOBuffer(SWISH *sw, unsigned long buf_needed )
{
    unsigned long total_size;

    if ( !buf_needed )
        progerr("Asked for too small of a buffer size!");


    if ( !sw->Prop_IO_Buf ||  buf_needed > sw->PropIO_allocated )
    {
        /* don't reallocate because we don't need to memcpy */
        if ( sw->Prop_IO_Buf )
            efree( sw->Prop_IO_Buf );


        total_size = buf_needed > sw->PropIO_allocated + RD_BUFFER_SIZE
                    ? buf_needed
                    : sw->PropIO_allocated + RD_BUFFER_SIZE;

        sw->Prop_IO_Buf = emalloc( total_size );
        sw->PropIO_allocated = total_size;  /* keep track of structure size */
    }


    return sw->Prop_IO_Buf;
}

#endif


/*******************************************************************
*   Uncompress a Property
*
*   Call with:
*       SWISH
*       *input_buf          - buffer address
*       buf_len             - size of buffer
*       *uncompressed_size  - size of original prop, or zero if not compressed.
*
*   Returns:
*       buffer address of uncompressed property
*       uncompressed_size is set to length of buffer
*
*
*********************************************************************/

static unsigned char *uncompress_property( SWISH *sw, unsigned char *input_buf, int buf_len, int *uncompressed_size )
{

#ifndef HAVE_ZLIB

    if ( *uncompressed_size )
        progerr("The index was created with zlib compression.\n This version of swish was not compiled with zlib");

    *uncompressed_size = buf_len;
    return input_buf;

#else
    unsigned char   *PropBuf;
    int             zlib_status = 0;
    uLongf          buf_size = (uLongf)*uncompressed_size;



    if ( *uncompressed_size == 0 ) /* wasn't compressed */
    {
        *uncompressed_size = buf_len;
        return input_buf;
    }



    /* make sure we have enough space */

    PropBuf = allocatePropIOBuffer( sw, *uncompressed_size );


    zlib_status = uncompress(PropBuf, &buf_size, input_buf, buf_len );
    if ( zlib_status != Z_OK )
    {
        // $$$ make sure this works ok if returning null $$$
        progwarn("Failed to uncompress Property. zlib uncompress returned: %d.  uncompressed size: %d buf_len: %d\n",
            zlib_status, buf_size, buf_len );
        return NULL;
    }


    *uncompressed_size = (int)buf_size;


    return PropBuf;


#endif

}




/*******************************************************************
*   Reads a single doc property - this is used for sorting
*
*   Caller needs to destroy returned property
*
*   Call with:
*       indexf  - which index to read from
*       FileRec - which contains filenum (key part 1)
*       metaID  - which prop (key part 2)
*       max_size- to limit size of property
*
*   Returns:
*       *propEntry - caller *must* destroy
*
*
*********************************************************************/
propEntry *ReadSingleDocPropertiesFromDisk( IndexFILE *indexf, FileRec *fi, int metaID, int max_size )
{
    SWISH           *sw = indexf->sw;
    int             propLen;
    int             error_flag;
    struct          metaEntry meta_entry;
    unsigned char  *buf;
    int             buf_len;            /* size on disk */
    int             uncompressed_len;   /* size uncompressed */
    propEntry      *docProp;
    unsigned char  *propbuf;
    INDEXDATAHEADER *header = &indexf->header;
    int             count;
    int             propIDX;


    /* initialize the first time called */
    if ( header->property_count == 0 )
        init_property_list(header);

    if ( (count = header->property_count) <= 0)
        return NULL;


    /* Map the propID to an index number */
    propIDX = header->metaID_to_PropIDX[metaID];

    if ( propIDX < 0 )
        progerr("Mapped propID %d to invalid property index", metaID );


    /* limit size if requested and is a string property - hope this isn't too slow */

    if ( max_size )
    {
        struct metaEntry *m = getPropNameByID(header, metaID); /* might be better if the caller passes the metaEntry */

        /* Reset to zero if the property is not a string */
        if ( !is_meta_string( m ) )
            max_size = 0;
    }





    /* already loaded? -- if so, duplicate the property for the given length */
    /* This should only happen if ReadAllDocPropertiesFromDisk() was called, and only with db_native.c */

    if ( fi->docProperties )
        return duplicate_in_mem_property( fi->docProperties, metaID, max_size );


    /* Otherwise, read from disk */

    if ( !(buf = (unsigned char*)DB_ReadProperty( sw, indexf, fi, metaID, &buf_len, &uncompressed_len, indexf->DB )))
        return NULL;

        if ( !(propbuf = uncompress_property( sw, buf, buf_len, &uncompressed_len )) )
            return NULL;

        propLen = uncompressed_len; /* just to be clear ;) */

    /* Limit size,if possible  */
    if ( max_size && (max_size < propLen ))
        propLen = max_size;


    meta_entry.metaName = "(default)";  /* for error message, I think */
    meta_entry.metaID   = metaID;

    docProp = CreateProperty( &meta_entry, propbuf, propLen, 1, &error_flag );

        efree( buf );
        return docProp;
}



/*******************************************************************
*   Reads the doc properties from disk
*
*   Maybe should return void, and just set?
*   Or maybe should take a filenum, and instead take a position?
*
*   The original idea (and the way it was written) was to use the seek
*   position of the first property, and the total length of all properties
*   then read all the properties in one fread call.
*   The plan was to call it in result_output.c, so all the props would get loaded
*   in one shot.
*   That design probably has little effect on performance.  Now we just call
*   ReadSingleDocPropertiesFromDisk for each prop.
*
*   Now, this is really just a way to populate the fi->docProperties structure.
*
*   2001-09 jmruiz Modified to be used by merge.c
*********************************************************************/

docProperties *ReadAllDocPropertiesFromDisk( IndexFILE *indexf, int filenum )
{
    FileRec         fi;
    propEntry      *new_prop;
    int             count;
    struct          metaEntry meta_entry;
    docProperties   *docProperties=NULL;
    INDEXDATAHEADER *header = &indexf->header;
    int             propIDX;



    /* Get a place to cache the pointers */
    memset(&fi,0, sizeof( FileRec ));
    fi.filenum = filenum;


    meta_entry.metaName = "(default)";  /* for error message, I think */


    /* initialize the first time called */
    if ( header->property_count == 0 )
        init_property_list(header);

    if ( (count = header->property_count) <= 0)
        return NULL;


    for ( propIDX = 0; propIDX < count; propIDX++ )
    {
        meta_entry.metaID = header->propIDX_to_metaID[propIDX];

        new_prop = ReadSingleDocPropertiesFromDisk( indexf, &fi, meta_entry.metaID, 0);

        if ( !new_prop )
            continue;

        // would be better if we didn't need to create a new property just to free one
        // this routine is currently only used by merge and dump.c

        addDocProperty(&docProperties, &meta_entry, new_prop->propValue, new_prop->propLen, 1 );

        efree( new_prop );
    }

    /* Free the prop seek location cache */
    if ( fi.prop_index )
        efree( fi.prop_index );

    return docProperties;
}





��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/check.c���������������������������������������������������������������������������0000664�0000771�0001750�00000012343�11166010110�012026� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
** Copyright (C) 1995, 1996, 1997, 1998 Hewlett-Packard Company
** Originally by Kevin Hughes, kev@kevcom.com, 3/11/94
**
*

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 15:51:39 CDT 2005
** added GPL
** if it was re-written by rasc, is it still copyright HP?


**
** fixed non-int subscripting pointed out by "gcc -Wall"
** SRE 2/22/00
**
** 2001-03-08 rasc   rewritten and enhanced suffix routines
**
*/

#include "swish.h"
#include "check.h"
#include "hash.h"
#include "swstring.h"
#include "mem.h"

/* Check if a file with a particular suffix should be indexed
** according to the settings in the configuration file.
*/

/* Should a word be indexed? Consults the stopword hash list
** and checks if the word is of a reasonable length...
** If you have any good rules that can work with most languages,
** please let me know...
*/

int     isokword(sw, word, indexf)
     SWISH  *sw;
     char   *word;
     IndexFILE *indexf;
{
    int     i,
            same,
            hasnumber,
            hasvowel,
            hascons,
            numberrow,
            vowelrow,
            consrow,
            wordlen;
    char    lastchar;

    if (word[0] == '\0')
        return 0;

    if ( is_word_in_hash_table( indexf->header.hashstoplist, word ) )
        return 0;

    wordlen = strlen(word);
    if ((wordlen < indexf->header.minwordlimit) || (wordlen > indexf->header.maxwordlimit))
        return 0;

    lastchar = '\0';
    same = 0;
    hasnumber = hasvowel = hascons = 0;
    numberrow = vowelrow = consrow = 0;

    for (i = 0; word[i] != '\0'; i++)
    {
        /* Max number of times a char can repeat in a word */
        if (word[i] == lastchar)
        {
            same++;
            if (same > IGNORESAME)
                return 0;
        }
        else
            same = 0;

        /* Max number of consecutive digits */
        if (isdigit((int) ( (unsigned char) word[i])))
        {
            hasnumber = 1;
            numberrow++;
            if (numberrow > IGNOREROWN)
                return 0;
            vowelrow = 0;
            consrow = 0;
        }

        /* maximum number of consecutive vowels a word can have */
        else if (isvowel(sw, word[i]))
        {
            hasvowel = 1;
            vowelrow++;
            if (vowelrow > IGNOREROWV)
                return 0;
            numberrow = 0;
            consrow = 0;
        }

        /* maximum number of consecutive consonants a word can have */
        else if (!ispunct((int) ( (unsigned char) word[i])))
        {
            hascons = 1;
            consrow++;
            if (consrow > IGNOREROWC)
                return 0;
            numberrow = 0;
            vowelrow = 0;
        }
        lastchar = word[i];
    }

    /* If IGNOREALLV is 1, words containing all vowels won't be indexed. */
    if (IGNOREALLV)
        if (hasvowel && !hascons)
            return 0;

    /* If IGNOREALLC is 1, words containing all consonants won't be indexed */
    if (IGNOREALLC)
        if (hascons && !hasvowel)
            return 0;

    /* If IGNOREALLN is 1, words containing all digits won't be indexed */
    if (IGNOREALLN)
        if (hasnumber && !hasvowel && !hascons)
            return 0;

    return 1;
}


/*
  -- Determine document type by checking the file extension
  -- of the filename
  -- Return: doctype
  -- 2001-03-08 rasc   rewritten (optimize and match also
  --                   e.g. ".htm", ".htm.de" or ".html.gz")
*/

int     getdoctype(char *filename, struct IndexContents *indexcontents)
{
    struct swline *swl;
    char   *s,
           *fe;


    if (!indexcontents)
        return NODOCTYPE;

    /* basically do a right to left compare */
    fe = (filename + strlen(filename));
    while (indexcontents)
    {
        swl = indexcontents->patt;

        while (swl)
        {
            s = fe - strlen(swl->line);
            if (s >= filename)
            {                   /* no negative overflow! */
                if (!strcasecmp(swl->line, s))
                {
                    return indexcontents->DocType;;
                }
            }
            swl = swl->next;
        }

        indexcontents = indexcontents->next;
    }

    return NODOCTYPE;
}





struct StoreDescription *hasdescription(int doctype, struct StoreDescription *sd)
{
    while (sd)
    {
        if (sd->DocType == doctype)
            return sd;
        sd = sd->next;
    }
    return NULL;
}
���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/extprog.h�������������������������������������������������������������������������0000775�0000771�0001750�00000002140�11166010110�012443� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/* 
extprog.h


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL


*/
void initModule_Prog (SWISH  *sw);
void freeModule_Prog (SWISH *sw);
int configModule_Prog (SWISH *sw, StringList *sl);



��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/double_metaphone.c����������������������������������������������������������������0000775�0000771�0001750�00000070761�11166010110�014276� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: double_metaphone.c 1736 2005-05-12 15:41:22Z karman $
**

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 15:51:39 CDT 2005
** added GPL


**
** August 20, 2002 moseley - first added to swish-e
**
** this is a very slightly modified version of the double_metaphone.c code
** from the Perl module Text::DoubleMetaphone by Maurice Aubrey, and based
** on the work of Lawrence Philips.
** See http://aspell.sourceforge.net/metaphone
**
** From the Text::DoubleMetaphone README file:

DESCRIPTION

  This module implements a "sounds like" algorithm developed
  by Lawrence Philips which he published in the June, 2000 issue
  of C/C++ Users Journal.  Double Metaphone is an improved
  version of Philips' original Metaphone algorithm.  

COPYRIGHT

  Copyright 2000, Maurice Aubrey <maurice@hevanet.com>. 
  All rights reserved.

  This code is based heavily on the C++ implementation by
  Lawrence Philips and incorporates several bug fixes courtesy
  of Kevin Atkinson <kevina@users.sourceforge.net>.

  This module is free software; you may redistribute it and/or
  modify it under the same terms as Perl itself.


** Mon May  9 16:19:47 CDT 2005
** "the same terms as Perl itself" means GPL if we want it. We do.
**

**
**
*/



#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>
#include <string.h>
#include <stdarg.h>
#include <assert.h>
#include "swish.h"  // $$$ yikes, sure brings in a lot
#include "double_metaphone.h"
#include "mem.h"

 
#define META_MALLOC(v,n,t) (v = (t*)emalloc(((n)*sizeof(t))))

#define META_REALLOC(v,n,t) (v = (t*)erealloc((v),((n)*sizeof(t))))

#define META_FREE(x) efree((x))
	 

metastring *
NewMetaString(char *init_str)
{
    metastring *s;
    char empty_string[] = "";

    META_MALLOC(s, 1, metastring);
    assert( s != NULL );

    if (init_str == NULL)
	init_str = empty_string;
    s->length  = strlen(init_str);
    /* preallocate a bit more for potential growth */
    s->bufsize = s->length + 7;

    META_MALLOC(s->str, s->bufsize, char);
    assert( s->str != NULL );
    
    strncpy(s->str, init_str, s->length + 1);
    s->free_string_on_destroy = 1;

    return s;
}


void
DestroyMetaString(metastring * s)
{
    if (s == NULL)
	return;

    if (s->free_string_on_destroy && (s->str != NULL))
	META_FREE(s->str);

    META_FREE(s);
}


void
IncreaseBuffer(metastring * s, int chars_needed)
{
    META_REALLOC(s->str, (s->bufsize + chars_needed + 10), char);
    assert( s->str != NULL );
    s->bufsize = s->bufsize + chars_needed + 10;
}


void
MakeUpper(metastring * s)
{
    char *i;

    for (i = s->str; *i; i++)
      {
	  *i = toupper(*i);
      }
}


int
IsVowel(metastring * s, int pos)
{
    char c;

    if ((pos < 0) || (pos >= s->length))
	return 0;

    c = *(s->str + pos);
    if ((c == 'A') || (c == 'E') || (c == 'I') || (c =='O') || 
        (c =='U')  || (c == 'Y'))
	return 1;

    return 0;
}


int
SlavoGermanic(metastring * s)
{
    if ((char *) strstr(s->str, "W"))
	return 1;
    else if ((char *) strstr(s->str, "K"))
	return 1;
    else if ((char *) strstr(s->str, "CZ"))
	return 1;
    else if ((char *) strstr(s->str, "WITZ"))
	return 1;
    else
	return 0;
}


int
GetLength(metastring * s)
{
    return s->length;
}


char
GetAt(metastring * s, int pos)
{
    if ((pos < 0) || (pos >= s->length))
	return '\0';

    return ((char) *(s->str + pos));
}


void
SetAt(metastring * s, int pos, char c)
{
    if ((pos < 0) || (pos >= s->length))
	return;

    *(s->str + pos) = c;
}


/* 
   Caveats: the START value is 0 based
*/
int
StringAt(metastring * s, int start, int length, ...)
{
    char *test;
    char *pos;
    va_list ap;

    if ((start < 0) || (start >= s->length))
        return 0;

    pos = (s->str + start);
    va_start(ap, length);

    do
      {
	  test = va_arg(ap, char *);
	  if (*test && (strncmp(pos, test, length) == 0))
	      return 1;
      }
    while (strcmp(test, ""));

    va_end(ap);

    return 0;
}


void
MetaphAdd(metastring * s, char *new_str)
{
    int add_length;

    if (new_str == NULL)
	return;

    add_length = strlen(new_str);
    if ((s->length + add_length) > (s->bufsize - 1))
      {
	  IncreaseBuffer(s, add_length);
      }

    strcat(s->str, new_str);
    s->length += add_length;
}


void
DoubleMetaphone(const char *str, char **codes)
{
    int        length;
    metastring *original;
    metastring *primary;
    metastring *secondary;
    int        current;
    int        last;

    current = 0;
    /* we need the real length and last prior to padding */
    length  = strlen(str); 
    last    = length - 1; 
    original = NewMetaString((char *)str);
    /* Pad original so we can index beyond end */
    MetaphAdd(original, "     ");

    primary = NewMetaString("");
    secondary = NewMetaString("");
    primary->free_string_on_destroy = 0;
    secondary->free_string_on_destroy = 0;

    MakeUpper(original);

    /* skip these when at start of word */
    if (StringAt(original, 0, 2, "GN", "KN", "PN", "WR", "PS", ""))
	current += 1;

    /* Initial 'X' is pronounced 'Z' e.g. 'Xavier' */
    if (GetAt(original, 0) == 'X')
      {
	  MetaphAdd(primary, "S");	/* 'Z' maps to 'S' */
	  MetaphAdd(secondary, "S");
	  current += 1;
      }

    /* main loop */
    while ((primary->length < 4) || (secondary->length < 4))  
      {
	  if (current >= length)
	      break;

	  switch (GetAt(original, current))
	    {
	    case 'A':
	    case 'E':
	    case 'I':
	    case 'O':
	    case 'U':
	    case 'Y':
		if (current == 0)
                  {
		    /* all init vowels now map to 'A' */
		    MetaphAdd(primary, "A");
		    MetaphAdd(secondary, "A");
                  }
		current += 1;
		break;

	    case 'B':

		/* "-mb", e.g", "dumb", already skipped over... */
		MetaphAdd(primary, "P");
		MetaphAdd(secondary, "P");

		if (GetAt(original, current + 1) == 'B')
		    current += 2;
		else
		    current += 1;
		break;

	    case 'Ç':
		MetaphAdd(primary, "S");
		MetaphAdd(secondary, "S");
		current += 1;
		break;

	    case 'C':
		/* various germanic */
		if ((current > 1)
		    && !IsVowel(original, current - 2)
		    && StringAt(original, (current - 1), 3, "ACH", "")
		    && ((GetAt(original, current + 2) != 'I')
			&& ((GetAt(original, current + 2) != 'E')
			    || StringAt(original, (current - 2), 6, "BACHER",
					"MACHER", ""))))
		  {
		      MetaphAdd(primary, "K");
		      MetaphAdd(secondary, "K");
		      current += 2;
		      break;
		  }

		/* special case 'caesar' */
		if ((current == 0)
		    && StringAt(original, current, 6, "CAESAR", ""))
		  {
		      MetaphAdd(primary, "S");
		      MetaphAdd(secondary, "S");
		      current += 2;
		      break;
		  }

		/* italian 'chianti' */
		if (StringAt(original, current, 4, "CHIA", ""))
		  {
		      MetaphAdd(primary, "K");
		      MetaphAdd(secondary, "K");
		      current += 2;
		      break;
		  }

		if (StringAt(original, current, 2, "CH", ""))
		  {
		      /* find 'michael' */
		      if ((current > 0)
			  && StringAt(original, current, 4, "CHAE", ""))
			{
			    MetaphAdd(primary, "K");
			    MetaphAdd(secondary, "X");
			    current += 2;
			    break;
			}

		      /* greek roots e.g. 'chemistry', 'chorus' */
		      if ((current == 0)
			  && (StringAt(original, (current + 1), 5, "HARAC", "HARIS", "")
			   || StringAt(original, (current + 1), 3, "HOR",
				       "HYM", "HIA", "HEM", ""))
			  && !StringAt(original, 0, 5, "CHORE", ""))
			{
			    MetaphAdd(primary, "K");
			    MetaphAdd(secondary, "K");
			    current += 2;
			    break;
			}

		      /* germanic, greek, or otherwise 'ch' for 'kh' sound */
		      if (
			  (StringAt(original, 0, 4, "VAN ", "VON ", "")
			   || StringAt(original, 0, 3, "SCH", ""))
			  /*  'architect but not 'arch', 'orchestra', 'orchid' */
			  || StringAt(original, (current - 2), 6, "ORCHES",
				      "ARCHIT", "ORCHID", "")
			  || StringAt(original, (current + 2), 1, "T", "S",
				      "")
			  || ((StringAt(original, (current - 1), 1, "A", "O", "U", "E", "") 
                          || (current == 0))
			   /* e.g., 'wachtler', 'wechsler', but not 'tichner' */
			  && StringAt(original, (current + 2), 1, "L", "R",
		                      "N", "M", "B", "H", "F", "V", "W", " ", "")))
			{
			    MetaphAdd(primary, "K");
			    MetaphAdd(secondary, "K");
			}
		      else
			{
			    if (current > 0)
			      {
				  if (StringAt(original, 0, 2, "MC", ""))
				    {
					/* e.g., "McHugh" */
					MetaphAdd(primary, "K");
					MetaphAdd(secondary, "K");
				    }
				  else
				    {
					MetaphAdd(primary, "X");
					MetaphAdd(secondary, "K");
				    }
			      }
			    else
			      {
				  MetaphAdd(primary, "X");
				  MetaphAdd(secondary, "X");
			      }
			}
		      current += 2;
		      break;
		  }
		/* e.g, 'czerny' */
		if (StringAt(original, current, 2, "CZ", "")
		    && !StringAt(original, (current - 2), 4, "WICZ", ""))
		  {
		      MetaphAdd(primary, "S");
		      MetaphAdd(secondary, "X");
		      current += 2;
		      break;
		  }

		/* e.g., 'focaccia' */
		if (StringAt(original, (current + 1), 3, "CIA", ""))
		  {
		      MetaphAdd(primary, "X");
		      MetaphAdd(secondary, "X");
		      current += 3;
		      break;
		  }

		/* double 'C', but not if e.g. 'McClellan' */
		if (StringAt(original, current, 2, "CC", "")
		    && !((current == 1) && (GetAt(original, 0) == 'M')))
		{
		    /* 'bellocchio' but not 'bacchus' */
		    if (StringAt(original, (current + 2), 1, "I", "E", "H", "")
			&& !StringAt(original, (current + 2), 2, "HU", ""))
		      {
			  /* 'accident', 'accede' 'succeed' */
			  if (
			      ((current == 1)
			       && (GetAt(original, current - 1) == 'A'))
			      || StringAt(original, (current - 1), 5, "UCCEE",
					  "UCCES", ""))
			    {
				MetaphAdd(primary, "KS");
				MetaphAdd(secondary, "KS");
				/* 'bacci', 'bertucci', other italian */
			    }
			  else
			    {
				MetaphAdd(primary, "X");
				MetaphAdd(secondary, "X");
			    }
			  current += 3;
			  break;
		      }
		    else
		      {	  /* Pierce's rule */
			  MetaphAdd(primary, "K");
			  MetaphAdd(secondary, "K");
			  current += 2;
			  break;
		      }
		}

		if (StringAt(original, current, 2, "CK", "CG", "CQ", ""))
		  {
		      MetaphAdd(primary, "K");
		      MetaphAdd(secondary, "K");
		      current += 2;
		      break;
		  }

		if (StringAt(original, current, 2, "CI", "CE", "CY", ""))
		  {
		      /* italian vs. english */
		      if (StringAt
			  (original, current, 3, "CIO", "CIE", "CIA", ""))
			{
			    MetaphAdd(primary, "S");
			    MetaphAdd(secondary, "X");
			}
		      else
			{
			    MetaphAdd(primary, "S");
			    MetaphAdd(secondary, "S");
			}
		      current += 2;
		      break;
		  }

		/* else */
		MetaphAdd(primary, "K");
		MetaphAdd(secondary, "K");

		/* name sent in 'mac caffrey', 'mac gregor */
		if (StringAt(original, (current + 1), 2, " C", " Q", " G", ""))
		    current += 3;
		else
		    if (StringAt(original, (current + 1), 1, "C", "K", "Q", "")
			&& !StringAt(original, (current + 1), 2, "CE", "CI", ""))
		    current += 2;
		else
		    current += 1;
		break;

	    case 'D':
		if (StringAt(original, current, 2, "DG", ""))
                  {
		      if (StringAt(original, (current + 2), 1, "I", "E", "Y", ""))
		        {
			    /* e.g. 'edge' */
			    MetaphAdd(primary, "J");
			    MetaphAdd(secondary, "J");
			    current += 3;
			    break;
		        }
		      else
		        {
			    /* e.g. 'edgar' */
			    MetaphAdd(primary, "TK");
			    MetaphAdd(secondary, "TK");
			    current += 2;
			    break;
		        }
                  }

		if (StringAt(original, current, 2, "DT", "DD", ""))
		  {
		      MetaphAdd(primary, "T");
		      MetaphAdd(secondary, "T");
		      current += 2;
		      break;
		  }

		/* else */
		MetaphAdd(primary, "T");
		MetaphAdd(secondary, "T");
		current += 1;
		break;

	    case 'F':
		if (GetAt(original, current + 1) == 'F')
		    current += 2;
		else
		    current += 1;
		MetaphAdd(primary, "F");
		MetaphAdd(secondary, "F");
		break;

	    case 'G':
		if (GetAt(original, current + 1) == 'H')
		  {
		      if ((current > 0) && !IsVowel(original, current - 1))
			{
			    MetaphAdd(primary, "K");
			    MetaphAdd(secondary, "K");
			    current += 2;
			    break;
			}

		      if (current < 3)
			{
			    /* 'ghislane', ghiradelli */
			    if (current == 0)
			      {
				  if (GetAt(original, current + 2) == 'I')
				    {
					MetaphAdd(primary, "J");
					MetaphAdd(secondary, "J");
				    }
				  else
				    {
					MetaphAdd(primary, "K");
					MetaphAdd(secondary, "K");
				    }
				  current += 2;
				  break;
			      }
			}
		      /* Parker's rule (with some further refinements) - e.g., 'hugh' */
		      if (
			  ((current > 1)
			   && StringAt(original, (current - 2), 1, "B", "H", "D", ""))
			  /* e.g., 'bough' */
			  || ((current > 2)
			      && StringAt(original, (current - 3), 1, "B", "H", "D", ""))
			  /* e.g., 'broughton' */
			  || ((current > 3)
			      && StringAt(original, (current - 4), 1, "B", "H", "")))
			{
			    current += 2;
			    break;
			}
		      else
			{
			    /* e.g., 'laugh', 'McLaughlin', 'cough', 'gough', 'rough', 'tough' */
			    if ((current > 2)
				&& (GetAt(original, current - 1) == 'U')
				&& StringAt(original, (current - 3), 1, "C",
					    "G", "L", "R", "T", ""))
			      {
				  MetaphAdd(primary, "F");
				  MetaphAdd(secondary, "F");
			      }
			    else if ((current > 0)
				     && GetAt(original, current - 1) != 'I')
			      {


				  MetaphAdd(primary, "K");
				  MetaphAdd(secondary, "K");
			      }

			    current += 2;
			    break;
			}
		  }

		if (GetAt(original, current + 1) == 'N')
		  {
		      if ((current == 1) && IsVowel(original, 0)
			  && !SlavoGermanic(original))
			{
			    MetaphAdd(primary, "KN");
			    MetaphAdd(secondary, "N");
			}
		      else
			  /* not e.g. 'cagney' */
			  if (!StringAt(original, (current + 2), 2, "EY", "")
			      && (GetAt(original, current + 1) != 'Y')
			      && !SlavoGermanic(original))
			{
			    MetaphAdd(primary, "N");
			    MetaphAdd(secondary, "KN");
			}
		      else
                        {
			    MetaphAdd(primary, "KN");
		            MetaphAdd(secondary, "KN");
                        }
		      current += 2;
		      break;
		  }

		/* 'tagliaro' */
		if (StringAt(original, (current + 1), 2, "LI", "")
		    && !SlavoGermanic(original))
		  {
		      MetaphAdd(primary, "KL");
		      MetaphAdd(secondary, "L");
		      current += 2;
		      break;
		  }

		/* -ges-,-gep-,-gel-, -gie- at beginning */
		if ((current == 0)
		    && ((GetAt(original, current + 1) == 'Y')
			|| StringAt(original, (current + 1), 2, "ES", "EP",
				    "EB", "EL", "EY", "IB", "IL", "IN", "IE",
				    "EI", "ER", "")))
		  {
		      MetaphAdd(primary, "K");
		      MetaphAdd(secondary, "J");
		      current += 2;
		      break;
		  }

		/*  -ger-,  -gy- */
		if (
		    (StringAt(original, (current + 1), 2, "ER", "")
		     || (GetAt(original, current + 1) == 'Y'))
		    && !StringAt(original, 0, 6, "DANGER", "RANGER", "MANGER", "")
		    && !StringAt(original, (current - 1), 1, "E", "I", "")
		    && !StringAt(original, (current - 1), 3, "RGY", "OGY",
				 ""))
		  {
		      MetaphAdd(primary, "K");
		      MetaphAdd(secondary, "J");
		      current += 2;
		      break;
		  }

		/*  italian e.g, 'biaggi' */
		if (StringAt(original, (current + 1), 1, "E", "I", "Y", "")
		    || StringAt(original, (current - 1), 4, "AGGI", "OGGI", ""))
		  {
		      /* obvious germanic */
		      if (
			  (StringAt(original, 0, 4, "VAN ", "VON ", "")
			   || StringAt(original, 0, 3, "SCH", ""))
			  || StringAt(original, (current + 1), 2, "ET", ""))
			{
			    MetaphAdd(primary, "K");
			    MetaphAdd(secondary, "K");
			}
		      else
			{
			    /* always soft if french ending */
			    if (StringAt
				(original, (current + 1), 4, "IER ", ""))
			      {
				  MetaphAdd(primary, "J");
				  MetaphAdd(secondary, "J");
			      }
			    else
			      {
				  MetaphAdd(primary, "J");
				  MetaphAdd(secondary, "K");
			      }
			}
		      current += 2;
		      break;
		  }

		if (GetAt(original, current + 1) == 'G')
		    current += 2;
		else
		    current += 1;
		MetaphAdd(primary, "K");
		MetaphAdd(secondary, "K");
		break;

	    case 'H':
		/* only keep if first & before vowel or btw. 2 vowels */
		if (((current == 0) || IsVowel(original, current - 1))
		    && IsVowel(original, current + 1))
		  {
		      MetaphAdd(primary, "H");
		      MetaphAdd(secondary, "H");
		      current += 2;
		  }
		else		/* also takes care of 'HH' */
		    current += 1;
		break;

	    case 'J':
		/* obvious spanish, 'jose', 'san jacinto' */
		if (StringAt(original, current, 4, "JOSE", "")
		    || StringAt(original, 0, 4, "SAN ", ""))
		  {
		      if (((current == 0)
			   && (GetAt(original, current + 4) == ' '))
			  || StringAt(original, 0, 4, "SAN ", ""))
			{
			    MetaphAdd(primary, "H");
			    MetaphAdd(secondary, "H");
			}
		      else
			{
			    MetaphAdd(primary, "J");
			    MetaphAdd(secondary, "H");
			}
		      current += 1;
		      break;
		  }

		if ((current == 0)
		    && !StringAt(original, current, 4, "JOSE", ""))
		  {
		      MetaphAdd(primary, "J");	/* Yankelovich/Jankelowicz */
		      MetaphAdd(secondary, "A");
		  }
		else
		  {
		      /* spanish pron. of e.g. 'bajador' */
		      if (IsVowel(original, current - 1)
			  && !SlavoGermanic(original)
			  && ((GetAt(original, current + 1) == 'A')
			      || (GetAt(original, current + 1) == 'O')))
			{
			    MetaphAdd(primary, "J");
			    MetaphAdd(secondary, "H");
			}
		      else
			{
			    if (current == last)
			      {
				  MetaphAdd(primary, "J");
				  MetaphAdd(secondary, "");
			      }
			    else
			      {
				  if (!StringAt(original, (current + 1), 1, "L", "T",
				                "K", "S", "N", "M", "B", "Z", "")
				      && !StringAt(original, (current - 1), 1,
						   "S", "K", "L", "")) 
                                    {
				      MetaphAdd(primary, "J");
				      MetaphAdd(secondary, "J");
                                    }
			      }
			}
		  }

		if (GetAt(original, current + 1) == 'J')	/* it could happen! */
		    current += 2;
		else
		    current += 1;
		break;

	    case 'K':
		if (GetAt(original, current + 1) == 'K')
		    current += 2;
		else
		    current += 1;
		MetaphAdd(primary, "K");
		MetaphAdd(secondary, "K");
		break;

	    case 'L':
		if (GetAt(original, current + 1) == 'L')
		  {
		      /* spanish e.g. 'cabrillo', 'gallegos' */
		      if (((current == (length - 3))
			   && StringAt(original, (current - 1), 4, "ILLO",
				       "ILLA", "ALLE", ""))
			  || ((StringAt(original, (last - 1), 2, "AS", "OS", "")
			    || StringAt(original, last, 1, "A", "O", ""))
			   && StringAt(original, (current - 1), 4, "ALLE", "")))
			{
			    MetaphAdd(primary, "L");
			    MetaphAdd(secondary, "");
			    current += 2;
			    break;
			}
		      current += 2;
		  }
		else
		    current += 1;
		MetaphAdd(primary, "L");
		MetaphAdd(secondary, "L");
		break;

	    case 'M':
		if ((StringAt(original, (current - 1), 3, "UMB", "")
		     && (((current + 1) == last)
			 || StringAt(original, (current + 2), 2, "ER", "")))
		    /* 'dumb','thumb' */
		    || (GetAt(original, current + 1) == 'M'))
		    current += 2;
		else
		    current += 1;
		MetaphAdd(primary, "M");
		MetaphAdd(secondary, "M");
		break;

	    case 'N':
		if (GetAt(original, current + 1) == 'N')
		    current += 2;
		else
		    current += 1;
		MetaphAdd(primary, "N");
		MetaphAdd(secondary, "N");
		break;

	    case 'Ñ':
		current += 1;
		MetaphAdd(primary, "N");
		MetaphAdd(secondary, "N");
		break;

	    case 'P':
		if (GetAt(original, current + 1) == 'H')
		  {
		      MetaphAdd(primary, "F");
		      MetaphAdd(secondary, "F");
		      current += 2;
		      break;
		  }

		/* also account for "campbell", "raspberry" */
		if (StringAt(original, (current + 1), 1, "P", "B", ""))
		    current += 2;
		else
		    current += 1;
		MetaphAdd(primary, "P");
		MetaphAdd(secondary, "P");
		break;

	    case 'Q':
		if (GetAt(original, current + 1) == 'Q')
		    current += 2;
		else
		    current += 1;
		MetaphAdd(primary, "K");
		MetaphAdd(secondary, "K");
		break;

	    case 'R':
		/* french e.g. 'rogier', but exclude 'hochmeier' */
		if ((current == last)
		    && !SlavoGermanic(original)
		    && StringAt(original, (current - 2), 2, "IE", "")
		    && !StringAt(original, (current - 4), 2, "ME", "MA", ""))
		  {
		      MetaphAdd(primary, "");
		      MetaphAdd(secondary, "R");
		  }
		else
		  {
		      MetaphAdd(primary, "R");
		      MetaphAdd(secondary, "R");
		  }

		if (GetAt(original, current + 1) == 'R')
		    current += 2;
		else
		    current += 1;
		break;

	    case 'S':
		/* special cases 'island', 'isle', 'carlisle', 'carlysle' */
		if (StringAt(original, (current - 1), 3, "ISL", "YSL", ""))
		  {
		      current += 1;
		      break;
		  }

		/* special case 'sugar-' */
		if ((current == 0)
		    && StringAt(original, current, 5, "SUGAR", ""))
		  {
		      MetaphAdd(primary, "X");
		      MetaphAdd(secondary, "S");
		      current += 1;
		      break;
		  }

		if (StringAt(original, current, 2, "SH", ""))
		  {
		      /* germanic */
		      if (StringAt
			  (original, (current + 1), 4, "HEIM", "HOEK", "HOLM",
			   "HOLZ", ""))
			{
			    MetaphAdd(primary, "S");
			    MetaphAdd(secondary, "S");
			}
		      else
			{
			    MetaphAdd(primary, "X");
			    MetaphAdd(secondary, "X");
			}
		      current += 2;
		      break;
		  }

		/* italian & armenian */
		if (StringAt(original, current, 3, "SIO", "SIA", "")
		    || StringAt(original, current, 4, "SIAN", ""))
		  {
		      if (!SlavoGermanic(original))
			{
			    MetaphAdd(primary, "S");
			    MetaphAdd(secondary, "X");
			}
		      else
			{
			    MetaphAdd(primary, "S");
			    MetaphAdd(secondary, "S");
			}
		      current += 3;
		      break;
		  }

		/* german & anglicisations, e.g. 'smith' match 'schmidt', 'snider' match 'schneider' 
		   also, -sz- in slavic language altho in hungarian it is pronounced 's' */
		if (((current == 0)
		     && StringAt(original, (current + 1), 1, "M", "N", "L", "W", ""))
		    || StringAt(original, (current + 1), 1, "Z", ""))
		  {
		      MetaphAdd(primary, "S");
		      MetaphAdd(secondary, "X");
		      if (StringAt(original, (current + 1), 1, "Z", ""))
			  current += 2;
		      else
			  current += 1;
		      break;
		  }

		if (StringAt(original, current, 2, "SC", ""))
		  {
		      /* Schlesinger's rule */
		      if (GetAt(original, current + 2) == 'H')
		      {
			  /* dutch origin, e.g. 'school', 'schooner' */
			  if (StringAt(original, (current + 3), 2, "OO", "ER", "EN",
			               "UY", "ED", "EM", ""))
			    {
				/* 'schermerhorn', 'schenker' */
				if (StringAt(original, (current + 3), 2, "ER", "EN", ""))
				  {
				      MetaphAdd(primary, "X");
				      MetaphAdd(secondary, "SK");
				  }
				else
                                  {
				      MetaphAdd(primary, "SK");
				      MetaphAdd(secondary, "SK");
                                  }
				current += 3;
				break;
			    }
			  else
			    {
				if ((current == 0) && !IsVowel(original, 3)
				    && (GetAt(original, 3) != 'W'))
				  {
				      MetaphAdd(primary, "X");
				      MetaphAdd(secondary, "S");
				  }
				else
				  {
				      MetaphAdd(primary, "X");
				      MetaphAdd(secondary, "X");
				  }
				current += 3;
				break;
			    }

		      if (StringAt(original, (current + 2), 1, "I", "E", "Y", ""))
			{
			    MetaphAdd(primary, "S");
			    MetaphAdd(secondary, "S");
			    current += 3;
			    break;
			}
		      /* else */
		      MetaphAdd(primary, "SK");
		      MetaphAdd(secondary, "SK");
		      current += 3;
		      break;
		    }
		  }

		/* french e.g. 'resnais', 'artois' */
		if ((current == last)
		    && StringAt(original, (current - 2), 2, "AI", "OI", ""))
		  {
		      MetaphAdd(primary, "");
		      MetaphAdd(secondary, "S");
		  }
		else
		  {
		      MetaphAdd(primary, "S");
		      MetaphAdd(secondary, "S");
		  }

		if (StringAt(original, (current + 1), 1, "S", "Z", ""))
		    current += 2;
		else
		    current += 1;
		break;

	    case 'T':
		if (StringAt(original, current, 4, "TION", ""))
		  {
		      MetaphAdd(primary, "X");
		      MetaphAdd(secondary, "X");
		      current += 3;
		      break;
		  }

		if (StringAt(original, current, 3, "TIA", "TCH", ""))
		  {
		      MetaphAdd(primary, "X");
		      MetaphAdd(secondary, "X");
		      current += 3;
		      break;
		  }

		if (StringAt(original, current, 2, "TH", "")
		    || StringAt(original, current, 3, "TTH", ""))
		  {
		      /* special case 'thomas', 'thames' or germanic */
		      if (StringAt(original, (current + 2), 2, "OM", "AM", "")
			  || StringAt(original, 0, 4, "VAN ", "VON ", "")
			  || StringAt(original, 0, 3, "SCH", ""))
			{
			    MetaphAdd(primary, "T");
			    MetaphAdd(secondary, "T");
			}
		      else
			{
			    MetaphAdd(primary, "0");
			    MetaphAdd(secondary, "T");
			}
		      current += 2;
		      break;
		  }

		if (StringAt(original, (current + 1), 1, "T", "D", ""))
		    current += 2;
		else
		    current += 1;
		MetaphAdd(primary, "T");
		MetaphAdd(secondary, "T");
		break;

	    case 'V':
		if (GetAt(original, current + 1) == 'V')
		    current += 2;
		else
		    current += 1;
		MetaphAdd(primary, "F");
		MetaphAdd(secondary, "F");
		break;

	    case 'W':
		/* can also be in middle of word */
		if (StringAt(original, current, 2, "WR", ""))
		  {
		      MetaphAdd(primary, "R");
		      MetaphAdd(secondary, "R");
		      current += 2;
		      break;
		  }

		if ((current == 0)
		    && (IsVowel(original, current + 1)
			|| StringAt(original, current, 2, "WH", "")))
		  {
		      /* Wasserman should match Vasserman */
		      if (IsVowel(original, current + 1))
			{
			    MetaphAdd(primary, "A");
			    MetaphAdd(secondary, "F");
			}
		      else
			{
			    /* need Uomo to match Womo */
			    MetaphAdd(primary, "A");
			    MetaphAdd(secondary, "A");
			}
		  }

		/* Arnow should match Arnoff */
		if (((current == last) && IsVowel(original, current - 1))
		    || StringAt(original, (current - 1), 5, "EWSKI", "EWSKY",
				"OWSKI", "OWSKY", "")
		    || StringAt(original, 0, 3, "SCH", ""))
		  {
		      MetaphAdd(primary, "");
		      MetaphAdd(secondary, "F");
		      current += 1;
		      break;
		  }

		/* polish e.g. 'filipowicz' */
		if (StringAt(original, current, 4, "WICZ", "WITZ", ""))
		  {
		      MetaphAdd(primary, "TS");
		      MetaphAdd(secondary, "FX");
		      current += 4;
		      break;
		  }

		/* else skip it */
		current += 1;
		break;

	    case 'X':
		/* french e.g. breaux */
		if (!((current == last)
		      && (StringAt(original, (current - 3), 3, "IAU", "EAU", "")
		       || StringAt(original, (current - 2), 2, "AU", "OU", ""))))
                  {
		      MetaphAdd(primary, "KS");
		      MetaphAdd(secondary, "KS");
                  }
                  

		if (StringAt(original, (current + 1), 1, "C", "X", ""))
		    current += 2;
		else
		    current += 1;
		break;

	    case 'Z':
		/* chinese pinyin e.g. 'zhao' */
		if (GetAt(original, current + 1) == 'H')
		  {
		      MetaphAdd(primary, "J");
		      MetaphAdd(secondary, "J");
		      current += 2;
		      break;
		  }
		else if (StringAt(original, (current + 1), 2, "ZO", "ZI", "ZA", "")
			|| (SlavoGermanic(original)
			    && ((current > 0)
				&& GetAt(original, current - 1) != 'T')))
		  {
		      MetaphAdd(primary, "S");
		      MetaphAdd(secondary, "TS");
		  }
		else
                  {
		    MetaphAdd(primary, "S");
		    MetaphAdd(secondary, "S");
                  }

		if (GetAt(original, current + 1) == 'Z')
		    current += 2;
		else
		    current += 1;
		break;

	    default:
		current += 1;
	    }
        /* printf("PRIMARY: %s\n", primary->str);
        printf("SECONDARY: %s\n", secondary->str);  */
      }


    if (primary->length > 4)
	SetAt(primary, 4, '\0');

    if (secondary->length > 4)
	SetAt(secondary, 4, '\0');

    *codes = primary->str;
    *++codes = secondary->str;

    DestroyMetaString(original);
    DestroyMetaString(primary);
    DestroyMetaString(secondary);
}

���������������swish-e-2.4.7/src/http.h����������������������������������������������������������������������������0000664�0000771�0001750�00000003761�11166010110�011741� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/* http.h

$Id: http.h 1736 2005-05-12 15:41:22Z karman $


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL


**/

#ifndef __HasSeenModule_HTTP
#define __HasSeenModule_HTTP       1

#define MAXPIDLEN 32 /* 32 is for the pid identifier and the trailing null */

/*
   -- module data
*/

struct MOD_HTTP
{
        /* spider directory for index (HTTP method) */
    int     lenspiderdirectory;
    char   *spiderdirectory;

        /* http system specific configuration parameters */
    int     maxdepth;
    int     delay;
    struct multiswline *equivalentservers;

    struct url_info *url_hash[BIGHASHSIZE];
};

void initModule_HTTP (SWISH *);
void freeModule_HTTP (SWISH *);
int  configModule_HTTP (SWISH *, StringList *);


char *url_method ( char *url, int *plen );
char *url_serverport (char *url, int *plen);
char *url_uri (char *url, int *plen);
int get(SWISH * sw, char *contenttype_or_redirect, time_t *last_modified, time_t * plastretrieval, char *file_prefix, char *url);
int cmdf (int (*cmd)(const char *), char *fmt, char *,pid_t pid);
char *readline (FILE *fp);
pid_t lgetpid ();


#endif

���������������swish-e-2.4.7/src/entities.c������������������������������������������������������������������������0000664�0000771�0001750�00000077505�11166010110�012610� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: entities.c 1838 2006-10-18 02:58:02Z karman $
**
** (c) Rainer.Scherg
**
**

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 15:51:39 CDT 2005
** added GPL

**************************************************************************************




** HTML entity routines (encoding, etc.):
**
** internally we are working with int/wchar_t to support unicode-16 for future
** enhancements of swish (rasc - Rainer Scherg).
**
** 2001-05-05  rasc   
**
*/




#include <stdlib.h>
#include "swish.h"
#include "mem.h"
#include "swstring.h"
#include "parse_conffile.h"
#include "config.h"
#include "entities.h"


/*
** ----------------------------------------------
** 
**  Private Module Data
**
** ----------------------------------------------
*/

/* Prototypes */

static int is_EOE(int c);       /* is_EndOfEntity */




#define  MAX_ENTITY_LEN  16     /* max chars after where we have to see the EOE */

/*
  -- Entity encoding/decoding structure 
*/

/* #define IS_EOE(a)   ((a)==';')    -- be W3C compliant */
#define IS_EOE(a)   (is_EOE((int)(a))) /* tolerant routine */


typedef struct
{
    char   *name;
    int     code;
}
CEntity;


/*
  -- CEntity Quick Hash structure
  -- works like follow: Array of ASCII-7 start "positions" (1. char of entity name)
  -- each entry can have a chain of pointers
  -- e.g.  "e; --> ['q']->ce(.name .code)
  --                        ->next (chains all &q...;)
  -- lots of slots in the array will be empty because only [A-Z] and [a-z]
  -- is needed. But this cost hardly any memory, and is convenient...  (rasc)
  -- The hash sequence list will be re-sequenced during(!) usage (dynamic re-chaining).
  -- This brings down compares to almost 1 strcmp on entity checks. 
  --
  -- Warning: don't change this (ce_hasharray,etc) unless you know how this really works!
  --
  --   2001-05-14  Rainer.Scherg@rexroth.de (rasc)
  --
 */

struct CEHE
{                               /* CharEntityHashEntry */
    CEntity *ce;
    struct CEHE *next;
};

static struct CEHE *ce_hasharray[128];
static int ce_hasharray_initialized = 0;


/*
 -- the following table is retrieved from HTML4.x / SGML definitions
 -- of the W3C  (did it automated 2001-05-05).
 --   http://www.w3.org/TR/html40/
 --   http://www.w3.org/TR/1999/REC-html401-19991224/sgml/entities.html
 -- 
 -- 2001-05-07 Rainer.Scherg
*/


static CEntity entity_table[] = {
    {"quot", 0x0022},           /* quotation mark = APL quote, U+0022 ISOnum */
    {"amp", 0x0026},            /* ampersand, U+0026 ISOnum */
    {"apos", 0x0027},           /* single quote */
    {"lt", 0x003C},             /* less-than sign, U+003C ISOnum */
    {"gt", 0x003E},             /* greater-than sign, U+003E ISOnum */

    /*
     * A bunch still in the 128-255 range
     * Replacing them depend really on the charset used.
     */
    {"nbsp", 0x00A0},           /* no-break space = non-breaking space, U+00A0 ISOnum */
    {"iexcl", 0x00A1},          /* inverted exclamation mark, U+00A1 ISOnum */
    {"cent", 0x00A2},           /* cent sign, U+00A2 ISOnum */
    {"pound", 0x00A3},          /* pound sign, U+00A3 ISOnum */
    {"curren", 0x00A4},         /* currency sign, U+00A4 ISOnum */
    {"yen", 0x00A5},            /* yen sign = yuan sign, U+00A5 ISOnum */
    {"brvbar", 0x00A6},         /* broken bar = broken vertical bar, U+00A6 ISOnum */
    {"sect", 0x00A7},           /* section sign, U+00A7 ISOnum */
    {"uml", 0x00A8},            /* diaeresis = spacing diaeresis, U+00A8 ISOdia */
    {"copy", 0x00A9},           /* copyright sign, U+00A9 ISOnum */
    {"ordf", 0x00AA},           /* feminine ordinal indicator, U+00AA ISOnum */
    {"laquo", 0x00AB},          /* left-pointing double angle quotation mark = left pointing guillemet, U+00AB ISOnum */
    {"not", 0x00AC},            /* not sign, U+00AC ISOnum */
    {"shy", 0x00AD},            /* soft hyphen = discretionary hyphen, U+00AD ISOnum */
    {"reg", 0x00AE},            /* registered sign = registered trade mark sign, U+00AE ISOnum */
    {"macr", 0x00AF},           /* macron = spacing macron = overline = APL overbar, U+00AF ISOdia */
    {"deg", 0x00B0},            /* degree sign, U+00B0 ISOnum */
    {"plusmn", 0x00B1},         /* plus-minus sign = plus-or-minus sign, U+00B1 ISOnum */
    {"sup2", 0x00B2},           /* superscript two = superscript digit two = squared, U+00B2 ISOnum */
    {"sup3", 0x00B3},           /* superscript three = superscript digit three = cubed, U+00B3 ISOnum */
    {"acute", 0x00B4},          /* acute accent = spacing acute, U+00B4 ISOdia */
    {"micro", 0x00B5},          /* micro sign, U+00B5 ISOnum */
    {"para", 0x00B6},           /* pilcrow sign = paragraph sign, U+00B6 ISOnum */
    {"middot", 0x00B7},         /* middle dot = Georgian comma Greek middle dot, U+00B7 ISOnum */
    {"cedil", 0x00B8},          /* cedilla = spacing cedilla, U+00B8 ISOdia */
    {"sup1", 0x00B9},           /* superscript one = superscript digit one, U+00B9 ISOnum */
    {"ordm", 0x00BA},           /* masculine ordinal indicator, U+00BA ISOnum */
    {"raquo", 0x00BB},          /* right-pointing double angle quotation mark right pointing guillemet, U+00BB ISOnum */
    {"frac14", 0x00BC},         /* vulgar fraction one quarter = fraction one quarter, U+00BC ISOnum */
    {"frac12", 0x00BD},         /* vulgar fraction one half = fraction one half, U+00BD ISOnum */
    {"frac34", 0x00BE},         /* vulgar fraction three quarters = fraction three quarters, U+00BE ISOnum */
    {"iquest", 0x00BF},         /* inverted question mark = turned question mark, U+00BF ISOnum */
    {"Agrave", 0x00C0},         /* latin capital letter A with grave = latin capital letter A grave, U+00C0 ISOlat1 */
    {"Aacute", 0x00C1},         /* latin capital letter A with acute, U+00C1 ISOlat1 */
    {"Acirc", 0x00C2},          /* latin capital letter A with circumflex, U+00C2 ISOlat1 */
    {"Atilde", 0x00C3},         /* latin capital letter A with tilde, U+00C3 ISOlat1 */
    {"Auml", 0x00C4},           /* latin capital letter A with diaeresis, U+00C4 ISOlat1 */
    {"Aring", 0x00C5},          /* latin capital letter A with ring above = latin capital letter A ring, U+00C5 ISOlat1 */
    {"AElig", 0x00C6},          /* latin capital letter AE = latin capital ligature AE, U+00C6 ISOlat1 */
    {"Ccedil", 0x00C7},         /* latin capital letter C with cedilla, U+00C7 ISOlat1 */
    {"Egrave", 0x00C8},         /* latin capital letter E with grave, U+00C8 ISOlat1 */
    {"Eacute", 0x00C9},         /* latin capital letter E with acute, U+00C9 ISOlat1 */
    {"Ecirc", 0x00CA},          /* latin capital letter E with circumflex, U+00CA ISOlat1 */
    {"Euml", 0x00CB},           /* latin capital letter E with diaeresis, U+00CB ISOlat1 */
    {"Igrave", 0x00CC},         /* latin capital letter I with grave, U+00CC ISOlat1 */
    {"Iacute", 0x00CD},         /* latin capital letter I with acute, U+00CD ISOlat1 */
    {"Icirc", 0x00CE},          /* latin capital letter I with circumflex, U+00CE ISOlat1 */
    {"Iuml", 0x00CF},           /* latin capital letter I with diaeresis, U+00CF ISOlat1 */
    {"ETH", 0x00D0},            /* latin capital letter ETH, U+00D0 ISOlat1 */
    {"Ntilde", 0x00D1},         /* latin capital letter N with tilde, U+00D1 ISOlat1 */
    {"Ograve", 0x00D2},         /* latin capital letter O with grave, U+00D2 ISOlat1 */
    {"Oacute", 0x00D3},         /* latin capital letter O with acute, U+00D3 ISOlat1 */
    {"Ocirc", 0x00D4},          /* latin capital letter O with circumflex, U+00D4 ISOlat1 */
    {"Otilde", 0x00D5},         /* latin capital letter O with tilde, U+00D5 ISOlat1 */
    {"Ouml", 0x00D6},           /* latin capital letter O with diaeresis, U+00D6 ISOlat1 */
    {"times", 0x00D7},          /* multiplication sign, U+00D7 ISOnum */
    {"Oslash", 0x00D8},         /* latin capital letter O with stroke latin capital letter O slash, U+00D8 ISOlat1 */
    {"Ugrave", 0x00D9},         /* latin capital letter U with grave, U+00D9 ISOlat1 */
    {"Uacute", 0x00DA},         /* latin capital letter U with acute, U+00DA ISOlat1 */
    {"Ucirc", 0x00DB},          /* latin capital letter U with circumflex, U+00DB ISOlat1 */
    {"Uuml", 0x00DC},           /* latin capital letter U with diaeresis, U+00DC ISOlat1 */
    {"Yacute", 0x00DD},         /* latin capital letter Y with acute, U+00DD ISOlat1 */
    {"THORN", 0x00DE},          /* latin capital letter THORN, U+00DE ISOlat1 */
    {"szlig", 0x00DF},          /* latin small letter sharp s = ess-zed, U+00DF ISOlat1 */
    {"agrave", 0x00E0},         /* latin small letter a with grave = latin small letter a grave, U+00E0 ISOlat1 */
    {"aacute", 0x00E1},         /* latin small letter a with acute, U+00E1 ISOlat1 */
    {"acirc", 0x00E2},          /* latin small letter a with circumflex, U+00E2 ISOlat1 */
    {"atilde", 0x00E3},         /* latin small letter a with tilde, U+00E3 ISOlat1 */
    {"auml", 0x00E4},           /* latin small letter a with diaeresis, U+00E4 ISOlat1 */
    {"aring", 0x00E5},          /* latin small letter a with ring above = latin small letter a ring, U+00E5 ISOlat1 */
    {"aelig", 0x00E6},          /* latin small letter ae = latin small ligature ae, U+00E6 ISOlat1 */
    {"ccedil", 0x00E7},         /* latin small letter c with cedilla, U+00E7 ISOlat1 */
    {"egrave", 0x00E8},         /* latin small letter e with grave, U+00E8 ISOlat1 */
    {"eacute", 0x00E9},         /* latin small letter e with acute, U+00E9 ISOlat1 */
    {"ecirc", 0x00EA},          /* latin small letter e with circumflex, U+00EA ISOlat1 */
    {"euml", 0x00EB},           /* latin small letter e with diaeresis, U+00EB ISOlat1 */
    {"igrave", 0x00EC},         /* latin small letter i with grave, U+00EC ISOlat1 */
    {"iacute", 0x00ED},         /* latin small letter i with acute, U+00ED ISOlat1 */
    {"icirc", 0x00EE},          /* latin small letter i with circumflex, U+00EE ISOlat1 */
    {"iuml", 0x00EF},           /* latin small letter i with diaeresis, U+00EF ISOlat1 */
    {"eth", 0x00F0},            /* latin small letter eth, U+00F0 ISOlat1 */
    {"ntilde", 0x00F1},         /* latin small letter n with tilde, U+00F1 ISOlat1 */
    {"ograve", 0x00F2},         /* latin small letter o with grave, U+00F2 ISOlat1 */
    {"oacute", 0x00F3},         /* latin small letter o with acute, U+00F3 ISOlat1 */
    {"ocirc", 0x00F4},          /* latin small letter o with circumflex, U+00F4 ISOlat1 */
    {"otilde", 0x00F5},         /* latin small letter o with tilde, U+00F5 ISOlat1 */
    {"ouml", 0x00F6},           /* latin small letter o with diaeresis, U+00F6 ISOlat1 */
    {"divide", 0x00F7},         /* division sign, U+00F7 ISOnum */
    {"oslash", 0x00F8},         /* latin small letter o with stroke, = latin small letter o slash, U+00F8 ISOlat1 */
    {"ugrave", 0x00F9},         /* latin small letter u with grave, U+00F9 ISOlat1 */
    {"uacute", 0x00FA},         /* latin small letter u with acute, U+00FA ISOlat1 */
    {"ucirc", 0x00FB},          /* latin small letter u with circumflex, U+00FB ISOlat1 */
    {"uuml", 0x00FC},           /* latin small letter u with diaeresis, U+00FC ISOlat1 */
    {"yacute", 0x00FD},         /* latin small letter y with acute, U+00FD ISOlat1 */
    {"thorn", 0x00FE},          /* latin small letter thorn with, U+00FE ISOlat1 */
    {"yuml", 0x00FF},           /* latin small letter y with diaeresis, U+00FF ISOlat1 */

    {"OElig", 0x0152},          /* latin capital ligature OE, U+0152 ISOlat2 */
    {"oelig", 0x0153},          /* latin small ligature oe, U+0153 ISOlat2 */
    {"Scaron", 0x0160},         /* latin capital letter S with caron, U+0160 ISOlat2 */
    {"scaron", 0x0161},         /* latin small letter s with caron, U+0161 ISOlat2 */
    {"Yuml", 0x0178},           /* latin capital letter Y with diaeresis, U+0178 ISOlat2 */

    /*
     * Anything below should really be kept as entities references
     */

    /*
       -- Latin Extended-B
     */
    {"fnof", 0x0192},           /* latin small f with hook = function = florin, U+0192 ISOtech */

    {"circ", 0x02C6},           /* modifier letter circumflex accent, U+02C6 ISOpub */
    {"tilde", 0x02DC},          /* small tilde, U+02DC ISOdia */

    /*
       -- Greek symbols
     */
    {"Alpha", 0x0391},          /* greek capital letter alpha, U+0391 */
    {"Beta", 0x0392},           /* greek capital letter beta, U+0392 */
    {"Gamma", 0x0393},          /* greek capital letter gamma, U+0393 ISOgrk3 */
    {"Delta", 0x0394},          /* greek capital letter delta, U+0394 ISOgrk3 */
    {"Epsilon", 0x0395},        /* greek capital letter epsilon, U+0395 */
    {"Zeta", 0x0396},           /* greek capital letter zeta, U+0396 */
    {"Eta", 0x0397},            /* greek capital letter eta, U+0397 */
    {"Theta", 0x0398},          /* greek capital letter theta, U+0398 ISOgrk3 */
    {"Iota", 0x0399},           /* greek capital letter iota, U+0399 */
    {"Kappa", 0x039A},          /* greek capital letter kappa, U+039A */
    {"Lambda", 0x039B},         /* greek capital letter lambda, U+039B ISOgrk3 */
    {"Mu", 0x039C},             /* greek capital letter mu, U+039C */
    {"Nu", 0x039D},             /* greek capital letter nu, U+039D */
    {"Xi", 0x039E},             /* greek capital letter xi, U+039E ISOgrk3 */
    {"Omicron", 0x039F},        /* greek capital letter omicron, U+039F */
    {"Pi", 0x03A0},             /* greek capital letter pi, U+03A0 ISOgrk3 */
    {"Rho", 0x03A1},            /* greek capital letter rho, U+03A1 */
    /* -- there is no Sigmaf, and no U+03A2 character either */
    {"Sigma", 0x03A3},          /* greek capital letter sigma, U+03A3 ISOgrk3 */
    {"Tau", 0x03A4},            /* greek capital letter tau, U+03A4 */
    {"Upsilon", 0x03A5},        /* greek capital letter upsilon, U+03A5 ISOgrk3 */
    {"Phi", 0x03A6},            /* greek capital letter phi, U+03A6 ISOgrk3 */
    {"Chi", 0x03A7},            /* greek capital letter chi, U+03A7 */
    {"Psi", 0x03A8},            /* greek capital letter psi, U+03A8 ISOgrk3 */
    {"Omega", 0x03A9},          /* greek capital letter omega, U+03A9 ISOgrk3 */

    {"alpha", 0x03B1},          /* greek small letter alpha, U+03B1 ISOgrk3 */
    {"beta", 0x03B2},           /* greek small letter beta, U+03B2 ISOgrk3 */
    {"gamma", 0x03B3},          /* greek small letter gamma, U+03B3 ISOgrk3 */
    {"delta", 0x03B4},          /* greek small letter delta, U+03B4 ISOgrk3 */
    {"epsilon", 0x03B5},        /* greek small letter epsilon, U+03B5 ISOgrk3 */
    {"zeta", 0x03B6},           /* greek small letter zeta, U+03B6 ISOgrk3 */
    {"eta", 0x03B7},            /* greek small letter eta, U+03B7 ISOgrk3 */
    {"theta", 0x03B8},          /* greek small letter theta, U+03B8 ISOgrk3 */
    {"iota", 0x03B9},           /* greek small letter iota, U+03B9 ISOgrk3 */
    {"kappa", 0x03BA},          /* greek small letter kappa, U+03BA ISOgrk3 */
    {"lambda", 0x03BB},         /* greek small letter lambda, U+03BB ISOgrk3 */
    {"mu", 0x03BC},             /* greek small letter mu, U+03BC ISOgrk3 */
    {"nu", 0x03BD},             /* greek small letter nu, U+03BD ISOgrk3 */
    {"xi", 0x03BE},             /* greek small letter xi, U+03BE ISOgrk3 */
    {"omicron", 0x03BF},        /* greek small letter omicron, U+03BF NEW */
    {"pi", 0x03C0},             /* greek small letter pi, U+03C0 ISOgrk3 */
    {"rho", 0x03C1},            /* greek small letter rho, U+03C1 ISOgrk3 */
    {"sigmaf", 0x03C2},         /* greek small letter final sigma, U+03C2 ISOgrk3 */
    {"sigma", 0x03C3},          /* greek small letter sigma, U+03C3 ISOgrk3 */
    {"tau", 0x03C4},            /* greek small letter tau, U+03C4 ISOgrk3 */
    {"upsilon", 0x03C5},        /* greek small letter upsilon, U+03C5 ISOgrk3 */
    {"phi", 0x03C6},            /* greek small letter phi, U+03C6 ISOgrk3 */
    {"chi", 0x03C7},            /* greek small letter chi, U+03C7 ISOgrk3 */
    {"psi", 0x03C8},            /* greek small letter psi, U+03C8 ISOgrk3 */
    {"omega", 0x03C9},          /* greek small letter omega, U+03C9 ISOgrk3 */
    {"thetasym", 0x03D1},       /* greek small letter theta symbol, U+03D1 NEW */
    {"upsih", 0x03D2},          /* greek upsilon with hook symbol, U+03D2 NEW */
    {"piv", 0x03D6},            /* greek pi symbol, U+03D6 ISOgrk3 */

    {"ensp", 0x2002},           /* en space, U+2002 ISOpub */
    {"emsp", 0x2003},           /* em space, U+2003 ISOpub */
    {"thinsp", 0x2009},         /* thin space, U+2009 ISOpub */
    {"zwnj", 0x200C},           /* zero width non-joiner, U+200C NEW RFC 2070 */
    {"zwj", 0x200D},            /* zero width joiner, U+200D NEW RFC 2070 */
    {"lrm", 0x200E},            /* left-to-right mark, U+200E NEW RFC 2070 */
    {"rlm", 0x200F},            /* right-to-left mark, U+200F NEW RFC 2070 */
    {"ndash", 0x2013},          /* en dash, U+2013 ISOpub */
    {"mdash", 0x2014},          /* em dash, U+2014 ISOpub */
    {"lsquo", 0x2018},          /* left single quotation mark, U+2018 ISOnum */
    {"rsquo", 0x2019},          /* right single quotation mark, U+2019 ISOnum */
    {"sbquo", 0x201A},          /* single low-9 quotation mark, U+201A NEW */
    {"ldquo", 0x201C},          /* left double quotation mark, U+201C ISOnum */
    {"rdquo", 0x201D},          /* right double quotation mark, U+201D ISOnum */
    {"bdquo", 0x201E},          /* double low-9 quotation mark, U+201E NEW */
    {"dagger", 0x2020},         /* dagger, U+2020 ISOpub */
    {"Dagger", 0x2021},         /* double dagger, U+2021 ISOpub */

    {"bull", 0x2022},           /* bullet = black small circle, U+2022 ISOpub */
    {"hellip", 0x2026},         /* horizontal ellipsis = three dot leader, U+2026 ISOpub */

    {"permil", 0x2030},         /* per mille sign, U+2030 ISOtech */

    {"prime", 0x2032},          /* prime = minutes = feet, U+2032 ISOtech */
    {"Prime", 0x2033},          /* double prime = seconds = inches, U+2033 ISOtech */

    {"lsaquo", 0x2039},         /* single left-pointing angle quotation mark, U+2039 ISO proposed */
    {"rsaquo", 0x203A},         /* single right-pointing angle quotation mark, U+203A ISO proposed */

    {"oline", 0x203E},          /* overline = spacing overscore, U+203E NEW */
    {"frasl", 0x2044},          /* fraction slash, U+2044 NEW */

    {"euro", 0x20AC},           /* euro sign, U+20AC NEW */

    /* -- Letterlike Symbols  */
    {"image", 0x2111},          /* blackletter capital I = imaginary part, U+2111 ISOamso */
    {"weierp", 0x2118},         /* script capital P = power set = Weierstrass p, U+2118 ISOamso */
    {"real", 0x211C},           /* blackletter capital R = real part symbol, U+211C ISOamso */
    {"trade", 0x2122},          /* trade mark sign, U+2122 ISOnum */

    /* -- alef symbol is NOT the same as hebrew letter alef, U+05D0 */
    {"alefsym", 0x2135},        /* alef symbol = first transfinite cardinal, U+2135 NEW */

    /* -- Arrow Symbols  */
    {"larr", 0x2190},           /* leftwards arrow, U+2190 ISOnum */
    {"uarr", 0x2191},           /* upwards arrow, U+2191 ISOnum */
    {"rarr", 0x2192},           /* rightwards arrow, U+2192 ISOnum */
    {"darr", 0x2193},           /* downwards arrow, U+2193 ISOnum */
    {"harr", 0x2194},           /* left right arrow, U+2194 ISOamsa */
    {"crarr", 0x21B5},          /* downwards arrow with corner leftwards = carriage return, U+21B5 NEW */
    {"lArr", 0x21D0},           /* leftwards double arrow, U+21D0 ISOtech */
    {"uArr", 0x21D1},           /* upwards double arrow, U+21D1 ISOamsa */
    {"rArr", 0x21D2},           /* rightwards double arrow, U+21D2 ISOtech */
    {"dArr", 0x21D3},           /* downwards double arrow, U+21D3 ISOamsa */
    {"hArr", 0x21D4},           /* left right double arrow, U+21D4 ISOamsa */

    /* -- Mathematical Operators */
    {"forall", 0x2200},         /* for all, U+2200 ISOtech */
    {"part", 0x2202},           /* partial differential, U+2202 ISOtech */
    {"exist", 0x2203},          /* there exists, U+2203 ISOtech */
    {"empty", 0x2205},          /* empty set = null set = diameter, U+2205 ISOamso */
    {"nabla", 0x2207},          /* nabla = backward difference, U+2207 ISOtech */
    {"isin", 0x2208},           /* element of, U+2208 ISOtech */
    {"notin", 0x2209},          /* not an element of, U+2209 ISOtech */
    {"ni", 0x220B},             /* contains as member, U+220B ISOtech */
    {"prod", 0x220F},           /* n-ary product = product sign, U+220F ISOamsb */
    {"sum", 0x2211},            /* n-ary sumation, U+2211 ISOamsb */
    {"minus", 0x2212},          /* minus sign, U+2212 ISOtech */
    {"lowast", 0x2217},         /* asterisk operator, U+2217 ISOtech */
    {"radic", 0x221A},          /* square root = radical sign, U+221A ISOtech */
    {"prop", 0x221D},           /* proportional to, U+221D ISOtech */
    {"infin", 0x221E},          /* infinity, U+221E ISOtech */
    {"ang", 0x2220},            /* angle, U+2220 ISOamso */
    {"and", 0x2227},            /* logical and = wedge, U+2227 ISOtech */
    {"or", 0x2228},             /* logical or = vee, U+2228 ISOtech */
    {"cap", 0x2229},            /* intersection = cap, U+2229 ISOtech */
    {"cup", 0x222A},            /* union = cup, U+222A ISOtech */
    {"int", 0x222B},            /* integral, U+222B ISOtech */
    {"there4", 0x2234},         /* therefore, U+2234 ISOtech */
    {"sim", 0x223C},            /* tilde operator = varies with = similar to, U+223C ISOtech */
    {"cong", 0x2245},           /* approximately equal to, U+2245 ISOtech */
    {"asymp", 0x2248},          /* almost equal to = asymptotic to, U+2248 ISOamsr */
    {"ne", 0x2260},             /* not equal to, U+2260 ISOtech */
    {"equiv", 0x2261},          /* identical to, U+2261 ISOtech */
    {"le", 0x2264},             /* less-than or equal to, U+2264 ISOtech */
    {"ge", 0x2265},             /* greater-than or equal to, U+2265 ISOtech */
    {"sub", 0x2282},            /* subset of, U+2282 ISOtech */
    {"sup", 0x2283},            /* superset of, U+2283 ISOtech */
    {"nsub", 0x2284},           /* not a subset of, U+2284 ISOamsn */
    {"sube", 0x2286},           /* subset of or equal to, U+2286 ISOtech */
    {"supe", 0x2287},           /* superset of or equal to, U+2287 ISOtech */
    {"oplus", 0x2295},          /* circled plus = direct sum, U+2295 ISOamsb */
    {"otimes", 0x2297},         /* circled times = vector product, U+2297 ISOamsb */
    {"perp", 0x22A5},           /* up tack = orthogonal to = perpendicular, U+22A5 ISOtech */
    {"sdot", 0x22C5},           /* dot operator, U+22C5 ISOamsb */
    {"lceil", 0x2308},          /* left ceiling = apl upstile, U+2308 ISOamsc */
    {"rceil", 0x2309},          /* right ceiling, U+2309 ISOamsc */
    {"lfloor", 0x230A},         /* left floor = apl downstile, U+230A ISOamsc */
    {"rfloor", 0x230B},         /* right floor, U+230B ISOamsc */
    {"lang", 0x2329},           /* left-pointing angle bracket = bra, U+2329 ISOtech */
    {"rang", 0x232A},           /* right-pointing angle bracket = ket, U+232A ISOtech */
    {"loz", 0x25CA},            /* lozenge, U+25CA ISOpub */

    /* -- Miscellaneous Symbols */
    {"spades", 0x2660},         /* black spade suit, U+2660 ISOpub */
    {"clubs", 0x2663},          /* black club suit = shamrock, U+2663 ISOpub */
    {"hearts", 0x2665},         /* black heart suit = valentine, U+2665 ISOpub */
    {"diams", 0x2666},          /* black diamond suit, U+2666 ISOpub */

};


/*
** ----------------------------------------------
** 
**  Module management code starts here
**
** ----------------------------------------------
*/

/* 
  -- init structures for Entities
*/

void    initModule_Entities(SWISH * sw)
{
    struct MOD_Entities *md;


    md = (struct MOD_Entities *) emalloc(sizeof(struct MOD_Entities));

    sw->Entities = md;

    md->convertEntities = CONVERTHTMLENTITIES;

    /* 
       -- init entity hash 
       -- this is module local only
     */

    if ( !ce_hasharray_initialized++ )
    {
        int     i,
                tab_len;
        CEntity *ce_p;
        struct CEHE **hash_pp,
               *tmp_p;

        /* empty positions */
        for (i = 0; i < sizeof(ce_hasharray) / sizeof(ce_hasharray[0]); i++)
            ce_hasharray[i] = (struct CEHE *) NULL;


        /* 
           -- fill entity table into hash
           --  process from end to start of entity_table, because most used
           --  entities are at the beginning (iso)
           --  this is due to "insert in hash sequence" behavior in hashtab (performance!)
           --  The improvement is minimal, because the hash table re-chains during usage.
         */

        tab_len = sizeof(entity_table) / sizeof(entity_table[0]);

        for (i = tab_len - 1; i >= 0; i--)
        {
            ce_p = &entity_table[i];
            hash_pp = &ce_hasharray[(int) *(ce_p->name) & 0x7F];
            /* insert entity-ptr at start of ptr sequence in hash */
            tmp_p = *hash_pp;
            *hash_pp = (struct CEHE *) emalloc(sizeof(struct CEHE));
            (*hash_pp)->ce = ce_p;
            (*hash_pp)->next = tmp_p;
        }

    }                           /* end init hash block */

}



/* 
  -- release structures for Entities
  -- release all wired memory
*/

void    freeModule_Entities(SWISH * sw)
{

    /* free module data structure */

    efree(sw->Entities);
    sw->Entities = NULL;


    /* 
       -- free local entity hash table 
     */
    {
        int     i;
        struct CEHE *hash_p,
               *tmp_p;
        /* free ptr "chains" in array */
        for (i = 0; i < sizeof(ce_hasharray) / sizeof(ce_hasharray[0]); i++)
        {
            hash_p = ce_hasharray[i];
            while (hash_p)
            {
                tmp_p = hash_p->next;
                efree(hash_p);
                hash_p = tmp_p;
            }
            ce_hasharray[i] = (struct CEHE *) NULL;
        }

    }                           /* end free hash block */

}


/*
** ----------------------------------------------
** 
**  Module config code starts here
**
** ----------------------------------------------
*/


/*
 -- Config Directives
 -- Configuration directives for this Module
 -- return: 0/1 = none/config applied
*/

int     configModule_Entities(SWISH * sw, StringList * sl)
{
    struct MOD_Entities *md = sw->Entities;
    char   *w0;
    int     retval;


    w0 = sl->word[0];
    retval = 1;


    if (strcasecmp(w0, "ConvertHTMLEntities") == 0)
    {
        md->convertEntities = getYesNoOrAbort(sl, 1, 1);
    }
    else
    {
        retval = 0;             /* not a Entities directive */
    }


    return retval;
}






/*
** ----------------------------------------------
** 
**  Module code starts here
**
** ----------------------------------------------
*/


/*
  -- convert a string containing HTML/XML entities
  -- conversion is done on the string itsself.
  -- conversion is only done, if config directive is set to "YES"
  -- return ptr to converted string.
*/

unsigned char *sw_ConvHTMLEntities2ISO(SWISH * sw, unsigned char *s)
{
    return (sw->Entities->convertEntities) ? strConvHTMLEntities2ISO(s) : s;
}



/*
  -- convert a string containing HTML/XML entities
  -- conversion is done on the string itsself.
  -- return ptr to converted string.
*/

unsigned char *strConvHTMLEntities2ISO(unsigned char *buf)
{
    unsigned char *s,
           *t;
    unsigned char *d;
    int     code;


    s = d = buf;

    while (*s)
    {

        /* if not entity start, next */
        if (*s != '&')
        {
            *d++ = *s++;
        }
        else
        {
            /* entity found, identify and decode */
            /* ignore zero entities and UNICODE ! */
            code = charEntityDecode(s, &t);
            if (code && (code < 256))
                *d++ = (unsigned char) code;
            s = t;
        }
    }
    *d = '\0';

    return buf;
}



/* 
 -- decode entity string to character code:
 --  &#dec;   &#xhex;   &#Xhex;   &named;
 -- Decoding is hash optimized with dynamic re-chaining for
 -- performance improvement...
 -- return: entity character (decoded)
 --    position "end" (if != NULL) past "entity" or behind ret. char
 --    on illegal entities, just return the char...
*/

int     charEntityDecode(unsigned char *s, unsigned char **end)
{
    unsigned char *s1, *t, *e_end;
    unsigned char s_cmp[MAX_ENTITY_LEN + 1];
    int     len;
    int     code;


    /*
       -- no entity ctrl start char?, err: return char 
     */
    if (*s != '&')
    {
        if (end)
            *end = s + 1;
        return (int) *s;
    }



    /* ok, seems valid entity starting char */
    code = 0;
    e_end = NULL;

    if (*(s + 1) == '#')
    {                           /* numeric entity  "&#" */

        s += 2;                 /* after "&#" */
        switch (*s)
        {
        case 'x':
        case 'X':
            ++s;                /* skip x */
            code = (int) strtoul((char *)s, (char **) &e_end, (int) 16);
            break;
        default:
            code = (int) strtoul((char *)s, (char **) &e_end, (int) 10);
            break;
        }

    }
    else
    {

        /* 
           -- ok, seems to be a named entity, find terminating char 
           -- t = NULL if  not found...
           -- if no char found: return '&' (illegal entity) 
         */

        len = 0;
        t = NULL;
        s1 = s;
        while (len < MAX_ENTITY_LEN)
        {
            s_cmp[len] = *(++s1);
            if (IS_EOE(*s1))
            {
                t = s1;         /* End of named entity */
                break;
            }
            if (!*s1)
                break;          /* maybe this is also checked by is_EOE! */
            len++;
        }
        s_cmp[len] = '\0';

        /*
           -- hash search block
           -- case sensitiv search  (hashvalue = 1 entity name char)
           -- (& 0x7F to prevent hashtable mem coredumps by illegal chars)
           -- improve performance, by rechaining found elements
         */

        if (t)
        {
            struct CEHE *hash_p;
            struct CEHE **hash_pp,
                   *last_p;

            hash_pp = &ce_hasharray[*(s + 1) & 0x7F];
            last_p = NULL;
            hash_p = *hash_pp;
            while (hash_p)
            {
                if (!strcmp( (char *)hash_p->ce->name, (char *)s_cmp))
                {
                    code = hash_p->ce->code;
                    if (last_p)
                    {           /* rechain hash sequence list (last found = first) */
                        last_p->next = hash_p->next; /* take elem out of seq */
                        hash_p->next = *hash_pp; /* old 1. = 2.          */
                        *hash_pp = hash_p; /* found = 1st          */
                    }
                    e_end = t;  /* found -> set end     */
                    break;
                }
                last_p = hash_p;
                hash_p = hash_p->next;
            }

        }
    }                           /* end if */


    if (!e_end)
    {
        code = *s;
        e_end = s + 1;
    }
    else
    {
        if (*e_end == ';')
            e_end++;            /* W3C  EndOfEntity */
    }


    if (end)
        *end = e_end;
    return code;
}


/*
  -- check if a char is the end of a html entity.
  -- behavior can be W3C pedantic or tolerant.
  -- mapped via macro to avoid function calls on strict ==';' behavior
  -- return: cmp value
*/

static int is_EOE(int c)
{
/* be tolerant ! */
    return ((!isprint(c)) || ispunct(c) || isspace(c)) ? 1 : 0;
}
�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/hash.c����������������������������������������������������������������������������0000664�0000771�0001750�00000012315�11166010110�011673� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
** Copyright (C) 1995, 1996, 1997, 1998 Hewlett-Packard Company
** Originally by Kevin Hughes, kev@kevcom.com, 3/11/94
**
**

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 15:51:39 CDT 2005
** added GPL


**---------------------------------------------------------
** Added addStopList to support printing of common words
** G. Hill 4/7/97  ghill@library.berkeley.edu
**
** change sprintf to snprintf to avoid corruption
** SRE 11/17/99
**
** 04/00 - Jose Ruiz
** change hash for bighash in mergeresultlists for better performance
** when big searchs (a* or b* or c*)
**
*/

#include "swish.h"
#include "swstring.h"
#include "hash.h"
#include "mem.h"
#include "search.h"

/* Hashes a string. Common routine
*/

unsigned string_hash(char *s, int hash_size)
{
    unsigned hashval;

    for (hashval = 0; *s != '\0'; s++)
        hashval = (int) ((unsigned char) *s) + 31 * hashval;
    return hashval % hash_size;
}

/* Hashes a string.
*/
unsigned hash(char *s)
{
    return string_hash(s,HASHSIZE);
}

/* Hashes a string for a larger hash table.
*/
unsigned bighash(char *s)
{
    return string_hash(s,BIGHASHSIZE);
}

/* Hashes a int. Common routine
*/
unsigned int_hash(int i, int hash_size)
{
    return i % hash_size;
}

/* Hashes a int.
*/
unsigned numhash(int i)
{
    return int_hash(i, HASHSIZE);
}

/* Hashes a int for a larger hash table.
*/
unsigned bignumhash(int i)
{
    return int_hash(i, BIGHASHSIZE);
}

/* Hashes a string for a larger hash table (for search).
*/
unsigned verybighash(char *s)
{
    return string_hash(s, VERYBIGHASHSIZE);
}


/******************************************************************
* add_word_to_hash_table -  Adds a word to a hash table.
*
*   Call with:
*       address of an array of swline pointers
*
*   Returns:
*       swline that was added
*******************************************************************/

struct swline *add_word_to_hash_table( WORD_HASH_TABLE *table_ptr, char *word, int hash_size)
{
    struct swline **hash_array = table_ptr->hash_array;
    unsigned hashval;
    struct swline *sp;
    int len;

    /* Create the array if it doesn't exist */
    if ( !hash_array )
    {
        int ttl_bytes = sizeof(struct swline *) * (hash_size = (hash_size ? hash_size : HASHSIZE));
       
        table_ptr->mem_zone = (void *) Mem_ZoneCreate("Word Hash Zone", 0, 0); 
        //hash_array = (struct swline  **)emalloc( ttl_bytes );
        hash_array = (struct swline  **) Mem_ZoneAlloc( (MEM_ZONE *)table_ptr->mem_zone, ttl_bytes );
        memset( hash_array, 0, ttl_bytes );
        table_ptr->hash_array = hash_array;
        table_ptr->hash_size = hash_size;
        table_ptr->count = 0;
    }
    else
        if ( (sp = is_word_in_hash_table( *table_ptr, word )) )
            return sp;

    hashval = string_hash(word,hash_size);

    /* Create a new entry */            
    len = strlen(word);
    sp = (struct swline *) Mem_ZoneAlloc((MEM_ZONE *)table_ptr->mem_zone, sizeof(struct swline) + len);

    memcpy(sp->line,word,len + 1);

    /* Add word to head of list */
    
    sp->next = hash_array[hashval];
    hash_array[hashval] = sp;

    table_ptr->count++;

    return sp;
}

/******************************************************************
* is_word_in_hash_table -
*
*   Call with:
*       array of swline pointers
*
*   Returns:
*       true (swline) if word found, NULL if not found
*
*******************************************************************/

struct swline * is_word_in_hash_table( WORD_HASH_TABLE table, char *word)
{
    unsigned hashval;
    struct swline *sp;

    if ( !table.hash_array )
        return 0;

    hashval = string_hash(word, table.hash_size);
    sp = table.hash_array[hashval];

    while (sp != NULL)
    {
        if (!strcmp(sp->line, word))
            return sp;
        sp = sp->next;
    }
    return NULL;
}

/******************************************************************
* free_word_hash_table - 
*
*   Call with:
*       address of an array of swline pointers
*
*   Returns:
*       true if word found
*
*******************************************************************/

void free_word_hash_table( WORD_HASH_TABLE *table_ptr)
{
    struct swline **hash_array = table_ptr->hash_array;

    if ( !hash_array )
        return;

    Mem_ZoneFree((MEM_ZONE **)&table_ptr->mem_zone);
    table_ptr->hash_array = NULL;
    table_ptr->hash_size = 0;
    table_ptr->count = 0;
}

�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/double_metaphone.h����������������������������������������������������������������0000664�0000771�0001750�00000002610�11166010110�014264� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: double_metaphone.h 1736 2005-05-12 15:41:22Z karman $


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL


*/

#ifndef DOUBLE_METAPHONE__H
#define DOUBLE_METAPHONE__H

#ifdef __cplusplus
extern "C" {
#endif



typedef struct
{
    char *str;
    int length;
    int bufsize;
    int free_string_on_destroy;
}
metastring;      


void
DoubleMetaphone(const char *str,
                char **codes);

#ifdef __cplusplus
}
#endif /* __cplusplus */



#endif /* DOUBLE_METAPHONE__H */
������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/getruntime.c����������������������������������������������������������������������0000664�0000771�0001750�00000006214�11166010110�013134� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/* Return time used so far, in microseconds.
   Copyright (C) 1994, 1999 Free Software Foundation, Inc.

This file is part of the libiberty library.
Libiberty is free software; you can redistribute it and/or
modify it under the terms of the GNU Library General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later version.

Libiberty is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
Library General Public License for more details.

You should have received a copy of the GNU Library General Public
License along with libiberty; see the file COPYING.LIB.  If
not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA.  */


/*
Mon May  9 15:57:53 CDT 2005

did not update license info here because
 (a) the original is still ok per GPL
 (b) the original is lifted from another package and it's better to preserve
 (c) this file is only used in swish-e binary, not libswish-e
 
*/


#include "acconfig.h"

/* For testing */
// #undef HAVE_GETRUSAGE
// #undef HAVE_SYS_RESOURCE_H
// #undef HAVE_TIMES

/* There are several ways to get elapsed execution time; unfortunately no
   single way is available for all host systems, nor are there reliable
   ways to find out which way is correct for a given host. */

#include "getruntime.h"
#include <time.h>

#if defined (HAVE_GETRUSAGE) && defined (HAVE_SYS_RESOURCE_H)
#include <sys/time.h>
#include <sys/resource.h>
#endif

#ifdef HAVE_TIMES
#ifdef HAVE_SYS_PARAM_H
#include <sys/param.h>
#endif
#include <sys/times.h>
#endif

#ifdef HAVE_UNISTD_H
#include <unistd.h>
#endif

/* This is a fallback; if wrong, it will likely make obviously wrong
   results. */

#ifndef CLOCKS_PER_SEC
#define CLOCKS_PER_SEC 1
#endif

#ifdef _SC_CLK_TCK
#define GNU_HZ  sysconf(_SC_CLK_TCK)
#else
#ifdef HZ
#define GNU_HZ  HZ
#else
#ifdef CLOCKS_PER_SEC
#define GNU_HZ  CLOCKS_PER_SEC
#endif
#endif
#endif

cpu_seconds 
get_cpu_secs ()
{
#if defined (HAVE_GETRUSAGE) && defined (HAVE_SYS_RESOURCE_H)
  struct rusage rusage;
  cpu_seconds secs;

  getrusage (0, &rusage);
  secs = (cpu_seconds)( rusage.ru_utime.tv_sec + rusage.ru_stime.tv_sec );

  if (  rusage.ru_utime.tv_usec > 500000 )
     secs++;
  if (  rusage.ru_stime.tv_usec > 500000 )
     secs++;

  return secs;


#else /* ! HAVE_GETRUSAGE */
#ifdef HAVE_TIMES

  /* This returns number of clock "ticks" since: */
  /* In linux since boot, in BSD since 1/1/1970 */
  /* Again, these are clock_t, which may overflow, but under linux it's 1/100 second so about 6000 hours */

  struct tms tms;

  times (&tms);

  return  (cpu_seconds)( (tms.tms_utime + tms.tms_stime) / GNU_HZ);


#else /* ! HAVE_TIMES */
  /* Fall back on clock and hope it's correctly implemented. */
  /* clock() returns clock_t, which seems to be a long.  On Linux CLOCKS_PER_SEC is 10^6 */
  /* so expect an overflow at about 35 minutes. */

  clock_t t = clock();
  if ( t < 0 )
      t = 0;

  return (cpu_seconds) (t / CLOCKS_PER_SEC );

#endif  /* HAVE_TIMES */
#endif  /* HAVE_GETRUSAGE */
}



������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/config.h��������������������������������������������������������������������������0000664�0000771�0001750�00000034745�11166010110�012235� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
** Copyright (C) 1995, 1996, 1997, 1998 Hewlett-Packard Company
** Originally by Kevin Hughes, kev@kevcom.com, 3/11/94
**

$Id: config.h 1945 2007-10-22 14:54:07Z karpet $

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL


**--------------------------------------------------------------------------
** Config file edited by Roy Tennant 2/20/96
** Config file edited by Giulia Hill 2/27/97 to increase lenght of
**        words that are indexed
** Added IGNORELASTCHAR
**        G. Hill 3/12/97 ghill@library.berkeley.edu
**
** Added OKNOMETA to allow no failing in case the META name is
** not listed in the config.h
**        G. Hill 4/15/97 ghill@library.berkeley.edu
**
** Added IGNOREFIRSTCHAR
**        G.Hill 10/16/97 ghill@library.berkeley.edu
**-----------------------------------------------------------------------
** The following are user-definable options that you can change
** to fine-tune SWISH's default options.
**
** 2001-03-13 rasc   moved search boolean words from swish.h
**
** 2001-05-23 wsm    added ranking weights
**
*/


#ifdef __VMS
#define PROPFILE_EXTENSION "_prop"
#define WORDDATA_EXTENSION "_wdata"
#define PRESORTED_EXTENSION "_psort"
#define BTREE_EXTENSION "_btree"
#define ARRAY_EXTENSION "_array"
#define HASHFILE_EXTENSION "_file"
#else
#define PROPFILE_EXTENSION ".prop"
#define WORDDATA_EXTENSION ".wdata"
#define PRESORTED_EXTENSION ".psort"
#define BTREE_EXTENSION ".btree"
#define ARRAY_EXTENSION ".array"
#define HASHFILE_EXTENSION ".file"
#endif

/* MIN_PROP_COMPRESS_SIZE sets the limit for which properties are compressed 
 * must be compiled with zlib.
 *
 * NEAR WORD feature might benefit from setting these next 2 higher
 */
#define MIN_PROP_COMPRESS_SIZE 100
/* Same for worddata */
#define MIN_WORDDATA_COMPRESS_SIZE 100

/* This is the character used to replace UTF-8 characters that cannot be
 * converted to 8859-1 Latin-1 character
 */
#define ENCODE_ERROR_CHAR ' '

/* Defines the file extension to use on the property file.
*/

#define MAX_SORT_STRING_LEN 100

/* MAX_SORT_STRING_LEN defines the max string length to use
*  for sorting properties.  Should be long enough to sort ALL
*  file paths or URLs.  Useful if using StoreDescription to store
*  a large amount of text.
*/

#define USE_DOCPATH_AS_TITLE 1

/* If USE_DOCPATH_AS_TITLE is defined then documents that do not have
*  a title defined (xml and txt, and HTML documents without a title)
*  will display the document path as the title in results.
*  Documents without a title will sort as a blank title, and not
*  by the document path regardless of this setting.  This is a change
*  from versions previous to 2.2.
*/

#ifdef __VMS
#define USE_TEMPFILE_EXTENSION "_temp"
#else
#define USE_TEMPFILE_EXTENSION ".temp"
#endif

/* If USE_TMPFILE_EXTENSION is defined then swish will append the supplied
*  extension onto the index files during indexing, and when indexing is
*  complete will remove the extension by renaming the files.
*  This has two important uses when an index file already exists (and is in use):
*    1) the old index can be used while indexing is running
*    2) a failure during indexing will not destroy the existing index
*
*   Note: This is used instead of a normal temporary file because possible limitation
*   in renaming across file systems.  Therefore, the temporary index files are
*   stored in the same directory as the final index files.
*/

#define TEMP_FILE_PREFIX "swtmp"

/* TEMP_FILE_PREFIX is prepended to all temporary files.  Makes them
*  easier to find.
*/


#define ALLOW_HTTP_INDEXING_DATA_SOURCE		1
#define ALLOW_FILESYSTEM_INDEXING_DATA_SOURCE	1
#define ALLOW_EXTERNAL_PROGRAM_DATA_SOURCE	1

/* These symbols allow compile-time elimination of indexing
** data sources. Any Data Source that is allowed by these
** symbols can be selected for indexing from the command line.
** Comment out any options you do not want to support, but
** be sure to leave at least one option.
*/

#define DEFAULT_HTTP_DELAY 5

/* DEFAULT_HTTP_DELAY is the default delay when using swishspider -S http */


#define DATE_FORMAT_STRING "%Y-%m-%d %H:%M:%S %Z"
/* default format string for dates */


#define INDEXPERMS 0644

/* After SWISH generates an index file, it changes the permissions
** of the file to this mode. Change to the mode you like
** (note that it must be an octal number). If you don't want
** permissions to be changed for you, comment out this line.
*/

#define NO_PLIMIT 101

#define PLIMIT NO_PLIMIT
#define FLIMIT 10000

/* SWISH uses these parameters to automatically mark words as
** being too common while indexing. For instance, if I defined PLIMIT
** as 80 and FLIMIT as 256, SWISH would define a common word as
** a word that occurs in over 80% of all indexed files and over
** 256 files. Making these numbers lower will most likely make your
** index files smaller. Making PLIMIT and FLIMIT small will also
** ensure that searching consumes only so much CPU resources.
*/

#define VERBOSE 1

/* You can define VERBOSE to be a number from 0 to 4. 0 is totally
** silent operation.  The default before swish 2.2 was 3
*/

#define _NEAR_WORD  "near"
#define _AND_WORD   "and"
#define _OR_WORD    "or"
#define _NOT_WORD   "not"

/* 
 ** these are the default boolean operator words used by swish search
*/

#define DEFAULT_RULE AND_RULE

/* If a list of search words is specified without booleans,
** SWISH will assume they are connected by a default rule.
** This can be AND_RULE or OR_RULE.
*/

#define TITLETOPLINES 12

/* This is how many lines deep SWISH will look into an HTML file to
** attempt to find a <TITLE> tag.  This has no effect when using the libxml2 parser.
*/


#define MINWORDLIMIT 1

/* This is the minimum length of a word. Anything shorter will not
** be indexed.
** Do not change it here. Use MinWordLimit in config file
*/

#define MAXWORDLIMIT 40

/* This is the maximum length of a word. Anything longer will not
** be indexed.
** Do not change it here. Use MaxWordLimit in config file
*/

#define CONVERTHTMLENTITIES 1

/* If defined as 1, all entities in indexed
** words will be converted to an ASCII equivalent. For instance,
** with this feature you can index the word "resumé" or
** "resumé" and it will be indexed as the word "resume".
** 2001-01 Do not change it here. Use ConvertHTMLEtities Yes/No in 
** config file
*/

#define IGNOREALLV 0
#define IGNOREALLC 0
#define IGNOREALLN 0

/* If IGNOREALLV is 1, words containing all vowels won't be indexed.
** If IGNOREALLC is 1, words containing all consonants won't be indexed.
** If IGNOREALLN is 1, words containing all digits won't be indexed.
** Define as 0 to allow words with consistent characters.
** Vowels are defined as "aeiou", digits are "0123456789".
*/

#define IGNOREROWV 60
#define IGNOREROWC 60
#define IGNOREROWN 60

/* IGNOREROWV is the maximum number of consecutive vowels a word can have.
** IGNOREROWC is the maximum number of consecutive consonants a word can have.
** IGNOREROWN is the maximum number of consecutive digits a word can have.
** Vowels are defined as "aeiou", digits are "0123456789".
*/

#define IGNORESAME 100

/* IGNORESAME is the maximum times a character can repeat in a word.
*/
/* Dec 6, 2001 - Grabbed "letters" from /usr/local/share/aspell/iso8859-1.dat (http://aspell.sf.net) - moseley */
#define WORDCHARS "0123456789abcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ"

/*
#define WORDCHARS "abcdefghijklmnopqrstuvwxyzÁÂÃÈýÊËÌÐÝÞÍðÎÏÒÓÔÕØÙÛîèãõšœ€ßƒŠŒŽøŸ£ÜžíÀ0123456789"
*/

/* WORDCHARS is a string of characters which SWISH permits to
** be in words.  Words are defined by these characters.
**
** Also note that if you specify the backslash character (\) or
** double quote (") you need to type a backslash before them to
** make the compiler understand them.
*/

#define BEGINCHARS "0123456789abcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ"

/* Of the characters that you decide can go into words, this is
** a list of characters that words can begin with. It should be
** a subset of (or equal to) WORDCHARS.
*/

#define ENDCHARS "0123456789abcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ"

/* This is the same as BEGINCHARS, except you're testing for
** valid characters at the ends of words.
*/

#define IGNORELASTCHAR ""

/* Array that contains the char that, if considered valid in the middle of 
** a word need to be disreguarded when at the end. It is important to also
** set the given char's in the ENDCHARS array, otherwise the word will not
** be indexed because considered invalid.
** If none just leave the empty list "". Do not erase the line.
*/

#define IGNOREFIRSTCHAR ""
 
/* Array that contains the char that, if considered valid in the middle of 
** a word need to be disreguarded when at the beginning. It is important to also
** set the given char's in the BEGINCHARS array, otherwise the word will not
** be indexed because considered invalid.
** If none just leave the empty list "". Do not erase the line.
*/

#define IGNORE_STOPWORDS_IN_QUERY 1
 
/* Added JM 1/10/98.  Setting this to 0 (default) causes a stopword in
** an AND_RULE search to create an empty result.  Setting it to 1 simply
** ignores the stopwords and does a search on the remaining words.
*/

#define INDEXTAGS 0

/* Normally, all data in tags in HTML files (except for words in
** comments or meta tags) is ignored. If you want to index HTML files with the
** text within tags and all, define this to be 1 and not 0.
** NOTE: if you set it to 1 you will not be able to do context nor
** metaNames searches, as tags are just plain text with no specific
** meaning.
*/

// #define BLANK_PROP_VALUE " *BLANK*"

/* This effects how blank properties are stored 
** Normally, blank properties are treated as if they were not even contained int
** the document.  That is:
**           <meta name="author" content="">
** is ignored, and no "author" property is stored for that docment.
** If BLANK_PROP_VALUE is set, then blank properties will be stored
** but using the string provided as the property value.
** If you use a leading space, then these properties will sort
** before other properties (since leading whitespace is removed from
** properties), and after documents that do not include the property
*/

#define RANK_TITLE		7
#define RANK_HEADER		5
#define RANK_META		3
#define RANK_COMMENTS	1
#define RANK_EMPHASIZED 0

/* This symbols affect the weights applied during ranking. Note that they are added
** together and added to a base rank of 1.0 -- thus defining a rank with a value of
** 2.0 really means it is ranked (1.0 + 2.0) times greater than normal. 
** A value of 0.0 applies no additional ranking boost. Note that RANK_COMMENTS only
** applies if you are indexing comment. Be sure you understand how these interact
** in getrank; don't just go changing these values!
*/

#define SWAP_LOC_DEFAULT 0

/* 2001/08 jmruiz -- Default chunk size - Index will work with blocks of files. This number specifies when to coalesce locations to save memory */
#define INDEX_DEFAULT_CHUNK_SIZE 10000

/* 2001/08 jmruiz -- Default optimal zone size for temporal storage of locations */
/* 1<<23  is 8 MB */
#define INDEX_DEFAULT_OPTIMAL_CHUNK_ZONE_SIZE_FOR_LOCATIONS 1<<23

/* 2002/06 Number of swap loc files (-e) */
#define MAX_LOC_SWAP_FILES 377

/* 2001/08 jmruiz -- To avoid emalloc/erealloc in some routines some stack arrays have been added. This is their default size */
#define MAX_STACK_POSITIONS 1024

/* 2001/08 jmruiz -- Do not change this (it must be a unsigned number) */
/* This is the maximum size of a block of coalesced locations */
#define COALESCE_BUFFER_MAX_SIZE 1<<18  /* (256 KB) */

/* 2003/06 jmruiz -- Snowball's Stemmers activation */
#define SNOWBALL 1

/* 2003/08 jmruiz -- Use cache for stemming */
#define STEMCACHE 1

/* 2001/08 jmruiz -- File System sort flag - 0 means that filenames
** will not be indexed - 1 means that filenames will be indexed */
#define SORT_FILENAMES 0

/* 2001/10 jmruiz -- Added BTREE schema to store words */

//#define USE_BTREE  /* use --enable-incremental at configure time */

/* If USE_BTREE then enable the ARRAY code for the pre-sorted indexes */

#define sw_fopen fopen
#define sw_fclose fclose
#define sw_fwrite fwrite
#define sw_fread fread
#define sw_fputc fputc
#define sw_fgetc fgetc

/* 64 bit LFS support */
#ifdef _LARGEFILE_SOURCE
#define sw_off_t off_t
#define sw_fseek fseeko
#define sw_ftell ftello
#else
#define sw_off_t long
#define sw_fseek fseek
#define sw_ftell ftell
#endif

/* 09/00 Jose Ruiz. When set to 1 part of the info is swapped to disk
** to save memory in the index proccess
** Do not change it. You can activate this option through the command
** line (option -e)
*/

/* Set this to 1 if you are compiling under Win32
   define _WIN32 1
 */

/* --- BEGIN PORTING-RELATED SYMBOLS --- */



#ifdef _WIN32
#define NO_SYMBOLIC_FILE_LINKS          /* Win32 has no symbolic links */
#endif

#ifdef __VMS
#define NO_SYMBOLIC_FILE_LINKS          /* VMS has no symbolic links */
#endif

			

/* Default Delimiter of phrase search */
#define PHRASE_DELIMITER_CHAR '"'


/*
 * Binary files must be open with the "b" option under Win32, so all
 * fopen() calls to index files have to go through these routines to
 * keep the code portable.
 * Note: text files should be opened normally, without the "b" option,
 * otherwise end-of-line processing is not done correctly (on Win32).
 */
#define F_READ_BINARY           "rb"
#define F_WRITE_BINARY          "wb"
#define F_READWRITE_BINARY      "rb+"

#define F_READ_TEXT             "r"
#define F_WRITE_TEXT            "w"
#define F_READWRITE_TEXT        "r+"



/* #define NEXTSTEP */

/* You may need to define this if compiling on a NeXTstep machine.
*/

/* --- END PORTING-RELATED SYMBOLS --- */

���������������������������swish-e-2.4.7/src/parse_conffile.h������������������������������������������������������������������0000664�0000771�0001750�00000002635�11166010110�013740� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: parse_conffile.h 1736 2005-05-12 15:41:22Z karman $
**
**


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL



*/

void grabCmdOptions(StringList *sl, int start, struct swline **listOfWords);
void getdefaults(SWISH *sw, char *conffile, int *hasdir, int *hasindex, int hasverbose);
int getYesNoOrAbort (StringList *sl, int n, int islast);
int	strtoDocType( char * s );

void free_Extracted_Path( SWISH *sw );
void free_regex_list( regex_list **reg_list );
void freeSwishConfigOptions( SWISH *sw );




���������������������������������������������������������������������������������������������������swish-e-2.4.7/src/db_native.c�����������������������������������������������������������������������0000664�0000771�0001750�00000224716�11166010110�012715� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*

$Id: db_native.c 1945 2007-10-22 14:54:07Z karpet $

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 15:51:39 CDT 2005
** added GPL


**
** 2001-05-07 jmruiz init coding
**
*/

#include <time.h>
#include "swish.h"
#include "mem.h"
#include "file.h"
#include "error.h"
#include "swstring.h"
#include "compress.h"
#include "hash.h"
#include "db.h"
#include "swish_qsort.h"
#include "ramdisk.h"
#include "db_native.h"

#ifdef USE_BTREE
#define WRITE_WORDS_RAMDISK 0
#else
#define WRITE_WORDS_RAMDISK 1
#endif

/* MAX_PATH used by Herman's NEAR feature but it seems to be a Windoze thing 
 * so karman just made this value up so it will compile on *nix
 */
#if !defined(_WIN32)
#define MAX_PATH    255
#endif

// #define DEBUG_PROP 1

/*
  -- init structures for this module
*/

void    initModule_DBNative(SWISH * sw)
{
    struct MOD_DB *Db;

    Db = (struct MOD_DB *) emalloc(sizeof(struct MOD_DB));

    Db->DB_name = (char *) estrdup("native");

    Db->DB_Create = DB_Create_Native;
    Db->DB_Open = DB_Open_Native;
    Db->DB_Close = DB_Close_Native;
    Db->DB_Remove = DB_Remove_Native;

    Db->DB_InitWriteHeader = DB_InitWriteHeader_Native;
    Db->DB_WriteHeaderData = DB_WriteHeaderData_Native;
    Db->DB_EndWriteHeader = DB_EndWriteHeader_Native;

    Db->DB_InitReadHeader = DB_InitReadHeader_Native;
    Db->DB_ReadHeaderData = DB_ReadHeaderData_Native;
    Db->DB_EndReadHeader = DB_EndReadHeader_Native;

    Db->DB_InitWriteWords = DB_InitWriteWords_Native;
    Db->DB_GetWordID = DB_GetWordID_Native;
    Db->DB_WriteWord = DB_WriteWord_Native;

#ifndef USE_BTREE
    Db->DB_WriteWordHash = DB_WriteWordHash_Native;
#else
    Db->DB_UpdateWordID = DB_UpdateWordID_Native;
    Db->DB_DeleteWordData = DB_DeleteWordData_Native;
#endif

    Db->DB_WriteWordData = DB_WriteWordData_Native;
    Db->DB_EndWriteWords = DB_EndWriteWords_Native;

    Db->DB_InitReadWords = DB_InitReadWords_Native;
    Db->DB_ReadWordHash = DB_ReadWordHash_Native;
    Db->DB_ReadFirstWordInvertedIndex = DB_ReadFirstWordInvertedIndex_Native;
    Db->DB_ReadNextWordInvertedIndex = DB_ReadNextWordInvertedIndex_Native;
    Db->DB_ReadWordData = DB_ReadWordData_Native;
    Db->DB_EndReadWords = DB_EndReadWords_Native;

    Db->DB_WriteFileNum = DB_WriteFileNum_Native;
    Db->DB_ReadFileNum = DB_ReadFileNum_Native;
    Db->DB_CheckFileNum = DB_CheckFileNum_Native;
    Db->DB_RemoveFileNum = DB_RemoveFileNum_Native;

    Db->DB_InitWriteSortedIndex = DB_InitWriteSortedIndex_Native;
    Db->DB_WriteSortedIndex = DB_WriteSortedIndex_Native;
    Db->DB_EndWriteSortedIndex = DB_EndWriteSortedIndex_Native;

    Db->DB_InitReadSortedIndex = DB_InitReadSortedIndex_Native;
    Db->DB_ReadSortedIndex = DB_ReadSortedIndex_Native;
    Db->DB_ReadSortedData = DB_ReadSortedData_Native;
    Db->DB_EndReadSortedIndex = DB_EndReadSortedIndex_Native;

    Db->DB_InitWriteProperties = DB_InitWriteProperties_Native;
    Db->DB_WriteProperty = DB_WriteProperty_Native;
    Db->DB_WritePropPositions = DB_WritePropPositions_Native;
    Db->DB_ReadProperty = DB_ReadProperty_Native;
    Db->DB_ReadPropPositions = DB_ReadPropPositions_Native;
    Db->DB_Reopen_PropertiesForRead = DB_Reopen_PropertiesForRead_Native;

#ifdef USE_BTREE
    Db->DB_WriteTotalWordsPerFile = DB_WriteTotalWordsPerFile_Native;
    Db->DB_ReadTotalWordsPerFile = DB_ReadTotalWordsPerFile_Native;
#endif

    sw->Db = Db;

    return;
}


/*
  -- release all wired memory for this module
*/

void    freeModule_DBNative(SWISH * sw)
{
    efree(sw->Db->DB_name);
    efree(sw->Db);
    sw->Db = NULL;
    return;
}



/* ---------------------------------------------- */





/* Does an index file have a readable format?
*/

static void DB_CheckHeader(struct Handle_DBNative *DB)
{
    long    swish_magic;

    sw_fseek(DB->fp, (sw_off_t)0, SEEK_SET);
    swish_magic = readlong(DB->fp, sw_fread);

    if (swish_magic != SWISH_MAGIC)
    {
        set_progerr(INDEX_FILE_ERROR, DB->sw, "File \"%s\" has an unknown format.", DB->cur_index_file);
        return;
    }



    {
#ifdef USE_BTREE
        long btree, worddata, hashfile, array, presorted;
#endif
        long prop;

        DB->unique_ID = readlong(DB->fp, sw_fread);
        prop = readlong(DB->prop, sw_fread);

        if (DB->unique_ID != prop)
        {
            set_progerr(INDEX_FILE_ERROR, DB->sw, "Index file '%s' and property file '%s' are not related.", DB->cur_index_file, DB->cur_prop_file);
            return;
        }

#ifdef USE_BTREE
        btree = readlong(DB->fp_btree, sw_fread);
        if (DB->unique_ID != btree)
        {
            set_progerr(INDEX_FILE_ERROR, DB->sw, "Index file '%s' and btree file '%s' are not related.", DB->cur_index_file, DB->cur_btree_file);
            return;
        }

        worddata = readlong(DB->fp_worddata, sw_fread);
        if (DB->unique_ID != worddata)
        {
            set_progerr(INDEX_FILE_ERROR, DB->sw, "Index file '%s' and worddata file '%s' are not related.", DB->cur_index_file, DB->cur_worddata_file);
            return;
        }

        hashfile = readlong(DB->fp_hashfile, sw_fread);
        if (DB->unique_ID != hashfile)
        {
            set_progerr(INDEX_FILE_ERROR, DB->sw, "Index file '%s' and hashfile file '%s' are not related.", DB->cur_index_file, DB->cur_hashfile_file);
            return;
        }

        array = readlong(DB->fp_array, sw_fread);
        if (DB->unique_ID != array)
        {
            set_progerr(INDEX_FILE_ERROR, DB->sw, "Index file '%s' and array file '%s' are not related.", DB->cur_index_file, DB->cur_array_file);
            return;
        }

        presorted = readlong(DB->fp_presorted, sw_fread);

        if (DB->unique_ID != presorted)
        {
            set_progerr(INDEX_FILE_ERROR, DB->sw, "Index file '%s' and presorted index file '%s' are not related.", DB->cur_index_file, DB->cur_presorted_file);
            return;
        }
#endif
    }

}

static struct Handle_DBNative *newNativeDBHandle(SWISH *sw, char *dbname)
{
    struct Handle_DBNative *DB;

    /* Allocate structure */
    DB = (struct Handle_DBNative *) emalloc(sizeof(struct Handle_DBNative));
    memset( DB, 0, sizeof( struct Handle_DBNative ));

    DB->sw = sw;  /* for error messages */

    if (WRITE_WORDS_RAMDISK)
    {
        DB->w_tell = ramdisk_tell;
        DB->w_write = ramdisk_write;
        DB->w_seek = ramdisk_seek;
        DB->w_read = ramdisk_read;
        DB->w_close = ramdisk_close;
        DB->w_putc = ramdisk_putc;
        DB->w_getc = ramdisk_getc;
    }
    else
    {
        DB->w_tell = sw_ftell;
        DB->w_write = sw_fwrite;
        DB->w_seek = sw_fseek;
        DB->w_read = sw_fread;
        DB->w_close = sw_fclose;
        DB->w_putc = sw_fputc;
        DB->w_getc = sw_fgetc;
    }

    DB->dbname = estrdup(dbname);

    return DB;
}


/* Open files */


static FILE   *openIndexFILEForRead(char *filename)
{
    return sw_fopen(filename, F_READ_BINARY);
}

static FILE   *openIndexFILEForReadAndWrite(char *filename)
{
    return sw_fopen(filename, F_READWRITE_BINARY);
}


static FILE   *openIndexFILEForWrite(char *filename)
{
    return sw_fopen(filename, F_WRITE_BINARY);
}

static void    CreateEmptyFile(char *filename)
{
    FILE   *fp;

    if (!(fp = openIndexFILEForWrite(filename)))
    {
        progerrno("Couldn't write the file \"%s\": ", filename);
    }
    sw_fclose(fp);
}

static int  is_directory(char *path)
{
    struct stat stbuf;

    if (stat(path, &stbuf))
        return 0;
    return ((stbuf.st_mode & S_IFMT) == S_IFDIR) ? 1 : 0;
}


/**********************/



void   *DB_Create_Native(SWISH *sw, char *dbname)
{
    int     i;
    long    swish_magic;
    char   *filename;
#ifdef USE_BTREE
    FILE   *fp_tmp;
#endif
    struct Handle_DBNative *DB;

    if ( is_directory( dbname ) )
        progerr( "Index file '%s' is a directory", dbname );


    swish_magic = SWISH_MAGIC;
   /* Allocate structure */
    DB = (struct Handle_DBNative *) newNativeDBHandle(sw, dbname);
    DB->mode = DB_CREATE;
    DB->unique_ID = (long) time(NULL); /* Ok, so if more than one index is created the second... */

#ifdef USE_TEMPFILE_EXTENSION
    filename = emalloc(strlen(dbname) + strlen(USE_TEMPFILE_EXTENSION) + strlen(PROPFILE_EXTENSION) + strlen(BTREE_EXTENSION) + strlen(WORDDATA_EXTENSION) + strlen(ARRAY_EXTENSION) + strlen(PRESORTED_EXTENSION) + strlen(HASHFILE_EXTENSION) + 1);
    strcpy(filename, dbname);
    strcat(filename, USE_TEMPFILE_EXTENSION);
    DB->tmp_index = 1;
#else
    filename = emalloc(strlen(dbname) + strlen(PROPFILE_EXTENSION) + +strlen(BTREE_EXTENSION) + strlen(WORDDATA_EXTENSION) + strlen(ARRAY_EXTENSION) + strlen(PRESORTED_EXTENSION) + strlen(HASHFILE_EXTENSION) + 1);
    strcpy(filename, dbname);
#endif


    /* Create index File */

    CreateEmptyFile(filename);
    if (!(DB->fp = openIndexFILEForReadAndWrite(filename)))
        progerrno("Couldn't create the index file \"%s\": ", filename);

    DB->cur_index_file = estrdup(filename);
    printlong(DB->fp, swish_magic, sw_fwrite);
    printlong(DB->fp, DB->unique_ID, sw_fwrite);


    /* Create property File */
    strcpy(filename, dbname);
    strcat(filename, PROPFILE_EXTENSION);

#ifdef USE_TEMPFILE_EXTENSION
    strcat(filename, USE_TEMPFILE_EXTENSION);
    DB->tmp_prop = 1;
#endif

    CreateEmptyFile(filename);
    if (!(DB->prop = openIndexFILEForWrite(filename)))
        progerrno("Couldn't create the property file \"%s\": ", filename);

    DB->cur_prop_file = estrdup(filename);
    printlong(DB->prop, DB->unique_ID, sw_fwrite);


#ifdef USE_BTREE
    /* Create Btree File */
    strcpy(filename, dbname);
    strcat(filename, BTREE_EXTENSION);
#ifdef USE_TEMPFILE_EXTENSION
    strcat(filename, USE_TEMPFILE_EXTENSION);
    DB->tmp_btree = 1;
#endif
    CreateEmptyFile(filename);
    if (!(fp_tmp = openIndexFILEForReadAndWrite(filename)))
        progerrno("Couldn't create the btree file \"%s\": ", filename);
    DB->cur_btree_file = estrdup(filename);
    printlong(fp_tmp, DB->unique_ID, sw_fwrite);
    DB->fp_btree = fp_tmp;
    DB->bt=BTREE_Create(DB->fp_btree,4096);


    /* Create WordData File */
    strcpy(filename, dbname);
    strcat(filename, WORDDATA_EXTENSION);
#ifdef USE_TEMPFILE_EXTENSION
    strcat(filename, USE_TEMPFILE_EXTENSION);
    DB->tmp_worddata = 1;
#endif
    CreateEmptyFile(filename);
    if (!(fp_tmp = openIndexFILEForReadAndWrite(filename)))
        progerrno("Couldn't create the worddata file \"%s\": ", filename);
    printlong(fp_tmp, DB->unique_ID, sw_fwrite);
    DB->fp_worddata = fp_tmp;
    DB->cur_worddata_file = estrdup(filename);
    DB->worddata=WORDDATA_Open(DB->fp_worddata);

    /* Create Array File */
    strcpy(filename, dbname);
    strcat(filename, ARRAY_EXTENSION);
#ifdef USE_TEMPFILE_EXTENSION
    strcat(filename, USE_TEMPFILE_EXTENSION);
    DB->tmp_array = 1;
#endif
    CreateEmptyFile(filename);
    if (!(fp_tmp = openIndexFILEForReadAndWrite(filename)))
        progerrno("Couldn't create the array file \"%s\": ", filename);
    printlong(fp_tmp, DB->unique_ID, sw_fwrite);
    DB->cur_array_file = estrdup(filename);
    DB->fp_array = fp_tmp;
    DB->totwords_array = ARRAY_Create(DB->fp_array);
    DB->props_array = ARRAY_Create(DB->fp_array);

    /* Create PreSorted Index File */
    strcpy(filename, dbname);
    strcat(filename, PRESORTED_EXTENSION);

#ifdef USE_TEMPFILE_EXTENSION
    strcat(filename, USE_TEMPFILE_EXTENSION);
    DB->tmp_presorted = 1;
#endif

    CreateEmptyFile(filename);
    if (!(DB->fp_presorted = openIndexFILEForWrite(filename)))
        progerrno("Couldn't create the presorted index file \"%s\": ", filename);

    DB->cur_presorted_file = estrdup(filename);
    printlong(DB->fp_presorted, DB->unique_ID, sw_fwrite);

    /* Create HashFileIndex File */
    strcpy(filename, dbname);
    strcat(filename, HASHFILE_EXTENSION);

#ifdef USE_TEMPFILE_EXTENSION
    strcat(filename, USE_TEMPFILE_EXTENSION);
    DB->tmp_hashfile = 1;
#endif

    CreateEmptyFile(filename);
    if (!(DB->fp_hashfile = openIndexFILEForWrite(filename)))
        progerrno("Couldn't create the hash-file index file \"%s\": ", filename)
;

    DB->cur_hashfile_file = estrdup(filename);
    printlong(DB->fp_hashfile, DB->unique_ID, sw_fwrite);
    DB->hashfile=FHASH_Create(DB->fp_hashfile);


#endif

    efree(filename);


    for (i = 0; i < MAXCHARS; i++)
        DB->offsets[i] = (sw_off_t)0;

#ifndef USE_BTREE
    for (i = 0; i < VERYBIGHASHSIZE; i++)
        DB->hashoffsets[i] = (sw_off_t)0;
    for (i = 0; i < VERYBIGHASHSIZE; i++)
        DB->lasthashval[i] = (sw_off_t)0;
#endif




    /* Reserve space for offset pointers */
    DB->offsetstart = sw_ftell(DB->fp);
    for (i = 0; i < MAXCHARS; i++)
        printfileoffset(DB->fp, (sw_off_t) 0, sw_fwrite);

#ifndef USE_BTREE
    DB->hashstart = sw_ftell(DB->fp);
    for (i = 0; i < VERYBIGHASHSIZE; i++)
        printfileoffset(DB->fp, (sw_off_t) 0, sw_fwrite);
#endif

    return (void *) DB;
}


/*******************************************************************
*   DB_Open_Native
*
*******************************************************************/

void   *DB_Open_Native(SWISH *sw, char *dbname,int mode)
{
    struct Handle_DBNative *DB;
    int     i;
    FILE   *(*openRoutine)(char *) = NULL;
    char   *s;
#ifdef USE_BTREE
    FILE *fp_tmp;
#endif

    switch(mode)
    {
    case DB_READ:
        openRoutine = openIndexFILEForRead;
        break;
    case DB_READWRITE:
        openRoutine = openIndexFILEForReadAndWrite;
        break;
    default:
        openRoutine = openIndexFILEForRead;
    }

    DB = (struct Handle_DBNative *) newNativeDBHandle(sw, dbname);
    DB->mode = mode;

    /* Open index File */
    if (!(DB->fp = openRoutine(dbname)))
    {
        set_progerrno(INDEX_FILE_ERROR, DB->sw, "Could not open the index file '%s': ", dbname);
        return (void *) DB;
    }

    DB->cur_index_file = estrdup(dbname);

    s = emalloc(strlen(dbname) + strlen(PROPFILE_EXTENSION) + 1);

    strcpy(s, dbname);
    strcat(s, PROPFILE_EXTENSION);

    if (!(DB->prop = openRoutine(s)))
    {
        set_progerrno(INDEX_FILE_ERROR, DB->sw, "Couldn't open the property file \"%s\": ", s);
        efree(s);
        return (void *) DB;
    }

    DB->cur_prop_file = s;

#ifdef USE_BTREE

    s = emalloc(strlen(dbname) + strlen(BTREE_EXTENSION) + 1);

    strcpy(s, dbname);
    strcat(s, BTREE_EXTENSION);

    if (!(fp_tmp = openRoutine(s)))
    {
        set_progerrno(INDEX_FILE_ERROR, DB->sw, "Couldn't open the btree file \"%s\": ", s);
        efree(s);
        return (void *) DB;
    }



    DB->fp_btree = fp_tmp;
    DB->cur_btree_file = s;

    s = emalloc(strlen(dbname) + strlen(PRESORTED_EXTENSION) + 1);

    strcpy(s, dbname);
    strcat(s, PRESORTED_EXTENSION);

    if (!(DB->fp_presorted = openRoutine(s)))
    {
        set_progerrno(INDEX_FILE_ERROR, DB->sw, "Couldn't open the presorted index file \"%s\": ", s);
        efree(s);
        return (void *) DB;
    }


    DB->cur_presorted_file = s;



    s = emalloc(strlen(dbname) + strlen(WORDDATA_EXTENSION) + 1);

    strcpy(s, dbname);
    strcat(s, WORDDATA_EXTENSION);

    if (!(fp_tmp = openRoutine(s)))
    {
        set_progerrno(INDEX_FILE_ERROR, DB->sw, "Couldn't open the worddata file \"%s\": ", s);
        efree(s);
        return (void *) DB;
    }


    DB->fp_worddata = fp_tmp;
    DB->cur_worddata_file = s;


    s = emalloc(strlen(dbname) + strlen(HASHFILE_EXTENSION) + 1);

    strcpy(s, dbname);
    strcat(s, HASHFILE_EXTENSION);

    if (!(fp_tmp = openRoutine(s)))
    {
        set_progerrno(INDEX_FILE_ERROR, DB->sw, "Couldn't open the hashfile file \"%s\": ", s);
        efree(s);
        return (void *) DB;
    }


    DB->fp_hashfile = fp_tmp;
    DB->cur_hashfile_file = s;


    s = emalloc(strlen(dbname) + strlen(ARRAY_EXTENSION) + 1);

    strcpy(s, dbname);
    strcat(s, ARRAY_EXTENSION);

    if (!(fp_tmp = openRoutine(s)))
    {
        set_progerrno(INDEX_FILE_ERROR, DB->sw, "Couldn't open the array file \"%s\": ", s);
        efree(s);
        return (void *) DB;
    }

    DB->fp_array = fp_tmp;
    DB->cur_array_file = s;

#endif

    /* Validate index files */
    DB_CheckHeader(DB);
    if ( DB->sw->lasterror )
        return (void *) DB;

    /* Read offsets lookuptable */
    DB->offsetstart = sw_ftell(DB->fp);
    for (i = 0; i < MAXCHARS; i++)
        DB->offsets[i] = readfileoffset(DB->fp, sw_fread);

#ifndef USE_BTREE
    /* Read hashoffsets lookuptable */
    DB->hashstart = sw_ftell(DB->fp);
    for (i = 0; i < VERYBIGHASHSIZE; i++)
        DB->hashoffsets[i] = readfileoffset(DB->fp, sw_fread);
#else
    DB->bt = BTREE_Open(DB->fp_btree,4096,DB->offsets[WORDPOS]);
    DB->worddata = WORDDATA_Open(DB->fp_worddata);
    DB->hashfile = FHASH_Open(DB->fp_hashfile,DB->offsets[FILEHASHPOS]);
    DB->totwords_array = ARRAY_Open(DB->fp_array,DB->offsets[TOTALWORDSPERFILEPOS]);
    DB->props_array = ARRAY_Open(DB->fp_array,DB->offsets[FILEOFFSETPOS]);

    /* Put the file pointer of props file at the end of the file
    ** This is very important because if we are in update mode
    ** we must avoid the properties to be overwritten
    */
    sw_fseek(DB->prop,(sw_off_t)0,SEEK_END);
#endif

    return (void *) DB;
}

/****************************************************************
* This closes a file, and will rename if flagged as such
*  Frees the associated current file name
*
*****************************************************************/

static void DB_Close_File_Native(FILE ** fp, char **filename, int *tempflag)
{
#if defined(_WIN32) && !defined(__CYGWIN__)
        struct stat stbuf;
#endif
    if (!*fp)
        return;

    if (sw_fclose(*fp))
        progerrno("Failed to close file '%s': ", *filename);

    *fp = NULL;

#ifdef USE_TEMPFILE_EXTENSION
    if (*tempflag)
    {
        char   *newname = estrdup(*filename);

        newname[strlen(newname) - strlen(USE_TEMPFILE_EXTENSION)] = '\0';

#if defined(_WIN32) && !defined(__CYGWIN__)
        if(!stat(newname, &stbuf) && ((stbuf.st_mode & S_IFMT) == S_IFREG))
        /* if(isfile(newname)) FIXME: file.c shouldn't rely on indexing structures */
            if (remove(newname))
                progerrno("Failed to unlink '%s' before renaming. : ", newname);
#endif

        if (rename(*filename, newname))
            progerrno("Failed to rename '%s' to '%s' : ", *filename, newname);


#ifdef INDEXPERMS
        chmod(newname, INDEXPERMS);
#endif

        *tempflag = 0;          /* no longer opened as a temporary file */
        efree(newname);
    }

#else

#ifdef INDEXPERMS
    chmod(*filename, INDEXPERMS);
#endif

#endif /* USE_TEMPFILE_EXTENTION */

    efree(*filename);
    *filename = NULL;
}




void    DB_Close_Native(void *db)
{
    int     i;
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
    FILE   *fp = DB->fp;

    /* Close (and rename) property file, if it's open */
    DB_Close_File_Native(&DB->prop, &DB->cur_prop_file, &DB->tmp_prop);

#ifdef USE_BTREE
    /* Close (and rename) array file, if it's open */
    if(DB->fp_array)
    {
        if(DB->totwords_array)
        {
            DB->offsets[TOTALWORDSPERFILEPOS] = ARRAY_Close(DB->totwords_array);
            DB->totwords_array = NULL;
        }
        if(DB->props_array)
        {
            DB->offsets[FILEOFFSETPOS] = ARRAY_Close(DB->props_array);
            DB->props_array = NULL;
        }
        DB_Close_File_Native(&DB->fp_array, &DB->cur_array_file, &DB->tmp_array);
    }
    /* Close (and rename) worddata file, if it's open */
    if(DB->worddata)
    {
        WORDDATA_Close(DB->worddata);
        DB_Close_File_Native(&DB->fp_worddata, &DB->cur_worddata_file, &DB->tmp_worddata);
        DB->worddata = NULL;
    }
    /* Close (and rename) btree file, if it's open */
    if(DB->bt)
    {
        DB->offsets[WORDPOS] = BTREE_Close(DB->bt);
        DB_Close_File_Native(&DB->fp_btree, &DB->cur_btree_file, &DB->tmp_btree);
        DB->bt = NULL;
    }

    /* Close (and rename) presorted index file, if it's open */
    if(DB->fp_presorted)
    {
        DB_Close_File_Native(&DB->fp_presorted, &DB->cur_presorted_file, &DB->tmp_presorted);
    }
    if(DB->presorted_array)
    {
        for(i = 0; i < DB->n_presorted_array; i++)
        {
            if(DB->presorted_array[i])
                ARRAY_Close(DB->presorted_array[i]);
            DB->presorted_array[i] = NULL;
        }
        efree(DB->presorted_array);
    }
    if(DB->presorted_root_node)
        efree(DB->presorted_root_node);
    if(DB->presorted_propid)
        efree(DB->presorted_propid);

    /* Close (and rename) hash-file index file, if it's open */
    if(DB->fp_hashfile)
    {
        DB->offsets[FILEHASHPOS] = FHASH_Close(DB->hashfile);
        DB->hashfile = NULL;
        DB_Close_File_Native(&DB->fp_hashfile, &DB->cur_hashfile_file, &DB->tmp_hashfile);
    }
#endif

    if (DB->mode == DB_CREATE || DB->mode == DB_READWRITE)     /* If we are indexing update offsets to words and files */
    {
        /* Update internal pointers */

        sw_fseek(fp, DB->offsetstart, SEEK_SET);
        for (i = 0; i < MAXCHARS; i++)
            printfileoffset(fp, DB->offsets[i], sw_fwrite);

#ifndef USE_BTREE
        sw_fseek(fp, DB->hashstart, SEEK_SET);
        for (i = 0; i < VERYBIGHASHSIZE; i++)
            printfileoffset(fp, DB->hashoffsets[i], sw_fwrite);
#endif
    }

    /* Close (and rename) the index file */
    DB_Close_File_Native(&DB->fp, &DB->cur_index_file, &DB->tmp_index);


    if (DB->dbname)
        efree(DB->dbname);
    efree(DB);
}

void    DB_Remove_Native(void *db)
{
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;


    /* this is currently not used */
    /* $$$ remove the prop file too */
    sw_fclose(DB->fp);
    remove(DB->dbname);
    efree(DB->dbname);
    efree(DB);
}


/*--------------------------------------------*/
/*--------------------------------------------*/
/*              Header stuff                  */
/*--------------------------------------------*/
/*--------------------------------------------*/

int     DB_InitWriteHeader_Native(void *db)
{
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;

    if(DB->offsets[HEADERPOS])
    {
        /* If DB->offsets[HEADERPOS] is not 0 we are in update mode
        ** So, put the pointer file in the header start position to overwrite
        ** the header
        */
        sw_fseek(DB->fp,DB->offsets[HEADERPOS],SEEK_SET);
    }
    else
    {
        /* The index file is being created. So put the header in the
        ** current file position (coincides with the end of the file
        */
        DB->offsets[HEADERPOS] = sw_ftell(DB->fp);
    }
    return 0;
}


int     DB_EndWriteHeader_Native(void *db)
{
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
    FILE   *fp = DB->fp;

    /* End of header delimiter */
    if ( putc(0, fp) == EOF )
        progerrno("putc() failed: ");

    return 0;
}

int     DB_WriteHeaderData_Native(int id, unsigned char *s, int len, void *db)
{
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;

    FILE   *fp = DB->fp;

    compress1(id, fp, sw_fputc);
    compress1(len, fp, sw_fputc);
    if ( sw_fwrite(s, len, sizeof(char), fp) != sizeof( char ) ) /* seems backward */
        progerrno("Error writing to device while trying to write %d bytes: ", len );


    return 0;
}


int     DB_InitReadHeader_Native(void *db)
{
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;

    sw_fseek(DB->fp, DB->offsets[HEADERPOS], SEEK_SET);
    return 0;
}

int     DB_ReadHeaderData_Native(int *id, unsigned char **s, int *len, void *db)
{
    int     tmp;
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
    FILE   *fp = DB->fp;

    tmp = uncompress1(fp, sw_fgetc);
    *id = tmp;
    if (tmp)
    {
        tmp = uncompress1(fp, sw_fgetc);
        *s = (unsigned char *) emalloc(tmp + 1);
        *len = tmp;
        sw_fread(*s, *len, sizeof(char), fp);
    }
    else
    {
        len = 0;
        *s = NULL;
    }
    return 0;
}

int     DB_EndReadHeader_Native(void *db)
{
    return 0;
}

/*--------------------------------------------*/
/*--------------------------------------------*/
/*                 Word Stuff                 */
/*--------------------------------------------*/
/*--------------------------------------------*/

int     DB_InitWriteWords_Native(void *db)
{

#ifndef USE_BTREE
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
    DB->offsets[WORDPOS] = sw_ftell(DB->fp);
#endif

    return 0;
}

int     cmp_wordhashdata(const void *s1, const void *s2)
{
    sw_off_t    *i = (sw_off_t *) s1;
    sw_off_t    *j = (sw_off_t *) s2;
    sw_off_t     d = (*i - *j);

    if(d == (sw_off_t)0) return 0;
    else if(d > (sw_off_t)0) return 1;
    else return -1;
}

int     DB_EndWriteWords_Native(void *db)
{
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
#ifndef USE_BTREE
    FILE   *fp = (FILE *) DB->fp;
    int     i,
            wordlen;
    sw_off_t wordID, word_pos;
    sw_off_t f_hash_offset, f_offset;
#else
    FILE   *fp_tmp;
#endif

#ifdef USE_BTREE

    /* If we close the BTREE here we can save some memory bytes */
    /* Close (and rename) worddata file, if it's open */

    fp_tmp =DB->worddata->fp;
    WORDDATA_Close(DB->worddata);
    DB->worddata=NULL;
    DB_Close_File_Native(&fp_tmp, &DB->cur_worddata_file, &DB->tmp_worddata);

    fp_tmp = DB->bt->fp;
    DB->offsets[WORDPOS] = BTREE_Close(DB->bt);
    DB->bt = NULL;
    DB_Close_File_Native(&fp_tmp, &DB->cur_btree_file, &DB->tmp_btree);

    /* Restore file pointer at the end of file */
    sw_fseek(DB->fp, 0, SEEK_END);
#else

    /* Free hash zone */
    Mem_ZoneFree(&DB->hashzone);

    /* Now update word's data offset into the list of words */
    /* Simple check  words and worddata must match */

    if (! DB->num_words)
        progerr("No unique words indexed");

    if (DB->num_words != DB->wordhash_counter)
        progerrno("Internal DB_native error - DB->num_words != DB->wordhash_counter: ");

    if (DB->num_words != DB->worddata_counter)
        progerrno("Internal DB_native error - DB->num_words != DB->worddata_counter: ");

    /* Sort wordhashdata to be written to allow sequential writes */
    swish_qsort(DB->wordhashdata, DB->num_words, 3 * sizeof(sw_off_t), cmp_wordhashdata);

    if (WRITE_WORDS_RAMDISK)
    {
        fp = (FILE *) DB->rd;
    }
    for (i = 0; i < DB->num_words; i++)
    {
        wordID = DB->wordhashdata[3 * i];
        f_hash_offset = DB->wordhashdata[3 * i + 1];
        f_offset = DB->wordhashdata[3 * i + 2];

        word_pos = wordID;
        if (WRITE_WORDS_RAMDISK)
        {
            word_pos -= DB->offsets[WORDPOS];
        }
        /* Position file pointer in word */
        DB->w_seek(fp, word_pos, SEEK_SET);
        /* Jump over word length and word */
        wordlen = uncompress1(fp, DB->w_getc); /* Get Word length */
        DB->w_seek(fp, (sw_off_t) wordlen, SEEK_CUR); /* Jump Word */
        /* Write offset to next chain */
        printfileoffset(fp, f_hash_offset, DB->w_write);
        /* Write offset to word data */
        printfileoffset(fp, f_offset, DB->w_write);
    }

    efree(DB->wordhashdata);
    DB->wordhashdata = NULL;
    DB->worddata_counter = 0;
    DB->wordhash_counter = 0;

    if (WRITE_WORDS_RAMDISK)
    {
        unsigned char buffer[4096];
        sw_off_t    ramdisk_size;
        long    read = 0;

        ramdisk_seek((FILE *) DB->rd, (sw_off_t)0, SEEK_END);
        ramdisk_size = ramdisk_tell((FILE *) DB->rd);
        /* Write ramdisk to fp end free it */
        sw_fseek((FILE *) DB->fp, DB->offsets[WORDPOS], SEEK_SET);
        ramdisk_seek((FILE *) DB->rd, (sw_off_t)0, SEEK_SET);
        while (ramdisk_size)
        {
            read = ramdisk_read(buffer, 4096, 1, (FILE *) DB->rd);
            if ( sw_fwrite(buffer, read, 1, DB->fp) != 1 )
                progerrno("Error while flushing ramdisk to disk:");

            ramdisk_size -= (sw_off_t)read;
        }
        ramdisk_close((FILE *) DB->rd);
    }
    /* Get last word file offset - For the last word, this will be
    ** used to delimite the last word in the index file
    ** In other words. This is the file offset where no more words
    ** are added.
    */
    DB->offsets[ENDWORDPOS] = sw_ftell(DB->fp);

    /* Restore file pointer at the end of file */
    sw_fseek(DB->fp, (sw_off_t)0, SEEK_END);
    if ( sw_fputc(0, DB->fp) == EOF )           /* End of words mark */
        progerrno("sw_fputc() failed writing null: ");

#endif

    return 0;
}

#ifndef USE_BTREE
sw_off_t    DB_GetWordID_Native(void *db)
{
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
    FILE   *fp = DB->fp;
    sw_off_t    pos = (sw_off_t)0;

    if (WRITE_WORDS_RAMDISK)
    {
        if (!DB->rd)
        {
            /* ramdisk size as suggested by Bill Meier */
            DB->rd = ramdisk_create("RAM Disk: write words", 32 * 4096);
        }
        pos = DB->offsets[WORDPOS];
        fp = (FILE *) DB->rd;
    }
    pos += DB->w_tell(fp);

    return pos;                 /* Native database uses position as a Word ID */
}

int     DB_WriteWord_Native(char *word, sw_off_t wordID, void *db)
{
    int     i,
            wordlen;
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;

    FILE   *fp = DB->fp;

    i = (int) ((unsigned char) word[0]);

    if (!DB->offsets[i])
        DB->offsets[i] = wordID;


    /* Write word length, word and a NULL offset */
    wordlen = strlen(word);

    if (WRITE_WORDS_RAMDISK)
    {
        fp = (FILE *) DB->rd;
    }
    compress1(wordlen, fp, DB->w_putc);
    DB->w_write(word, wordlen, sizeof(char), fp);

    printfileoffset(fp, (sw_off_t) 0, DB->w_write); /* hash chain */
    printfileoffset(fp, (sw_off_t) 0, DB->w_write); /* word's data pointer */

    DB->num_words++;

    return 0;
}

int offsethash(sw_off_t offset)
{
    return (int)(offset % (sw_off_t) BIGHASHSIZE);
}

long    DB_WriteWordData_Native(sw_off_t wordID, unsigned char *worddata, int data_size, int saved_bytes, void *db)
{
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
    FILE   *fp = DB->fp;
    struct numhash *numhash;
    int     numhashval;

    /* We must be at the end of the file */

    if (!DB->worddata_counter)
    {
        /* We are starting writing worddata */
        /* If inside a ramdisk we must preserve its space */
        if (WRITE_WORDS_RAMDISK)
        {
            sw_off_t    ramdisk_size;

            ramdisk_seek((FILE *) DB->rd, (sw_off_t)0, SEEK_END);
            ramdisk_size = ramdisk_tell((FILE *) DB->rd);
            /* Preserve ramdisk size in DB file  */
            /* it will be written later */
            sw_fseek((FILE *) DB->fp, ramdisk_size, SEEK_END);
        }
    }
    /* Search for word's ID */
    numhashval = offsethash(wordID);
    for (numhash = DB->hash[numhashval]; numhash; numhash = numhash->next)
        if (DB->wordhashdata[3 * numhash->index] == wordID)
            break;
    if (!numhash)
        progerrno("Internal db_native.c error in DB_WriteWordData_Native: ");
    DB->wordhashdata[3 * numhash->index + 2] = sw_ftell(fp);

    DB->worddata_counter++;

    /* Write the worddata to disk */
    /* Write in the form:  <data_size><saved_bytes><worddata> */
    /* If there is not any compression then saved_bytes is 0 */
    compress1(data_size, fp, sw_fputc);
    compress1(saved_bytes, fp, sw_fputc);
    if ( sw_fwrite(worddata, data_size, 1, fp) != 1 )
        progerrno("Error writing to device while trying to write %d bytes: ", data_size );


    /* A NULL byte to indicate end of word data */
    if ( sw_fputc(0, fp) == EOF )
        progerrno( "sw_fputc() returned error writing null: ");



    return 0;
}

#else

sw_off_t    DB_GetWordID_Native(void *db)
{
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;

    return DB->worddata->lastid;
}

int     DB_WriteWord_Native(char *word, sw_off_t wordID, void *db)
{
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;

    BTREE_Insert(DB->bt, (unsigned char *)word, strlen(word), (sw_off_t) wordID);

    DB->num_words++;

    return 0;
}

int     DB_UpdateWordID_Native(char *word, sw_off_t new_wordID, void *db)
{
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;

    BTREE_Update(DB->bt, (unsigned char *)word, strlen(word), (sw_off_t) new_wordID);

    return 0;
}

int     DB_DeleteWordData_Native(sw_off_t wordID, void *db)
{
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
    int dummy;

    WORDDATA_Del(DB->worddata, wordID, &dummy);

    return 0;
}

long    DB_WriteWordData_Native(sw_off_t wordID, unsigned char *worddata, int data_size, int saved_bytes, void *db)
{
    unsigned char stack_buffer[8192]; /* just to avoid emalloc,efree overhead */
    unsigned char *buf, *p;
    int buf_size;

    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;

    /* WORDDATA_Put requires only 2 values (size and data). So
    ** we need to pack the saved bytes into data
    */

    /* Get total size */
    buf_size = data_size + sizeofcompint(saved_bytes);

    if(buf_size > sizeof(stack_buffer))
        buf = (unsigned char *) emalloc(buf_size);
    else
        buf = stack_buffer;

    /* Put saved_bytes in buf */
    p = compress3(saved_bytes, buf);
    /* Put bytes worddata buf */
    memcpy(p,worddata,data_size);

    DB->worddata_counter++;

    /* Write the worddata to disk */
    WORDDATA_Put(DB->worddata,buf_size,buf);

    if(buf != stack_buffer)
        efree(buf);
    return 0;
}

#endif

#ifndef USE_BTREE
int     DB_WriteWordHash_Native(char *word, sw_off_t wordID, void *db)
{
    int     i,
            hashval,
            numhashval;
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
    struct numhash *numhash;

    if (!DB->wordhash_counter)
    {
        /* Init hash array */
        for (i = 0; i < BIGHASHSIZE; i++)
            DB->hash[i] = NULL;
        DB->hashzone = Mem_ZoneCreate("WriteWordHash", DB->num_words * sizeof(struct numhash), 0);

        /* If we are here we have finished WriteWord_Native */
        /* If using ramdisk - Reserve space upto the size of the ramdisk */
        if (WRITE_WORDS_RAMDISK)
        {
            sw_off_t    ram_size = (sw_off_t) (DB->w_seek((FILE *) DB->rd, 0, SEEK_END));

            sw_fseek(DB->fp, ram_size, SEEK_SET);
        }

        DB->wordhashdata = emalloc(3 * DB->num_words * sizeof(sw_off_t));
    }

    hashval = verybighash(word);

    if (!DB->hashoffsets[hashval])
    {
        DB->hashoffsets[hashval] = wordID;
    }

    DB->wordhashdata[3 * DB->wordhash_counter] = wordID;
    DB->wordhashdata[3 * DB->wordhash_counter + 1] = (sw_off_t) 0;


    /* Add to the hash */
    numhash = (struct numhash *) Mem_ZoneAlloc(DB->hashzone, sizeof(struct numhash));

    numhashval = offsethash(wordID);
    numhash->index = DB->wordhash_counter;
    numhash->next = DB->hash[numhashval];
    DB->hash[numhashval] = numhash;

    DB->wordhash_counter++;

    /* Update previous word in hashlist */
    if (DB->lasthashval[hashval])
    {
        /* Search for DB->lasthashval[hashval] */
        numhashval = offsethash(DB->lasthashval[hashval]);
        for (numhash = DB->hash[numhashval]; numhash; numhash = numhash->next)
            if (DB->wordhashdata[3 * numhash->index] == DB->lasthashval[hashval])
                break;
        if (!numhash)
            progerrno("Internal db_native.c error in DB_WriteWordHash_Native: ");
        DB->wordhashdata[3 * numhash->index + 1] = wordID;
    }
    DB->lasthashval[hashval] = wordID;

    return 0;
}
#endif

int     DB_InitReadWords_Native(void *db)
{
    return 0;
}

int     DB_EndReadWords_Native(void *db)
{
    return 0;
}

#ifndef USE_BTREE
int     DB_ReadWordHash_Native(char *word, sw_off_t *wordID, void *db)
{
    int     wordlen,
            res,
            hashval;
    sw_off_t    offset, dataoffset;
    char   *fileword = NULL;
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
    FILE   *fp = DB->fp;


    /* If there is not a star use the hash approach ... */
    res = 1;

    /* Get hash file offset */
    hashval = verybighash(word);
    if (!(offset = DB->hashoffsets[hashval]))
    {
        *wordID = (sw_off_t)0;
        return 0;
    }
    /* Search for word */
    while (res)
    {
        /* Position in file */
        sw_fseek(fp, offset, SEEK_SET);
        /* Get word */
        wordlen = uncompress1(fp, sw_fgetc);
        fileword = emalloc(wordlen + 1);
        sw_fread(fileword, 1, wordlen, fp);
        fileword[wordlen] = '\0';
        offset = readfileoffset(fp, sw_fread); /* Next hash */
        dataoffset = readfileoffset(fp, sw_fread); /* Offset to Word data */

        res = strcmp(word, fileword);
        efree(fileword);

        if (!res)
            break;              /* Found !! */
        else if (!offset)
        {
            dataoffset = (sw_off_t)0;
            break;
        }
    }
    *wordID = (sw_off_t)dataoffset;
    return 0;
}

int     DB_ReadFirstWordInvertedIndex_Native(char *word, char **resultword, sw_off_t *wordID, void *db)
{
    int     wordlen,
            i,
            res,
            len,
            found;
    sw_off_t    dataoffset = 0;
    char   *fileword = NULL;
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
    FILE   *fp = DB->fp;


    len = strlen(word);

    i = (int) ((unsigned char) word[0]);

    if (!DB->offsets[i])
    {
        *resultword = NULL;
        *wordID = (sw_off_t)0;
        return 0;
    }
    found = 1;
    sw_fseek(fp, DB->offsets[i], SEEK_SET);

    /* Look for first occurrence */
    wordlen = uncompress1(fp, sw_fgetc);
    fileword = (char *) emalloc(wordlen + 1);

    while (wordlen)
    {
        int bytes_read = (int)sw_fread(fileword, 1, wordlen, fp);
        if ( bytes_read != wordlen )
            progerr("Read %d bytes, expected %d in DB_ReadFirstWordInvertedIndex_Native", bytes_read, wordlen);

        fileword[wordlen] = '\0';
        readfileoffset(fp, sw_fread);    /* jump hash offset */
        dataoffset = readfileoffset(fp, sw_fread); /* Get offset to word's data */

        if (!(res = strncmp(word, fileword, len))) /*Found!! */
        {
            DB->nextwordoffset = sw_ftell(fp); /* preserve next word pos */
            break;
        }

        /* check if past current word or at end */
        if (res < 0 || sw_ftell(fp) ==  DB->offsets[ENDWORDPOS] )
        {
            dataoffset = 0;
            break;
        }

        /* Go to next value */
        wordlen = uncompress1(fp, sw_fgetc); /* Next word */
        if (!wordlen)
        {
            dataoffset = 0;
            break;
        }
        efree(fileword);
        fileword = (char *) emalloc(wordlen + 1);
    }

    if (!dataoffset)
    {
        efree(fileword);
        *resultword = NULL;
    }
    else
        *resultword = fileword;

    *wordID = dataoffset;

    return 0;
}

int     DB_ReadNextWordInvertedIndex_Native(char *word, char **resultword, sw_off_t *wordID, void *db)
{
    int     len,
            wordlen;
    sw_off_t    dataoffset;
    char   *fileword;
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
    FILE   *fp = DB->fp;

    /* Check for end of words */
    if (!DB->nextwordoffset || DB->nextwordoffset == DB->offsets[ENDWORDPOS])
    {
        *resultword = NULL;
        *wordID = (sw_off_t)0;
        return 0;
    }

    len = strlen(word);


    sw_fseek(fp, DB->nextwordoffset, SEEK_SET);

    wordlen = uncompress1(fp, sw_fgetc);
    fileword = (char *) emalloc(wordlen + 1);

    sw_fread(fileword, 1, wordlen, fp);
    fileword[wordlen] = '\0';
    if (strncmp(word, fileword, len))
    {
        efree(fileword);
        fileword = NULL;
        dataoffset = (sw_off_t)0;         /* No more data */
        DB->nextwordoffset = (sw_off_t)0;
    }
    else
    {
        readfileoffset(fp, sw_fread);    /* jump hash offset */
        dataoffset = readfileoffset(fp, sw_fread); /* Get data offset */
        DB->nextwordoffset = sw_ftell(fp);
    }
    *resultword = fileword;
    *wordID = dataoffset;

    return 0;

}


long    DB_ReadWordData_Native(sw_off_t wordID, unsigned char **worddata, int *data_size, int *saved_bytes, void *db)
{
    unsigned char *buffer;
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
    FILE   *fp = DB->fp;

    sw_fseek(fp, wordID, SEEK_SET);
    *data_size = uncompress1(fp, sw_fgetc);
    *saved_bytes = uncompress1(fp, sw_fgetc);
    buffer = emalloc(*data_size);
    sw_fread(buffer, *data_size, 1, fp);

    *worddata = buffer;

    return 0;
}


#else
int     DB_ReadWordHash_Native(char *word, sw_off_t *wordID, void *db)
{
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
    unsigned char *dummy;
    int dummy2;

    if((*wordID = (sw_off_t)BTREE_Search(DB->bt,word,strlen(word),&dummy,&dummy2,1)) < 0)
        *wordID = (sw_off_t)0;
    else
        efree(dummy);
    return 0;
}

int     DB_ReadFirstWordInvertedIndex_Native(char *word, char **resultword, sw_off_t *wordID, void *db)
{
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
    unsigned char *found;
    int found_len;


    if((*wordID = (sw_off_t)BTREE_Search(DB->bt,word,strlen(word), &found, &found_len, 0)) < 0)
    {
        *resultword = NULL;
        *wordID = (sw_off_t)0;
    }
    else
    {

        *resultword = emalloc(found_len + 1);
        memcpy(*resultword,found,found_len);
        (*resultword)[found_len]='\0';
        efree(found);
        if (strncmp(word, *resultword, strlen(word))>0)
        {
            efree(*resultword);
            return DB_ReadNextWordInvertedIndex_Native(word, resultword, wordID, db);
        }
    }

    return 0;
}

int     DB_ReadNextWordInvertedIndex_Native(char *word, char **resultword, sw_off_t *wordID, void *db)
{
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
    unsigned char *found;
    int found_len;

    if((*wordID = (sw_off_t)BTREE_Next(DB->bt, &found, &found_len)) < 0)
    {
        *resultword = NULL;
        *wordID = (sw_off_t)0;
    }
    else
    {
        *resultword = emalloc(found_len + 1);
        memcpy(*resultword,found,found_len);
        (*resultword)[found_len]='\0';
        efree(found);
        if (strncmp(word, *resultword, strlen(word)))
        {
            efree(*resultword);
            *resultword = NULL;
            *wordID = (sw_off_t)0;         /* No more data */
        }
    }
    return 0;
}

long    DB_ReadWordData_Native(sw_off_t wordID, unsigned char **worddata, int *data_size, int *saved_bytes, void *db)
{
    unsigned char *buf;
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;

    *worddata = buf = WORDDATA_Get(DB->worddata,wordID,data_size);
    /* Get saved_bytes and adjust data_size */
    *saved_bytes = uncompress2(&buf);
    *data_size -= (buf - (*worddata));
    /* Remove saved_bytes from buffer
    ** We need to use memmove because data overlaps */
    memmove(*worddata,buf, *data_size);

    return 0;
}

#endif


/*--------------------------------------------
** 2002/12 Jose Ruiz
**     FilePath,FileNum  pairs
**     Auxiliar hash index
*/

/* Routine to write path,filenum */
int     DB_WriteFileNum_Native(int filenum, unsigned char *filedata, int sz_filedata, void *db)
{
#ifdef USE_BTREE
    unsigned long tmp = (unsigned long)filenum;
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;

    /* Pack tmp */
    tmp = PACKLONG(tmp);

    /* Write it to the hash index */
    FHASH_Insert(DB->hashfile, filedata, sz_filedata, (unsigned char *)&tmp, sizeof(long));
#endif

    return 0;
}

/* Routine to get filenum from path */
int     DB_ReadFileNum_Native(unsigned char *filedata, void *db)
{
#ifdef USE_BTREE
    unsigned long tmp;
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;

    /* Write it to the hash index */
    FHASH_Search(DB->hashfile, filedata, strlen(filedata), (unsigned char *)&tmp, sizeof(unsigned long));

    /* UnPack tmp */
    tmp = PACKLONG(tmp);

    return (int)tmp;
#endif
    return 0;
}

/* Routine to test if filenum was deleted */
int     DB_CheckFileNum_Native(int filenum, void *db)
{
#ifdef USE_BTREE
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;

    return ARRAY_Get(DB->totwords_array,filenum - 1);
#endif
    return 1;
}


/* Routine to remove a filenum */
/* At this moment, to remove a filenum I am just puting 0 in the number of
** words */
int     DB_RemoveFileNum_Native(int filenum, void *db)
{
#ifdef USE_BTREE
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;

    ARRAY_Put(DB->totwords_array,filenum - 1,0);
#endif

    return 0;
}

/*--------------------------------------------*/
/*--------------------------------------------*/
/*              Sorted data Stuff             */
/*--------------------------------------------*/
/*--------------------------------------------*/

#ifdef USE_PRESORT_ARRAY

/*******************************************************************************
*  Sorted data tables (presorted indexes)
*
*  DB->offsets[SORTEDINDEX]     Points to start of presorted index tables
*  DB->n_presorted_array        Number of props in index
*  DB->presorted_array          Points to array[ # props ] of pointers to ARRAY
*                               ARRAY is defined in array.h
*  DB->presorted_root_node      Points to array[ # props ] of unsigned longs
*                               holds the seek pointer to the root node for
*                               each of the sorted props.
*  DB->next_sortedindex         Counter - index into structure below.
*                               incremented after each write of an array.
*
*  Index layout:
*            +-------------------------------+
*            |  Num props                    |  <<-- DB->offsets[SORTEDINEDEX]
*            +-------------------------------+
*            |  PropIDX[0]                   |  <- propID for
*            +-------------------------------+
*            |  Array Pointer[0]             |  <- seek point of root node for this prop
*            +-------------------------------+
*            |  PropIDX[1]                   |
*            +-------------------------------+
*            |  Array Pointer[1]             |
*            +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
*            |  PropIDX[n_props-1]           |  up to number of props defined in
*            +-------------------------------+  the index.  Not all will be filled
*            |  Array Pointer[n_props-1]     |  since not all props are pre-sorted.
*            +-------------------------------+  They are filled sequentially.
*            |  Data for all arrays follows  |
*            |  here.  (true?)               |
*            +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
*
*  I (moseley) don't really follow why  DB->presorted_array and
*   DB->presorted_root_node are needed during write operations.
*
*  Also, since the above is total number of props long, why not just
*  have it indexed by propIDX (not propID)?  Then don't need to search
*  for a given propID -- just do
*
*           sorted_props_index = read_sorted_props_table_from_disk();
*           root_array_seek_pointer = sorted_props_index[ propIDX ];
*
*  Since the table above is fixed length, don't really need a loop to write
*  or read from the disk.
*
*  Maybe the DB->presorted_(propid|root_node|array) could all be part of
*  one structure (i.e. DB->presorted[n].array ).
*
*  These functions return an int, but never seems to be used.
*
*  Another problem with using this code is that the format of the array is
*  different before and after writing to disk.  That is, with the old code
*  pre_sort.c: CreatePropertySortArray() created an integer array total_files long.
*  And when reading the the array back from disk you get that same array.
*  This problem shows up in merge.c: load_filename_sort().  See notes there.
*
*
*
*******************************************************************************/


int     DB_InitWriteSortedIndex_Native(void *db, int n_props)
{
   struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
   FILE *fp = DB->fp_presorted;
   int i;

   DB->offsets[SORTEDINDEX] = sw_ftell(fp);

   /* Write number of properties possible.  May not write all*/
   printlong(fp,(unsigned long) n_props, sw_fwrite);

   DB->n_presorted_array = n_props;
   DB->presorted_array = (ARRAY **)emalloc(n_props * sizeof(ARRAY *));
   DB->presorted_root_node = (unsigned long *)emalloc(n_props * sizeof(unsigned long));


   for(i = 0; i < n_props ; i++)
   {
       DB->presorted_array[i] = NULL;
       DB->presorted_root_node[i] = 0;

       /* Reserve space for propidx and Array Pointer */
       printfileoffset(fp,(sw_off_t) 0, sw_fwrite);
       printfileoffset(fp,(sw_off_t) 0, sw_fwrite);
   }
   DB->next_sortedindex = 0;
   return 0;
}


/***********************************************************************************
* DB_WriteSortedIndex -
*
* Jose's ARRAY storage - a way to avoid reading the entire table into memory
*
* Input:
*   propID      id of property to write
*   *data       pointer to array of integers (total_files long)
*   sz_data     size of array (i.e. total_files)
*   *db         database
*
* Why pass *char and then cast to an *int?  *int is the
* original data type in pre_sort.c.  Why not "int *data"?
*
*
* Questions:
*   Why does DB->presorted_root_node[] array needed to be populated during the write
*   operation?
*
*
*
************************************************************************************/


int     DB_WriteSortedIndex_Native(int propID, int *data, int num_elements, void *db)
{
   struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
   FILE *fp = DB->fp_presorted;
   ARRAY *arr;
   int i;

   arr = ARRAY_Create(fp);          /* Initialize the ARRAY storage */


   /* Add all elemnts of array to the array */
   for(i = 0 ; i < num_elements ; i++)
       ARRAY_Put(arr,i,data[i]);



   /* ARRAY_Close() returns the offset of the root node for this array */
   DB->presorted_root_node[DB->next_sortedindex] = ARRAY_Close(arr);


   /* Seek to start of this record in the index */
   sw_fseek(fp,DB->offsets[SORTEDINDEX] + (1 + 2 * DB->next_sortedindex) * sizeof(unsigned long),SEEK_SET);

   /* Write out the propID and the seek position of this array's root node */
   printlong(fp,(unsigned long) propID, sw_fwrite);
   printlong(fp,(unsigned long) DB->presorted_root_node[DB->next_sortedindex], sw_fwrite);

   DB->next_sortedindex++;

   return 0;
}


int     DB_EndWriteSortedIndex_Native(void *db)
{
   return 0;
}

/********************************************************************************
*  BTREE DB_InitReadSortedIndex
*
*  Reads the index table into memory:
*       DB->presorted_array[]       all set to null
*       DB->presorted_propid[]      list of propIDs
*       DB->presorted_root_node[]   seek points for the array for each propID
*
*********************************************************************************/


int     DB_InitReadSortedIndex_Native(void *db)
{
   struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
   FILE *fp = DB->fp_presorted;
   int i;

   sw_fseek(fp,DB->offsets[SORTEDINDEX],SEEK_SET);

   /* Read number of properties */
   DB->n_presorted_array = readlong(fp,sw_fread);

   /* init the arrays */

   DB->presorted_array = (ARRAY **)emalloc(DB->n_presorted_array * sizeof(ARRAY *));
   DB->presorted_root_node = (unsigned long *)emalloc(DB->n_presorted_array * sizeof(unsigned long));
   DB->presorted_propid = (unsigned long *)emalloc(DB->n_presorted_array * sizeof(unsigned long));


   /* read the table into memory */

   for(i = 0; i < DB->n_presorted_array ; i++)
   {
       DB->presorted_array[i] = NULL;
       DB->presorted_propid[i] = readlong(fp,sw_fread);
       DB->presorted_root_node[i] = readlong(fp,sw_fread);
   }
   return 0;

}



int     DB_ReadSortedIndex_Native(int propID, unsigned char **data, int *sz_data,void *db)
{
   struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
   FILE *fp = DB->fp_presorted;
   int i;

   /* Open the array for the propID specified, if not already open */


   /* $$$ looks like a bug -- If cur_presorted_array is set, but the propid doesn't match
    * Then it will search for the new matching propID.  But if it doesn't find it then
    * DB-cur_presorted_array is STILL true, so it will return the wrong array.
    * Seems like need to set DB->cur_presorted_array = NULL before entering the loop. */

   if(!DB->cur_presorted_array || DB->cur_presorted_propid != (unsigned long)propID)
   {
       DB->cur_presorted_array = NULL;  /* Added to fix bug mentioned above - moseley */

       for(i = 0; i < DB->n_presorted_array ; i++)
       {
           if((unsigned long)propID == DB->presorted_propid[i])
           {
               DB->cur_presorted_propid = propID;
               DB->cur_presorted_array = DB->presorted_array[i] = ARRAY_Open(fp,DB->presorted_root_node[i]);
               break;
           }
       }
   }

   /* If found, then return */
   if(DB->cur_presorted_array)
   {
       *data = (unsigned char *)DB->cur_presorted_array;
       *sz_data = sizeof(DB->cur_presorted_array);  /* is this just a non-zero value to flag that something was returned? */
   }
   else
   {
       *data = NULL;
       *sz_data = 0;
   }

   return 0;
}

int     DB_ReadSortedData_Native(int *data,int index, int *value, void *db)
{
    *value = ARRAY_Get((ARRAY *)data,index);
    return 0;
}

int     DB_EndReadSortedIndex_Native(void *db)
{
   return 0;
}

#else /* NOT USE_BTREE */

/********************************************************************************
*
*   DB->offsets[SORTEDINDEX] => first record in table There's a record for each
*   property that was sorted (not all are sorted) The data passed in has
*   already been compressed
*
*                 +------------------------------+
*                 | Pointer to next table entry  | <<-- DB->last_sortedindex
*                 |   (initially zero)           |
*                 +------------------------------+
*                 | PropID                       |
*                 +------------------------------+
*                 | Data Length in bytes         |
*                 +------------------------------+
*                 | Data [....]                  |
*                 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
*                 |                              | <<-- DB_next_sortedindex
*                 +------------------------------+
*
*   Again, DB->offsets[SORTEDINDEX] points to the very first record in the
*   index While writing, DB->next_sortedindex points to the next place to start
*   writing.  DB->last_sortedindex points to the (initially zero) point to the
*   next table.  So, when it is not null then last_sortedinex is used to update
*   the pointer in the previous table to point to the next table.  A zero entry
*   indicates that there are no more records.
*
* DB_InitWriteSortedIndex_Native should probably write a null for the first
* record's "next table entry" and set next_ and last_ pointers.  Then
* DB_EndWriteSortedIndex_Native call would not be needed.
*
* Notes/Questions:
*   Seems like result_sort.c is the only place that loads this -- in LoadSortedProps.
*   LoadSortedProps is used by merge, proplimit, and result_sort.  LoadSortedProps
*   uncompresses the array (and pre_sort.c compresses).  Why isn't that compression
*   and uncompression done here?
*
*
********************************************************************************/


int     DB_InitWriteSortedIndex_Native(void *db)
{
   struct Handle_DBNative *DB = (struct Handle_DBNative *) db;

   DB->offsets[SORTEDINDEX] = sw_ftell(DB->fp);
   DB->next_sortedindex = DB->offsets[SORTEDINDEX];
   return 0;
}

/********************************************************************************
* DB_WriteSortedIndex
*
* Input:
*   propID      property id of this table
*   *data       pointer to char array compressed table
*   sz_data     size of the char array in bytes
*   db          where to write
*
*********************************************************************************/

int     DB_WriteSortedIndex_Native(int propID, unsigned char *data, int sz_data,void *db)
{
   sw_off_t tmp1,tmp2;
   struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
   FILE *fp = DB->fp;


   sw_fseek(fp, DB->next_sortedindex, SEEK_SET);

    /* save start of this record so can update previous record's "next record" pointer */
   tmp1 = sw_ftell(fp);

   printfileoffset(fp,(sw_off_t)0,sw_fwrite);  /* Pointer to next table if any */

    /* Write ID */
   compress1(propID,fp,sw_fputc);

   /* Write len of data */
   compress1(sz_data,fp,putc);

   /* Write data */
   if ( sw_fwrite(data,sz_data,1,fp) != 1 )
        progerrno("Error writing to device while trying to write %d bytes: ", sz_data );


   DB->next_sortedindex = tmp2 = sw_ftell(fp);


   if(DB->last_sortedindex)
   {
       sw_fseek(fp,DB->last_sortedindex,SEEK_SET);
       printfileoffset(fp,tmp1,sw_fwrite);
       sw_fseek(fp,tmp2,SEEK_SET);
   }
   DB->last_sortedindex = tmp1;
   return 0;
}

int     DB_EndWriteSortedIndex_Native(void *db)
{
   struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
   FILE *fp = DB->fp;

   printfileoffset(fp,(sw_off_t)0,sw_fwrite);  /* No next table mark - Useful if no presorted indexes */
         /* NULL meta id- Only useful if no presorted indexes  */

    if ( putc(0, fp) == EOF )
        progerrno("putc() failed writing null: ");

   return 0;
}


/* Non Btree read functions */


int     DB_InitReadSortedIndex_Native(void *db)
{
   return 0;
}

/***********************************************************************************
*  DB_ReadSortedIndex_Native -
*
*  Searches through the sorted indexes looking for one that matches the propID
*  passed in.  If found then malloc's a table and reads it in.
*  The data is in compressed format and is not uncompressed here
*
***********************************************************************************/

int     DB_ReadSortedIndex_Native(int propID, unsigned char **data, int *sz_data,void *db)
{
   sw_off_t next;
   long id, tmp;
   struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
   FILE *fp = DB->fp;


   /* seek to the first record */
   sw_fseek(fp,DB->offsets[SORTEDINDEX],SEEK_SET);


   /* get seek position of the next record, if needed */
   next = readfileoffset(fp,sw_fread);

   /* read propID for this record */
   id = uncompress1(fp,sw_fgetc);


   while(1)
   {
       if(id == propID)  /* this is the property we are looking for */
       {
           tmp = uncompress1(fp,sw_fgetc);
           *sz_data = tmp;

           *data = emalloc(*sz_data);
           sw_fread(*data,*sz_data,1,fp);
           return 0;
       }
       if(next)
       {
           sw_fseek(fp,next,SEEK_SET);
           next = readfileoffset(fp,sw_fread);
           id = uncompress1(fp,sw_fgetc);
       }
       else
       {
           *sz_data = 0;
           *data = NULL;
           return 0;
       }
   }
   return 0;
}

int     DB_ReadSortedData_Native(int *data,int index, int *value, void *db)
{
    *value = data[index];
    return 0;
}

int     DB_EndReadSortedIndex_Native(void *db)
{
   return 0;
}


#endif




/*
** Jose Ruiz 04/00
** Store a portable long in a portable format
*/
void    printlong(FILE * fp, unsigned long num, size_t(*f_write) (const void *, size_t, size_t, FILE *))
{
    size_t written;

    num = PACKLONG(num);        /* Make the number portable */
    if ( (written = f_write(&num, sizeof(long), 1, fp)) != 1 )
        progerrno("Error writing %d of %d bytes: ", sizeof(long), written );
}

/*
** Jose Ruiz 04/00
** Read a portable long from a portable format
*/
unsigned long readlong(FILE * fp, size_t(*f_read) (void *, size_t, size_t, FILE *))
{
    unsigned long num;

    f_read(&num, sizeof(long), 1, fp);
    return UNPACKLONG(num);     /* Make the number readable */
}

/*
** 2003/10 Jose Ruiz
** Store a file offset in a portable format
*/
void    printfileoffset(FILE * fp, sw_off_t num, size_t(*f_write) (const void *, size_t, size_t, FILE *))
{
    size_t written;

    num = PACKFILEOFFSET(num);        /* Make the number portable */
    if ( (written = f_write(&num, sizeof(num), 1, fp)) != 1 )
        progerrno("Error writing %d of %d bytes: ", sizeof(num), written );
}

/*
** 2003/10Jose Ruiz
** Read a file offset from a portable format
*/
sw_off_t readfileoffset(FILE * fp, size_t(*f_read) (void *, size_t, size_t, FILE *))
{
    sw_off_t num;

    f_read(&num, sizeof(num), 1, fp);
    return UNPACKFILEOFFSET(num);     /* Make the number readable */
}



/****************************************************************************
*   Writing Properites  (not for USE_BTREE)
*
*   Properties are written sequentially to the .prop file.
*   Fixed length records of the seek position into the
*   property file are written sequentially to the main index (which is why
*   there's a separate .prop file).
*
*   DB_InitWriteProperties is called first time a property is written
*   to save the offset of the property index table in the main index.
*   It's simply a ftell() of the current position in the index and that
*   seek position is stored in the main index "offsets" table.
*
*   DB_WriteProperty writes a property.
*
*   DB_WritePropPositions write the seek pointers to the main index and
*   *must* be called after processing each file.
*   This is all done in WritePropertiesToDisk().
*
*   The index tables are all based on the count of properties in the index.
*   So, to read you find the start of the prop pointers table by the value
*   stored in the offsets table.  Since we have a fixed number of properties
*   we know the size of an entry in the prop pointers table, one record per filenum.
*   Index into the prop seek positions table and grab the pointers to the properties.
*
*
*****************************************************************************/


int     DB_InitWriteProperties_Native(void *db)
{
#ifndef USE_BTREE
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;

    DB->offsets[FILELISTPOS] = sw_ftell(DB->fp);

#ifdef DEBUG_PROP
    printf("InitWriteProperties: Start of property table in main index at offset: %ld\n", DB->offsets[FILELISTPOS] );
#endif

#endif

    return 0;
}


/****************************************************************************
*   Writes a property to the property file
*
*   Creates a PROP_INDEX structure in the file entry that caches all
*   the seek pointers into the .prop file, if it doesn't already exist.
*
*   Stores in the fi->prop_index structure the seek address of this property
*
*   Writes to the prop file:
*       <compressed length><saved_bytes><property (possibly compressed)>
*
*   compressed_length: length of the compressed property. If no compression was
*                      made this value is the real length
*   saved_bytes: number of bytes saved by the compression. If no compression
*                was made this value is 0.
*   property: buffer array containg the property (possibly compressed)
*
*   Important note: On entry to this routine, if uncompressed_len is zero, then
*                   no compression was made. So, saved_bytes is adjusted to zero
*
*****************************************************************************/

void    DB_WriteProperty_Native( IndexFILE *indexf, FileRec *fi, int propID, char *buffer, int buf_len, int uncompressed_len, void *db)
{
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
    size_t             written_bytes;
    PROP_INDEX      *pindex = fi->prop_index;
    PROP_LOCATION   *prop_loc;
    INDEXDATAHEADER *header = &indexf->header;
    int             count = header->property_count;
    int             index_size;
    int             propIDX = header->metaID_to_PropIDX[propID];
#ifdef DEBUG_PROP
    sw_off_t            prop_start_pos;
#endif
    int             saved_bytes;

    if ( count <= 0 )
        return;


    if (!DB->prop)
        progerr("Property database file not opened\n");


    /* Create place to store seek positions on first call for this file */
    if ( !pindex )
    {
        index_size = sizeof( PROP_INDEX ) + sizeof( PROP_LOCATION ) * (count - 1);
        pindex = fi->prop_index = emalloc( index_size );
        memset( pindex, 0, index_size );
    }


    /* make an alias */
    prop_loc = &pindex->prop_position[ propIDX ];


    /* Just to be sure */
    if ( !buf_len )
    {
        prop_loc->seek = (sw_off_t)0;
        return;
    }

    /* write the property to disk */

    if ((prop_loc->seek = sw_ftell(DB->prop)) == (sw_off_t) -1)
        progerrno("O/S failed to tell me where I am - file number %d metaID %d : ", fi->filenum, propID);


    /* First write the compressed size */
    /* should be smaller than uncompressed_len if any */
    /* NOTE: uncompressed_len is 0 if no compression was made */
    compress1( buf_len, DB->prop, putc);

    /* Second write the number of saved bytes */
    if( !uncompressed_len )   /* No compression */
        saved_bytes = 0;
    else
        saved_bytes = uncompressed_len - buf_len;
    /* Write them */
    compress1( saved_bytes, DB->prop, putc);

#ifdef DEBUG_PROP
    prop_start_pos = sw_ftell(DB->prop);
#endif



    if ( (int)(written_bytes = sw_fwrite(buffer, 1, buf_len, DB->prop)) != buf_len) /* Write data */
        progerrno("Failed to write file number %d metaID %d to property file.  Tried to write %d, wrote %Zu : ", fi->filenum, propID, buf_len,
                  written_bytes);


#ifdef DEBUG_PROP
    printf("Write Prop: file %d  PropIDX %d  (meta %d) seek: %ld ",
                fi->filenum, propIDX, propID, prop_loc->seek );

    printf("data=[uncompressed_len: %d (%ld bytes), prop_data: (%ld bytes)]\n",
            uncompressed_len, prop_start_pos - prop_loc->seek, (long)written_bytes);
#endif
}


/****************************************************************************
*   Writes out the seek positions for the properties
*
*   This writes out a fixed size records, one for each property.  Each
*   record is a list of <seek pos> entries, one for
*   each property defined.  seek_pos is null if this file doesn't have a
*   property.
*
*   The advantage of the fixed width records is that they can be written
*   to disk after each file, saving RAM, and more importanly, all the
*   files don't need to be read when searhing.  Can just seek to the
*   file of interest, read the table, then read the property file.
*
*   This comes at a cost of disk space (and maybe disk access speed),
*   since much of the data in the table written to disk could be compressed.
*
*****************************************************************************/
void DB_WritePropPositions_Native(IndexFILE *indexf, FileRec *fi, void *db)
{
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
    PROP_INDEX      *pindex = fi->prop_index;
    INDEXDATAHEADER *header = &indexf->header;
    int             count = header->property_count;
    int             index_size;
    int             i;
#ifdef DEBUG_PROP
    sw_off_t            start_seek;
#endif
#ifdef USE_BTREE
    sw_off_t            seek_pos;
#endif



    /* Just in case there were no properties for this file */
    if ( !pindex )
    {
        index_size = sizeof( PROP_INDEX ) + sizeof( PROP_LOCATION ) * (count - 1);
        pindex = fi->prop_index = emalloc( index_size );
        memset( pindex, 0, index_size );
    }

#ifdef USE_BTREE
    /* now calculate index */
    seek_pos = (sw_off_t)((sw_off_t)(fi->filenum - 1) * (sw_off_t)count);
#endif

#ifdef DEBUG_PROP
    printf("Writing seek positions to index for file %d\n", fi->filenum );
#endif


    /* Write out the prop index */
    for ( i = 0; i < count; i++ )
    {
        /* make an alias */
        PROP_LOCATION *prop_loc = &pindex->prop_position[ i ];

#ifndef USE_BTREE

#ifdef DEBUG_PROP
        start_seek = sw_ftell( DB->fp );
#endif

        /* Write in portable format */
        printfileoffset( DB->fp, prop_loc->seek, sw_fwrite );

#ifdef DEBUG_PROP
        printf("  PropIDX: %d  data=[seek: %ld]  main index location: %ld for %ld bytes (one print long)\n",
                 i,  prop_loc->seek, start_seek, sw_ftell( DB->fp ) - start_seek );
#endif


#else
        ARRAY_Put( DB->props_array,seek_pos++, prop_loc->seek);
#endif
    }

    efree( pindex );
    fi->prop_index = NULL;;
}

/****************************************************************************
*   Reads in the seek positions for the properties
*
*
*****************************************************************************/
void DB_ReadPropPositions_Native(IndexFILE *indexf, FileRec *fi, void *db)
{
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
    PROP_INDEX      *pindex = fi->prop_index;
    INDEXDATAHEADER *header = &indexf->header;
    int             count = header->property_count;
    int             index_size;
    sw_off_t        seek_pos;
    int             i;

    if ( count <= 0 )
        return;


    /* create a place to store them */

    index_size = sizeof( PROP_INDEX ) + sizeof( PROP_LOCATION ) * (count - 1);

    pindex = fi->prop_index = emalloc( index_size );
    memset( pindex, 0, index_size );


#ifndef USE_BTREE
    /* now calculate seek_pos */
    /* printlong currently always writes sizeof(long) bytes (usually 4 bytes
    ** for 32 bit architectures and 8 bytes for 64bit ones, so 4 or 8 bytes
    ** bytes are need for seek
    */
    seek_pos = (sw_off_t)(((sw_off_t)(fi->filenum - 1)) * ((sw_off_t)sizeof(sw_off_t)) * ((sw_off_t)(count))  + (sw_off_t)DB->offsets[FILELISTPOS]);


    /* and seek to table */
    if (sw_fseek(DB->fp, seek_pos, SEEK_SET) == -1)
        progerrno("Failed to seek to property index located at %ld for file number %d : ", seek_pos, fi->filenum);


#ifdef DEBUG_PROP
        printf("\nFetching seek positions for file %d\n", fi->filenum );
        printf(" property index table at %ld, this file at %ld\n", DB->offsets[FILELISTPOS], seek_pos );
#endif


    /* Read in the prop indexes */
    for ( i=0; i < count; i++ )
    {
#ifdef DEBUG_PROP
        sw_off_t    seek_start = sw_ftell( DB->fp );
#endif

        /* make an alias */
        PROP_LOCATION *prop_loc = &pindex->prop_position[ i ];

        prop_loc->seek = readfileoffset( DB->fp, sw_fread );

#ifdef DEBUG_PROP
        printf("   PropIDX: %d  data[Seek: %ld] at seek %ld read %ld bytes (one readlong)\n", i, prop_loc->seek, seek_start, sw_ftell( DB->fp ) - seek_start  );
#endif


    }
#else

    /* now calculate index */
    seek_pos = (sw_off_t)(((sw_off_t)(fi->filenum - 1)) * ((sw_off_t)count));

    /* Read in the prop indexes */
    for ( i=0; i < count; i++ )
    {
        /* make an alias */
        PROP_LOCATION *prop_loc = &pindex->prop_position[ i ];
        prop_loc->seek = ARRAY_Get(DB->props_array, seek_pos++);
    }
#endif
}



/****************************************************************************
*   Reads a property from the property file
*
*   Returns:
*       *char (buffer -- must be destoryed by caller)
*
*   Important note: On returning, *buf_len contains the compressed lenth,
*                   *uncompressed_length contains the real length or 0 if
*                   no compression was made
*
*****************************************************************************/
char   *DB_ReadProperty_Native(IndexFILE *indexf, FileRec *fi, int propID, int *buf_len, int *uncompressed_len, void *db)
{
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
    PROP_INDEX      *pindex = fi->prop_index;
    INDEXDATAHEADER *header = &indexf->header;
    int             count = header->property_count;
    sw_off_t        seek_pos, prev_seek_pos;
    int             propIDX;
    PROP_LOCATION   *prop_loc;
    char            *buffer;
    int             saved_bytes;


    propIDX = header->metaID_to_PropIDX[propID];

    if ( count <= 0 )
        return NULL;


    /* read in the index pointers if not already loaded */
    if ( !pindex )
    {
        DB_ReadPropPositions_Native( indexf, fi, db);
        pindex = fi->prop_index;
    }


    if ( !pindex )
        progerr("Failed to call DB_ReadProperty_Native with seek positions");

    prop_loc = &pindex->prop_position[ propIDX ];

    seek_pos = pindex->prop_position[propIDX].seek;

    /* Any for this metaID? */
    if (!seek_pos )
    {
        *buf_len = 0;
        return NULL;
    }

    /* Preserve seek_pos */
    prev_seek_pos = sw_ftell(DB->prop);

    if (sw_fseek(DB->prop, seek_pos, SEEK_SET) == -1)
        progerrno("Failed to seek to properties located at %ld for file number %d : ", seek_pos, fi->filenum);

#ifdef DEBUG_PROP
    printf("Fetching filenum: %d propIDX: %d at seek: %ld\n", fi->filenum, propIDX, seek_pos);
#endif


    /* read compressed size (for use in zlib uncompression) */
    *buf_len = uncompress1( DB->prop, sw_fgetc );

    /* Get the uncompressed size */
    saved_bytes = uncompress1( DB->prop, sw_fgetc );

    /* If saved_bytes is 0 there was not any compression */
    if( !saved_bytes )             /* adjust *uncompressed_len */
        *uncompressed_len = 0;    /* This value means no compression */
    else
        *uncompressed_len = *buf_len + saved_bytes;


#ifdef DEBUG_PROP
    printf(" Fetched uncompressed length of %d (%ld bytes storage), now fetching %ld prop bytes from %ld\n",
             *uncompressed_len, sw_ftell( DB->prop ) - seek_pos, *buf_len, sw_ftell( DB->prop ) );
#endif


    /* allocate a read buffer */
    buffer = emalloc(*buf_len);


    if ( (int)sw_fread(buffer, 1, *buf_len, DB->prop) != *buf_len)
        progerrno("Failed to read properties located at %ld for file number %d : ", seek_pos, fi->filenum);

    /* Restore previous seek_pos */
    sw_fseek(DB->prop, prev_seek_pos,SEEK_SET);

    return buffer;
}


/****************************************************************
*  This routine closes the property file and reopens it as
*  readonly to improve seek times.
*  Note: It does not rename the property file.
*****************************************************************/

void    DB_Reopen_PropertiesForRead_Native(void *db)
{
    struct Handle_DBNative *DB = (struct Handle_DBNative *) db;
    int     no_rename = 0;
    char   *s = estrdup(DB->cur_prop_file);

    /* Close property file */
    DB_Close_File_Native(&DB->prop, &DB->cur_prop_file, &no_rename);


    if (!(DB->prop = openIndexFILEForRead(s)))
        progerrno("Couldn't open the property file \"%s\": ", s);

    DB->cur_prop_file = s;
}



#ifdef USE_BTREE


int    DB_WriteTotalWordsPerFile_Native(SWISH *sw, int idx, int wordcount, void *db)
{
   struct Handle_DBNative *DB = (struct Handle_DBNative *) db;

   ARRAY_Put(DB->totwords_array,idx,wordcount);

   return 0;
}


int     DB_ReadTotalWordsPerFile_Native(SWISH *sw, int index, int *value, void *db)
{
   struct Handle_DBNative *DB = (struct Handle_DBNative *) db;

   *value = ARRAY_Get((ARRAY *)DB->totwords_array,index);
   return 0;
}


#endif
��������������������������������������������������swish-e-2.4.7/src/http.c����������������������������������������������������������������������������0000664�0000771�0001750�00000051153�11166010110�011732� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
** Copyright (C) 1995, 1996, 1997, 1998 Hewlett-Packard Company
** Originally by Kevin Hughes, kev@kevcom.com, 3/11/94
**

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 15:51:39 CDT 2005
** added GPL

**--------------------------------------------------------------------
** All the code in this file added by Ron Klachko ron@ckm.ucsf.edu 9/98
** 
** change sprintf to snprintf to avoid corruption,
** test length of spiderdirectory before strcat to avoid corruption,
** added safestrcpy() macro to avoid corruption from strcpy overflow,
** define MAXPIDLEN instead of literal "32" - assumed return length from lgetpid()
** SRE 11/17/99
**
** added buffer size arg to grabStringValue - core dumping from overrun
** SRE 2/22/00
**
** 2000-11   jruiz,rasc  some redesign
*/

/*
** http.c
*/

#ifdef HAVE_CONFIG_H
#include "acconfig.h"
#endif

#ifdef HAVE_UNISTD_H
#include <unistd.h>
#endif

#ifdef HAVE_STDLIB_H
#include <stdlib.h>
#endif

#ifdef HAVE_PROCESS_H
#include <process.h>
#endif

#include <time.h>
#include <stdarg.h>

// for wait
#include <sys/types.h>
#ifdef HAVE_SYS_WAIT_H
#include <sys/wait.h>
#endif

#include "swish.h"
#include "mem.h"
#include "swstring.h"
#include "index.h"
#include "hash.h"
#include "file.h"
#include "check.h"
#include "error.h"
#include "list.h"

#include "http.h"
#include "httpserver.h"

#include "xml.h"
#include "txt.h"
#include "html.h"

#include "filter.h"
/*
  -- init structures for this module
*/

void    initModule_HTTP(SWISH * sw)
{
    struct MOD_HTTP *http;
    int     i;

    char *execdir = get_libexec();

    http = (struct MOD_HTTP *) emalloc(sizeof(struct MOD_HTTP));

    sw->HTTP = http;

    http->lenspiderdirectory = strlen(execdir); 
    http->spiderdirectory = (char *) emalloc(http->lenspiderdirectory + 1);
    strcpy( http->spiderdirectory, execdir );
    efree( execdir );

    for (i = 0; i < BIGHASHSIZE; i++)
        http->url_hash[i] = NULL;

    http->equivalentservers = NULL;

    /* http default system parameters */
    http->maxdepth = 0;
    http->delay = DEFAULT_HTTP_DELAY;
}

void    freeModule_HTTP(SWISH * sw)
{
    struct MOD_HTTP *http = sw->HTTP;

    if (http->spiderdirectory)
        efree(http->spiderdirectory);
    efree(http);
    sw->HTTP = NULL;
}

int     configModule_HTTP(SWISH * sw, StringList * sl)
{
    struct MOD_HTTP *http = sw->HTTP;
    char   *w0 = sl->word[0];
    int     retval = 1;

    int     i;
    struct multiswline *list;
    struct swline *slist;

    if (strcasecmp(w0, "maxdepth") == 0)
    {
        if (sl->n == 2)
        {
            retval = 1;
            http->maxdepth = atoi(sl->word[1]);
        }
        else
            progerr("MaxDepth requires one value");
    }
    else if (strcasecmp(w0, "delay") == 0)
    {
        if (sl->n == 2)
        {
            retval = 1;
            http->delay = atoi(sl->word[1]);
        }
        else
            progerr("Delay requires one value");
    }
    else if (strcasecmp(w0, "spiderdirectory") == 0)
    {
        if (sl->n == 2)
        {
            retval = 1;
            http->spiderdirectory = erealloc( http->spiderdirectory, strlen(sl->word[1])+2);
            strcpy( http->spiderdirectory, sl->word[1] );
            normalize_path( http->spiderdirectory );
            

            if (!isdirectory(http->spiderdirectory))
            {
                progerr("SpiderDirectory. %s is not a directory", http->spiderdirectory);
            }
        }
        else
            progerr("SpiderDirectory requires one value");
    }
    else if (strcasecmp(w0, "equivalentserver") == 0)
    {
        if (sl->n > 1)
        {
            retval = 1;
            /* Add a new list of equivalent servers */
            list = (struct multiswline *) emalloc(sizeof(struct multiswline));

            list->next = http->equivalentservers;
            list->list = 0;
            http->equivalentservers = list;

            for (i = 1; i < sl->n; i++)
            {
                /* Add a new entry to this list */
                slist = newswline(sl->word[i]);
                slist->next = list->list;
                list->list = slist;
            }

        }
        else
            progerr("EquivalentServers requires at least one value");
    }
    else
    {
        retval = 0;
    }

    return retval;
}
typedef struct urldepth
{
    char   *url;
    int     depth;
    struct urldepth *next;
}
urldepth;


int     http_already_indexed(SWISH * sw, char *url);
urldepth *add_url(SWISH * sw, urldepth * list, char *url, int depth, char *baseurl);


urldepth *add_url(SWISH * sw, urldepth * list, char *url, int depth, char *baseurl)
{
    urldepth *item;
    struct MOD_HTTP *http = sw->HTTP;


    if (!equivalentserver(sw, url, baseurl))
    {
        if (sw->verbose >= 3)
            printf("Skipping %s:  %s\n", url, "Wrong method or server.");


    }
    else if (http->maxdepth && (depth >= http->maxdepth))
    {
        if (sw->verbose >= 3)
            printf("Skipping %s:  %s\n", url, "Too deep.");
    }
    else if (sw->nocontentslist && isoksuffix(url, sw->nocontentslist))
    {
        if (sw->verbose >= 3)
            printf("Skipping %s: %s\n", url, "Wrong suffix.");

    }
    else if (urldisallowed(sw, url))
    {
        if (sw->verbose >= 3)
            printf("Skipping %s:  %s\n", url, "URL disallowed by robots.txt.");
    }
    else if (!http_already_indexed(sw, url))
    {
        item = (urldepth *) emalloc(sizeof(urldepth));
        item->url = estrdup(url);
        item->depth = depth;
#if 0
        /* Depth first searching
           * */
        item->next = list;
        list = item;
#else
        /* Breadth first searching
           * */
        item->next = 0;
        if (!list)
        {
            list = item;
        }
        else
        {
            urldepth *walk;

            for (walk = list; walk->next; walk = walk->next)
            {
            }
            walk->next = item;
        }
#endif
    }

    return list;
}


/* Have we already indexed a file or directory?
** This function is used to avoid multiple index entries
** or endless looping due to symbolic links.
*/

int     http_already_indexed(SWISH * sw, char *url)
{
    struct url_info *p;

    int     len;
    unsigned hashval;
    struct MOD_HTTP *http = sw->HTTP;

    /* Hash with via the uri alone.  Depending on the equivalent
       ** servers, we may or may not make the decision of the entire
       ** url or just the uri.
     */
    hashval = bighash(url_uri(url, &len)); /* Search hash for this file. */
    for (p = http->url_hash[hashval]; p != NULL; p = p->next)
        if ((strcmp(url, p->url) == 0) || (equivalentserver(sw, url, p->url) && (strcmp(url_uri(url, &len), url_uri(p->url, &len)) == 0)))
        {                       /* We found it. */
            if (sw->verbose >= 3)
                printf("Skipping %s:  %s\n", url, "Link already processed.");
            return 1;
        }

    /* Not found, make new entry. */
    p = (struct url_info *) emalloc(sizeof(struct url_info));

    p->url = estrdup(url);
    p->next = http->url_hash[hashval];
    http->url_hash[hashval] = p;

    return 0;
}


char   *url_method(char *url, int *plen)
{
    char   *end;

    if ((end = strstr(url, "://")) == NULL)
    {
        return NULL;
    }
    *plen = end - url;
    return url;
}


char   *url_serverport(char *url, int *plen)
{
    int     methodlen;
    char   *serverstart;
    char   *serverend;

    if (url_method(url, &methodlen) == NULL)
    {
        return NULL;
    }

    /* +3 for 
       * */
    serverstart = url + methodlen + 3;
    if ((serverend = strchr(serverstart, '/')) == NULL)
    {
        *plen = strlen(serverstart);
    }
    else
    {
        *plen = serverend - serverstart;
    }

    return serverstart;
}


char   *url_uri(char *url, int *plen)
{
    if ((url = url_serverport(url, plen)) == 0)
    {
        return 0;
    }
    url += *plen;
    *plen = strlen(url);
    return url;
}
/************************************************************
*
* Fork and exec a program, and wait for child to exit.
* Returns
*
*************************************************************/
#ifdef HAVE_WORKING_FORK
static void run_program(char* prog, char** args)
{
    pid_t pid = fork();
    int   status;

    /* In parent, wait for child */
#ifdef HAVE_SYS_WAIT_H
    if ( pid )
    {
        wait( &status );
        if ( WIFEXITED( status ) ) // exited normally if non-zero
            return;

        progerr("%s exited with non-zero status (%d)", prog, WEXITSTATUS(status) );
    }
#endif /* HAVE_SYS_WAIT_H */

    execvp (prog, args);
    progerrno("Failed to fork '%s'. Error: ", prog );
}
#endif

/************************************************************
*
* Fetch a URL
* Side effect that it appends to "response_file"
*  -- lazy programmer hoping that -S http will go away...
*
*  Under Windows system() is used to call "perl"
*  Otherwise, exec is called on the swishspider program
*
*************************************************************/

int get(SWISH * sw, char *contenttype_or_redirect, time_t *last_modified, time_t * plastretrieval, char *file_prefix, char *url)
{
    int     code = 500;
    FILE   *fp;
    struct MOD_HTTP *http = sw->HTTP;

    /* Build path to swishspider program */
    char   *spider_prog = emalloc( strlen(http->spiderdirectory) + strlen("swishspider+fill") );
    sprintf(spider_prog, "%s/swishspider", http->spiderdirectory ); // note that spiderdir MUST be set.  

    /* Sleep a little so we don't overwhelm the server */
    if (  *plastretrieval && (time(0) - *plastretrieval) < http->delay)
    {
        int     num_sec = http->delay - (time(0) - *plastretrieval);
        if ( sw->verbose >= 3 )
            printf("sleeping %d seconds before fetching %s\n", num_sec, url);
#ifdef _WIN32
        _sleep(num_sec); 
#else
        sleep(num_sec);
#endif
    }

    *plastretrieval = time(0);

    if ( sw->verbose >= 3 )
        printf("Now fetching [%s]...", url );

    
#ifndef HAVE_WORKING_FORK
    /* Should be in autoconf or obsoleted by extprog. - DLN 2001-11-05  */
    {
        int     retval;
        char    commandline[] = "perl \"%s\" \"%s\" \"%s\"";
        char   *command = emalloc( strlen(commandline) + strlen(spider_prog) + strlen(file_prefix) + strlen(url) + 1 );

        sprintf(command, commandline, spider_prog, file_prefix, url);

        retval = system( command );
        efree( command );
        efree( spider_prog );

        if ( retval )
            return 500;
    }
#else
    {
        char *args[4];

        args[0] = spider_prog;
        args[1] = file_prefix;
        args[2] = url;
        args[3] = NULL;
        run_program( spider_prog, args );
        efree( spider_prog );
    }
#endif

    /* Probably better to have Delay be time between requests since some docs may take more than Delay seconds to fetch */
    *plastretrieval = time(0);
    

    /* NAUGHTY SIDE EFFECT */
    strcat( file_prefix, ".response" );
    
    if ( !(fp = fopen(file_prefix, F_READ_TEXT)) )
    {
        progerrno("Failed to open file '%s': ", file_prefix );
    }
    else
    {
        char buffer[500];
    
        fgets(buffer, 400, fp);
        code = atoi(buffer);
        if ((code == 200) || ((code / 100) == 3))
        {
            /* read content-type  redirect */
            fgets(contenttype_or_redirect, MAXSTRLEN, fp);  /* more yuck */
            *(contenttype_or_redirect + strlen(contenttype_or_redirect) - 1) = '\0';
        }


        if (code == 200)
        {
            /* read last-mod time */
            fgets(buffer, 400, fp);  /* more yuck */
            *last_modified = (time_t)strtol(buffer, NULL, 10);  // go away http.c -- no error checking
        }


        fclose(fp);
    }

    if ( sw->verbose >= 3 ) 
        printf("Status: %d. %s\n", code, contenttype_or_redirect );

    return code;
}

int     cmdf(int (*cmd) (const char *), char *fmt, char *string, pid_t pid)
{
    int     rc;
    char   *buffer;

    buffer = emalloc(strlen(fmt) + strlen(string) + sizeof(pid_t) * 8 + 1);

    sprintf(buffer, fmt, string, pid);

    rc = cmd(buffer);
    efree(buffer);
    return rc;
}

char   *readline(FILE * fp)
{
    static char *buffer = 0;
    static int buffersize = 512;

    if (buffer == 0)
    {
        buffer = (char *) emalloc(buffersize);
    }
    /*
       *Try to read in the line
     */

    if (fgets(buffer, buffersize, fp) == NULL)
    {
        return NULL;
    }

    /*
       * Make sure we read the entire line.  If not, double the buffer
       * size and try to read the rest
     */
    while (buffer[strlen(buffer) - 1] != '\n')
    {
        buffer = (char *) erealloc(buffer, buffersize * 2);

        /*
           * The easiest way to verify that this line is okay is to consider
           * the situation where the buffer is 2 bytes longs.  Since fgets()
           * always guarantees to put the trailing NULL, it will have essentially
           * used only 1 bytes.  We double it to four, so we now have the left
           * over byte (that currently contains NULL) in addition to the doubling
           * which gets us to read buffersize + 1.
         */
        if (fgets(buffer + buffersize - 1, buffersize + 1, fp) == 0)
        {
            break;
        }
        buffersize *= 2;
    }

    return buffer;
}


/* A local version of getpid() so that we don't have to suffer
** a system call each time we need it.
*/
pid_t   lgetpid()
{
    static pid_t pid = -1;

    if (pid == -1)
    {
        pid = getpid();
    }
    return pid;
}

#if 0

/* Testing the robot rules parsing code...
**/
void    http_indexpath(char *url)
{
    httpserverinfo *server = getserverinfo(url);
    robotrules *robotrule;

    printf("User-agent: %s\n", server->useragent ? server->useragent : "(none)");
    for (robotrule = server->robotrules; robotrule; robotrule = robotrule->next)
    {
        printf("Disallow: %s\n", robotrule->disallow);
    }
}

#else

/********************************************************/
/*					"Public" functions					*/
/********************************************************/

/* The main entry point for the module.  For fs.c, decides whether this
** is a file or directory and routes to the correct routine.
*/
void    http_indexpath(SWISH * sw, char *url)
{
    urldepth *urllist = 0;
    urldepth *item;
    static int lentitle = 0;
    static char *title = NULL;
    char   *tmptitle;
    static int lencontenttype = 0;
    static char *contenttype = NULL;
    int     code;
    time_t  last_modified = 0;

    httpserverinfo *server;
    char   *link;
    char   *p;
    FileProp *fprop;
    FILE   *fp;
    struct MOD_Index *idx = sw->Index;

    char   *file_prefix;  // prefix for use with files written by swishspider -- should just be on the stack!
    char   *file_suffix;  // where to copy the suffix


    /* Initialize buffers */


    file_prefix = emalloc( strlen(idx->tmpdir) + MAXPIDLEN + strlen("/swishspider@.contents+fill") );
    sprintf(file_prefix, "%s/swishspider@%ld", idx->tmpdir, (long) lgetpid());
    file_suffix = file_prefix + strlen( file_prefix );
    

    if (!lentitle) {
        title = emalloc((lentitle = MAXSTRLEN) + 1);
        *title = '\0';
    }

    if (!lencontenttype) {
        contenttype = emalloc((lencontenttype = MAXSTRLEN) + 1);
        *contenttype = '\0';
    }


    /* prime the pump with the first url */
    urllist = add_url(sw, urllist, url, 0, url);



    /* retrieve each url and add urls to a certain depth */

    while (urllist)
    {
        item = urllist;
        urllist = urllist->next;

        if (sw->verbose >= 2)
        {
            printf("retrieving %s (%d)...\n", item->url, item->depth);
            fflush(stdout);
        }

        /* We don't check if this url is legal here, because we do that before adding to the list. */
        server = getserverinfo(sw, item->url);

        strcpy( file_suffix, "" );  // reset to just the prefix

        if ((code = get(sw, contenttype, &last_modified, &server->lastretrieval, file_prefix, item->url)) == 200)
        {
            FilterList *filter_list = hasfilter(sw, item->url);  /* check to see if there's a filter */
            
            /* Set the file_prefix to be the path to "contents" */
            strcpy( file_suffix, ".contents" );

            
            /* Patch from Steve van der Burg */
            /* change from strcmp to strncmp */


            /* Fetch title from doc if it's HTML */

            if (strncmp(contenttype, "text/html", 9) == 0)
                title = SafeStrCopy(title, (char *) (tmptitle = parseHTMLtitle(sw , file_prefix)), &lentitle);
            else
                if ((p = strrchr(item->url, '/')))
                    title = SafeStrCopy(title, p + 1, &lentitle);
                else
                    title = SafeStrCopy(title, item->url, &lentitle);


            /* Now index the file */

            /* What to do with non text files?? */
            /* This never worked correctly.  Used to set fprop->index_no_content if it wasn't a text type of file. */
            /* That forced indexing of only the path name for say a PDF file.  But although that also allowed files */
            /* to be processed by FileFilter filters, the index_no_content still forced indexing of only file names, */
            /* thus making the filters worthless.  But without the index_no_contents it would index all files, includeing binary files. */
            /* Two solutions: 1: set a flag that only should index the file if a filters is setup for it, or */
            /*                2: do filtering in swishspider.  That's a better option. */

            /* Nov 14, 2002 - well, do both */

            if ( filter_list || strncmp(contenttype, "text/", 5) == 0 )            {
                if (sw->verbose >= 4)
                    printf("Indexing %s:  Content type: %s. %s\n",
                        item->url, contenttype, filter_list ? "(filtered)" : "");
 
                
                fprop = file_properties(item->url, file_prefix, sw);
                fprop->mtime = last_modified;

                

                /* only index contents of text docs */
                // this would just index the path name
                // but also tossed away output from filters.
                // fprop->index_no_content = strncmp(contenttype, "text/", 5);

                do_index_file(sw, fprop);
                free_file_properties(fprop);
            }
            else if (sw->verbose >= 3)
                printf("Skipping %s:  Wrong content type: %s.\n", item->url, contenttype);
            



            /* add new links as extracted by the spider */

            if (strncmp(contenttype, "text/html", 9) == 0)
            {
                strcpy( file_suffix, ".links" );
            
                if ((fp = fopen(file_prefix, F_READ_TEXT)) != NULL)
                {
                    /* URLs can get quite large so don't depend on a fixed size buffer */
                
                    while ((link = readline(fp)) != NULL)
                    {
                        *(link + strlen(link) - 1) = '\0';
                        urllist = add_url(sw, urllist, link, item->depth + 1, url);
                    }
                    fclose(fp);
                }
            }

        }
        else if ((code / 100) == 3)
        {
            if ( *contenttype )
                urllist = add_url(sw, urllist, contenttype, item->depth, url);
            else
                if (sw->verbose >= 3)
                    printf("URL '%s' returned redirect code %d without a Location.\n", url, code);
        }



        /* Clean up the files left by swishspider */
        cmdf(unlink, "%s/swishspider@%ld.response", idx->tmpdir, lgetpid());
        cmdf(unlink, "%s/swishspider@%ld.contents", idx->tmpdir, lgetpid());
        cmdf(unlink, "%s/swishspider@%ld.links", idx->tmpdir, lgetpid());
    }
    efree(file_prefix);
}

#endif




struct _indexing_data_source_def HTTPIndexingDataSource = {
    "HTTP-Crawler",
    "http",
    http_indexpath,
    configModule_HTTP
};
���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/metanames.c�����������������������������������������������������������������������0000664�0000771�0001750�00000046737�11166010110�012741� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������
/*
   -- This module does metaname handling for swish-e
   --
   
   
    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.

** Mon May  9 14:59:32 CDT 2005
** added GPL


   -- 2001-02-12 rasc    minor changes, concering the tolower problem
                 (unsigned char) problem!!!

*/




#include "swish.h"
#include "mem.h"
#include "merge.h"
#include "swstring.h"
#include "search.h"
#include "docprop.h"
#include "metanames.h"
#include "headers.h"
#include "dump.h"
#include "error.h"


typedef struct
{
    char   *metaName;
    int     metaType;           /* see metanames.h for values. All values must be "ored" */
}
defaultMetaNames;


/**************************************************************************
*   List of *internal* meta names
*
*   Note:
*       Removing any of these will prevent access for result output
*       Removing any of the "real" meta names will also prevent storing
*       of the data in the property file.
*       That is, they may be commented out and then selected in the
*       configuration file as needed.
*       Hard to imagine not wanting the doc path!
*
***************************************************************************/


static defaultMetaNames SwishDefaultMetaNames[] = {

    /* This is the default meta ID ( number 1 ) that plain text is stored as */
    { AUTOPROPERTY_DEFAULT,      META_INDEX },  /* REQUIRED */


    /* These are the "internal" meta names generated at search time they are all required */
    { AUTOPROPERTY_REC_COUNT,    META_PROP | META_INTERNAL | META_NUMBER },
    { AUTOPROPERTY_RESULT_RANK,  META_PROP | META_INTERNAL | META_NUMBER },
    { AUTOPROPERTY_FILENUM,      META_PROP | META_INTERNAL | META_NUMBER },
    { AUTOPROPERTY_INDEXFILE,    META_PROP | META_INTERNAL | META_STRING },

    /* These meta names "real" meta names that are available by default */
    /* These can be commented out (e.g. to save disk space) and added back in with PropertyNames */
    { AUTOPROPERTY_DOCPATH,      META_PROP | META_STRING },
    { AUTOPROPERTY_TITLE,        META_PROP | META_STRING | META_IGNORE_CASE },
    { AUTOPROPERTY_DOCSIZE,      META_PROP | META_NUMBER},
    { AUTOPROPERTY_LASTMODIFIED, META_PROP | META_DATE},
 // { AUTOPROPERTY_SUMMARY,      META_PROP | META_STRING},
 // { AUTOPROPERTY_STARTPOS,     META_PROP | META_NUMBER},  // should be added only if LST is selected
};

/* Add the Internal swish metanames to the index file structure */
void    add_default_metanames(IndexFILE * indexf)
{
    int     i;

    for (i = 0; i < (int)(sizeof(SwishDefaultMetaNames) / sizeof(SwishDefaultMetaNames[0])); i++)
        addMetaEntry(&indexf->header, SwishDefaultMetaNames[i].metaName, SwishDefaultMetaNames[i].metaType, 0);
}



/**************************************************************************
*   These next routines add a new property/metaname to the list
*
*
***************************************************************************/



/* Add an entry to the metaEntryArray if one doesn't already exist */
/* Nov 2004 -- looks like this was suppose to be a way to either create or
 * update an existing meta.  Probably a good idea once once parse_conffile.c is rewritten
 */


struct metaEntry *addMetaEntry(INDEXDATAHEADER *header, char *metaname, int metaType, int metaID)
{
    struct metaEntry *tmpEntry = NULL;
    char *metaWord;

    if (metaname == NULL || metaname[0] == '\0')
        progerr("internal error - called addMetaEntry without a name");


    metaWord = estrdup( metaname );
    strtolower(metaWord);


    /* See if there is a previous metaname with the same name */
//    tmpEntry = metaType & META_PROP
//               ? getPropNameByName(header, metaWord)
//               : getMetaNameByName(header, metaWord);


    if (!tmpEntry)              /* metaName not found - Create a new one */
        tmpEntry = addNewMetaEntry( header, metaWord, metaType, metaID);

    else
        /* This allows adding Numeric or Date onto an existing property. */
        /* Probably not needed */
        tmpEntry->metaType |= metaType;


    efree( metaWord );

    return tmpEntry;

}

/*********************************************************************************
* create_meta_entry -- malloc's a new metaEntry and initializes it
*
*********************************************************************************/

static struct metaEntry *create_meta_entry( char *name )
{
    struct metaEntry *newEntry = (struct metaEntry *) emalloc(sizeof(struct metaEntry));

    memset(newEntry, 0, sizeof(struct metaEntry));
    newEntry->metaName = (char *) estrdup( name );
    newEntry->sort_len = MAX_SORT_STRING_LEN;  /* default for sorting strings */
    return newEntry;
}

/************************************************************************************
* addNewMetaEntry -- Adds a new meta entry to the index header.
* Will create the headers meta array if doesn't exist, otherwise grow it.
* Always adds; does not protect against duplicates.
*
* Returns: *metaEntry (never fails)
*
************************************************************************************/

struct metaEntry *addNewMetaEntry(INDEXDATAHEADER *header, char *metaWord, int metaType, int metaID)
{
    int    metaCounter = header->metaCounter;
    struct metaEntry *newEntry;
    struct metaEntry **metaEntryArray = header->metaEntryArray;
    newEntry = create_meta_entry( metaWord );

    newEntry->metaType = metaType;

    /* If metaID is 0 assign a value using metaCounter */
    /* Loaded stored metanames from index specifically sets the metaID */

    newEntry->metaID = metaID ? metaID : metaCounter + 1;


    /* Create or enlarge the array, as needed */
    if(! metaEntryArray)
    {
        metaEntryArray = (struct metaEntry **) emalloc(sizeof(struct metaEntry *));
        metaCounter = 0;
    }
    else
        metaEntryArray = (struct metaEntry **) erealloc(metaEntryArray,(metaCounter + 1) * sizeof(struct metaEntry *));


    /* And save it in the array */
    metaEntryArray[metaCounter++] = newEntry;

    /* Now update the header */
    header->metaCounter = metaCounter;
    header->metaEntryArray = metaEntryArray;

    return newEntry;
}

/***************************************************************************
* cloneMetaEntry -- creates a meta entry from another meta making 
* sure all attributes are copies.  Alias is not copied
*
* Only really useful for merge where metas are copied from one index to 
* another.
*
* Doesn't fail
*
****************************************************************************/

struct metaEntry *cloneMetaEntry(INDEXDATAHEADER *header, struct metaEntry *meta )
{
    struct metaEntry *new_meta = addNewMetaEntry( header, meta->metaName, meta->metaType, 0 );
    if ( !new_meta )
        return NULL;

    /* Copy important attributes */
    new_meta->rank_bias = meta->rank_bias;
    new_meta->sort_len  = meta->sort_len;

    new_meta->max_len   = meta->max_len;  /* not needed when merging */

    /* 
     * other attribues, extractpath_default, sorted_data, sorted_loaded, in_tag
     * are not needed when merging and/or cannot be copied
     */

    return new_meta;

}

/**************************************************************************
*   Clear in_tag flags on all metanames
*   The flags are used for indexing
*
***************************************************************************/


/** Lookup META_INDEX -- these only return meta names, not properties **/

void ClearInMetaFlags(INDEXDATAHEADER * header)
{
    int     i;

    for (i = 0; i < header->metaCounter; i++)
        header->metaEntryArray[i]->in_tag = 0;
}




/**************************************************************************
*   Initialize the property mapping array
*   Used to get the property seek pointers from the index file
*
*   THIS IS TEMPORARY until I break up the metanames and properties
*
*   This just creates two arrays to map metaIDs between property index numbers.
*
***************************************************************************/

void init_property_list(INDEXDATAHEADER *header)
{
    int i;

    /* only needs to be called one time */
    if ( header->property_count )
        return;

    if ( header->propIDX_to_metaID )
        progerr("Called init_property_list with non-null header->propIDX_to_metaID");

    if ( !header->metaCounter )
    {
        header->property_count = -1;
        return;
    }


    header->propIDX_to_metaID = emalloc( (1 + header->metaCounter) * sizeof( int ) );
    header->metaID_to_PropIDX = emalloc( (1 + header->metaCounter) * sizeof( int ) );

    for (i = 0; i < header->metaCounter; i++)
    {
        if (is_meta_property(header->metaEntryArray[i]) && !header->metaEntryArray[i]->alias && !is_meta_internal(header->metaEntryArray[i]) )
        {
            header->metaID_to_PropIDX[header->metaEntryArray[i]->metaID] = header->property_count;
            header->propIDX_to_metaID[header->property_count++] = header->metaEntryArray[i]->metaID;
        }
        else
            header->metaID_to_PropIDX[header->metaEntryArray[i]->metaID] = -1;
    }

    if ( !header->property_count )
        header->property_count = -1;
}




/**************************************************************************
*   These routines lookup either a property or a metaname
*   by its ID or name
*
*   The routines only look at either properites or metanames
*
*   Note: probably could save a bit by just saying that if not META_PROP then
*   it's a a meta index entry.  In otherwords, the type flag of zero could mean
*   META_INDEX, otherwise it's a PROPERTY.  $$$ todo...
*
*
***************************************************************************/


/** Lookup META_INDEX -- these only return meta names, not properties **/

struct metaEntry *getMetaNameByNameNoAlias(INDEXDATAHEADER * header, char *word)
{
    int     i;

    for (i = 0; i < header->metaCounter; i++)
        if (is_meta_index(header->metaEntryArray[i]) && !strcasecmp(header->metaEntryArray[i]->metaName, word))
            return header->metaEntryArray[i];

    return NULL;
}


/* Returns the structure associated with the metaName if it exists
*  Requests for Aliased names returns the base meta entry, not the alias meta entry.
*  Note that on a alias it checks the *alias*'s type, so it must match.
*/

struct metaEntry *getMetaNameByName(INDEXDATAHEADER * header, char *word)
{
    int     i;

    for (i = 0; i < header->metaCounter; i++)
        if (is_meta_index(header->metaEntryArray[i]) && !strcasecmp(header->metaEntryArray[i]->metaName, word))
            return header->metaEntryArray[i]->alias
                   ? getMetaNameByID( header, header->metaEntryArray[i]->alias )
                   : header->metaEntryArray[i];

    return NULL;
}


/* Returns the structure associated with the metaName ID if it exists
*/

struct metaEntry *getMetaNameByID(INDEXDATAHEADER *header, int number)
{
    int     i;

    for (i = 0; i < header->metaCounter; i++)
    {
        if (is_meta_index(header->metaEntryArray[i]) && number == header->metaEntryArray[i]->metaID)
            return header->metaEntryArray[i];
    }
    return NULL;
}



/** Lookup META_PROP -- these only return properties **/

struct metaEntry *getPropNameByNameNoAlias(INDEXDATAHEADER * header, char *word)
{
    int     i;

    for (i = 0; i < header->metaCounter; i++)
        if (is_meta_property(header->metaEntryArray[i]) && !strcasecmp(header->metaEntryArray[i]->metaName, word))
            return header->metaEntryArray[i];

    return NULL;
}


/* Returns the structure associated with the metaName if it exists
*  Requests for Aliased names returns the base meta entry, not the alias meta entry.
*  Note that on a alias it checks the *alias*'s type, so it must match.
*/

struct metaEntry *getPropNameByName(INDEXDATAHEADER * header, char *word)
{
    int     i;


    for (i = 0; i < header->metaCounter; i++)
        if (is_meta_property(header->metaEntryArray[i]) && !strcasecmp(header->metaEntryArray[i]->metaName, word))
            return header->metaEntryArray[i]->alias
                   ? getPropNameByID( header, header->metaEntryArray[i]->alias )
                   : header->metaEntryArray[i];

    return NULL;
}


/* Returns the structure associated with the metaName ID if it exists
*/

struct metaEntry *getPropNameByID(INDEXDATAHEADER *header, int number)
{
    int     i;

    for (i = 0; i < header->metaCounter; i++)
    {
        if (is_meta_property(header->metaEntryArray[i]) && number == header->metaEntryArray[i]->metaID)
            return header->metaEntryArray[i];
    }

    return NULL;
}





/* This is really used to check for seeing which internal metaname is being requested */

int is_meta_entry( struct metaEntry *meta_entry, char *name )
{
    return strcasecmp( meta_entry->metaName, name ) == 0;
}


/**************************************************************************
*   Free list of MetaEntry's
*
***************************************************************************/



/* Free meta entries for an index file */

void   freeMetaEntries( INDEXDATAHEADER *header )
{
    int i;

    /* Make sure there are meta names assigned */
    if ( !header->metaCounter )
        return;


    /* should the elements be set to NULL? */
    for( i = 0; i < header->metaCounter; i++ )
    {
        struct metaEntry *meta = header->metaEntryArray[i];

        efree( meta->metaName );

#ifndef USE_PRESORT_ARRAY
        if ( meta->sorted_data)
            efree( meta->sorted_data );
#endif

        if ( meta->extractpath_default )
            efree( meta->extractpath_default );


        efree( meta );
    }

    /* And free the pointer to the list */
    efree( header->metaEntryArray);
    header->metaEntryArray = NULL;
    header->metaCounter = 0;
}


/**************************************************************************
*   Check if should bump word position on this meta name
*
***************************************************************************/


int isDontBumpMetaName( struct swline *tmplist, char *tag)
{
char *tmptag;

    if (!tmplist) return 0;
    if (strcmp(tmplist->line,"*")==0) return 1;

    tmptag=estrdup(tag);
    tmptag=strtolower(tmptag);
    while(tmplist)
    {

        if( strcasecmp(tmptag,tmplist->line)==0 )
        {
            efree(tmptag);
            return 1;
        }
        tmplist=tmplist->next;
    }
    efree(tmptag);
    return 0;

}

/*************************************************
* int properties_compatible -
*
*  checks to see if two properties can be compared
*
**************************************************/
int properties_compatible( struct metaEntry *m1, struct metaEntry *m2 )
{
    int mask = META_STRING | META_NUMBER | META_DATE | META_IGNORE_CASE;
    return (m1->metaType & mask ) == ( m2->metaType & mask);
}



static struct metaEntry **meta_entries_for_index( IndexFILE *indexf, int want_props )
{
    INDEXDATAHEADER *header = NULL;
    int     i, n;
    struct metaEntry **entries;

    header = &indexf->header;

    if ( !header->metaCounter )
      progerr("no meta names in index");

    entries = (struct metaEntry **)emalloc( sizeof(struct metaEntry *) * ( 1 + header->metaCounter ) );
    for (i = 0, n = 0; i < header->metaCounter; i++)
    {
      int is_prop = (is_meta_property(header->metaEntryArray[i]) && !header->metaEntryArray[i]->alias);
      if (is_prop == want_props)
      {
        entries[n] = header->metaEntryArray[i];
        n++;
      }
    }
    entries[n] = 0;
    return( entries );
}

/*********************************************************************
* SwishMetaList -- Return a list of the meta entries for a given index.
*
**********************************************************************/

struct metaEntry **SwishMetaList(SWISH *sw, const char *index_name)
{
    IndexFILE *indexf = indexf_by_name( sw, index_name );

    if ( !sw )
      progerr("SwishMetaNames requires a valid swish handle");

    if ( !indexf ) {
      set_progerr( HEADER_READ_ERROR, sw, "Index file '%s' is not an active index file", index_name );
      return( NULL );
    }

    if ( indexf->meta_list )
      return indexf->meta_list;

    indexf->meta_list = meta_entries_for_index( indexf, 0 );

    return( indexf->meta_list );
}

/*********************************************************************
* SwishResultMetaList -- this is the same as SwishMetaNammes but requires
* only a result instead of an index name.
*
**********************************************************************/

struct metaEntry **SwishResultMetaList(RESULT *result)
{
  IndexFILE *indexf = result->db_results->indexf;

  if ( indexf->meta_list )
    return indexf->meta_list;

  indexf->meta_list = meta_entries_for_index( indexf, 0 );

  return( indexf->meta_list );
}

/*********************************************************************
* SwishPropertyList -- return a list of all the properties only.
*
**********************************************************************/

struct metaEntry **SwishPropertyList(SWISH *sw, const char *index_name)
{
    IndexFILE *indexf = indexf_by_name( sw, index_name );

    if ( !sw )
      progerr("SwishPropertyNames requires a valid swish handle");

    if ( !indexf ) {
      set_progerr( HEADER_READ_ERROR, sw, "Index file '%s' is not an active index file", index_name );
      return( NULL );
    }

    if ( indexf->prop_list )
      return indexf->prop_list;

    indexf->prop_list = meta_entries_for_index( indexf, 1 );

    return( indexf->prop_list );
}

/*********************************************************************
* SwishResultPropertyList -- this is the same as SwishPropertyNammes
* but requires uses a search result instead of an index name.
*
**********************************************************************/

struct metaEntry **SwishResultPropertyList(RESULT *result)
{
  IndexFILE *indexf = result->db_results->indexf;

  if ( indexf->prop_list )
    return indexf->prop_list;

  indexf->prop_list = meta_entries_for_index( indexf, 1 );

  return( indexf->prop_list );
}

/*********************************************************************
* SwishMetaName -- returns the name for the given meta entry.
*
**********************************************************************/

const char *SwishMetaName( struct  metaEntry *meta )
{
  return( meta->metaName );
}

/*********************************************************************
* SwishMetaType -- returns the type for the given meta entry.
* (see swish-e.h for possible types)
*
**********************************************************************/

int SwishMetaType( struct  metaEntry *meta )
{
  return( meta->metaType & (META_STRING | META_NUMBER | META_DATE) );
}

/*********************************************************************
* SwishMetaID -- returns the ID for the given meta entry.
*
**********************************************************************/

int SwishMetaID( struct  metaEntry *meta )
{
  return( meta->metaID );
}

���������������������������������swish-e-2.4.7/src/ramdisk.c�������������������������������������������������������������������������0000664�0000771�0001750�00000014766�11166010110�012416� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*

$Id: ramdisk.c 1946 2007-10-22 14:56:35Z karpet $


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 14:52:21 CDT 2005
** added GPL


**
**
** 2001-06-21 jmruiz ramdisk module or "poor man ramdisk" - Initial coding
**                   to test with LOCATIONS but it can be used with other
**                   things.
**                   Writes: Only sequential write is allowed
**                   Reads: Direct and sequential reads are allowed
**                   Routines and its equivalents:
**                   
**                         ramdisk_create                         fopen
**                         ramdisk_tell                           ftell
**                         ramdisk_write                          fwrite
**                         ramdisk_read                           fwrite
**                         ramdisk_seek                           fseek
**                         ramdisk_close                          fclose
**
**
*/


#include "swish.h"
#include "mem.h"
#include "ramdisk.h"

/* 06/2001 Jose Ruiz
** Routines to store/extract location data in memory incontigous positions to save
** memory.
** The data has been compressed previously in memory.
** Returns the pointer to the memory.
*/

struct ramdisk
{
   sw_off_t cur_pos;
   sw_off_t end_pos;
   unsigned int n_buffers;
   unsigned int buf_size;
   unsigned char **buffer;
   MEM_ZONE *zone;
};


/* Create a new ramdisk - The size of the internal buffers is buf_size */
struct ramdisk *ramdisk_create(char *name, int buf_size)
{
struct ramdisk *rd;

        rd = (struct ramdisk *) emalloc(sizeof(struct ramdisk));
        rd->zone = Mem_ZoneCreate(name, buf_size, 0);
        rd->cur_pos = (sw_off_t)0;
        rd->end_pos = (sw_off_t)0;
        rd->n_buffers = 1;
        rd->buf_size = buf_size;
        rd->buffer = (unsigned char **)emalloc(sizeof(unsigned char *));
        rd->buffer[0] = (unsigned char *)Mem_ZoneAlloc(rd->zone, buf_size);
        return rd;
}

/* Closes / frees the memory used by the ramdisk */
int ramdisk_close(FILE *fp)
{
struct ramdisk *rd = (struct ramdisk *)fp;

        Mem_ZoneFree(&rd->zone);
        efree(rd->buffer);
        efree(rd);
    return 0;
}

void add_buffer_ramdisk(struct ramdisk *rd)
{
        rd->buffer = (unsigned char **)erealloc(rd->buffer,(rd->n_buffers + 1) * sizeof(unsigned char *));
        rd->buffer[rd->n_buffers++] = (unsigned char *)Mem_ZoneAlloc(rd->zone, rd->buf_size);
}

/* Equivalent to ftell to get the position while writing to the ramdisk */
sw_off_t ramdisk_tell(FILE *fp)
{
struct ramdisk *rd = (struct ramdisk *)fp;

    return rd->cur_pos;
}


/* Writes to the ramdisk - The parameters are identical to those in fwrite */
size_t ramdisk_write(const void *buffer,size_t sz1, size_t sz2, FILE *fp)
{
unsigned int lenbuf=(unsigned int)(sz1 *sz2);
struct ramdisk *rd = (struct ramdisk *)fp;
unsigned char *buf = (unsigned char *)buffer;
unsigned int num_buffer,start_pos;
unsigned int avail;

    num_buffer = (unsigned int)(rd->cur_pos / (sw_off_t)rd->buf_size);
    start_pos = (unsigned int)(rd->cur_pos % (sw_off_t)rd->buf_size);

    avail = rd->buf_size - start_pos;
    while(avail<=(unsigned int)lenbuf)
    {
        if(avail)
            memcpy(rd->buffer[num_buffer]+start_pos,buf,avail);
        lenbuf -= avail;
        rd->cur_pos += (sw_off_t)avail;
        buf += avail;
        add_buffer_ramdisk(rd);
        avail = rd->buf_size;
        start_pos = 0;
        num_buffer++;
    }
    if(lenbuf)
    {
        memcpy(rd->buffer[num_buffer]+start_pos,buf,lenbuf);
        rd->cur_pos += (sw_off_t)lenbuf;
    }
    if(rd->cur_pos > rd->end_pos)
        rd->end_pos = rd->cur_pos;

    /* needs to return number of elements, not number of bytes */
    return sz2;
}

/* Equivalent to fseek */
int ramdisk_seek(FILE *fp,sw_off_t pos, int set)
{
struct ramdisk *rd = (struct ramdisk *)fp;

    switch(set)
    {
    case SEEK_CUR:
        pos += rd->cur_pos;
        break;
    case SEEK_END:
        pos += rd->end_pos;
        break;
    }
    if( pos > rd->end_pos )
    {
        while(rd->end_pos < pos)
        {
            ramdisk_putc(0, (FILE *)rd);
        }
    } 
    else
    {
        rd->cur_pos = pos;
    }
    return 0;
}


/* Reads from the ramdisk - The parameters are identical to those in fread */
size_t ramdisk_read(void *buf, size_t sz1, size_t sz2, FILE *fp)
{
struct ramdisk *rd = (struct ramdisk *)fp;
unsigned long len = (unsigned long) (sz1 * sz2);
unsigned char *buffer = (unsigned char *)buf;
unsigned int avail, num_buffer, start_pos, buffer_offset;

    if(rd->cur_pos >= rd->end_pos)
    return 0;
    if((rd->cur_pos + (sw_off_t)len) > rd->end_pos)
    {
        len = (unsigned long)(rd->end_pos - rd->cur_pos);
    }
    num_buffer = (unsigned int)(rd->cur_pos / (sw_off_t)rd->buf_size);
    start_pos = (unsigned int)(rd->cur_pos % (sw_off_t)rd->buf_size);

    buffer_offset = 0;

    avail = rd->buf_size - start_pos;
    while(avail < (unsigned int)len)
    {
        memcpy(buffer+buffer_offset,rd->buffer[num_buffer]+start_pos,avail);
        buffer_offset += avail;
        rd->cur_pos += (sw_off_t)avail;
        len -= avail;
        num_buffer++;
        start_pos=0;
        avail = rd->buf_size;
        if(num_buffer == rd->n_buffers)
            return buffer_offset;
    }
    memcpy(buffer+buffer_offset,rd->buffer[num_buffer]+start_pos,len);
    rd->cur_pos += (sw_off_t)len;
    buffer_offset += len;
    return buffer_offset;
}

int ramdisk_getc(FILE *fp)
{
unsigned char c;

    ramdisk_read((void *)&c, 1, 1, fp);
    return (int) ((unsigned char)c);
}

int ramdisk_putc(int c, FILE *fp)
{
unsigned char tmp = (unsigned char)c;

    ramdisk_write((const void *)&tmp,1, 1, fp);
    return 1;
}
����������swish-e-2.4.7/src/db_native.h�����������������������������������������������������������������������0000664�0000771�0001750�00000020301�11166010110�012702� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
**
$Id: db_native.h 1945 2007-10-22 14:54:07Z karpet $

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL


**
**
**
** 2001-01  jose   initial coding
**
*/



#ifndef __HasSeenModule_DBNative
#define __HasSeenModule_DBNative    1


#ifdef USE_BTREE
#include "btree.h"
#include "array.h"
#include "worddata.h"
#include "fhash.h"
#define MAXCHARS 8            /* Only 8 are needed when BTREE is used */

#else

#define MAXCHARS 266            /* 255 for chars plus ten more for other data */

#endif

#define FILELISTPOS (MAXCHARS - 1)
#define FILEOFFSETPOS (MAXCHARS - 2)
#define HEADERPOS (MAXCHARS - 3)
#define WORDPOS (MAXCHARS - 4)
#define SORTEDINDEX (MAXCHARS - 5)
#define ENDWORDPOS (MAXCHARS - 6)

#ifdef USE_BTREE
#define TOTALWORDSPERFILEPOS (MAXCHARS - 7)
#define FILEHASHPOS (MAXCHARS - 8)
#endif



struct Handle_DBNative
{
       /* values used by the index proccess */
       /* points to the start of offsets to words in the file */
   sw_off_t offsetstart;

   SWISH *sw;  /* for reporting errors back */

#ifndef USE_BTREE
       /* points to the start of hashoffsets to words in the file */
   sw_off_t hashstart;
#endif
       /* File Offsets to words */
   sw_off_t offsets[MAXCHARS];

#ifndef USE_BTREE
   sw_off_t hashoffsets[VERYBIGHASHSIZE];

   sw_off_t lasthashval[VERYBIGHASHSIZE];
   int wordhash_counter;
#endif

   sw_off_t nextwordoffset;
   sw_off_t last_sortedindex;
   sw_off_t next_sortedindex;
   
   int worddata_counter;

#ifndef USE_BTREE
   sw_off_t *wordhashdata;

      /* Hash array to improve wordhashdata performance */
   struct numhash
   {
      int index;
      struct numhash *next;
   } *hash[BIGHASHSIZE];
   MEM_ZONE *hashzone;
#endif

   int num_words;

   DB_OPEN_MODE mode; 

   char *dbname;

#ifndef USE_BTREE
       /* ramdisk to store words */
   struct ramdisk *rd;
#endif

       /* Index FILE handle as returned from fopen */

       /* Pointers to words write/read functions */ 
   sw_off_t    (*w_tell)(FILE *);
   size_t  (*w_write)(const void *, size_t, size_t, FILE *);
   int     (*w_seek)(FILE *, sw_off_t, int);
   size_t  (*w_read)(void *, size_t, size_t, FILE *);
   int     (*w_close)(FILE *);
   int     (*w_putc)(int , FILE *);
   int     (*w_getc)(FILE *);

   FILE *fp;
   FILE *prop;

   int      tmp_index;      /* These indicates the file is opened as a temporary file */
   int      tmp_prop;
   char    *cur_index_file;
   char    *cur_prop_file;

   long     unique_ID;          /* just because it's called that doesn't mean it is! */

#ifdef USE_BTREE
   BTREE   *bt;
   FILE    *fp_btree;
   int      tmp_btree;
   char    *cur_btree_file;

   WORDDATA    *worddata;
   FILE    *fp_worddata;
   int      tmp_worddata;
   char    *cur_worddata_file;

   FHASH   *hashfile;
   FILE    *fp_hashfile;
   int      tmp_hashfile;
   char    *cur_hashfile_file;

   FILE    *fp_array;
   int      tmp_array;
   char    *cur_array_file;

   int      n_presorted_array;
   unsigned long *presorted_root_node;
   unsigned long *presorted_propid;
   ARRAY  **presorted_array;
   FILE    *fp_presorted;
   int      tmp_presorted;
   char    *cur_presorted_file;

   unsigned long cur_presorted_propid;
   ARRAY   *cur_presorted_array;

   ARRAY   *totwords_array;

   ARRAY   *props_array;
#endif
};

void initModule_DBNative (SWISH *);
void freeModule_DBNative (SWISH *);

void   *DB_Create_Native (SWISH *sw, char *dbname);
void   *DB_Open_Native (SWISH *sw, char *dbname, int mode);
void    DB_Close_Native(void *db);
void    DB_Remove_Native(void *db);



int     DB_InitWriteHeader_Native(void *db);
int     DB_EndWriteHeader_Native(void *db);
int     DB_WriteHeaderData_Native(int id, unsigned char *s, int len, void *db);

int     DB_InitReadHeader_Native(void *db);
int     DB_ReadHeaderData_Native(int *id, unsigned char **s, int *len, void *db);
int     DB_EndReadHeader_Native(void *db);



int     DB_InitWriteWords_Native(void *db);
sw_off_t    DB_GetWordID_Native(void *db);
int     DB_WriteWord_Native(char *word, sw_off_t wordID, void *db);

#ifdef USE_BTREE
int     DB_UpdateWordID_Native(char *word, sw_off_t new_wordID, void *db);
int     DB_DeleteWordData_Native(sw_off_t wordID, void *db);
#endif

int     DB_WriteWordHash_Native(char *word, sw_off_t wordID, void *db);
long    DB_WriteWordData_Native(sw_off_t wordID, unsigned char *worddata, int data_size, int saved_bytes, void *db);
int     DB_EndWriteWords_Native(void *db);

int     DB_InitReadWords_Native(void *db);
int     DB_ReadWordHash_Native(char *word, sw_off_t *wordID, void *db);
int     DB_ReadFirstWordInvertedIndex_Native(char *word, char **resultword, sw_off_t *wordID, void *db);
int     DB_ReadNextWordInvertedIndex_Native(char *word, char **resultword, sw_off_t *wordID, void *db);
long    DB_ReadWordData_Native(sw_off_t wordID, unsigned char **worddata, int *data_size, int *saved_bytes, void *db);
int     DB_EndReadWords_Native(void *db);



int     DB_WriteFileNum_Native(int filenum, unsigned char *filedata,int sz_filedata, void *db);
int     DB_ReadFileNum_Native( unsigned char *filedata, void *db);
int     DB_CheckFileNum_Native(int filenum, void *db);
int     DB_RemoveFileNum_Native(int filenum, void *db);

/** Pre-sorted array access **/

#ifdef USE_PRESORT_ARRAY
int     DB_InitWriteSortedIndex_Native(void *db , int n_props);
int     DB_WriteSortedIndex_Native(int propID, int *data, int num_elements,void *db);
#else
int     DB_InitWriteSortedIndex_Native(void *db );
int     DB_WriteSortedIndex_Native(int propID, unsigned char *data, int sz_data,void *db);
#endif
int     DB_EndWriteSortedIndex_Native(void *db);

int     DB_InitReadSortedIndex_Native(void *db);
int     DB_ReadSortedIndex_Native(int propID, unsigned char **data, int *sz_data,void *db);
/* Note that a macro is not provided for reading the data */
int     DB_ReadSortedData_Native(int *data,int index, int *value, void *db);
int     DB_EndReadSortedIndex_Native(void *db);




int     DB_InitWriteProperties_Native(void *db);
void    DB_WriteProperty_Native( IndexFILE *indexf, FileRec *fi, int propID, char *buffer, int buf_len, int uncompressed_len, void *db);
void    DB_WritePropPositions_Native(IndexFILE *indexf, FileRec *fi, void *db);
void    DB_ReadPropPositions_Native(IndexFILE *indexf, FileRec *fi, void *db);
char   *DB_ReadProperty_Native(IndexFILE *indexf, FileRec *fi, int propID, int *buf_len, int *uncompressed_len, void *db);
void    DB_Reopen_PropertiesForRead_Native(void *db);

#ifdef USE_BTREE
int        DB_InitWriteTotalWordsPerFile_Native(SWISH *sw, void *DB);
int    DB_WriteTotalWordsPerFile_Native(SWISH *sw, int idx, int wordcount, void *DB);
int    DB_EndWriteTotalWordsPerFile_Native(SWISH *sw, void *DB);
int        DB_InitReadTotalWordsPerFile_Native(SWISH *sw, void *DB);
int    DB_ReadTotalWordsPerFile_Native(SWISH *sw, int idx, int *wordcount, void *DB);
int    DB_EndReadTotalWordsPerFile_Native(SWISH *sw, void *DB);
#endif





/* 04/00 Jose Ruiz
** Functions to read/write longs from a file
*/
void    printlong(FILE * fp, unsigned long num, size_t (*f_write)(const void *, size_t, size_t, FILE *));
unsigned long    readlong(FILE * fp, size_t (*f_read)(void *, size_t, size_t, FILE *));

/* 2003/10 Jose Ruiz
** Functions to read/write file offsets
*/
void    printfileoffset(FILE * fp, sw_off_t num, size_t (*f_write)(const void *, size_t, size_t, FILE *));
sw_off_t    readfileoffset(FILE * fp, size_t (*f_read)(void *, size_t, size_t, FILE *));


#endif
�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/array.c���������������������������������������������������������������������������0000664�0000771�0001750�00000030402�11166010110�012063� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*


$Id: array.c 1946 2007-10-22 14:56:35Z karpet $

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 17:58:02 CDT 2005
** added GPL

**-----------------------------------------------------------------
**
**  Virtual Array Code. 
**  11/2001 jmruiz - The intention of this routines is storing and reading
**                   elemnts of arrays of long numbers avoiding the 
**                   allocation in memory of the total array. In other words,
**                   if we need to read only 10 elements of the array, we
**                   will must try to make the minimal I/O memory and disk
**                   operations.
**
**                   To do that, the data is stored in aligned pages in disk   
**                   Also, a simple cache system is used to speed I/O file
**                   operations.
**
**                   The virtual array is extensible. In other words, you can
**                   add elements whenever you want
**
**    Main routines:
**
**  ARRAY *ARRAY_Create(FILE *fp)   
**    Creates a virtual array. Returns the handle of the array
**
**  ARRAY *ARRAY_Open(FILE *fp, sw_off_t root_page) 
**    Opens an existent Virtual Array. root_page is de value returned by
**    Array_Close. Returns de handle of the array.
**
**  sw_off_t ARRAY_Close(ARRAY *arr)
**    Closes and frees memory. arr is the value returned by ARRAY_Create or
**    ARRAY_Open. Returns the root page of the array. This value must be
**
**  int ARRAY_Put(ARRAY *arr, int index, unsigned long value)
**    Writes the array element arr[index]=value to the virtual array
**
**  unsigned long ARRAY_Get(ARRAY *arr, int index)
**    Reads the array element index. Returns de value arr[index]
**
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "swish.h"
#include "mem.h"
#include "compress.h"
#include "array.h"
#include "error.h"


/* A ARRAY page size */
#define ARRAY_PageSize 4096

#define SizeOfElement sizeof(long)

/* Round to ARRAY_PageSize */
#define ARRAY_RoundPageSize(n) (((sw_off_t)(n) + (sw_off_t)(ARRAY_PageSize - 1)) & (~(sw_off_t)(ARRAY_PageSize - 1)))

#define ARRAY_PageHeaderSize (1 * sizeof(sw_off_t)) 

#define ARRAY_PageData(pg) ((pg)->data + ARRAY_PageHeaderSize)
#define ARRAY_Data(pg,i) (ARRAY_PageData((pg)) + (i) * SizeOfElement)

#define ARRAY_SetNextPage(pg,num) (sw_off_t)( *(sw_off_t *)((pg)->data + 0 * sizeof(sw_off_t)) = PACKFILEOFFSET(num))

#define ARRAY_GetNextPage(pg,num) ( (num) = UNPACKFILEOFFSET(*(sw_off_t *)((pg)->data + 0 * sizeof(sw_off_t))))


int ARRAY_WritePageToDisk(FILE *fp, ARRAY_Page *pg)
{
    ARRAY_SetNextPage(pg,pg->next);
    sw_fseek(fp,(sw_off_t)((sw_off_t)pg->page_number * (sw_off_t)ARRAY_PageSize),SEEK_SET);
    return sw_fwrite(pg->data,ARRAY_PageSize,1,fp);
}

int ARRAY_WritePage(ARRAY *b, ARRAY_Page *pg)
{
int hash = pg->page_number % ARRAY_CACHE_SIZE;
ARRAY_Page *tmp;
    pg->modified =1;
    if((tmp = b->cache[hash]))
    {
        while(tmp)
        {
            if(tmp->page_number == pg->page_number)
            {
                return 0;
            }
            tmp = tmp->next_cache;
        }
    }
    pg->next_cache = b->cache[hash];
    b->cache[hash] = pg;
    return 0;
}

int ARRAY_FlushCache(ARRAY *b)
{
int i;
ARRAY_Page *tmp, *next;
    for(i = 0; i < ARRAY_CACHE_SIZE; i++)
    {
        if((tmp = b->cache[i]))
        {
            while(tmp)
            {
#ifdef DEBUG
                if(tmp->in_use)
                {
                    printf("DEBUG Error in FlushCache: Page in use\n");
                    exit(0);
                }
#endif
                next = tmp->next_cache;
                if(tmp->modified)
                {
                    ARRAY_WritePageToDisk(b->fp, tmp);
                    tmp->modified = 0;
                }
                if(tmp != b->cache[i])
                    efree(tmp);

                tmp = next;
            }
            b->cache[i]->next_cache = NULL;
        }
    }
    return 0;
}

int ARRAY_CleanCache(ARRAY *b)
{
int i;
ARRAY_Page *tmp,*next;
    for(i = 0; i < ARRAY_CACHE_SIZE; i++)
    {
        if((tmp = b->cache[i]))
        {
            while(tmp)
            {
                next = tmp->next_cache;
                efree(tmp);
                tmp = next;
            }
            b->cache[i] = NULL;
        }
    }
    return 0;
}

ARRAY_Page *ARRAY_ReadPageFromDisk(FILE *fp, sw_off_t page_number)
{
ARRAY_Page *pg = (ARRAY_Page *)emalloc(sizeof(ARRAY_Page) + ARRAY_PageSize);

    sw_fseek(fp,(sw_off_t)(page_number * (sw_off_t)ARRAY_PageSize),SEEK_SET);
    sw_fread(pg->data,ARRAY_PageSize, 1, fp);

    ARRAY_GetNextPage(pg,pg->next);

    pg->page_number = page_number;
    pg->modified = 0;
    return pg;
}

ARRAY_Page *ARRAY_ReadPage(ARRAY *b, sw_off_t page_number)
{
int hash = (int)(page_number % (sw_off_t)ARRAY_CACHE_SIZE);
ARRAY_Page *tmp;
    if((tmp = b->cache[hash]))
    {
        while(tmp)
        {
            if(tmp->page_number == page_number)
            {
                return tmp;
            }
            tmp = tmp->next_cache;
        }
    }

    tmp = ARRAY_ReadPageFromDisk(b->fp, page_number);
    tmp->modified = 0;
    tmp->in_use = 1;
    tmp->next_cache = b->cache[hash];
    b->cache[hash] = tmp;
    return tmp;
}

ARRAY_Page *ARRAY_NewPage(ARRAY *b)
{
ARRAY_Page *pg;
sw_off_t offset;
FILE *fp = b->fp;
int hash;
int size = ARRAY_PageSize;

    /* Get file pointer */
    if(sw_fseek(fp,(sw_off_t)0,SEEK_END) !=0)
        progerrno("Failed to seek to eof: ");

    offset = sw_ftell(fp);
    /* Round up file pointer */
    offset = ARRAY_RoundPageSize(offset);

    /* Set new file pointer - data will be aligned */
    if(sw_fseek(fp,offset, SEEK_SET)!=0 || offset != sw_ftell(fp))
        progerrno("Failed during seek: ");


    pg = (ARRAY_Page *)emalloc(sizeof(ARRAY_Page) + size);
    memset(pg,0,sizeof(ARRAY_Page) + size);
    /* Reserve space in file */
    if(sw_fwrite(pg->data,1,size,fp)!=size || ((sw_off_t)size + offset) != sw_ftell(fp))
        progerrno("Failed to write ARRAY_page: ");

    pg->next = 0;

    pg->page_number = offset / (sw_off_t)ARRAY_PageSize;

    /* add to cache */
    pg->modified = 1;
    pg->in_use = 1;
    hash = pg->page_number % ARRAY_CACHE_SIZE;
    pg->next_cache = b->cache[hash];
    b->cache[hash] = pg;
    return pg;
}

void ARRAY_FreePage(ARRAY *b, ARRAY_Page *pg)
{
int hash = pg->page_number % ARRAY_CACHE_SIZE;
ARRAY_Page *tmp;

    tmp = b->cache[hash];

#ifdef DEBUG
    if(!(tmp = b->cache[hash]))
    {
        /* This should never happen!!!! */
        printf("Error in FreePage\n");
        exit(0);
    }
#endif

    while(tmp)
    {
        if (tmp->page_number != pg->page_number)
            tmp = tmp->next_cache;
        else
        {
            tmp->in_use = 0;
            break;
        }
    }
}

ARRAY *ARRAY_New(FILE *fp, unsigned int size)
{
ARRAY *b;
int i;
    b = (ARRAY *) emalloc(sizeof(ARRAY));
    b->page_size = size;
    b->fp = fp;
    for(i = 0; i < ARRAY_CACHE_SIZE; i++)
        b->cache[i] = NULL;

    return b;
}

ARRAY *ARRAY_Create(FILE *fp)
{
ARRAY *b;
ARRAY_Page *root;
int size = ARRAY_PageSize;

    b = ARRAY_New(fp , size);
    root = ARRAY_NewPage(b);

    b->root_page = root->page_number;

    ARRAY_WritePage(b, root);
    ARRAY_FreePage(b, root);

    return b;
}


ARRAY *ARRAY_Open(FILE *fp, sw_off_t root_page)
{
ARRAY *b;
int size = ARRAY_PageSize;

    b = ARRAY_New(fp , size);

    b->root_page = root_page;

    return b;
}

sw_off_t ARRAY_Close(ARRAY *bt)
{
sw_off_t root_page = bt->root_page;
    ARRAY_FlushCache(bt);
    ARRAY_CleanCache(bt);
    efree(bt);
    return root_page;
}


int ARRAY_Put(ARRAY *b, int index, unsigned long value)
{
sw_off_t next_page; 
ARRAY_Page *root_page, *tmp = NULL, *prev; 
int i, hash, page_reads, page_index;

    page_reads = index / ((ARRAY_PageSize - ARRAY_PageHeaderSize) / SizeOfElement);
    hash = page_reads % ((ARRAY_PageSize - ARRAY_PageHeaderSize) / SizeOfElement);
    page_reads /= ((ARRAY_PageSize - ARRAY_PageHeaderSize) / SizeOfElement);
    page_index = index % ((ARRAY_PageSize - ARRAY_PageHeaderSize) / SizeOfElement);

    root_page = ARRAY_ReadPage(b, b->root_page);
    next_page = UNPACKFILEOFFSET(*(sw_off_t *)ARRAY_Data(root_page, hash));

    prev = NULL;
    for(i = 0; i <= page_reads; i++)
    {
        if(!next_page)
        {
            tmp = ARRAY_NewPage(b);
            ARRAY_WritePage(b,tmp);
            if(!i)
            {
                *(sw_off_t *)ARRAY_Data(root_page,hash) = PACKFILEOFFSET(tmp->page_number);
                 ARRAY_WritePage(b,root_page);
            }
            else
            {
                prev->next = tmp->page_number;
                ARRAY_WritePage(b,prev);
            }
        } 
        else
        {
            tmp = ARRAY_ReadPage(b, next_page);
        }
        if(prev)
            ARRAY_FreePage(b,prev);
        prev = tmp;
        next_page = tmp->next;
    }
    *(unsigned long *)ARRAY_Data(tmp,page_index) = PACKLONG(value);
    ARRAY_WritePage(b,tmp);
    ARRAY_FreePage(b,tmp);
    ARRAY_FreePage(b,root_page);

    return 0;
}


unsigned long ARRAY_Get(ARRAY *b, int index)
{
sw_off_t next_page;
unsigned long value; 
ARRAY_Page *root_page, *tmp;
int i, hash, page_reads, page_index;

    page_reads = index / ((ARRAY_PageSize - ARRAY_PageHeaderSize) / SizeOfElement);
    hash = page_reads % ((ARRAY_PageSize - ARRAY_PageHeaderSize) / SizeOfElement);
    page_reads /= ((ARRAY_PageSize - ARRAY_PageHeaderSize) / SizeOfElement);
    page_index = index % ((ARRAY_PageSize - ARRAY_PageHeaderSize) / SizeOfElement);

    root_page = ARRAY_ReadPage(b, b->root_page);
/* $$$$ to be fixed $$$ */
    next_page = UNPACKLONG(*(long *)ARRAY_Data(root_page, hash));

    tmp = NULL;
    for(i = 0; i <= page_reads; i++)
    {
        if(tmp)
            ARRAY_FreePage(b, tmp);
        if(!next_page)
        {
            ARRAY_FreePage(b,root_page);
            return 0L;
        } 
        else
        {
            tmp = ARRAY_ReadPage(b, next_page);
        }
        next_page = tmp->next;
    }
    value = UNPACKLONG(*(unsigned long *)ARRAY_Data(tmp,page_index));
    ARRAY_FreePage(b,tmp);
    ARRAY_FreePage(b,root_page);

    return value;
}



#ifdef DEBUG

#include <time.h>

#define N_TEST 50000000

#define F_READ_BINARY           "rb"
#define F_WRITE_BINARY          "wb"
#define F_READWRITE_BINARY      "rb+"

#define F_READ_TEXT             "r"
#define F_WRITE_TEXT            "w"
#define F_READWRITE_TEXT        "r+"

int main()
{
FILE *fp;
ARRAY *bt;
int i;
static unsigned long nums[N_TEST];
unsigned long root_page;
    srand(time(NULL));



    fp = sw_fopen("kkkkk",F_WRITE_BINARY);
    sw_fclose(fp);
    fp = sw_fopen("kkkkk",F_READWRITE_BINARY);

    sw_fwrite("aaa",1,3,fp);

printf("\n\nIndexing\n\n");

    bt = ARRAY_Create(fp);
    for(i=0;i<N_TEST;i++)
    {
        nums[i] = rand();
//        nums[i]=i;
        ARRAY_Put(bt,i,nums[i]);
        if(nums[i]!= ARRAY_Get(bt,i))
            printf("\n\nmal %d\n\n",i);
        if(!(i%1000))
        {
            ARRAY_FlushCache(bt);
            printf("%d            \r",i);
        }
    }

    root_page = ARRAY_Close(bt);
    sw_fclose(fp);

printf("\n\nUnfreed %d\n\n",num);
printf("\n\nSearching\n\n");

    fp = sw_fopen("kkkkk",F_READ_BINARY);
    bt = ARRAY_Open(fp, root_page);

    for(i=0;i<N_TEST;i++)
    {
        if(nums[i] != ARRAY_Get(bt,i))
            printf("\n\nmal %d\n\n",i);
        if(!(i%1000))
            printf("%d            \r",i);
    }

    root_page = ARRAY_Close(bt);

    sw_fclose(fp);
printf("\n\nUnfreed %d\n\n",num);

}

#endif
��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/txt.c�����������������������������������������������������������������������������0000664�0000771�0001750�00000004354�11166010110�011573� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
Mon May  9 10:57:22 CDT 2005 -- added GPL notice


*/


/*
$Id: txt.c 1736 2005-05-12 15:41:22Z karman $
**
**
** 2001-03-17  rasc  save real_filename as title (instead full real_path)
**                   was: compatibility issue to v 1.x.x
*/



#include "swish.h"
#include "txt.h"
#include "mem.h"
#include "swstring.h"
#include "check.h"
#include "merge.h"
#include "search.h"
#include "docprop.h"
#include "error.h"
#include "compress.h"
#include "file.h"
#include "index.h"


/* Indexes all the words in a TXT file and adds the appropriate information
** to the appropriate structures. This is the most simple function.
** Just a call to indexstring
*/
int countwords_TXT (SWISH *sw, FileProp *fprop, FileRec *fi, char *buffer)

{
    int     metaID;
    int     positionMeta;    /* Position of word in file */
    char   *summary=NULL;
    char   *title = "";


	if(fprop->stordesc && fprop->stordesc->size)    /* Let us take the summary */
	{
		/* No fields in a TXT doc. So it is easy to get summary */
		summary=estrndup(buffer,fprop->stordesc->size);
		remove_newlines(summary);			/* 2001-03-13 rasc */
	}

    addCommonProperties( sw, fprop, fi, title, summary, 0 );

	if(summary) efree(summary);


	metaID=1; positionMeta=1; /* No metanames in TXT */

	return indexstring(sw, buffer, fi->filenum, IN_FILE, 1, &metaID, &positionMeta);
}
������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/search.c��������������������������������������������������������������������������0000664�0000771�0001750�00000242026�11166010110�012221� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: search.c 2291 2009-03-31 01:56:00Z karpet $
**
** Copyright (C) 1995, 1996, 1997, 1998 Hewlett-Packard Company
** Originally by Kevin Hughes, kev@kevcom.com, 3/11/94
**


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 12:50:43 CDT 2005 - added GPL statement, removed LGPL
** karman: how much of this is still original HP stuff?


** Changes in expandstar and parseterm to fix the wildcard * problem.
** G. Hill, ghill@library.berkeley.edu  3/11/97
**
** Changes in notresultlist, parseterm, and fixnot to fix the NOT problem
** G. Hill, ghill@library.berkeley.edu 3/13/97
**
** Changes in search, parseterm, fixnot, operate, getfileinfo
** to support METADATA
** G. Hill 3/18/97 ghill@library.berkeley.edu
**
** Change in search to allow for search with a list including
** also some empty indexes.
** G. Hill after a suggestion by J. Winstead 12/18/97
**
** Created countResults for number of hits in search
** G. Hill 12/18/97
**
**
** Change in search to allow maxhits to return N number
** of results for each index specified
** D. Norris after suggestion by D. Chrisment 08/29/99
**
** Created resultmaxhits as a global, renewable maxhits
** D. Norris 08/29/99
**
** added word length arg to Stem() call for strcat overflow checking in stemmer.c
** added safestrcpy() macro to avoid corruption from strcpy overflow
** SRE 11/17/99
**
** 10/10/99 & 11/23/99 - Bill Moseley (merged by SRE)
**   - Changed to stem words *before* expanding with expandstar
**     so can find words in the index
**   - Moved META tag check before expandstar so META names don't get
**     expanded!
**
** fixed cast to int problems pointed out by "gcc -Wall"
** SRE 2/22/00
**
** fixed search() for case where stopword is followed by rule:
**   stopword was removed, rule was left, no matches ever found
** added "# Stopwords removed:" to output header so caller can
**   trap actions of IGNORE_STOPWORDS_IN_QUERY
** SRE 2/25/00
**
** 04/00 - Jose Ruiz
** Added code for phrase search
**     - New function phraseresultlists
**     - New function expandphrase
**
** 04/00 - Jose Ruiz
** Added freeresult function for freing results memory
** Also added changes to orresultlists andresultlists notresultlist
**  for freing memory
**
** 04/00 - Jose Ruiz
** Now use bighash instead of hash for better performance in
** orresultlist (a* or b*). Also changed hash.c
**
** 04/00 - Jose Ruiz
** Function getfileinfo rewrite
**     - Now use a hash approach for faster searching
**     - Solves the long timed searches (a* or b* or c*)
**
** 04/00 - Jose Ruiz
** Ordering of result rewrite
** Now builtin C function qsort is used for faster ordering of results
** This is useful when lots of results are found
** For example: not (axf) -> This gives you all the documents!!
**
** 06/00 - Jose Ruiz
** Rewrite of andresultlits and phraseresultlists for better permonace
** New function notresultlits for better performance
**
** 07/00 and 08/00 - Jose Ruiz
** Many modifications to make all search functions thread safe
**
** 08/00 - Added ascending and descending capabilities in results sorting
**
** 2001-02-xx rasc  search call changed, tolower changed...
** 2001-03-03 rasc  altavista search, translatechar in headers
** 2001-03-13 rasc  definable logical operators via  sw->SearchAlt->srch_op.[and|or|nor]
**                  bugfix in parse_search_string handling...
** 2001-03-14 rasc  resultHeaderOutput  -H <n>
**
** 2001-05-23 moseley - replace parse_search_string with new parser
**
** 2002-09-27 moseley - major rewrite to change to localized data in a SEARCH_OBJECT, thin out casts.
**
*/

#include "swish.h"
#include "mem.h"
#include "swstring.h"
#include "metanames.h"
#include "search.h"
#include "index.h"
#include "file.h"
#include "list.h"
#include "merge.h"
#include "hash.h"
#include "docprop.h"
#include "error.h"
#include "compress.h"
#include "result_sort.h"
#include "db.h"
#include "swish_words.h"
#include "swish_qsort.h"

#include "proplimit.h"

#include "rank.h"

/* ------ static fucntions ----------- */
static int init_sort_propIDs( DB_RESULTS *db_results, struct swline *sort_word, DB_RESULTS *last );
static void query_index( DB_RESULTS *db_results );
static int isbooleanrule(char *);
static int isunaryrule(char *);
static int getrulenum(char *);
static RESULT_LIST *sortresultsbyfilenum(RESULT_LIST *r);

static RESULT_LIST *parseterm(DB_RESULTS *db_results, int parseone, int metaID, IndexFILE * indexf, struct swline **searchwordlist);
static RESULT_LIST *operate(DB_RESULTS *db_results, RESULT_LIST * l_rp, int rulenum, char *wordin, int metaID, int andLevel, IndexFILE * indexf, int distance);
static RESULT_LIST *getfileinfo(DB_RESULTS *db_results, char *word, int metaID);
static RESULT_LIST *andresultlists(DB_RESULTS *db_results, RESULT_LIST *, RESULT_LIST *, int);
static RESULT_LIST *nearresultlists(DB_RESULTS *db_results, RESULT_LIST * l_r1, RESULT_LIST * l_r2, int andLevel, int distance);
static RESULT_LIST *orresultlists(DB_RESULTS *db_results, RESULT_LIST *, RESULT_LIST *);
static RESULT_LIST *notresultlist(DB_RESULTS *db_results, RESULT_LIST *, IndexFILE *);
static RESULT_LIST *notresultlists(DB_RESULTS *db_results, RESULT_LIST *, RESULT_LIST *);
static RESULT_LIST *phraseresultlists(DB_RESULTS *db_results, RESULT_LIST *, RESULT_LIST *, int);
static RESULT_LIST *mergeresulthashlist(DB_RESULTS *db_results, RESULT_LIST *r);
static void addtoresultlist(RESULT_LIST * l_rp, int filenum, int rank, int tfrequency, int frequency, DB_RESULTS * db_results);
static void freeresultlist(DB_RESULTS *db_results);
static void freeresult(RESULT *);
static void make_db_res_and_free(RESULT_LIST *l_res);


/**********************************************************************
* SwishRankScheme -- set the ranking scheme to use when sorting
* karman - Wed Sep  1 11:55:46 CDT 2004
*
***********************************************************************/

void SwishRankScheme(SWISH *sw, int scheme)
{
    sw->RankScheme = scheme;
}


/*********************************************************************
* SwishReturnRawRank -- return unscaled swishrank values
* karman - Mon Mar 30 20:51:02 CDT 2009
*
*********************************************************************/

void SwishReturnRawRank(SWISH *sw, int flag)
{
    sw->ReturnRawRank = flag;
}



/****************************************************************
*  New_Search_Object - Create a new search object
*
*   Pass in:
*       SWISH *sw - swish handle (database handle, if you like)
*       query - query string
*
*   Returns:
*       SEARCH_OBJECT
*
*   Notes:
*       This does not run the search, but just creates an object
*       than can generate a set of results.
*
*****************************************************************/

SEARCH_OBJECT *New_Search_Object( SWISH *sw, char *query )
{
    int         index_count;
    IndexFILE  *indexf = sw->indexlist;

    
    SEARCH_OBJECT *srch = (SEARCH_OBJECT *)emalloc( sizeof(SEARCH_OBJECT) );
    memset( srch, 0, sizeof(SEARCH_OBJECT) );

    reset_lasterror( sw );
    

    srch->sw = sw;  /* parent object */
    srch->PhraseDelimiter = PHRASE_DELIMITER_CHAR;
    srch->structure = IN_FILE;

    if ( query )
        SwishSetQuery( srch, query );


    index_count = 0;
    /* Allocate place to hold the limit arrays */
    while( indexf )
    {
        index_count++;
        indexf = indexf->next;
    }


    /* Create a table index by indexf number */
    srch->prop_limits = (PROP_LIMITS **)emalloc( sizeof(PROP_LIMITS **) * index_count );

    indexf = sw->indexlist;
    index_count = 0;

    while( indexf )
    {
        int     table_size = sizeof(PROP_LIMITS) * (indexf->header.metaCounter + 1); /* metaID start at one */
        PROP_LIMITS *index_limits;  /* an array of limits data */

        /* Create a table indexed by meta ID */
        index_limits = (PROP_LIMITS *)emalloc( table_size );
        memset( index_limits, 0, table_size );
        
        srch->prop_limits[index_count++] = index_limits;

        indexf = indexf->next;
    }


    return srch;
}

/****************************************************************
* These three funtions return the swish object's ref_count_ptr
****************************************************************/
void *SwishSearch_parent ( SEARCH_OBJECT *srch )
{
    return srch ? srch->sw->ref_count_ptr : NULL;
}

void *SwishResults_parent ( RESULTS_OBJECT *results )
{
    return results ? results->sw->ref_count_ptr : NULL;
}


/*** and this is same for the results ***/

void ResultsSetRefPtr( RESULTS_OBJECT *results, void *address )
{
    if ( !address )
        progerr("ResultsSetRefPtr - passed null address");

    results->ref_count_ptr = address;
}

void *SwishResult_parent ( RESULT *result )
{
    return result ? result->db_results->results->ref_count_ptr : NULL;
}


/********** Search object methods *************************/

void SwishSetStructure( SEARCH_OBJECT *srch, int structure )
{
    if ( srch )
        srch->structure = structure;
}

int SwishGetStructure( SEARCH_OBJECT *srch )
{
    return srch ? srch->structure : 0;
}

void SwishPhraseDelimiter( SEARCH_OBJECT *srch, char delimiter )
{
    if ( srch && delimiter && !isspace( (int)delimiter ) )
        srch->PhraseDelimiter = (int)delimiter;
}


char SwishGetPhraseDelimiter(SEARCH_OBJECT *srch)
{
    return srch ? srch->PhraseDelimiter : 0;
}


void SwishSetSort( SEARCH_OBJECT *srch, char *sort )
{
    StringList  *slsort = NULL;
    int          i;

    if ( !srch || !sort || !*sort )
        return;

    if ( srch->sort_params )
    {
        freeswline( srch->sort_params );
        srch->sort_params = NULL;
    }


    if ( !(slsort = parse_line(sort)) )
        return;

    for (i = 0; i < slsort->n; i++)
        srch->sort_params = addswline( srch->sort_params, slsort->word[i] );

    freeStringList(slsort);

}

/****************************************************************
*  Free_Search_Object - Frees a search object
*
*   Pass in:
*       SEARCH_OBJECT
*
*   Frees up memory associated with a search (not results)
*
*****************************************************************/

void Free_Search_Object( SEARCH_OBJECT *srch )
{
    int         index_count;
    IndexFILE   *indexf;
    
    if ( !srch )
        return;

    /* Free up any query parameters */

    if ( srch->query )
        efree( srch->query );


    if ( srch->sort_params )
        freeswline( srch->sort_params );



    SwishResetSearchLimit( srch );  /* clears data associated with the parameters and the processed data */

    /* Free up the limit tables */

    indexf = srch->sw->indexlist;
    index_count = 0;
    while ( indexf )
    {
        PROP_LIMITS *index_limits = srch->prop_limits[index_count++];
        efree( index_limits );
        indexf = indexf->next;
    }

    efree ( srch->prop_limits );

    efree (srch);
}



void SwishSetQuery(SEARCH_OBJECT *srch, char *words )
{
    if ( srch->query )
        efree( srch->query );

    srch->query = words ? estrdup( words ) : NULL;
}



/****************************************************************
*  New_Results_Object - Creates a structure for holding a set of results
*
*   Pass in:
*       SEARCH_OBJECT - parameters for running the search
*
*   Returns:
*       Initialized RESULTS_OBJECT
*
*   Notes:
*       This does not run the query.
*
*****************************************************************/

static RESULTS_OBJECT *New_Results_Object( SEARCH_OBJECT *srch )
{
    RESULTS_OBJECT  *results;
    IndexFILE       *indexf;
    DB_RESULTS      *last = NULL;
    int             indexf_count;

    reset_lasterror( srch->sw );
    
    
    results = (RESULTS_OBJECT *)emalloc( sizeof(RESULTS_OBJECT) );
    memset( results, 0, sizeof(RESULTS_OBJECT) );

    results->sw = srch->sw;

    /* Create place to store results */
    results->resultSearchZone = Mem_ZoneCreate("resultSearch Zone", 0, 0);

    /* Create place to store sort keys */
    results->resultSortZone = Mem_ZoneCreate("resultSort Zone", 0, 0);



    /* Add in a DB_RESULTS for each index file - place to store results for a single index file */
    indexf_count = 0;
    
    for ( indexf = srch->sw->indexlist; indexf; indexf = indexf->next )
    {
        DB_RESULTS * db_results = (DB_RESULTS *) emalloc(sizeof(DB_RESULTS));
        memset( db_results, 0, sizeof(DB_RESULTS));

        db_results->results     = results;      /* parent object */
        db_results->indexf      = indexf;
        db_results->index_num   = indexf_count++;
	db_results->srch        = srch;        /* only valid during the search */


        if ( !last )
            results->db_results = db_results;  /* first one */
        else
            last->next = db_results;

        /* memory allocations after linking the db_results into the list */
        
        if ( !init_sort_propIDs( db_results, srch->sort_params, last ) )
            return results;

        last = db_results;
    }



    /* Make sure we have a query string passed in */
    if (!srch->query || !*srch->query)
        srch->sw->lasterror = NO_WORDS_IN_SEARCH;
    else
        results->query = estrdup( srch->query );
    

    return results;
}

/**************************************************************************
*  init_sort_propIDs -- load the prop ids for a single index file
*
*  Creates an array of metaEntry's that are the sort keys
*  also creates an associated array that indicates sort direction
*
***************************************************************************/

static int init_sort_propIDs( DB_RESULTS *db_results, struct swline *sort_word, DB_RESULTS *last )
{
    int cur_length = 0;    /* array size */
    struct metaEntry *m;
    struct metaEntry *rank_meta;


    /* rank sorting is the default, and is also handled special when ranking */

    rank_meta = getPropNameByName(&db_results->indexf->header, AUTOPROPERTY_RESULT_RANK);




    reset_lasterror( db_results->indexf->sw );


    /* If no sorts specified set then default to rank */

    if ( !sort_word )  /* set the default */
    {
        db_results->num_sort_props = 1;

        /* create the array */
        db_results->sort_data = (SortData *)emalloc( sizeof( SortData ) );
        memset( db_results->sort_data, 0, sizeof( SortData ) );

        if ( !rank_meta )
            progerr("Rank is not defined as an auto property - must specify sort parameters");

        db_results->sort_data[0].property = rank_meta;
        db_results->sort_data[0].direction = 1;
        db_results->sort_data[0].is_rank_sort = 1; /* flag as special -- see result_sort.c */

        return 1;
    }
        

    while ( sort_word )
    {
        char *field = sort_word->line;
        int  sortmode = -1;  /* default */

        db_results->num_sort_props++;

        /* see if there's a "asc" or "desc" modifier following */
        
        if (sort_word->next)
        {
            if (!strcasecmp(sort_word->next->line, "asc"))
            {
                sortmode = -1; /* asc sort */
                sort_word = sort_word->next;
            }
            else if (!strcasecmp(sort_word->next->line, "desc"))
            {
                sortmode = 1; /* desc sort */
                sort_word = sort_word->next;
            }
        }

        /* array big enough? */
        if ( db_results->num_sort_props > cur_length )
        {
            cur_length += 20;

            db_results->sort_data = (SortData *)erealloc( db_results->sort_data, cur_length * sizeof( SortData ) );
            memset( db_results->sort_data, 0, cur_length * sizeof( SortData ) );
        }


        m = getPropNameByName(&db_results->indexf->header, field);
        if ( !m )
        {
            set_progerr(UNKNOWN_PROPERTY_NAME_IN_SEARCH_SORT, db_results->results->sw, 
                "Property '%s' is not defined in index '%s'", field, db_results->indexf->line);
            return 0;
        }

        /* make sure the properties are compatible for sorting */
        if ( last )
           if ( !properties_compatible( last->sort_data[db_results->num_sort_props-1].property, m ) )
           {
                set_progerr(INVALID_PROPERTY_TYPE, db_results->results->sw, 
                    "Property '%s' in index '%s' is not compatible with index '%s'", field, db_results->indexf->line, last->indexf->line);
                return 0;
            }


        db_results->sort_data[db_results->num_sort_props-1].property = m;
        db_results->sort_data[db_results->num_sort_props-1].direction = sortmode;

        /* flag special case of sorting by rank */
        if ( m == rank_meta )
            db_results->sort_data[db_results->num_sort_props-1].is_rank_sort = 1;


        sort_word = sort_word->next;
    }
    return 1;
}


/****************** Utility Methods **********************************/


int SwishHits( RESULTS_OBJECT *results )
{
    if ( !results )
        return 0;       /* probably should be an error */


    return results->total_results;
}


        
SWISH *SW_ResultToSW_HANDLE( RESULT *r )
{
    return r->db_results->indexf->sw;
}

SWISH *SW_ResultsToSW_HANDLE( RESULTS_OBJECT *results )
{
    return results->sw;
}

/****************************************************************
*  Free_Results_Object - Frees all memory associated with search results
*
*   Pass in:
*       RESULTS_OBJECT
*
*****************************************************************/


void Free_Results_Object( RESULTS_OBJECT *results )
{
    DB_RESULTS *next;
    DB_RESULTS *cur;
    int         i;

    if ( !results )
        return;

    cur = results->db_results;

    while ( cur )
    {
        next = cur->next;
        freeresultlist( cur );

        freeswline( cur->parsed_words );
        freeswline( cur->removed_stopwords );

        if ( cur->sort_data )
        {
            /* free the property pointer arrays -- */
            for ( i = 0; i < cur->num_sort_props; i++ )
                if ( cur->sort_data[i].key )
                {
                    int j;
                    for ( j = 0; j < cur->result_count; j++ )
                        if ( cur->sort_data[i].key[j] &&  cur->sort_data[i].key[j] != (propEntry *)-1 )
                           efree( cur->sort_data[i].key[j] ); /** double loop! -- memzone please */

                    efree( cur->sort_data[i].key );
                }

            efree(cur->sort_data);
         }


        /* free the property string cache, if used */
        if ( cur->prop_string_cache )
        {
            int i;
            for ( i=0; i< cur->indexf->header.metaCounter; i++ )
                if ( cur->prop_string_cache[i] )
                    efree( cur->prop_string_cache[i] );

            efree( cur->prop_string_cache );
        }
        


        efree(cur);
        cur = next;
    }

    if ( results->query )
        efree( results->query );
        

    /* Free up any results */
    Mem_ZoneFree( &results->resultSearchZone );

    /* Free up sort keys */
    Mem_ZoneFree( &results->resultSortZone );

    efree( results );
}





// #define DUMP_RESULTS 1


#ifdef DUMP_RESULTS

static void dump_result_lists( RESULTS_OBJECT *results, char *message )    
{
    DB_RESULTS *db_results = results->db_results;
    int cnt = 0;
    struct swline *query;

    printf("\nDump Results: (%s)\n", message );

    while ( db_results )
    {
        RESULT *result;
        printf("\nIndex: %s\nQuery: ", db_results->indexf->line);
        for ( query = db_results->parsed_words; query; query = query->next )
            printf("%s ", query->line );

        printf("\n");
        
        
        if ( !db_results->resultlist )
        {
            printf("  no resultlist\n");
            db_results = db_results->next;
            continue;
        }

        result = db_results->resultlist->head;

        if ( !result )
        {
            printf("  resultlist, but head is null\n");
            db_results = db_results->next;
            continue;
        }

        while ( result )
        {
            printf("  Result (%2d): filenum '%d' from index file '%s'\n", ++cnt, result->filenum, db_results->indexf->line );
            result = result->next;
        }

        printf(" end of results for index\n");

        db_results = db_results->next;
    }
    printf("end of all results.\n\n");
}
#endif





/********************************************************************************
*  SwishQuery - run a simple query
*
*   Pass in:
*       SWISH * - swish handle
*       char *  - query string
*
*   Returns:
*       RESULTS_OBJECT;
*
*
********************************************************************************/

RESULTS_OBJECT *SwishQuery(SWISH *sw, char *words )
{
    SEARCH_OBJECT *srch;
    RESULTS_OBJECT *results;

    reset_lasterror( sw );

    srch = New_Search_Object( sw, words );
    if ( sw->lasterror )
        return NULL;
        
    results = SwishExecute( srch, NULL );
    Free_Search_Object( srch );
    return results;
}




/********************************************************************************
*  SwishExecute - run a query on an existing SEARCH_OBJECT
*
*   Pass in:
*       SEARCH_OBJECT * - existing search object
*       char *  - optional query string
*
*   Returns:
*       RESULTS_OBJECT -- regardless of errors CALLER MUST DESTROY
*
*   Errors:
*       Sets sw->lasterror
*
*   ToDo:
*       localize the errorstr
*
********************************************************************************/



RESULTS_OBJECT *SwishExecute(SEARCH_OBJECT *srch, char *words)
{
    RESULTS_OBJECT *results;
    DB_RESULTS     *db_results;
    SWISH          *sw;

    if ( !srch )
        progerr("Passed in NULL search object to SwishExecute");

    sw = srch->sw;

    reset_lasterror( sw );


    /* Allow words to be passed in */
    if ( words )
        SwishSetQuery( srch, words );



    /* Create the results object based on the search input object */
    results = New_Results_Object( srch );
    if ( sw->lasterror )
        return results;


    /* This returns false when no files found within the limit */
    /* or on errors such as bad property name */

    /* $$$ make sure this is not repeated once set  */

    if ( !Prepare_PropLookup( srch ) )
        return results;



    /* Fecth results for each index file */
    db_results = results->db_results;

    while ( db_results )
    {

        /* Parse the query and run the search */
        query_index( db_results );
        

        /* Any big errors? */
        /* This is ugly, but allows processing all indexes before reporting an error */
        /* one could argue if this is the correct approach or not */
       
        
        if ( sw->lasterror )
        {
            if ( sw->lasterror == QUERY_SYNTAX_ERROR )
                return results;

            if ( sw->lasterror < results->lasterror )
                results->lasterror = sw->lasterror;

            sw->lasterror = RC_OK;
        }

        db_results = db_results->next;
    }


    /* Check for errors */
    
    if ( !results->total_files )
        sw->lasterror = INDEX_FILE_IS_EMPTY;

    else if ( !results->search_words_found )
        sw->lasterror = results->lasterror ? results->lasterror : NO_WORDS_IN_SEARCH;


    if ( sw->lasterror )
        return results;

   

    /*  Sort results by rank or by properties */

    results->total_results = sortresults( results );



    /* If no results then return the last error, or any error found while processing index files */
    if (!results->total_results )
        sw->lasterror = sw->lasterror ? sw->lasterror : results->lasterror;



#ifdef DUMP_RESULTS
    dump_result_lists( results , "After sorting" );
#endif

    return results;
}



/**************************************************************************
*  limit_result_list -- removes results that are not within the limit
*
*   Notes:
*
*   If all properties were pre-sorted would be better to limit results in
*   something like getfileinfo() before doing all the work of adding the file
*   and then removing it.  Another advantage would be that the individual
*   flag arrays (one set for each index, and one for each property name)
*   could be ANDed into a single small array (bytes or vectored)
*   for each index file.
*
*
*
***************************************************************************/

static void limit_result_list( DB_RESULTS *db_results )
{
    RESULT *result;
    RESULT *next;
    RESULT *prev;
    

    /* get first result in list */
    
    if( !(result = db_results->resultlist->head) )
        return;

    prev = NULL;

    while (result)
    {
        PROP_LIMITS *prop_limits = db_results->srch->prop_limits[db_results->index_num];
        
        if ( !LimitByProperty( db_results->indexf, prop_limits, result->filenum ) )
        {
            prev = result;
            result = result->next;
            continue;
        }

        next = result->next;

        if ( !next ) /* removing last one so set the tail to the previous one */
            db_results->resultlist->tail = prev;
            

        freeresult( result );

        if ( !prev )  /* if first in list change the head pointer */
            db_results->resultlist->head = next;
        else
            prev->next = next;

        result = result->next;            
    }

}


/********************************************************************************
*  query_index -- search a single index file
*
*   Call with:
*       srch - search object
*       indexf - the current index file
*
*   Returns:
*       DB_RESULTS - A structure that contains the list of results
*                    results have been limited with -L
*                    Note: result list may be empty
*
*   Error:
*       sets sw->lasterror on error that should abort processing
*
*   Notes:
*       $$$ Probably should always return a DB_RESULTS so can report headers
*       for all index files, and can show all the stop words removed.  FIXME!
*
*
*********************************************************************************/

static void query_index( DB_RESULTS *db_results )
{
    struct swline   *searchwordlist, *tmpswl;
    RESULTS_OBJECT  *results = db_results->results;

    

    /* This is used to detect if all the index files were empty for error reporting */
    /* $$$ Can this every happen? */
    results->total_files += db_results->indexf->header.totalfiles;


    /* convert the search into a parsed list */
    /* also sets db_results->(removed_stopwords|parsed_words) */

    if ( !(searchwordlist = parse_swish_query( db_results )) )
        return;


    results->search_words_found++;  /* flag that some words were found for search so can tell difference between all stop words removed vs. no words in query */        


    /* Now do the search */

    tmpswl = searchwordlist;

    db_results->resultlist = parseterm(db_results, 0, 1, db_results->indexf, &searchwordlist);

    freeswline( tmpswl );


    /* Limit result list by -L parameter */
    if ( db_results->srch->limit_params && db_results->resultlist )
        limit_result_list( db_results );
}

/***************************************************************************
* SwishSeekResult -- seeks to the result number specified
*
*   Returns the position or a negative number on error
*
*   Position is zero based
*
*   
*
****************************************************************************/


int     SwishSeekResult(RESULTS_OBJECT *results, int pos)
{
    int    i;
    RESULT *cur_result = NULL;

    reset_lasterror( results->sw );

    if ( pos < 0 )
        pos = 0;  /* really should warn.. */
    
    if (!results)
        return (results->sw->lasterror = INVALID_RESULTS_HANDLE);

    if ( !results->db_results )
    {
        set_progerr(SWISH_LISTRESULTS_EOF, results->sw, "Attempted to SwishSeekResult before searching");
        return SWISH_LISTRESULTS_EOF;
    }



    /* Check if only one index file -> Faster SwishSeek */

    if (!results->db_results->next)
    {
        for (i = 0, cur_result = results->db_results->sortresultlist; cur_result && i < pos; i++)
            cur_result = cur_result->next;

        results->db_results->currentresult = cur_result;
        


    } else {
        /* Well, we finally have more than one index file */
        /* In this case we have no choice - We need to read the data from disk */
        /* The easy way: Let SwishNextResult do the job */

        /* Must reset the currentresult pointers first */
        /* $$$ could keep the current result seek pos number in results, and then just offset from there if greater */
        DB_RESULTS *db_results;

        for ( db_results = results->db_results; db_results; db_results = db_results->next )
            db_results->currentresult = db_results->sortresultlist;

        /* If want first one then we are done */
        if ( 0 == pos )
            return pos;
        

        for (i = 0; i < pos; i++)
            if (!(cur_result = SwishNextResult(results)))
                break;
    }

    if (!cur_result)
        return ((results->sw->lasterror = SWISH_LISTRESULTS_EOF));

    return ( results->cur_rec_number = pos );
}





RESULT *SwishNextResult(RESULTS_OBJECT *results)
{
    RESULT *res = NULL;
    RESULT *res2 = NULL;
    int     rc;
    DB_RESULTS *db_results = NULL;
    DB_RESULTS *db_results_winner = NULL;
    SWISH   *sw = results->sw;
    
    reset_lasterror( results->sw );

    /* Seems like we should error here if there are no results */
    if ( !results->db_results )
    {
        set_progerr(SWISH_LISTRESULTS_EOF, sw, "Attempted to read results before searching");
        return NULL;
    }

    

    /* Check for a unique index file */
    if (!results->db_results->next)
    {
        if ((res = results->db_results->currentresult))
        {
            /* Increase Pointer */
            results->db_results->currentresult = res->next;
        }
    }


    else    /* tape merge to find the next one from all the index files */
    
    {
        /* We have more than one index file - can't use pre-sorted index */
        /* Get the lower value */
        db_results_winner = results->db_results;  /* get the first index */
        res = db_results_winner->currentresult;   /* and current result from first index */

        /* now loop through indexes looking for the lowest one */

        for (db_results = results->db_results->next; db_results; db_results = db_results->next)
        {
            /* Any more results for this index? If not skip and move to next index */
            if (!(res2 = db_results->currentresult))
                continue;


            if (!res)  /* first one doesn't exist, so second wins */
            {
                res = res2;
                db_results_winner = db_results;
                continue;
            }

            /* Finally, compare the properties */

            rc = compare_results(&res, &res2);

            /* If first is more than second then take second */
            if (rc < 0)
            {
                res = res2;
                db_results_winner = db_results;
            }
        }

        
        /* Move current pointer to next for this index */
        if ((res = db_results_winner->currentresult))
            db_results_winner->currentresult = res->next;
    }



    if (res)
    {
        results->cur_rec_number++;
    }
    else
    {
        // it's expected to just return null on end of list.
        // sw->lasterror = SWISH_LISTRESULTS_EOF;  
    }

        
    return res;

}






/* The recursive parsing function.
** This was a headache to make but ended up being surprisingly easy. :)
** parseone tells the function to only operate on one word or term.
** parseone is needed so that with metaA=foo bar "bar" is searched
** with the default metaname.
*/

static RESULT_LIST *parseterm(DB_RESULTS *db_results, int parseone, int metaID, IndexFILE * indexf, struct swline **searchwordlist)
{
    int     rulenum;
    char   *word;
    int     lenword;
    RESULT_LIST *l_rp,
           *new_l_rp;
    int     distance = 0;

    /*
     * The andLevel is used to help keep the ranking function honest
     * when it ANDs the results of the latest search term with
     * the results so far (rp).  The idea is that if you AND three
     * words together you ultimately want the resulting rank to
     * be the average of all three individual ranks. By keeping
     * a running total of the number of terms already ANDed, the
     * next AND operation can properly scale the average-rank-so-far
     * and recompute the new average properly (see andresultlists()).
     * This implementation is a little weak in that it will not average
     * across terms that are in parenthesis. (It treats an () expression
     * as one term, and weights it as "one".)
     */
    int     andLevel = 0;       /* number of terms ANDed so far */

    word = NULL;
    lenword = 0;

    l_rp = NULL;

    rulenum = OR_RULE;
    while (*searchwordlist)
    {
        
        word = SafeStrCopy(word, (*searchwordlist)->line, &lenword);

        if (rulenum == NO_RULE)
            rulenum = DEFAULT_RULE;


        if (isunaryrule(word))  /* is it a NOT? */
        {
            *searchwordlist = (*searchwordlist)->next;
            l_rp = parseterm(db_results, 1, metaID, indexf, searchwordlist);
            l_rp = notresultlist(db_results, l_rp, indexf);

            /* Wild goose chase */
            rulenum = NO_RULE;
            continue;
        }


        /* If it's an operator, set the current rulenum, and continue */
        else if (isbooleanrule(word))
        {
            rulenum = getrulenum(word);
            /* NEAR feature */
            if (rulenum == NEAR_RULE)
            {
                distance = atol(word + strlen(NEAR_WORD));
            }
            /* end NEAR */

            *searchwordlist = (*searchwordlist)->next;
            continue;
        }


        /* Bump up the count of AND terms for this level */
        
        if ((rulenum != AND_RULE) && (rulenum != NEAR_RULE))
            andLevel = 0;       /* reset */
        else if ((rulenum == AND_RULE) || (rulenum == NEAR_RULE))
            andLevel++;



        /* Is this the start of a sub-query? */
        /* Look for a lone "(" */

        if (word[0] == '(' && '\0' == word[1])
        {

            
            /* Recurse */
            *searchwordlist = (*searchwordlist)->next;
            new_l_rp = parseterm(db_results, 0, metaID, indexf, searchwordlist);


            if (rulenum == AND_RULE)
                l_rp = andresultlists(db_results, l_rp, new_l_rp, andLevel);

            else if (rulenum == NEAR_RULE)
                l_rp = nearresultlists(db_results, l_rp, new_l_rp, andLevel, distance);

            else if (rulenum == OR_RULE)
                l_rp = orresultlists(db_results, l_rp, new_l_rp);

            else if (rulenum == PHRASE_RULE)
                l_rp = phraseresultlists(db_results, l_rp, new_l_rp, 1);

            else if (rulenum == AND_NOT_RULE)
                l_rp = notresultlists(db_results, l_rp, new_l_rp);

            if (!*searchwordlist)
                break;

            rulenum = NO_RULE;
            continue;

        }

        /* Is this the end of a sub-query? Lone ')' */

        else if (word[0] == ')' && '\0' == word[1] )
        {
            *searchwordlist = (*searchwordlist)->next;
            break;
        }


        /* Now down to checking for metanames and actual search words */


        /* Check if the next word is '=' */
        if (isMetaNameOpNext((*searchwordlist)->next))
        {
            struct metaEntry *m = getMetaNameByName(&indexf->header, word);

            /* shouldn't happen since already checked */
            if ( !m )
                progerr("Unknown metaname '%s' -- swish_words failed to find.", word );

            metaID = m->metaID;
                
            
            /* Skip both the metaName end the '=' */
            *searchwordlist = (*searchwordlist)->next->next;

            
            if ((*searchwordlist) && ((*searchwordlist)->line[0] == '('))
            {
                *searchwordlist = (*searchwordlist)->next;
                parseone = 0;
            }
            else
                parseone = 1;

            /* Now recursively process the next terms */
            
            new_l_rp = parseterm(db_results, parseone, metaID, indexf, searchwordlist);
            if (rulenum == AND_RULE)
                l_rp = andresultlists(db_results, l_rp, new_l_rp, andLevel);

            else if (rulenum == NEAR_RULE)
                l_rp = nearresultlists(db_results, l_rp, new_l_rp, andLevel, distance);

            else if (rulenum == OR_RULE)
                l_rp = orresultlists(db_results, l_rp, new_l_rp);

            else if (rulenum == PHRASE_RULE)
                l_rp = phraseresultlists(db_results, l_rp, new_l_rp, 1);

            else if (rulenum == AND_NOT_RULE)
                l_rp = notresultlists(db_results, l_rp, new_l_rp);

            if (!*searchwordlist)
                break;

            rulenum = NO_RULE;
            metaID = 1;
            continue;
        }


        /* Finally, look up a word, and merge with previous results. */

        l_rp = operate(db_results, l_rp, rulenum, word, metaID, andLevel, indexf, distance);

        if (parseone)
        {
            *searchwordlist = (*searchwordlist)->next;
            break;
        }
        rulenum = NO_RULE;

        *searchwordlist = (*searchwordlist)->next;
    }

    if (lenword)
        efree(word);

    return l_rp;
}

/* Looks up a word in the index file -
** it calls getfileinfo(), which does the real searching.
*/

static RESULT_LIST *operate(DB_RESULTS *db_results, RESULT_LIST * l_rp, int rulenum, char *wordin, int metaID, int andLevel, IndexFILE * indexf, int distance)
{
    RESULT_LIST     *new_l_rp;
    RESULT_LIST     *return_l_rp;
    char            *word;
    int             lenword;


    /* $$$ why dup the input string?? */
    word = estrdup(wordin);
    lenword = strlen(word);

    new_l_rp = return_l_rp = NULL;


    /* Lookup the word in the index */
    new_l_rp = getfileinfo(db_results, word, metaID);

    switch (rulenum)
    {
        case AND_RULE:
            return_l_rp = andresultlists(db_results, l_rp, new_l_rp, andLevel);
            break;
        
        case NEAR_RULE:
            return_l_rp = nearresultlists(db_results, l_rp, new_l_rp, andLevel, distance);
            break;

        case OR_RULE:
            return_l_rp = orresultlists(db_results, l_rp, new_l_rp);
            break;

        case NOT_RULE:
            return_l_rp = notresultlist(db_results, new_l_rp, indexf);
            break;

        case PHRASE_RULE:
            return_l_rp = phraseresultlists(db_results, l_rp, new_l_rp, 1);
            break;

        case AND_NOT_RULE:
            return_l_rp = notresultlists(db_results, l_rp, new_l_rp);
            break;
    }


    efree(word);
    return return_l_rp;
}


static RESULT_LIST *newResultsList(DB_RESULTS *db_results)
{
    RESULTS_OBJECT *results = db_results->results;
    
    RESULT_LIST *result_list = (RESULT_LIST *)Mem_ZoneAlloc(results->resultSearchZone, sizeof(RESULT_LIST));
    memset( result_list, 0, sizeof( RESULT_LIST ) );

    result_list->results = results;
    return result_list;
}

static void addResultToList(RESULT_LIST *l_r, RESULT *r)
{
    r->next = NULL;

    if(!l_r->head)
        l_r->head = r;
    if(l_r->tail)
        l_r->tail->next = r;
    l_r->tail = r;

}


/* Routine to test structure in a result */
/* Also removes posdata that do not fit with structure field */
static int test_structure(int structure, int frequency, unsigned int *posdata)
{
    int i, j;    /* i -> counter upto frequency, j -> new frequency */
    int *p,*q;   /* Use pointers to ints instead of arrays for
                 ** faster proccess */
    
    for(i = j = 0, p = q = (int*)posdata; i < frequency; i++, p++)
    {
        if(GET_STRUCTURE(*p) & structure)
        {
            if(p - q)
            {
                *q = *p;
            }
            j++;
            q++;
        }
    }
    return j;  /* return new frequency */
}



/* Finds a word and returns its corresponding file and rank information list.
** If not found, NULL is returned.
*/
/* Jose Ruiz
** New implmentation based on Hashing for direct access. Faster!!
** Also solves stars. Faster!! It can even found "and", "or"
** when looking for "an*" or "o*" if they are not stop words
*/

#define MAX_POSDATA_STACK 256

static RESULT_LIST *getfileinfo(DB_RESULTS *db_results, char *word, int metaID)
{
    int     j,
            x,
            filenum,
            frequency,
            len,
            curmetaID,
            index_structure,
            index_structfreq,
            tmpval;
    char remains[100];   // hard-coded !!!?
    char myWord[100];
    int           rLen;
    int           tLen;
    unsigned char   *q;
    RESULT_LIST *l_rp, *l_rp2;
    sw_off_t    wordID;
    int     metadata_length;
    char   *p;
    int     tfrequency = 0;
    unsigned char   *s, *buffer; 
    int     sz_buffer;
    unsigned char flag;
    unsigned int     stack_posdata[MAX_POSDATA_STACK];  /* stack buffer for posdata */
    unsigned int    *posdata;
    IndexFILE  *indexf = db_results->indexf;
    SWISH  *sw = indexf->sw;
    int     structure = db_results->srch->structure;
    unsigned char *start;
    int saved_bytes = 0;

    x = j = filenum = frequency = len = curmetaID = index_structure = index_structfreq = 0;
    metadata_length = 0;


    l_rp = l_rp2 = NULL;

    /* how would we ever get a word here with faulty wildcards, 
       since swish_words parses for them already?
     */
    
    if (*word == '*')
    {
        sw->lasterror = WILDCARD_NOT_ALLOWED_AT_WORD_START;
        return NULL;
    }


    if (*word == '?')   /* ? may not start a word, just like * may not */
    {
        sw->lasterror = WILDCARD_NOT_ALLOWED_AT_WORD_START;
        return NULL;
    }
  
 

    /* First: Look for star at the end of the word */
    if ((p = strrchr(word, '*')))
    {
        if (p != word && *(p - 1) == '\\') /* Check for an escaped * */
        {
            p = NULL;           /* If escaped it is not a wildcard */
        }
        else
        {
            /* Check if it is at the end of the word */
            if (p == (word + strlen(word) - 1))
            {
                word[strlen(word) - 1] = '\0';
                /* Remove the wildcard - p remains not NULL */
            }
            else
            {
                p = NULL;       /* Not at the end - Ignore */
            }
        }
    }

    /* Second: Look for question mark somewhere in the word */
    strcpy(remains, (char*)"");
    rLen = 0;
    tLen = strlen(word);
    // Check for first "?" in current word (not reverse)
    if ((q = (unsigned char*)strchr(word, '?')))
    {
        if (q != (unsigned char*)word && *(q - 1) == '\\') /* Check for an escaped * */
        {
            q = NULL;           /* If escaped it is not a wildcard */
        }
        else
        {
            /* Check if it is at the end of the word */
            if (q == ((unsigned char*)word + strlen(word) - 1))
            {
                strcpy(remains, (char*)q);   // including the last "?"
                rLen = strlen(remains);
                word[strlen(word) - 1] = '\0';
            }
            else
            {
                strcpy(remains, (char*)q);   // including the first "?"
                rLen = strlen(remains);
                *q = '\0';
            }
        }
    }


    DB_InitReadWords(sw, indexf->DB);
    if ((!p) && (!q))    /* No wildcard -> Direct hash search */
    {
        DB_ReadWordHash(sw, word, &wordID, indexf->DB);

        if(!wordID)
        {    
            DB_EndReadWords(sw, indexf->DB);
            // sw->lasterror = WORD_NOT_FOUND;
            return NULL;
        }
    }        

    else  /* There is a wildcard. So use the sequential approach */
    {       
        unsigned char   *resultword;

        if (*word == '*')
        {
            sw->lasterror = UNIQUE_WILDCARD_NOT_ALLOWED_IN_WORD;
            return NULL;
        }

        
        DB_ReadFirstWordInvertedIndex(sw, word, (char**)&resultword, &wordID, indexf->DB);

        if (!wordID)
        {
            DB_EndReadWords(sw, indexf->DB);
            // sw->lasterror = WORD_NOT_FOUND;
            return NULL;
        }
        else
            strcpy(myWord, (char*)resultword);   // Remember the word
            
        efree(resultword);   /* Do not need it */
    }


    /* If code is here we have found the word !! */

    do
    {
    
       // Check if this could be a match (only if "?" is present
       if (rLen)
       {
          char *pw, *ps;
          int found = 0;
          pw = &remains[0];
          ps = &myWord[strlen(word)];

          for (; *pw && *ps; pw++, ps++)
          {
            if (*pw == '?')
            {
              if (!p)
              {
                // no wildcard "*" at end, so length should exactly match
                if ((pw == &remains[strlen(remains) - 1]) && (*(ps + 1) == '\0'))
                  found = 1;
                else
                  continue;
              }
              else
              {
                // wildcard at end, so ignore length
                if (pw == &remains[strlen(remains) - 1])
                  found = 1;
                else
                  continue;

              }
            }

            if (*pw != *ps)
              break;

            if (!p)
            {
              if ((pw == &remains[strlen(remains) - 1]) && (*(ps + 1) == '\0'))
                found = 1;
            }
            else
            {
              if (pw == &remains[strlen(remains) - 1])
                found = 1;
            }

          }

          if (!found)
          {
            unsigned char   *resultword;

            /* Jump to next word */
            /* No more data for this word but we
               are in sequential search because of
               the star (p is not null) */
            /* So, go for next word */
            DB_ReadNextWordInvertedIndex(sw, word, (char**)&resultword, &wordID, indexf->DB);
            if (! wordID)
                break;          /* no more data */
            else
              strcpy(myWord, (char*)resultword);

            efree(resultword);  /* Do not need it (although might be useful for highlighting some day) */

            continue;
          }
       }


        DB_ReadWordData(sw, wordID, &buffer, &sz_buffer, &saved_bytes , indexf->DB);
        uncompress_worddata(&buffer,&sz_buffer,saved_bytes);

        s = buffer;

        // buffer structure = <tfreq><metaID><delta to next meta>

        /* Get the data of the word */
        tfrequency = uncompress2(&s); /* tfrequency - number of files with this word */

        /* Now look for a correct Metaname */
        curmetaID = uncompress2(&s);

        while (curmetaID)
        {
            metadata_length = uncompress2(&s);
            
            if (curmetaID >= metaID)
                break;

            /* If this is not the searched metaID jump onto next one */
            s += metadata_length;

            /* Check if no more meta data */
            if(s == (buffer + sz_buffer))
                break; /* exit if no more meta data */

            curmetaID = uncompress2(&s);
        }

        if (curmetaID == metaID) /* found a matching meta value */
        {
            int meta_rank = metaID * -1;  /*  store metaID in rank value until computed by getrank() */
            filenum = 0;
            start = s;   /* points to the star of data */
            do
            {
                /* Read on all items */
                uncompress_location_values(&s,&flag,&tmpval,&frequency);
                filenum += tmpval;  

                /* stack_posdata is just to avoid calling emalloc */
                /* it should be enough for most cases */
                if(frequency > MAX_POSDATA_STACK)
                    posdata = (unsigned int *)emalloc(frequency * sizeof(int));
                else
                    posdata = stack_posdata;

                /* read positions */
                uncompress_location_positions(&s,flag,frequency,posdata);

                /* test (limit by) structure and adjust frequency */
                frequency = test_structure(structure, frequency, posdata);

                /* Store metaID * -1 in rank - In this way, we can delay its computation */

                /* Store result */
                /* 2003-01 jmruiz. Check also if file is deleted */
                if(frequency && ((!indexf->header.removedfiles) || DB_CheckFileNum(sw,filenum,indexf->DB)))
                {
                    /* This is very useful if we sorted by other property */
                    if(!l_rp)
                       l_rp = newResultsList(db_results);

                    /*
		       tfrequency = number of files with this word
                       frequency = number of times this words is in this document for this metaID
                       metarank is the negative of the metaID - for use in getrank()
		    */

                    addtoresultlist(l_rp, filenum, meta_rank, tfrequency, frequency, db_results);

                    /* Copy positions */
                    memcpy((unsigned char *)l_rp->tail->posdata,(unsigned char *)posdata,frequency * sizeof(int));

                    /* Calculate rank now -- can't delay as an optimization */
                    getrank( l_rp->tail );
                }
                if(posdata != stack_posdata)
                    efree(posdata);
                    

            } while ((s - start) != metadata_length);


        }

        efree(buffer);


        if ((!p) && (!q))
            break;              /* direct access (no wild card) -> break */

        else
        {
            unsigned char   *resultword;

            /* Jump to next word */
            /* No more data for this word but we
               are in sequential search because of
               the star (p is not null) */
            /* So, go for next word */
            DB_ReadNextWordInvertedIndex(sw, word, (char**)&resultword, &wordID, indexf->DB);
            if (! wordID)
                break;          /* no more data */

            else
                strcpy(myWord, (char*)resultword); // remember the word

            efree(resultword);  /* Do not need it (although might be useful for highlighting some day) */
        }

    } while(1);   /* continue on in loop for wildcard search */



    if ((p) || (q))
    {
        /* Finally, if we are in an sequential search merge all results */
        l_rp = mergeresulthashlist(db_results, l_rp);
    }

    DB_EndReadWords(sw, indexf->DB);
    return l_rp;
}


/*
  -- Rules checking
  -- u_is...  = user rules (and, or, ...)
  -- is...    = internal rules checking
 */




/* Is a word a boolean rule?
*/

static int     isbooleanrule(char *word)
{
    if (!strcmp(word, AND_WORD) || !strncmp(word, NEAR_WORD, strlen(NEAR_WORD)) || !strcmp(word, OR_WORD) || !strcmp(word, PHRASE_WORD) || !strcmp(word, AND_NOT_WORD))
        return 1;
    else
        return 0;
}

/* Is a word a unary rule?
*/

static int     isunaryrule(char *word)
{
    if (!strcmp(word, NOT_WORD))
        return 1;
    else
        return 0;
}

/* Return the number for a rule.
*/

static int     getrulenum(char *word)
{
    if (!strcmp(word, AND_WORD))
        return AND_RULE;
    else if (!strncmp(word, NEAR_WORD, strlen(NEAR_WORD)))
        return NEAR_RULE;
    else if (!strcmp(word, OR_WORD))
        return OR_RULE;
    else if (!strcmp(word, NOT_WORD))
        return NOT_RULE;
    else if (!strcmp(word, PHRASE_WORD))
        return PHRASE_RULE;
    else if (!strcmp(word, AND_NOT_WORD))
        return AND_NOT_RULE;
    return NO_RULE;
}

// Check if new position is still valid for ALL other
// position sequences; at least one position within each
// sequence should be valid
// Definition of sequence: one or more positions, where each
//                         sequence is separated from another
//                         by means of a "0" (zero)
static int KeepPos(RESULT *r, int pos, int dist)
{
  int i;
  int pos1;
  int first;
  int found;
  int detect;

  // no earlier "nearx" for this document; so the position
  // to be checked is always a valid one, otherwise it wouldn't
  // arrive here
  if (r->bArea == 0)
    return(1);

  found = 0;
  detect = 0;
  first = 1;
  for (i = 0; i < r->frequency; i++)
  {
    pos1 = GET_POSITION(r->posdata[i]);
    if (pos1 == 0)
    {
      if (first)
      {
        found = detect;
        first = 0;
      }
      else
      {
        found = found & detect;
      }
      detect = 0;
      continue;
    }
    else
    {
      if (abs(pos1 - pos) <= dist)
        detect = 1;
    }
  }

  // Also for positions after last 0
  found = found & detect;

  if (found)
    return (1);

  return(0);
}

/* NEAR WORD feature -- proximity hits */
/* this and other NEAR WORD code contributed by Herman Knoops hk.sw@knoman.com */
//#define DUMP_NEAR_VALUES  1

// This is a special case of proximity. A sequence of single words/terms connected
// via nearX, will be checked for proximity on a certain area, e.g. wordA near50
// wordB near50 wordC will give a hit if all three words are in an area of 50 words.

// A generic nearX can be easily derived from this.

/* Takes two lists of results from searches and ANDs them together.
** On input, both result lists r1 and r2 must be sorted by filenum
** On output, the new result list remains sorted
*/
static RESULT_LIST *nearresultlists(DB_RESULTS *db_results, RESULT_LIST * l_r1, RESULT_LIST * l_r2, int andLevel, int distance)
{
    RESULT_LIST *new_results_list = NULL;
    RESULT *r1;
    RESULT *r2;
    int res = 0;
    int i, j, pos1, pos2;
    int found, found1, found2, detect1, detect2;
    int iZero = 0;
    int first1, first2;
    int *posd1 = NULL;
    int *posd2 = NULL;
    int cnt1, cnt2;
#ifdef DUMP_NEAR_VALUES
    FILE *ofd;
    int maxneed = 0;
#endif

    // Check if a valid distance was specified
    if (distance == 0)
      return(andresultlists(db_results, l_r1, l_r2, andLevel));

    /* patch provided by Mukund Srinivasan */
    if (l_r1 == NULL || l_r2 == NULL)
    {
        make_db_res_and_free(l_r1);
        make_db_res_and_free(l_r2);
        return NULL;
    }

#ifdef DUMP_NEAR_VALUES
    // Do not open earlier, otherwise file could be left open !!!
    ofd = fopen("kmlog.txt", "ab");
#endif

    if (andLevel < 1)
        andLevel = 1;

    for (r1 = l_r1->head, r2 = l_r2->head; r1 && r2;)
    {
        res = r1->filenum - r2->filenum;
        if (!res)
        {
            /*
             * Computing the new rank is interesting because
             * we want to weight each of the words that was
             * previously ANDed equally along with the new word.
             * We compute a running average using andLevel and
             * simply scale up the old average (in r1->rank)
             * and recompute a new, equally weighted average.
             */
            int     newRank = 0;
#ifdef DUMP_NEAR_VALUES
            int     maxpos;

            // Determine max combinations
            maxpos = (r1->frequency * r2->frequency);

            if (maxneed > 0)
            {
              fprintf(ofd,"  maxneed: %ld\n", maxneed);
              maxneed = 0; // reset for next to come
            }

#endif
            cnt1 = 0;
            cnt2 = 0;

#ifdef DUMP_NEAR_VALUES
            // make sure to skip the found entry if not within given proximity
            fprintf(ofd, "file %ld (andLevel %ld)\n", r1->filenum, andLevel);

            // Detects if there was already a "nearx" executed before, which
            // means there must be one or more "0" present in positions
            if (r1->bArea > 0)
              fprintf(ofd,"  bArea1: %ld\n", r1->bArea);
            // This can never happen, as long as no parenthesis/brackets
            // are supported; the complete query is parsed left to right
            // TODO: modify if priority brackets are going to be supported
            if (r2->bArea > 0)
              fprintf(ofd,"  bArea2: %ld\n", r2->bArea);

            fprintf(ofd,"  maxpos: %ld\n", maxpos);
            fprintf(ofd,"  %s: ", "term1");
            for (j = 0; j < r1->frequency; j++)
              fprintf(ofd, "  %ld:", GET_POSITION(r1->posdata[j]));
            fprintf(ofd,"\n");

            fprintf(ofd,"  %s: ", "term2");
            for (j = 0; j < r2->frequency; j++)
              fprintf(ofd, "  %ld:", GET_POSITION(r2->posdata[j]));
            fprintf(ofd,"\n");
#endif

            found1 = found2 = 0;
            detect1 = detect2 = 0;
            first1 = first2 = 1;

            for (i = 0; i < r1->frequency; i++)
            {
              pos1 = GET_POSITION(r1->posdata[i]);

              // Check if "0" is present in a posdata array. If so, there has been one or more
              // AND combinations already (each separated with "0").
              if (pos1 == 0)
              {
                if (first1 == 1)
                {
                  found1 = detect1;
                  first1 = 0;
                }
                else
                {
                  found1 = found1 & detect1;
                }
                // Reset to start checking next serie after an "0"
                detect1 = 0;
                cnt1++;

                // BUGFIX: copy also the 0 in between, otherwise more than
                // two "and" operator will gor wrong for the 3td, 4th, etc.
                if (posd1)
                  posd1 = (int *)erealloc(posd1, cnt1 * sizeof(int));
                else
                  posd1 = (int *)emalloc(cnt1 * sizeof(int));
                posd1[cnt1-1] = r1->posdata[i];

                continue;
              }

              for (j = 0; j < r2->frequency; j++)
              {
                pos2 = GET_POSITION(r2->posdata[j]);

                // Check if "0" is present in a posdata array. If so, there has been one or more
                // AND combinations already (each separated with "0").
                if (pos2 == 0)
                {
                  if (first2 == 1)
                  {
                    found2 = detect2;
                    first2 = 0;
                  }
                  else
                  {
                    found2 = found2 & detect2;
                  }
                  // Reset to start checking next serie after an "0"
                  detect2 = 0;
                  cnt2++;

                  // BUGFIX: copy also the 0 in between, otherwise more than
                  // two "and" operator will gor wrong for the 3td, 4th, etc.
                  if (posd2)
                    posd2 = (int *)erealloc(posd2, cnt2 * sizeof(int));
                  else
                    posd2 = (int *)emalloc(cnt2 * sizeof(int));
                  posd2[cnt2-1] = r2->posdata[j];

                  continue;     // skip 0
                }
// enable: maybe if parenthesis support for near is added ???
// enable ??               if ((abs(pos1 - pos2) <= distance) && (pos1 != pos2))
                if ((abs(pos1 - pos2) <= distance) && KeepPos(r1, pos2, distance))
                {
                  detect1 = detect2 = 1;

#ifdef DUMP_NEAR_VALUES
                  maxneed++;
                  fprintf(ofd, "  hit %ld: (%ld - %ld): %ld\n", i, pos1, pos2, abs(pos1 - pos2));
#endif
                  cnt1++;
                  cnt2++;
                  if (posd1)
                    posd1 = (int *)erealloc(posd1, cnt1 * sizeof(int));
                  else
                    posd1 = (int *)emalloc(cnt1 * sizeof(int));
                  if (posd2)
                    posd2 = (int *)erealloc(posd2, cnt2 * sizeof(int));
                  else
                    posd2 = (int *)emalloc(cnt2 * sizeof(int));
                  
                  posd1[cnt1-1] = r1->posdata[i];
                  posd2[cnt2-1] = r2->posdata[j];
                }
              } // for r2
            } // for r1

            // Check if this was the first serie of two or more "0" connected patterns.
            // For single patterns (no "0"), the variable "found" is already filled correctly.
            if (first1 == 1)
              found1 = detect1;
            else
              found1 = found1 & detect1;

            if (first2 == 1)
              found2 = detect2;
            else
              found2 = found2 & detect2;

            // overall result
            found = found1 & found2;


            // if there was a proximity hit then process it
            if (found)
            {
              newRank = ((r1->rank * andLevel) + r2->rank) / (andLevel + 1);

              if(!new_results_list)
                  new_results_list = newResultsList(db_results);

              addtoresultlist(new_results_list, r1->filenum, newRank, 0, cnt1 + cnt2 + 1, db_results);

              new_results_list->tail->bArea = r1->bArea + r2->bArea + 1;

              /* Storing all positions could be useful in the future  */
              /* BEWARE: an extra zero is inserted to make sure ALL words/terms of a previous near-operation */
              /*         also have a proximity to this new word/term */
              /*         Could give side-effects for people, who use the positions for highlighting !!! */ 

              CopyPositions(new_results_list->tail->posdata, 0, posd1, 0, cnt1);
              CopyPositions(new_results_list->tail->posdata, cnt1, &iZero, 0, 1);
              CopyPositions(new_results_list->tail->posdata, cnt1 + 1, posd2, 0, cnt2);
            }

            // Free if allocation has been performed
            if (posd1 != NULL)
            {
              efree(posd1);
              posd1 = NULL;
            }
            if (posd2 != NULL)
            {
              efree(posd2);
              posd2 = NULL;
            }

            r1 = r1->next;
            r2 = r2->next;
        }

        else if (res > 0)
        {
            r2 = r2->next;
        }
        else
        {
            r1 = r1->next;
        }
    } // for


#ifdef DUMP_NEAR_VALUES
    if (maxneed > 0)
      fprintf(ofd,"  maxneed: %ld\n", maxneed);

    fclose(ofd);
#endif

    return new_results_list;
}



/* Takes two lists of results from searches and ANDs them together.
** On input, both result lists r1 and r2 must be sorted by filenum
** On output, the new result list remains sorted
*/

static RESULT_LIST *andresultlists(DB_RESULTS *db_results, RESULT_LIST * l_r1, RESULT_LIST * l_r2, int andLevel)
{
    RESULT_LIST *new_results_list = NULL;
    RESULT *r1;
    RESULT *r2;
    int     res = 0;


    /* patch provided by Mukund Srinivasan */
    if (l_r1 == NULL || l_r2 == NULL)
    {
        make_db_res_and_free(l_r1);
        make_db_res_and_free(l_r2);
        return NULL;
    }

    if (andLevel < 1)
        andLevel = 1;

    for (r1 = l_r1->head, r2 = l_r2->head; r1 && r2;)
    {
        res = r1->filenum - r2->filenum;
        if (!res)
        {
            /*
             * Computing the new rank is interesting because
             * we want to weight each of the words that was
             * previously ANDed equally along with the new word.
             * We compute a running average using andLevel and
             * simply scale up the old average (in r1->rank)
             * and recompute a new, equally weighted average.
             */
            int     newRank = 0;

            newRank = ((r1->rank * andLevel) + r2->rank) / (andLevel + 1);


            if ( DEBUG_RANK )
            {
                fprintf( stderr, "File num: %d  1st score: %d  2nd score: %d  andLevel: %d  newRank:  %d\n----\n", 
                    r1->filenum, r1->rank, r2->rank, andLevel, newRank );
            }
            

            if(!new_results_list)
                new_results_list = newResultsList(db_results);

            addtoresultlist(new_results_list, r1->filenum, newRank, 0, r1->frequency + r2->frequency, db_results);


            /* Storing all positions could be useful in the future  */

            CopyPositions(new_results_list->tail->posdata, 0, r1->posdata, 0, r1->frequency);
            CopyPositions(new_results_list->tail->posdata, r1->frequency, r2->posdata, 0, r2->frequency);


            r1 = r1->next;
            r2 = r2->next;
        }

        else if (res > 0)
        {
            r2 = r2->next;
        }
        else
        {
            r1 = r1->next;
        }
    }

    return new_results_list;
}

/* Takes two lists of results from searches and ORs them together.
2001-11 jmruiz Completely rewritten. Older one was really
               slow when the lists are very long
               On input, both result lists r1 and r2 must be sorted by filenum
               On output, the new result list remains sorted

               rank is combined for matching files.  That is,
               "foo OR bar" will rank files with both higher.

*/


static RESULT_LIST *orresultlists(DB_RESULTS *db_results, RESULT_LIST * l_r1, RESULT_LIST * l_r2)
{
    int     rc;
    RESULT *r1;
    RESULT *r2;
    RESULT *rp,
           *tmp;
    RESULT_LIST *new_results_list = NULL;
    RESULTS_OBJECT *results = db_results->results;
    /* TODO use to detect rank size overflow 
    unsigned int max_rank_size = 256 ^ sizeof(int);
    */

    /* If either list is empty, just return the other */
    if (l_r1 == NULL)
        return l_r2;

    else if (l_r2 == NULL)
        return l_r1;

    /* Look for files that have both words, and add up the ranks */

    r1 = l_r1->head;
    r2 = l_r2->head;

    while(r1 && r2)
    {
        rc = r1->filenum - r2->filenum;
        if(rc < 0)
        {
            rp = r1;
            r1 = r1->next;
        }
        else if(rc > 0)
        {
            rp = r2;
            r2 = r2->next;
        }

        else /* Matching file number */
        {
            int result_size, r1rank, r2rank;
            
            /* Create a new RESULT - Should be a function to create this, I'd think */

            result_size = sizeof(RESULT) + ( (r1->frequency + r2->frequency - 1) * sizeof(int) );
            rp = (RESULT *) Mem_ZoneAlloc(results->resultSearchZone, result_size );
            memset( rp, 0, result_size );

            rp->fi.filenum = rp->filenum = r1->filenum;

            /* TODO *2 breaks sort if rank > 2^32 */
            
            r1rank = r1->rank;
            r2rank = r2->rank;
                        
            rp->rank = ( r1rank + r2rank );  /* bump up the or terms */
	    
            if (DEBUG_RANK)
            {
                fprintf( stderr, "----\nFile num: %d  1st score: %d  2nd score: %d  newRank:  %d\n", 
                    r1->filenum, r1->rank, r2->rank, rp->rank );
            }

            rp->tfrequency = 0;
            rp->frequency = r1->frequency + r2->frequency;
            rp->db_results = r1->db_results;

            
            /* save the combined position data in the new result.  (Would freq ever be zero?) */
            if (r1->frequency)
                CopyPositions(rp->posdata, 0, r1->posdata, 0, r1->frequency);

            if (r2->frequency)
                CopyPositions(rp->posdata, r1->frequency, r2->posdata, 0, r2->frequency);

            r1 = r1->next;
            r2 = r2->next;
        }


        /* Now add the result to the output list */
        
        if(!new_results_list)
            new_results_list = newResultsList(db_results);

        addResultToList(new_results_list,rp);
    }


    /* Add the remaining results */

    tmp = r1 ? r1 : r2;

    while(tmp)
    {
        rp = tmp;
        tmp = tmp->next;
        if(!new_results_list)
            new_results_list = newResultsList(db_results);

        addResultToList(new_results_list,rp);
    }

    return new_results_list;
}


/* 2001-10 jmruiz - This code was originally at merge.c
**                  Also made it thread safe 
*/
/* These three routines are only used by notresultlist */

struct markentry
{
    struct markentry *next;
    int     num;
};

/* This marks a number as having been printed.
*/

static void    marknum(RESULTS_OBJECT *results, struct markentry **markentrylist, int num)
{
    unsigned hashval;
    struct markentry *mp;

    mp = (struct markentry *) Mem_ZoneAlloc( results->resultSearchZone, sizeof(struct markentry));

    mp->num = num;

    hashval = bignumhash(num);
    mp->next = markentrylist[hashval];
    markentrylist[hashval] = mp;
}


/* Has a number been printed?
*/

static int     ismarked(struct markentry **markentrylist, int num)
{
    unsigned hashval;
    struct markentry *mp;

    hashval = bignumhash(num);
    mp = markentrylist[hashval];

    while (mp != NULL)
    {
        if (mp->num == num)
            return 1;
        mp = mp->next;
    }
    return 0;
}

/* Initialize the marking list.
*/

static void    initmarkentrylist(struct markentry **markentrylist)
{
    int     i;

    for (i = 0; i < BIGHASHSIZE; i++)
        markentrylist[i] = NULL;
}

static void    freemarkentrylist(struct markentry **markentrylist)
{
    int     i;

    for (i = 0; i < BIGHASHSIZE; i++)
    {
        markentrylist[i] = NULL;
    }
}

/* This performs the NOT unary operation on a result list.
** NOTed files are marked with a default rank of 1000.
**
** Basically it returns all the files that have not been
** marked (GH)
*/

static RESULT_LIST *notresultlist(DB_RESULTS *db_results, RESULT_LIST * l_rp, IndexFILE * indexf)
{
    int     i,
            filenums;
    RESULT *rp;
    RESULT_LIST *new_results_list = NULL;
    struct markentry *markentrylist[BIGHASHSIZE];
    RESULTS_OBJECT *results = db_results->results;

    if(!l_rp)
        rp = NULL;
    else
        rp = l_rp->head;

    initmarkentrylist(markentrylist);
    while (rp != NULL)
    {
        marknum(results, markentrylist, rp->filenum);
        rp = rp->next;
    }

    filenums = indexf->header.totalfiles;

    for (i = 1; i <= filenums; i++)
    {
        if (!ismarked(markentrylist, i) && DB_CheckFileNum( indexf->sw, i, indexf->DB ) )
        {
            if(!new_results_list)
                new_results_list = newResultsList(db_results);

            addtoresultlist(new_results_list, i, 1000, 0, 0, db_results);
        }
    }

    freemarkentrylist(markentrylist);

    new_results_list = sortresultsbyfilenum(new_results_list);

    return new_results_list;
}

/* Phrase result routine - see distance parameter. For phrase search this
** value must be 1 (consecutive words)
**
** On input, both result lists r1 abd r2 must be sorted by filenum
** On output, the new result list remains sorted
*/
static RESULT_LIST *phraseresultlists(DB_RESULTS *db_results, RESULT_LIST * l_r1, RESULT_LIST * l_r2, int distance)
{
    int     i,
            j,
            found,
            newRank,
           *allpositions;
    int     res = 0;
    RESULT_LIST *new_results_list = NULL;
    RESULT *r1, *r2;
                


    if (l_r1 == NULL || l_r2 == NULL)
    {
        make_db_res_and_free(l_r1);
        make_db_res_and_free(l_r2);
        return NULL;
    }

    for (r1 = l_r1->head, r2 = l_r2->head; r1 && r2;)
    {
        res = r1->filenum - r2->filenum;
        if (!res)
        {
            found = 0;
            allpositions = NULL;
            for (i = 0; i < r1->frequency; i++)
            {
                for (j = 0; j < r2->frequency; j++)
                {
                    if ((GET_POSITION(r1->posdata[i]) + distance) == GET_POSITION(r2->posdata[j]))
                    {
                        found++;
                        if (allpositions)
                            allpositions = (int *) erealloc(allpositions, found * sizeof(int));

                        else
                            allpositions = (int *) emalloc(found * sizeof(int));

                        allpositions[found - 1] = r2->posdata[j];
                        break;
                    }
                }
            }
            if (found)
            {
                newRank = (r1->rank + r2->rank) / 2;

                /*
                   * Storing positions is neccesary for further
                   * operations 
                 */
                if(!new_results_list)
                    new_results_list = newResultsList(db_results);
                
                addtoresultlist(new_results_list, r1->filenum, newRank, 0, found, db_results);

                CopyPositions(new_results_list->tail->posdata, 0, allpositions, 0, found);
                efree(allpositions);
            }
            r1 = r1->next;
            r2 = r2->next;
        }
        else if (res > 0)
        {
            r2 = r2->next;
        }
        else
        {
            r1 = r1->next;
        }

    }

    return new_results_list;
}



/* Adds a file number and rank to a list of results.
*/


static void addtoresultlist(RESULT_LIST * l_rp, int filenum, int rank, int tfrequency, int frequency, DB_RESULTS *db_results)
{
    RESULT *newnode;
    int     result_size;
    RESULTS_OBJECT *results = db_results->results;

    result_size = sizeof(RESULT) + ((frequency - 1) * sizeof(int));
    newnode = (RESULT *) Mem_ZoneAlloc( results->resultSearchZone, result_size );
    memset( newnode, 0, result_size );
    newnode->fi.filenum = newnode->filenum = filenum;

    newnode->rank = rank;
    newnode->tfrequency = tfrequency;
    newnode->frequency = frequency;
    
    newnode->bArea = 0;
    newnode->pArea = NULL;
    
    newnode->db_results = db_results;

    addResultToList(l_rp, newnode);
}



/* Checks if the next word is "="
*/

int     isMetaNameOpNext(struct swline *searchWord)
{
    if (searchWord == NULL)
        return 0;

    if (!strcmp(searchWord->line, "="))
        return 1;

    return 0;
}

/* Free up a list of results that has not been assigned to a DB_RESULTS struct yet */
/* July 03 - This is called by andresultlists() and phraseresultlists() */
/*           isn't really needed because at this point in the search there's nothing  */
/*           attached to the result that needs to be freed. */

static void  make_db_res_and_free(RESULT_LIST *l_res)
{
    DB_RESULTS tmp;
    memset (&tmp,0,sizeof(DB_RESULTS));
    tmp.resultlist = l_res;
    freeresultlist(&tmp);
}



/* funtion to free all memory of a list of results */
static void    freeresultlist(DB_RESULTS *dbres)
{
    RESULT *rp;
    RESULT *tmp;

    if(dbres->resultlist)
        rp = dbres->resultlist->head;
    else
        rp = NULL;

    while (rp)
    {
        tmp = rp->next;
        freeresult(rp);
        rp = tmp;
    }
    dbres->resultlist = NULL;
    dbres->currentresult = NULL;
    dbres->sortresultlist = NULL;
}

/* funtion to free the memory of one result */
static void    freeresult(RESULT * rp)
{
    DB_RESULTS *db_results;

    if (!rp)
        return;


    freefileinfo( &rp->fi );  // may have already been freed

    
    db_results = rp->db_results;

}



/* 01/2001 Jose Ruiz */
/* Compare RESULTS using RANK */
/* This routine is used by qsort */
static int     compResultsByFileNum(const void *s1, const void *s2)
{
    return ((*(RESULT * const *) s1)->filenum - (*(RESULT * const *) s2)->filenum);
}



/* 
06/00 Jose Ruiz - Sort results by filenum
Uses an array and qsort for better performance
Used for faster "and" and "phrase" of results
*/
static RESULT_LIST *sortresultsbyfilenum(RESULT_LIST * l_rp)
{
    int     i,
            j;
    RESULT **ptmp;
    RESULT *rp;

    /* Very trivial case */
    if (!l_rp)
        return NULL;


    /* Compute results */
    for (i = 0, rp = l_rp->head; rp; rp = rp->next, i++);
    /* Another very trivial case */
    if (i == 1)
        return l_rp;
    /* Compute array size */
    ptmp = (void *) emalloc(i * sizeof(RESULT *));
    /* Build an array with the elements to compare
       and pointers to data */
    for (j = 0, rp = l_rp->head; rp; rp = rp->next)
        ptmp[j++] = rp;
    /* Sort them */
    swish_qsort(ptmp, i, sizeof(RESULT *), &compResultsByFileNum);
    /* Build the list */
    for (j = 0, rp = NULL; j < i; j++)
    {
        if (!rp)
            l_rp->head = ptmp[j];
        else
            rp->next = ptmp[j];
        rp = ptmp[j];
    }
    rp->next = NULL;
    l_rp->tail = rp;

    /* Free the memory of the array */
    efree(ptmp);

    return l_rp;
}


/* 06/00 Jose Ruiz
** returns all results in r1 that not contains r2 
**
** On input, both result lists r1 and r2 must be sorted by filenum
** On output, the new result list remains sorted
*/
static RESULT_LIST *notresultlists(DB_RESULTS *db_results, RESULT_LIST * l_r1, RESULT_LIST * l_r2)
{
    RESULT *rp, *r1, *r2;
    RESULT_LIST *new_results_list = NULL;
    int     res = 0;

    if (!l_r1)
        return NULL;
    if (l_r1 && !l_r2)
        return l_r1;

    for (r1 = l_r1->head, r2 = l_r2->head; r1 && r2;)
    {
        res = r1->filenum - r2->filenum;
        if (res < 0)
        {
            /*
               * Storing all positions could be useful
               * in the future
             */

            rp = r1;
            r1 = r1->next;
            if(!new_results_list)
                new_results_list = newResultsList(db_results);
            addResultToList(new_results_list,rp);
        }
        else if (res > 0)
        {
            r2 = r2->next;
        }
        else
        {
            r1 = r1->next;
            r2 = r2->next;
        }
    }
    /* Add remaining results */
    while (r1)
    {
        rp = r1;
        r1 = r1->next;
        if(!new_results_list)
            new_results_list = newResultsList(db_results);
        addResultToList(new_results_list,rp);
    }

    return new_results_list;
}



/* Compare two positions as stored in posdata */
/* This routine is used by qsort */
static int     icomp_posdata(const void *s1, const void *s2)
{
    return (GET_POSITION(*(unsigned int *) s1) - GET_POSITION(*(unsigned int *) s2));
}




/* Adds a file number to a hash table of results.
** If the entry's already there, add the ranks,
** else make a new entry.
**
** Jose Ruiz 04/00
** For better performance in large "or"
** keep the lists sorted by filenum
**
** Jose Ruiz 2001/11 Rewritten to get better performance
*/
static RESULT_LIST *mergeresulthashlist(DB_RESULTS *db_results, RESULT_LIST *l_r)
{
    unsigned hashval;
    RESULT *r,
           *rp,
           *tmp,
           *next,
           *start,
           *newnode = NULL;
    RESULT_LIST *new_results_list = NULL;
    int    i,
           tot_frequency,
           pos_off,
           filenum;
    RESULTS_OBJECT *results = db_results->results;

    if(!l_r)
        return NULL;

    if(!l_r->head)
        return NULL;

    /* Init hash table */
    for (i = 0; i < BIGHASHSIZE; i++)
        results->resulthashlist[i] = NULL;

    for(r = l_r->head, next = NULL; r; r =next)
    {
        next = r->next;

        tmp = NULL;
        hashval = bignumhash(r->filenum);

        rp = results->resulthashlist[hashval];

        for(tmp = NULL; rp; )
        {
            if (r->filenum <= rp->filenum)
            {
                break;
            }
            tmp = rp;
            rp = rp->next;
        }
        if (tmp)
        {
            tmp->next = r;
        }
        else
        {
            results->resulthashlist[hashval] = r;
        }
        r->next = rp;
    }

    /* Now coalesce reptitive filenums */
    for (i = 0; i < BIGHASHSIZE; i++)
    {
        rp = results->resulthashlist[i];
        for (filenum = 0, start = NULL; ; )
        {
            if(rp)
                next = rp->next;
            if(!rp || rp->filenum != filenum)
            {
                /* Start of new block, coalesce previous results */
                if(filenum)
                {
                    int result_size;
                    
                    for(tmp = start, tot_frequency = 0; tmp!=rp; tmp = tmp->next)
                    {
                        tot_frequency += tmp->frequency;                        
                    }

                    result_size = sizeof(RESULT) + ((tot_frequency - 1) * sizeof(int));
                    newnode = (RESULT *) Mem_ZoneAlloc(results->resultSearchZone, result_size );
                    memset( newnode, 0, result_size );
                    
                    newnode->fi.filenum = newnode->filenum = filenum;
                    newnode->rank = 0;
                    newnode->tfrequency = 0;
                    newnode->frequency = tot_frequency;
                    newnode->db_results = start->db_results;

                    for(tmp = start, pos_off = 0; tmp!=rp; tmp = tmp->next)
                    {
                        newnode->rank += tmp->rank;

                        if (tmp->frequency)
                        {
                            CopyPositions(newnode->posdata, pos_off, tmp->posdata, 0, tmp->frequency);
                            pos_off += tmp->frequency;
                        }

                    }
                    /* Add at the end of new_results_list */
                    if(!new_results_list)
                    {
                        new_results_list = newResultsList(db_results);
                    }
                    addResultToList(new_results_list,newnode);
                    /* Sort positions */
                    swish_qsort(newnode->posdata,newnode->frequency,sizeof(int),&icomp_posdata);
                }
                if(rp)
                    filenum = rp->filenum;
                start = rp;
            }
            if(!rp)
                break;
            rp = next;
        }
    }

    /* Sort results by filenum  and return */
    return sortresultsbyfilenum(new_results_list);
}

����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/swishspider�����������������������������������������������������������������������0000775�0000771�0001750�00000010431�11166010110�013073� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/usr/bin/perl -w
use strict;

# print STDERR "spider $$ [@ARGV]\n";

#
# SWISH-E http method Spider
# $Id: swishspider 1337 2003-08-30 05:02:55Z whmoseley $ 
#

# Should SWISH::Filter be use for filtering?  This can be left 1 all the time, but
# will add a little time to processing since.

# Note: Just because USE_FILTERS is true doesn't mean SWISH::Filter is in @INC.
# To get the path to use (for "use lib") run swish-filter-test -path.  Or another way:
#
#  PERL5LIB=`swish-filter-test -path` swish-e -S http -i http://localhost/index.html
#
# The @INC path is not set by default in swishspider because loading the SWISH::Filter
# modules for every URL might be slow.

use constant USE_FILTERS  => 1;  # 1 = yes use SWISH::Filter for filtering, 0 = no. (faster processing if not set)
use constant FILTER_TEXT  => 0;  # set to one to filter text/* content, 0 will save processing time
use constant DEBUG_FILTER => 0;  # set to one to report errors on loading SWISH::Filter module.

use LWP::UserAgent;
use HTTP::Status;
use HTML::Parser 3.00;
use HTML::LinkExtor;

    if (scalar(@ARGV) != 2) {
        print STDERR "Usage: $0 localpath url\n";
        exit(1);
    }

    my $ua = new LWP::UserAgent;
    $ua->agent( "SwishSpider http://swish-e.org" );


    my $localpath = shift;
    my $url = shift;

    my $request = new HTTP::Request( "GET", $url );
    my $response = $ua->simple_request( $request );

    # Save the HTTP code, the content/type (or a redirection header), and a last modified date, if one.

    open( RESP, ">$localpath.response" ) || die( "Could not open response file $localpath.response: $!" );
    print RESP $response->code() . "\n";



    # If failed to fetch doc then write out the code and location and exit
    
    if( $response->code != RC_OK ) {
        print RESP ($response->header( "location" ) ||'') . "\n";
        exit;
    }


    # Filter the document, if possible.

    my ( $content_ref, $content_type ) = filter_doc( $response );


    # Write out the (perhaps new) content type and the last modified date.

    print RESP "$content_type\n",
               ($response->last_modified || 0), "\n";

    close RESP;



    # Now write the content -- really only need to do this on text/* types since that's all swish processes
    # No, that's not true.  Can use FileFilter inside of swish-e on binary data.

    open( CONTENTS, ">$localpath.contents" ) || die( "Could not open contents file $localpath.contents: $!\n" );

    # Enable binmode if the contents is not text/*
    binmode CONTENTS unless $content_type =~ m[^text/]i;

    print CONTENTS $$content_ref;
    close( CONTENTS );


    # Finally, extract out links

    exit unless $content_type =~ m!text/html!;

    open( LINKS, ">$localpath.links" ) || die( "Could not open links file $localpath.links: $!\n" );
    my $p = HTML::LinkExtor->new( \&linkcb, $url );

    # Don't allow links above the base
    $URI::ABS_REMOTE_LEADING_DOTS = 1;

    $p->parse( $$content_ref );
    close( LINKS );

    exit;


sub linkcb {
    my($tag, %links) = @_;

    return unless $tag eq 'a' && $links{href};

    my $link = $links{href};

    # Remove fragments
    $link =~ s/(.*)#.*/$1/;

    print LINKS "$link\n";
}


# This will optionally attempt to filter the document

sub filter_doc {
    my $response = shift;

    my ( $content, $content_type ) = ( $response->content, $response->header( "content-type" ) );

    my $content_ref = \$content;

    unless ( $content_type ) {
        warn 'URL: ', $response->base, " did not return a content-type\n";
        return ( $content_ref, 'text/plain' );
    }


    return ( $content_ref, $content_type ) unless USE_FILTERS;  # filters enabled?


    # This can avoid loading the filter module if it is known that type text/* will never be filtered.
    
    return ( $content_ref, $content_type )
        if $content_type =~ m!^text/! && !FILTER_TEXT;
    

    eval { require SWISH::Filter };
    if ( $@ ) {
        warn $@ if DEBUG_FILTER;
        return ( $content_ref, $content_type );
    }

    my $filter = SWISH::Filter->new;

    my $doc = $filter->convert(
        document => $content_ref,
        name     => $response->base,
        content_type => $content_type,
    );

    return $doc && $doc->was_filtered
        ? ( $doc->fetch_doc, $doc->content_type )
        : ( $content_ref, $content_type );
}


���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/swish2.c��������������������������������������������������������������������������0000664�0000771�0001750�00000021344�11166010110�012171� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
** $Id: swish2.c 2291 2009-03-31 01:56:00Z karpet $
**
** Originally by Kevin Hughes, kev@kevcom.com, 3/11/94
**

** Mon May  9 11:06:18 CDT 2005
karman : NOTE that none of this looks like Kevin's code. Is it all Bill's
and simply mis-labeled?
**

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
Mon May  9 10:57:22 CDT 2005 -- added GPL notice


*/


#include "swish.h"
#include "swstring.h"
#include "mem.h"
#include "error.h"
#include "list.h"
#include "search.h"
#include "file.h"
#include "merge.h"
#include "docprop.h"
#include "hash.h"
/* #include "search_alt.h" */
#include "db.h"
#include "swish_words.h"
#include "metanames.h"
#include "proplimit.h"
#include "stemmer.h"
#ifdef HAVE_ZLIB
#include <zlib.h>
#endif

static IndexFILE *free_index( IndexFILE *indexf );



/* Moved here so it's in the library */
unsigned int DEBUG_MASK = 0;

/*************************************************************************
* SwishNew -- create a search swish structure
*
*
**************************************************************************/


/* 
  -- init swish structure 
            indexf->header.fuzzy_data = set_fuzzy_mode( indexf->header.fuzzy_data, sl->word[1] );
*/

SWISH  *SwishNew()
{
    SWISH  *sw;

    sw = emalloc(sizeof(SWISH));
    memset(sw, 0, sizeof(SWISH));

    initModule_DB(sw);
    initModule_Swish_Words(sw);  /* allocate a buffer */


    sw->lasterror = RC_OK;
    sw->lasterrorstr[0] = '\0';
    sw->verbose = VERBOSE;
    /* karman added -W opt which will override warn_level default at cmd line */
    sw->parser_warn_level = 2; /* report if libxml2 aborts processing a document. */
    sw->headerOutVerbose = 1;
    sw->DefaultDocType = NODOCTYPE;
    sw->ReturnRawRank = 0;

#ifdef HAVE_ZLIB
    sw->PropCompressionLevel = Z_DEFAULT_COMPRESSION;
#endif



    /* Make rest of lookup tables */
    makeallstringlookuptables(sw);  /* isvowel */
    return (sw);
}



static IndexFILE *free_index( IndexFILE *indexf )
{
    IndexFILE  *next = indexf->next;
    SWISH      *sw = indexf->sw;
    int         i;
    
    /* Close any pending DB */
    if ( indexf->DB )
        DB_Close(sw, indexf->DB);


    /* free the meteEntry array */
    if ( indexf->header.metaCounter)
        freeMetaEntries(&indexf->header);

    /* free the in-use cached meta list */
    if ( indexf->meta_list )
      efree(indexf->meta_list);

    /* free the in-use cached property list */
    if ( indexf->prop_list )
      efree(indexf->prop_list);

    /* free data loaded into header */
    free_header(&indexf->header);


    /* free array of words for each letter (-k) $$$ eight bit */
    for (i = 0; i < 256; i++)
        if ( indexf->keywords[i])
            efree(indexf->keywords[i]);


    /* free the name of the index file */
    efree( indexf->line );

    /* free the stem cache if any */
    free_word_hash_table( &indexf->hashstemcache);

    /* finally free up the index itself */
    efree( indexf );

    return next;
}

void free_swish_memory( SWISH *sw )
{
    IndexFILE *cur_indexf;


    /* Free up associated index file */
    cur_indexf = sw->indexlist;

    while (cur_indexf)
        cur_indexf = free_index( cur_indexf );


    /* Common to searching and indexing */
    freeModule_Swish_Words(sw);
    freeModule_DB(sw);


    /* Free temporary buffers -- mostly used for the library API to pass data to users */

    if (sw->Prop_IO_Buf) {
        efree(sw->Prop_IO_Buf);
        sw->Prop_IO_Buf = NULL;
    }

    if ( sw->header_names )
        efree( sw->header_names );

    if ( sw->index_names )
        efree( sw->index_names );

    if ( sw->temp_string_buffer )
        efree( sw->temp_string_buffer );


    if ( sw->stemmed_word )
        efree( sw->stemmed_word );

}

/*************************************************************************
* SwishClose -- frees up the swish handle
*
*
**************************************************************************/



void    SwishClose(SWISH * sw)
{

    if ( !sw )
        return;


    free_swish_memory( sw );

    efree(sw);
}




/*************************************************************************
* SwishInit -- create a swish handle for the indexe or indexes passed in
*
*
**************************************************************************/


SWISH  *SwishInit(char *indexfiles)
{
    StringList *sl = NULL;
    SWISH  *sw;
    int     i;

    sw = SwishNew();
    if (!indexfiles || !*indexfiles)
    {
        set_progerr(INDEX_FILE_ERROR, sw, "No index file supplied" );
        return sw;
    }


    /* Parse out index files, and append to indexlist */
    sl = parse_line(indexfiles);

    if ( 0 == sl->n )
    {
        set_progerr(INDEX_FILE_ERROR, sw, "No index file supplied" );
        return sw;
    }



    for (i = 0; i < sl->n; i++)
        addindexfile(sw, sl->word[i]);

    if (sl)
        freeStringList(sl);

    if ( !sw->lasterror )
        SwishAttach(sw);

    return sw;
}




/**************************************************
* SwishAttach - Connect to the database
*  This just opens the index files
*
*  Maybe this could be passed a variable length of arguments
*  so that swis.c:cmd_index() could call SwishAttach( sw, DB_READWRITE )
*  to have all similar code in one place.
*
* Returns false on Failure
**************************************************/

int     SwishAttach(SWISH * sw)
{
    IndexFILE *indexlist = sw->indexlist;  /* head of list of indexes */
    IndexFILE *tmplist;


    /* First of all . Read header default values from all index files */
    /* With this, we read wordchars, stripchars, ... */
    for (tmplist = indexlist; tmplist;)
        if ( !open_single_index( sw, tmplist, DB_READ ) )
            return 0;
        else
            tmplist = tmplist->next;


    return ( sw->lasterror == 0 );
}

/****************************************************************************
* open_single_index -- opens the index and reads in its header data
*
* Pass:
*   sw
*   indexf
*   db_mode  - open in read or read/write
*
* Returns
*   true if ok.
*
****************************************************************************/

int open_single_index( SWISH *sw, IndexFILE *indexf, int db_mode )
{
    INDEXDATAHEADER *header = &indexf->header;


    indexf->DB = (void *)DB_Open(sw, indexf->line, db_mode);

    if ( sw->lasterror )
        return 0;

    read_header(sw, header, indexf->DB);

    /* These values are used in ranking */

    sw->TotalFiles   += header->totalfiles - header->removedfiles;
    sw->TotalWordPos += header->total_word_positions - header->removed_word_positions;

    return 1;
}


/********************************************************************************
* SwishSetRefPtr - for use the SWISH::API to save the SV* of the swish handle
*
********************************************************************************/

void SwishSetRefPtr( SWISH *sw, void *address )
{
    if ( !address )
        progerr("SwishSetRefPtr - passed null address");

    sw->ref_count_ptr = address;
}

/********************************************************************************
* SwishGetRefPtr - for use the SWISH::API to get the SV* of the swish handle
*
********************************************************************************/

void *SwishGetRefPtr( SWISH *sw )
{
    return sw->ref_count_ptr;
}


/*********************************************************************************
* SwishWords -- returns all the words that begin with the specified character
*
*
**********************************************************************************/

const char *SwishWordsByLetter(SWISH * sw, char *filename, char c) 
{ 
    IndexFILE *indexf;

    indexf = sw->indexlist;
    while (indexf) {
        if (!strcasecmp(indexf->line, filename)) {
            return getfilewords(sw, c, indexf);
        }
        indexf = indexf->next;
    }
    /* Not really an "WORD_NOT_FOUND" error */
    set_progerr(WORD_NOT_FOUND, sw, "Invalid index file '%s' passed to SwishWordsByLetter", filename );
    return NULL;
}




��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/file.h����������������������������������������������������������������������������0000664�0000771�0001750�00000004034�11166010110�011673� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: file.h 1799 2006-06-11 02:28:19Z augur $
**


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL


**
**
** added buffer size arg to grabStringValue prototype - core dumping from overrun
** SRE 2/22/00
*/


#if defined(_WIN32) && !defined(__CYGWIN__)
void make_windows_path( char *path );
#endif
char *get_libexec(void);

void normalize_path(char *path);

int isdirectory(char *);
int isfile(char *);
int islink(char *);
int getsize(char *);

void indexpath(SWISH *, char *);

char *read_stream(SWISH *, FileProp *fprop, int is_text);
void flush_stream( FileProp *fprop );


/* Get/eval properties for file  (2000-11 rasc) */
FileProp *file_properties (char *real_path, char *work_path, SWISH *sw);
FileProp *init_file_properties (void);
void init_file_prop_settings( SWISH *sw, FileProp *fprop );
void     free_file_properties (FileProp *fprop);

 
/*
 * Some handy routines for parsing the Configuration File
 */

int grabCmdOptionsIndexFILE(char* line, char* commandTag, IndexFILE **listOfWords, int* gotAny, int dontToIt);

FILE *create_tempfile(SWISH *sw, const char *mode, char *prefix, char **file_name_buffer, int remove_file_name );

����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/pre_sort.c������������������������������������������������������������������������0000664�0000771�0001750�00000045652�11166010110�012617� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: pre_sort.c 1945 2007-10-22 14:54:07Z karpet $
**
    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 14:53:51 CDT 2005
** added GPL
**
** jmruiz - 02/2001 - Sorting results module
**
** 2001-05-04 jmruiz added new string comparison routines for proper sorting
**                   sw_strcasecmp and sw_strcmp
**                   also added the skeleton to initModule_ResultSort
**                   and freeModule_ResultSort
**
** 2001-05-05 rasc   just rearranged functions, to make modules look similar
**                   (makes code better to read and understand)
**
** 09/2002 - separated from result sorting code
**
*/

#include "swish.h"
#include "swstring.h"
#include "mem.h"
#include "merge.h"
#include "list.h"
#include "search.h"
#include "docprop.h"
#include "metanames.h"
#include "compress.h"
#include "error.h"
#include "db.h"
#include "parse_conffile.h"
#include "swish_qsort.h"
#include "result_sort.h"

// #define DEBUGSORT 1

/******************************************************
* Here's some static data to make the sort arrays smaller
* I don't think we need to worry about multi-threaded
* indexing at this time!
*******************************************************/

typedef struct
{
    PROP_INDEX  *prop_index;  /* cache of index pointers for this file */
    propEntry   *SortProp;    /* current property for this file */

#ifdef DEBUGSORT
    char *file_name;
#endif
} PROP_LOOKUP;

static struct metaEntry *CurrentPreSortMetaEntry;
static PROP_LOOKUP *PropLookup = NULL;




/*
** ----------------------------------------------
**
**  Module management code starts here
**
** ----------------------------------------------
*/





/*
  -- init structures for this module
*/

void    initModule_ResultSort(SWISH * sw)
{
    struct MOD_ResultSort *md;

    /* Allocate structure */
    md = (struct MOD_ResultSort *) emalloc(sizeof(struct MOD_ResultSort));

    sw->ResultSort = md;

    /* Init translation sortorder tables */
    initStrCmpTranslationTable(md->iSortTranslationTable);
    initStrCaseCmpTranslationTable(md->iSortCaseTranslationTable);

    /* Init data for -s command option */
    md->isPreSorted = 1;        /* Use presorted Index by default */
    md->presortedindexlist = NULL;

}


/*
  -- release all wired memory for this module
*/


/* Frees memory of vars used by ResultSortt properties configuration */
void    freeModule_ResultSort(SWISH * sw)
{
    struct MOD_ResultSort *md = sw->ResultSort;

    if (md->presortedindexlist)
        freeswline(md->presortedindexlist);

    efree(md);
    sw->ResultSort = NULL;
}


/*
** ----------------------------------------------
**
**  Module config code starts here
**
** ----------------------------------------------
*/


/*
 -- Config Directives
 -- Configuration directives for this Module
 -- return: 0/1 = none/config applied
*/

int     configModule_ResultSort(SWISH * sw, StringList * sl)
{
    struct MOD_ResultSort *md = sw->ResultSort;
    char   *w0 = sl->word[0];
    unsigned char *w1,
           *w2,
           *w3;
    int     retval = 1;
    int     incr = 0;
    int     i,
            j;
    struct swline *tmplist = NULL;
    struct metaEntry *m = NULL;


    if (strcasecmp(w0, "PreSortedIndex") == 0)
    {
        md->isPreSorted = sl->n - 1; /* If n is 1 (No properties specified) - Do not create presorted indexes */
        if (sl->n > 1)
        {
            grabCmdOptions(sl, 1, &md->presortedindexlist);
            /* Go lowercase  and check with properties */
            for (tmplist = md->presortedindexlist; tmplist; tmplist = tmplist->next)
            {
                (void)strtolower(tmplist->line);

                /* Check if it is in metanames list */
                if (!(m = getPropNameByName(&sw->indexlist->header, tmplist->line)))
                    progerr("%s: parameter is not a property", tmplist->line);
            }
        }
    }
    else if (strcasecmp(w0, "ResultSortOrder") == 0)
    {
        if (sl->n == 4)
        {
            w1 = (unsigned char *) sl->word[1];
            w2 = (unsigned char *) sl->word[2];
            w3 = (unsigned char *) sl->word[3];

            if (strlen( (char *)w1) != 1)
            {
                progerr("%s: parameter 1 must be one char length", w0);
            }
            if (strlen( (char *)w2 ) != 1)
            {
                progerr("%s: parameter 2 must be one char length", w0);
            }
            switch (w1[0])
            {
            case '=':
                incr = 0;
                break;
            case '>':
                incr = 1;
                break;
            default:
                progerr("%s: parameter 1 must be = or >", w0);
                break;
            }
            for (i = 0; w3[i]; i++)
            {
                j = (int) w2[0];
                md->iSortTranslationTable[(int) w3[i]] = md->iSortTranslationTable[j] + incr * (i + 1);

                md->iSortCaseTranslationTable[(int) w3[i]] = md->iSortCaseTranslationTable[j] + incr * (i + 1);
            }
        }
        else
            progerr("%s: requires 3 parameters (Eg: [=|>] a áàä)", w0);
    }
    else
    {
        retval = 0;             /* not a module directive */
    }
    return retval;
}


/*
** ----------------------------------------------
**
**  Module code starts here
**
** ----------------------------------------------
*/




/* 01/2001 Jose Ruiz */
/* function for comparing data in order to
get sorted results with qsort (including combinations of asc and descending
fields */
/***********************************************************************
* qsort compare function used for presorting the properties
*
************************************************************************/


static int     compFileProps(const void *s1, const void *s2)
{
    int         a = *(int *)s1;
    int         b = *(int *)s2;

#ifdef DEBUGSORT
    int  ret = Compare_Properties(CurrentPreSortMetaEntry, PropLookup[a].SortProp, PropLookup[b].SortProp );
    printf(" results: file %d [%s] (len %d) vs. %d [%s] (len %d).  Lower file = %s\n",
            a, PropLookup[a].file_name, PropLookup[a].SortProp ? PropLookup[a].SortProp->propLen : -1,
            b, PropLookup[b].file_name, PropLookup[b].SortProp ? PropLookup[b].SortProp->propLen : -1,

            !ret ? "*same*" : ret < 0 ? PropLookup[a].file_name : PropLookup[b].file_name );

    return ret;

#else
    return Compare_Properties(CurrentPreSortMetaEntry, PropLookup[a].SortProp, PropLookup[b].SortProp );
#endif
}


/***********************************************************************
* Checks if the property is set to be presorted
*
************************************************************************/

int     is_presorted_prop(SWISH * sw, char *name)
{
    struct MOD_ResultSort *md = sw->ResultSort;
    struct swline *tmplist = NULL;

    if (!md->isPreSorted)
        return 0;               /* Do not sort any property */
    else
    {
        if (!md->presortedindexlist)
            return 1;           /* All properties must be indexed */
        else
        {
            for (tmplist = md->presortedindexlist; tmplist; tmplist = tmplist->next)
                if (strcmp(name, tmplist->line) == 0)
                    return 1;
            return 0;
        }
    }
    return 0;
}


/***********************************************************************
* Pre sort a single property
*
************************************************************************/
int *CreatePropSortArray(IndexFILE *indexf, struct metaEntry *m, FileRec *fi, int free_cache )
{
    int             *sort_array = NULL;     /* array that gets sorted */
    int             *out_array = NULL;     /* array that gets sorted */
    int             total_files = indexf->header.totalfiles;
    int             i,
                    k;


    sort_array = emalloc( total_files * sizeof( long ) );
    out_array  = emalloc( total_files * sizeof( long ) );

    /* First time called, create place to cache property positions */
    if ( !PropLookup )
    {
        PropLookup = emalloc( total_files * sizeof( PROP_LOOKUP ));
        memset( PropLookup, 0, total_files * sizeof( PROP_LOOKUP ) );
    }


    /* This is need to know how to compare the properties */
    CurrentPreSortMetaEntry = m;


#ifdef DEBUGSORT
    {
        propEntry *d;
        FileRec fi;
        struct metaEntry *me = getPropNameByName( &indexf->header, "swishdocpath" );
        char *s;

        for (i = 0; i < total_files; i++)
        {
            memset(&fi, 0, sizeof( FileRec ));
            fi.filenum = i+1;

            d = ReadSingleDocPropertiesFromDisk(indexf, &fi, me->metaID, 0 );

            s = emalloc( d->propLen + 1 );
            memcpy( s, d->propValue, d->propLen );
            s[d->propLen] = '\0';

            PropLookup[i].file_name = s;
        }
    }
#endif



    /* Populate the arrays */

    for (i = 0; i < total_files; i++)
    {
        /* Here's a FileRec where the property index will get loaded */
        fi->filenum = i + 1;

        /* Used cached seek pointers for this file, if not the first time */
        if ( PropLookup[i].prop_index )
            fi->prop_index = PropLookup[i].prop_index;
        else
            fi->prop_index = NULL;


        PropLookup[i].SortProp = ReadSingleDocPropertiesFromDisk(indexf, fi, m->metaID, m->sort_len);
        PropLookup[i].prop_index = fi->prop_index;  // save it for next time
        sort_array[i] = i;
    }


    /* Sort them using qsort. The main work is done by compFileProps */
    swish_qsort( sort_array, total_files, sizeof( int ), &compFileProps);


    /* Build the sorted table */

    for (i = 0, k = 1; i < total_files; i++)
    {
        /* 02/2001 We can have duplicated values - So all them may have the same number asigned  - qsort justs sorts */
        if (i)
        {
            /* If consecutive elements are different increase the number */
            if ((compFileProps( &sort_array[i - 1], &sort_array[i])))
                k++;
        }

        out_array[ sort_array[i] ] = k;
    }

    efree( sort_array );


    if ( free_cache )
    {
        for (i = 0; i < total_files; i++)
            if ( PropLookup[i].prop_index )
                efree( PropLookup[i].prop_index );
        efree( PropLookup );
        PropLookup = NULL;
    }


    return out_array;
}


/***********************************************************************
* Pre sort all the properties
*
*
*
************************************************************************/


void    sortFileProperties(SWISH * sw, IndexFILE * indexf)
{
    int             i;
    int             *out_array = NULL;     /* array that gets sorted */
#ifndef USE_PRESORT_ARRAY
    unsigned char   *out_buffer  = NULL;
    unsigned char   *cur;
#endif
    struct metaEntry *m;
    int             props_sorted = 0;
    int             total_files = indexf->header.totalfiles;
    FileRec         fi;
    INDEXDATAHEADER *header = &indexf->header;
    int             propIDX;

    memset( &fi, 0, sizeof( FileRec ) );

#ifdef USE_PRESORT_ARRAY
    DB_InitWriteSortedIndex(sw, indexf->DB ,header->property_count);
#else
    DB_InitWriteSortedIndex(sw, indexf->DB );
#endif

    /* Any properties to check? */
    if ( header->property_count <= 0 )
    {
        DB_EndWriteSortedIndex(sw, indexf->DB);
        return;
    }


    /* Execute for each property */
    for (propIDX = 0; propIDX < header->property_count; propIDX++)
    {
        /* convert the count to a propID (metaID) */
        int metaID = header->propIDX_to_metaID[propIDX];

        if ( !(m = getPropNameByID(&indexf->header, metaID )))
            progerr("Failed to lookup propIDX %d (metaID %d)", propIDX, metaID );


        /* Check if this property must be in a presorted index */
        if (!is_presorted_prop(sw, m->metaName))
            continue;


        /* "internal" properties are sorted at runtime */
        if (is_meta_internal(m))
            continue;



        if (sw->verbose)
        {
#ifdef DEBUGSORT
            printf("\n-------------------\nSorting property: %s\n", m->metaName);
#else
            printf("Sorting property: %-40.40s\r", m->metaName);
#endif
            fflush(stdout);
        }

        out_array = CreatePropSortArray( indexf, m, &fi, 0 );


#ifdef USE_PRESORT_ARRAY
        DB_WriteSortedIndex(sw, metaID, out_array, total_files, indexf->DB);

        for (i = 0; i < total_files; i++)
            if ( PropLookup[i].SortProp )
                freeProperty( PropLookup[i].SortProp );
#else
        out_buffer = emalloc( total_files * MAXINTCOMPSIZE );


        /* Now compress */
        /* $$$ this should be in db_natvie.c */
        cur = out_buffer;

        for (i = 0; i < total_files; i++)
        {
            cur = compress3( out_array[i], cur );

            /* Free the property */
            if ( PropLookup[i].SortProp )
                freeProperty( PropLookup[i].SortProp );
        }


        DB_WriteSortedIndex(sw, metaID, out_buffer, cur - out_buffer, indexf->DB);

        efree( out_buffer );

#endif
        efree( out_array );

        props_sorted++;
    }

    DB_EndWriteSortedIndex(sw, indexf->DB);



    if ( props_sorted )
    {
        for (i = 0; i < total_files; i++)
            if ( PropLookup[i].prop_index )
                efree( PropLookup[i].prop_index );
        efree( PropLookup );
        PropLookup = NULL;
    }

    if (sw->verbose)
    {
        if ( !props_sorted )
            printf("No properties sorted.      %-40s\n", " ");
        else if ( props_sorted == 1 )
            printf("One property sorted.       %-40s\n", " ");
        else
            printf("%d properties sorted.      %-40s\n", props_sorted, " ");
    }
}


/* Routines to get the proper sortorder of chars to be called when sorting */
/* sw_strcasecmp sw_strcmp */

/*** $$$!!!$$$ I do not think any of these routines are used any more
 *  Better to use locale settings, IMO. - moseley 2/2005
 */


/* Exceptions to the standard translation table for sorting strings */
/* See initStrCaseCmpTranslationTable to see how it works */
/* The table shows the equivalences in the following way: */
/*     val(from) = val(order) + offset */
/* where val is asciivalue * 256 */

/* Some comments about äöü ...
** In french and spanish this chars are equivalent to
** ä -> a   (french)
** ö -> o   (french)
** ü -> u   (french + spanish)
** In the other hand, in german:
** ä -> a + 1  (german)
** ö -> o + 1  (german)
** ü -> u + 1  (german)
** I have put the german default. I think that in spanish we can live with that
** If you cannot modify them (change 1 by 0)
** Any comments about other languages are always welcome
*/
struct
{
    unsigned char from;
    unsigned char order;
    int     offset;
}
iTranslationTableExceptions[] =
{
    {'Ä', 'A', 1},                           /* >>> german sort order of umlauts */
    {'Ö', 'O', 1},                           /*     2001-05-04 rasc */
    {'Ü', 'U', 1},
    {'ä', 'a', 1},
    {'ö', 'o', 1},
    {'ü', 'u', 1},
    {'ß', 's', 1},                           /* <<< german */

    {'á', 'a', 0},                           /* >>> spanish sort order exceptions */
    {'Á', 'A', 0},                           /*     2001-05-04 jmruiz */
    {'é', 'e', 0},
    {'É', 'E', 0},
    {'í', 'i', 0},
    {'Í', 'I', 0},
    {'ó', 'o', 0},
    {'Ó', 'O', 0},
    {'ú', 'u', 0},
    {'Ú', 'U', 0},
    {'ñ', 'n', 1},
    {'Ñ', 'N', 1},                           /* <<< spanish */

    {'â', 'a', 0},                           /* >>> french sort order exceptions */
    {'Â', 'A', 0},                           /*     2001-05-04 jmruiz */
    {'à', 'a', 0},                           /*     Taken from the list - Please check */
    {'À', 'A', 0},                           /*     áéíóúÁÉÍÓÚ added in the spanish part */
    {'ç', 'c', 0},                           /*     äöüÄÖÜ added in the german part */
    {'Ç', 'C', 0},
    {'è', 'e', 0},
    {'È', 'E', 0},
    {'ê', 'e', 0},
    {'Ê', 'E', 0},
    {'î', 'i', 0},
    {'Î', 'I', 0},
    {'ï', 'i', 0},
    {'Ï', 'I', 0},
    {'ô', 'o', 0},
    {'Ô', 'O', 0},
    {'ù', 'u', 0},
    {'Ù', 'U', 0},                           /* <<< french */
    {0, 0, 0}
};

/* Initialization routine for the comparison table (ignoring case )*/
/* This routine should be called once  at the start of the module */
void    initStrCaseCmpTranslationTable(int *iCaseTranslationTable)
{
    int     i;

    /* Build default table using tolower(asciival) * 256 */
    /* The goal of multiply by 256 is having holes to put values inside
       eg: ñ is between n and o */
    for (i = 0; i < 256; i++)
        iCaseTranslationTable[i] = tolower(i) * 256;

    /* Exceptions */
    for (i = 0; iTranslationTableExceptions[i].from; i++)
        iCaseTranslationTable[iTranslationTableExceptions[i].from] =
            tolower(iTranslationTableExceptions[i].order) * 256 + iTranslationTableExceptions[i].offset;
}

/* Initialization routine for the comparison table (case sensitive) */
/* This routine should be called once at the start of the module */
void    initStrCmpTranslationTable(int *iCaseTranslationTable)
{
    int     i;

    /* Build default table using asciival * 256 */
    /* The goal of multiply by 256 is having holes to put values inside
       eg: ñ is between n and o */
    for (i = 0; i < 256; i++)
        iCaseTranslationTable[i] = i * 256;

    /* Exceptions */
    for (i = 0; iTranslationTableExceptions[i].from; i++)
        iCaseTranslationTable[iTranslationTableExceptions[i].from] =
            iTranslationTableExceptions[i].order * 256 + iTranslationTableExceptions[i].offset;
}

/* Comparison string routine function.
** Similar to strcasecmp but using our own translation table
*/
int     sw_strcasecmp(unsigned char *s1, unsigned char *s2, int *iTranslationTable)
{
    while (iTranslationTable[*s1] == iTranslationTable[*s2])
        if (!*s1++)
            return 0;
        else
            s2++;
    return iTranslationTable[*s1] - iTranslationTable[*s2];
}

/* Comparison string routine function.
** Similar to strcmp but using our own translation table
*/
int     sw_strcmp(unsigned char *s1, unsigned char *s2, int *iTranslationTable)
{
    while (iTranslationTable[*s1] == iTranslationTable[*s2])
        if (!*s1++)
            return 0;
        else
            s2++;
    return iTranslationTable[*s1] - iTranslationTable[*s2];
}
��������������������������������������������������������������������������������������swish-e-2.4.7/src/headers.c�������������������������������������������������������������������������0000775�0000771�0001750�00000042542�11166010110�012373� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
** $Id: headers.c 1736 2005-05-12 15:41:22Z karman $
**

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 15:51:39 CDT 2005
** added GPL

*/

#include "swish.h"
#include "swstring.h"
#include "mem.h"
#include "error.h"
#include "list.h"
#include "search.h"
#include "headers.h"
#include "stemmer.h"

#include <stddef.h>  /* for offsetof macro */
#ifndef offsetof
#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
#endif




/***************************************************************************
*  Routines for accessing index header data by name and header name
*
****************************************************************************/




typedef struct
{
    const char         *description;
    SWISH_HEADER_TYPE   data_type;
    int                 min_verbose_level;
    size_t              offset;
} HEADER_MAP;



static HEADER_MAP header_map[] = {
    {  "Name",              SWISH_STRING, 2,  offsetof( INDEXDATAHEADER, indexn ) },
    {  "Saved as",          SWISH_STRING, 2,  offsetof( INDEXDATAHEADER, savedasheader ) },
    /* $$$ Total Words is  unique words, and is Not corrected for removed words */
    {  "Total Words",       SWISH_NUMBER, 2,  offsetof( INDEXDATAHEADER, totalwords ) },
    {  "Total Files",       SWISH_NUMBER, 2,  offsetof( INDEXDATAHEADER, totalfiles ) },
    {  "Removed Files",     SWISH_NUMBER, 2,  offsetof( INDEXDATAHEADER, removedfiles ) },
    {  "Total Word Pos",    SWISH_NUMBER, 2,  offsetof( INDEXDATAHEADER, total_word_positions ) },
    {  "Removed Word Pos",  SWISH_NUMBER, 2,  offsetof( INDEXDATAHEADER, removed_word_positions ) },
    {  "Indexed on",        SWISH_STRING, 2,  offsetof( INDEXDATAHEADER, indexedon ) },
    {  "Description",       SWISH_STRING, 2,  offsetof( INDEXDATAHEADER, indexd ) },
    {  "Pointer",           SWISH_STRING, 2,  offsetof( INDEXDATAHEADER, indexp ) },
    {  "Maintained by",     SWISH_STRING, 2,  offsetof( INDEXDATAHEADER, indexa ) },
    {  "MinWordLimit",      SWISH_NUMBER, 2,  offsetof( INDEXDATAHEADER, minwordlimit ) },
    {  "MaxWordLimit",      SWISH_NUMBER, 2,  offsetof( INDEXDATAHEADER, maxwordlimit ) },
    {  "WordCharacters",    SWISH_STRING, 2,  offsetof( INDEXDATAHEADER, wordchars ) },
    {  "BeginCharacters",   SWISH_STRING, 2,  offsetof( INDEXDATAHEADER, beginchars ) },
    {  "EndCharacters",     SWISH_STRING, 2,  offsetof( INDEXDATAHEADER, endchars ) },
    {  "IgnoreFirstChar",   SWISH_STRING, 2,  offsetof( INDEXDATAHEADER, ignorefirstchar ) },
    {  "IgnoreLastChar",    SWISH_STRING, 2,  offsetof( INDEXDATAHEADER, ignorelastchar ) },
    {  "StopWords",         SWISH_WORD_HASH, 3,  offsetof( INDEXDATAHEADER, hashstoplist ) },
    {  "BuzzWords",         SWISH_WORD_HASH, 3,  offsetof( INDEXDATAHEADER, hashbuzzwordlist ) },
    {  "Stemming Applied", SWISH_OTHER_DATA, 2, offsetof( INDEXDATAHEADER, fuzzy_data ) },
    {  "Soundex Applied",   SWISH_OTHER_DATA, 2, offsetof( INDEXDATAHEADER, fuzzy_data ) },
    {  "Fuzzy Mode",        SWISH_OTHER_DATA, 2, offsetof( INDEXDATAHEADER, fuzzy_data ) },
    {  "IgnoreTotalWordCountWhenRanking", SWISH_BOOL, 2, offsetof( INDEXDATAHEADER, ignoreTotalWordCountWhenRanking ) }
};




static SWISH_HEADER_VALUE fetch_header( IndexFILE *indexf, const char *name,  SWISH_HEADER_TYPE *data_type  );
static const char **create_string_list( SWISH *sw, struct swline *swline );
static const char **string_list_from_hash( SWISH *sw, WORD_HASH_TABLE table );

static DB_RESULTS *db_results_by_name( RESULTS_OBJECT *results, const char *index_name );

static SWISH_HEADER_VALUE fetch_single_header( IndexFILE *indexf, HEADER_MAP *header_map, SWISH_HEADER_TYPE *data_type );
static void print_header_value( SWISH *sw, const char *name, SWISH_HEADER_VALUE head_value, SWISH_HEADER_TYPE head_type );




/*********** PUBLIC *************************/


/***********************************************************************
*  print_index_headers - prints all the headers for a given indexf
*
*  Called by swish-e binary (swish.c)
*
************************************************************************/

void print_index_headers( IndexFILE *indexf )
{
    int i;
    int array_size = sizeof(header_map) / sizeof(header_map[0]);
    SWISH_HEADER_VALUE value;
    SWISH_HEADER_TYPE data_type;
    int verbose_level = indexf->sw->headerOutVerbose;

   
    for (i = 0; i < array_size; i++)
    {
        if ( header_map[i].min_verbose_level > verbose_level )
            continue;

        value = fetch_single_header( indexf, &header_map[i], &data_type );
        print_header_value( indexf->sw, header_map[i].description, value, data_type );
    }
}




/*********************************************************************
* SwishHeaderNames -- return a list of possible header names
*
**********************************************************************/

const char **SwishHeaderNames(SWISH *sw)
{
    int array_size = sizeof(header_map) / sizeof(header_map[0]);
    int i;

    if ( !sw )
        progerr("SwishHeaderNames requires a valid swish handle");
        

    if ( sw->header_names )
        return sw->header_names;

    sw->header_names = (const char **)emalloc( sizeof(char *) * ( 1 + array_size ) );        

   
    for (i = 0; i < array_size; i++)
        sw->header_names[i] = (const char *)header_map[i].description;

    sw->header_names[i] = NULL;

    return sw->header_names;
}
        
        
/*********************************************************************
* SwishIndexNames -- return a list of associated index file names
*
**********************************************************************/

    
const char **SwishIndexNames(SWISH *sw)
{
    IndexFILE          *indexf;
    int                 index_count;

    if ( !sw )
        progerr("SwishIndexNames requires a valid swish handle");
    

    if ( sw->index_names )
        return sw->index_names;


    for ( index_count = 0, indexf = sw->indexlist; indexf; indexf = indexf->next )
        index_count++;
        
    if ( !index_count ) /* should not happen */
        progerr("Swish Handle does not have any associated index files!?!?");

    sw->index_names = (const char **)emalloc( sizeof(char *) * (1+index_count) );

    for ( index_count = 0, indexf = sw->indexlist; indexf; indexf = indexf->next )
        sw->index_names[index_count++] = (const char *)indexf->line;

    sw->index_names[index_count] = NULL;
    return sw->index_names;
}


/*********************************************************************
* SwishResultIndexValue -- lookup a header via a result structure
*
**********************************************************************/

SWISH_HEADER_VALUE SwishResultIndexValue( RESULT *result, const char *name, SWISH_HEADER_TYPE *data_type )
{
    return fetch_header( result->db_results->indexf, name, data_type );
}




/********************************************************************************
* SwishHeaderValue -- lookup a header via a index file name, and a header name
*
*********************************************************************************/

SWISH_HEADER_VALUE SwishHeaderValue( SWISH *sw, const char *index_name, const  char *cur_header, SWISH_HEADER_TYPE *data_type )
{
    IndexFILE          *indexf;
    SWISH_HEADER_VALUE  value;

    value.string = NULL;

    if ( !sw )
        progerr("SwishHeaderValue requires a valid swish handle");

    indexf = indexf_by_name( sw, index_name );

    if ( indexf )
        return fetch_header( indexf, cur_header, data_type );

    *data_type = SWISH_HEADER_ERROR;
    set_progerr( HEADER_READ_ERROR, sw, "Index file '%s' is not an active index file", index_name );
    return value;
}






/************* Local support function **********************************/

                  



/****************************************************************************
* fetch_single_header -- returns a SWISH_HEADER_VALUE
*
*   Pass:
*       indexf, HEADER_MAP element, and a *data_type
*
*   Return:
*       SWISH_HEADER_VALUE
*   
*
*****************************************************************************/

static SWISH_HEADER_VALUE fetch_single_header( IndexFILE *indexf, HEADER_MAP *header_map, SWISH_HEADER_TYPE *data_type )
{
    SWISH_HEADER_VALUE  value;
    INDEXDATAHEADER     *header = &indexf->header;
    char                *data_pointer = (char *)header + header_map->offset; /* should that be void* as a generic address? */

    value.string = NULL;

    *data_type = header_map->data_type;

    switch ( header_map->data_type )
    {
        case SWISH_STRING:
            value.string = *(const char **) data_pointer;
            return value;

        case SWISH_NUMBER:
        case SWISH_BOOL:
            value.number = *(unsigned long *) data_pointer;

            /* $$$ Ugly hack alert! */
            /* correct for removed files */
            if ( (void *)data_pointer == &header->totalfiles )
                value.number -= header->removedfiles;

            if ( (void *)data_pointer == &header->total_word_positions )
                value.number -= header->removed_word_positions;

            return value;

        case SWISH_LIST:
        {
            struct swline *first_item = *(struct swline **) data_pointer;
            value.string_list = create_string_list( indexf->sw, first_item );
            return value;
        }

        case SWISH_WORD_HASH:
        {
            WORD_HASH_TABLE table = *(WORD_HASH_TABLE *) data_pointer;
            *data_type = SWISH_LIST;
            value.string_list = string_list_from_hash( indexf->sw, table ); 
            return value;
        }




        case SWISH_OTHER_DATA:
            if ( strcasecmp( "Fuzzy Mode", header_map->description ) == 0 )
            {
                value.string = fuzzy_string( header->fuzzy_data );
                *data_type = SWISH_STRING;
                return value;

            }

            else if ( strcasecmp( "Stemming Applied", header_map->description ) == 0 )
            {
                value.number = stemmer_applied( header->fuzzy_data );

                *data_type = SWISH_BOOL;
                return value;
            }

            else if ( strcasecmp( "Soundex Applied", header_map->description ) == 0 )
            {
                value.number = FUZZY_SOUNDEX == fuzzy_mode_value( header->fuzzy_data ) ? 1 : 0;
                *data_type = SWISH_BOOL;
                return value;
            }
            else
                progerr("Invalid OTHER header '%s'", header_map->description );


        default:
            progerr("Invalid HEADER type '%d'", header_map->data_type );
    }

    return value;  /* make MS compiler happy */
}

/************************************************************
* fetch_header - fetches a header by name
*
*************************************************************/
    

static SWISH_HEADER_VALUE fetch_header( IndexFILE *indexf, const char *name,  SWISH_HEADER_TYPE *data_type  )
{
    int i;
    int array_size = sizeof(header_map) / sizeof(header_map[0]);
    SWISH_HEADER_VALUE value;

    value.string = NULL;
   
    for (i = 0; i < array_size; i++)
    {
        if ( strcasecmp(header_map[i].description, name) != 0 )
            continue;  /* nope */

        return fetch_single_header( indexf, &header_map[i], data_type );

    }

    *data_type = SWISH_HEADER_ERROR;
    set_progerr( HEADER_READ_ERROR, indexf->sw, "Index file '%s' does not have header '%s'", indexf->line, name );
    return value;
}







SWISH_HEADER_VALUE SwishParsedWords( RESULTS_OBJECT *results, const char *index_name )
{
    SWISH_HEADER_VALUE value;
    DB_RESULTS *db_results;

    if ( !results )
        progerr("Must pass a results object to SwishParsedWords");

    value.string_list = NULL;        

    db_results = db_results_by_name( results, index_name );

    if ( db_results )
        value.string_list = create_string_list( results->sw, db_results->parsed_words );

    return value;
}

SWISH_HEADER_VALUE SwishRemovedStopwords( RESULTS_OBJECT *results, const char *index_name )
{
    SWISH_HEADER_VALUE value;
    DB_RESULTS *db_results;

    if ( !results )
        progerr("Must pass a results object to SwishRemovedStopwords");

    value.string_list = NULL;        

    db_results = db_results_by_name( results, index_name );

    if ( db_results )
        value.string_list = create_string_list( results->sw, db_results->removed_stopwords );
    return value;
}




/**********************************************************************
* create_string_list - creates a list of strings from a swline
* $$$ should this return NULL if there's none to return, or an empty list?
*
***********************************************************************/

static const char **create_string_list( SWISH *sw, struct swline *swline )
{
    int i;
    struct swline *cur_item;

    /* first count up how many items there are */

    i = 1;  /* always need one */
    for ( cur_item = swline; cur_item; cur_item = cur_item->next )
        i++;

    if ( i > sw->temp_string_buffer_len )
    {
        sw->temp_string_buffer_len = i;
        sw->temp_string_buffer = (const char **)erealloc( sw->temp_string_buffer, sizeof(char *) * i );
    }

    i = 0;
    for ( cur_item = swline; cur_item; cur_item = cur_item->next )
        sw->temp_string_buffer[i++] = (const char*)cur_item->line;

    sw->temp_string_buffer[i] = NULL;  /* end of list */

    return sw->temp_string_buffer;
}

/**********************************************************************
* string_list_from_hash - creates a list of strings from a swline
* $$$ should this return NULL if there's none to return, or an empty list?
*
***********************************************************************/


static const char **string_list_from_hash( SWISH *sw, WORD_HASH_TABLE table )
{
    int i;
    struct swline *sp, *next;
    int count;

    i = table.count + 1;  /* always return one */

    if ( i > sw->temp_string_buffer_len )
    {
        sw->temp_string_buffer_len = i;
        sw->temp_string_buffer = (const char **)erealloc( sw->temp_string_buffer, sizeof(char *) * i );
    }
     
    /* first count them up */
    count = 0;

    if ( table.count )
    {
        for (i = 0; i < HASHSIZE; i++)
        {
            if ( !table.hash_array[i])
                continue;
            
            sp = table.hash_array[i];
            while (sp)
            {
                next = sp->next;
                sw->temp_string_buffer[count++] = sp->line;
                sp = next;
            }
        }
    }

    sw->temp_string_buffer[count] = NULL; /* end of list */

    return sw->temp_string_buffer;

}
    

    
    
/* no longer static since we want to use this in metanames.c and stemmer.c
ther'es probably a better way to organize this... karman Mon Nov  8 21:37:44 CST 2004
*/

IndexFILE *indexf_by_name( SWISH *sw, const char *index_name )
{
    IndexFILE *indexf = sw->indexlist;

    while ( indexf )
    {
        if (strcmp( index_name, indexf->line ) == 0 )
            return indexf;

        indexf = indexf->next;
    }
    return NULL;
}

static DB_RESULTS *db_results_by_name( RESULTS_OBJECT *results, const char *index_name )
{
    DB_RESULTS *db_results = results->db_results;

    while ( db_results )
    {
        if (strcmp( index_name, db_results->indexf->line ) == 0)
            return db_results;

        db_results = db_results->next;
    }
    return NULL;
}


/**************************************************************************
* print_index_headers - prints list of headers for the given index
*
*   Note:
*       This is not used in the library code.  Perhaps move elsewhere
*
***************************************************************************/


static void print_header_value( SWISH *sw, const char *name, SWISH_HEADER_VALUE head_value, SWISH_HEADER_TYPE head_type )
{
    const char **string_list;
    
    printf("# %s:", name );

    switch ( head_type )
    {
        case SWISH_STRING:
            printf(" %s\n", head_value.string ? head_value.string : "" );
            return;

        case SWISH_NUMBER:
            printf(" %lu\n", head_value.number );
            return;

        case SWISH_BOOL:
            printf(" %s\n", head_value.boolean ? "1" : "0" );
            // printf(" %s\n", head_value.boolean ? "Yes" : "No" );
            return;

        case SWISH_LIST:
            string_list = head_value.string_list;
            
            while ( *string_list )
            {
                printf(" %s", *string_list );
                string_list++;
            }
            printf("\n");
            return;

        case SWISH_HEADER_ERROR:
            SwishAbortLastError( sw );

        default:
            printf(" Unknown header type '%d'\n", (int)head_type );
            return;
    }
}


��������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/acconfig.h.in���������������������������������������������������������������������0000775�0000771�0001750�00000015174�11166010110�013144� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/* src/acconfig.h.in.  Generated from configure.in by autoheader.  */

/* Define to one of `_getb67', `GETB67', `getb67' for Cray-2 and Cray-YMP
   systems. This function is required for `alloca.c' support on those systems.
   */
#undef CRAY_STACKSEG_END

/* Define to 1 if using `alloca.c'. */
#undef C_ALLOCA

/* Define to the type of elements in the array set by `getgroups'. Usually
   this is either `int' or `gid_t'. */
#undef GETGROUPS_T

/* Define to 1 if you have the `access' function. */
#undef HAVE_ACCESS

/* Define to 1 if you have `alloca', as a function or macro. */
#undef HAVE_ALLOCA

/* Define to 1 if you have <alloca.h> and it should be used (not on Ultrix).
   */
#undef HAVE_ALLOCA_H

/* Get time of day */
#undef HAVE_BSDGETTIMEOFDAY

/* Define to 1 if you have the `clock' function. */
#undef HAVE_CLOCK

/* Define to 1 if you have the <dirent.h> header file, and it defines `DIR'.
   */
#undef HAVE_DIRENT_H

/* Define to 1 if you have the <dlfcn.h> header file. */
#undef HAVE_DLFCN_H

/* Define to 1 if you don't have `vprintf' but do have `_doprnt.' */
#undef HAVE_DOPRNT

/* Define to 1 if you have the `fork' function. */
#undef HAVE_FORK

/* Define to 1 if your system has a working `getgroups' function. */
#undef HAVE_GETGROUPS

/* Define to 1 if you have the `getrusage' function. */
#undef HAVE_GETRUSAGE

/* Define to 1 if you have the <inttypes.h> header file. */
#undef HAVE_INTTYPES_H

/* Define to 1 if you have the `kill' function. */
#undef HAVE_KILL

/* Define to 1 if you have the `m' library (-lm). */
#undef HAVE_LIBM

/* Define to 1 if you have the `snprintf' library (-lsnprintf). */
#undef HAVE_LIBSNPRINTF

/* Libxml2 support included */
#undef HAVE_LIBXML2

/* Define to 1 if you have the `lstat' function. */
#undef HAVE_LSTAT

/* Define to 1 if you have the `memcpy' function. */
#undef HAVE_MEMCPY

/* Define to 1 if you have the <memory.h> header file. */
#undef HAVE_MEMORY_H

/* Define to 1 if you have the `mkstemp' function. */
#undef HAVE_MKSTEMP

/* Define to 1 if you have the <ndir.h> header file, and it defines `DIR'. */
#undef HAVE_NDIR_H

/* Perl REGEX library */
#undef HAVE_PCRE

/* Define to 1 if you have the `regcomp' function. */
#undef HAVE_REGCOMP

/* Define to 1 if you have the `re_comp' function. */
#undef HAVE_RE_COMP

/* Define to 1 if you have the <stdint.h> header file. */
#undef HAVE_STDINT_H

/* Define to 1 if you have the <stdlib.h> header file. */
#undef HAVE_STDLIB_H

/* Define to 1 if you have the `strchr' function. */
#undef HAVE_STRCHR

/* Define to 1 if you have the `strcoll' function and it is properly defined.
   */
#undef HAVE_STRCOLL

/* Define to 1 if you have the `strdup' function. */
#undef HAVE_STRDUP

/* Define to 1 if you have the `strftime' function. */
#undef HAVE_STRFTIME

/* Define to 1 if you have the <strings.h> header file. */
#undef HAVE_STRINGS_H

/* Define to 1 if you have the <string.h> header file. */
#undef HAVE_STRING_H

/* Define to 1 if you have the `strstr' function. */
#undef HAVE_STRSTR

/* Define to 1 if you have the <sys/dir.h> header file, and it defines `DIR'.
   */
#undef HAVE_SYS_DIR_H

/* Define to 1 if you have the <sys/ndir.h> header file, and it defines `DIR'.
   */
#undef HAVE_SYS_NDIR_H

/* Define to 1 if you have the <sys/param.h> header file. */
#undef HAVE_SYS_PARAM_H

/* Define to 1 if you have the <sys/resource.h> header file. */
#undef HAVE_SYS_RESOURCE_H

/* Define to 1 if you have the <sys/stat.h> header file. */
#undef HAVE_SYS_STAT_H

/* Define to 1 if you have the <sys/timeb.h> header file. */
#undef HAVE_SYS_TIMEB_H

/* Define to 1 if you have the <sys/types.h> header file. */
#undef HAVE_SYS_TYPES_H

/* Define to 1 if you have <sys/wait.h> that is POSIX.1 compatible. */
#undef HAVE_SYS_WAIT_H

/* Define to 1 if you have the `times' function. */
#undef HAVE_TIMES

/* Define to 1 if you have the <unistd.h> header file. */
#undef HAVE_UNISTD_H

/* Define to 1 if you have the `vfork' function. */
#undef HAVE_VFORK

/* Define to 1 if you have the <vfork.h> header file. */
#undef HAVE_VFORK_H

/* Define to 1 if you have the `vprintf' function. */
#undef HAVE_VPRINTF

/* Define to 1 if you have the `vsnprintf' function. */
#undef HAVE_VSNPRINTF

/* Define to 1 if you have the `waitpid' function. */
#undef HAVE_WAITPID

/* Define to 1 if you have the <windows.h> header file. */
#undef HAVE_WINDOWS_H

/* Define to 1 if `fork' works. */
#undef HAVE_WORKING_FORK

/* Define to 1 if `vfork' works. */
#undef HAVE_WORKING_VFORK

/* Do we have zlib */
#undef HAVE_ZLIB

/* Define to 1 if you have the <zlib.h> header file. */
#undef HAVE_ZLIB_H

/* (developers only) checks for memory consistency on alloc/free using guards
   */
#undef MEM_DEBUG

/* (developers only) gives memory statistics (bytes allocated, calls, etc) */
#undef MEM_STATISTICS

/* (developers only) checks for unfreed memory, and where it is allocated */
#undef MEM_TRACE

/* Get time of day */
#undef NO_GETTOD

/* Name of package */
#undef PACKAGE

/* Define to the address where bug reports for this package should be sent. */
#undef PACKAGE_BUGREPORT

/* Define to the full name of this package. */
#undef PACKAGE_NAME

/* Define to the full name and version of this package. */
#undef PACKAGE_STRING

/* Define to the one symbol short name of this package. */
#undef PACKAGE_TARNAME

/* Define to the version of this package. */
#undef PACKAGE_VERSION

/* If using the C implementation of alloca, define if you know the
   direction of stack growth for your system; otherwise it will be
   automatically deduced at run-time.
	STACK_DIRECTION > 0 => grows toward higher addresses
	STACK_DIRECTION < 0 => grows toward lower addresses
	STACK_DIRECTION = 0 => direction of growth unknown */
#undef STACK_DIRECTION

/* Define to 1 if the `S_IS*' macros in <sys/stat.h> do not work properly. */
#undef STAT_MACROS_BROKEN

/* Define to 1 if you have the ANSI C header files. */
#undef STDC_HEADERS

/* Define to 1 if your <sys/time.h> declares `struct tm'. */
#undef TM_IN_SYS_TIME

/* Experimental BTREE support */
#undef USE_BTREE

/* Experimental BTREE PRESORT ARRAYS */
#undef USE_PRESORT_ARRAY

/* Version number of package */
#undef VERSION

/* Number of bits in a file offset, on hosts where this is settable. */
#undef _FILE_OFFSET_BITS

/* Define for large files, on AIX-style hosts. */
#undef _LARGE_FILES

/* Define to empty if `const' does not conform to ANSI C. */
#undef const

/* Define to `int' if <sys/types.h> doesn't define. */
#undef gid_t

/* Define to `int' if <sys/types.h> does not define. */
#undef pid_t

/* Define to `unsigned' if <sys/types.h> does not define. */
#undef size_t

/* Define to `int' if <sys/types.h> doesn't define. */
#undef uid_t

/* Define as `fork' if `vfork' does not work. */
#undef vfork
����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/index.h���������������������������������������������������������������������������0000664�0000771�0001750�00000013266�11166010110�012072� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: index.h 1736 2005-05-12 15:41:22Z karman $
**


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL


*/

#ifndef __HasSeenModule_Index
#define __HasSeenModule_Index       1

struct dev_ino
{
    dev_t   dev;
    ino_t   ino;
    struct dev_ino *next;
};

struct IgnoreLimitPositions
{
    int     n;                  /* Number of entries per file */
    int    *pos;                /* Store metaID1,position1, metaID2,position2 ..... */
};

/* This is used to build a list of the metaIDs that are currently in scope when indexing words */

typedef struct
{
    int    *array;              /* list of metaIDs that need to be indexed */
    int     max;                /* max size of table */
    int     num;                /* number in list */
    int     defaultID;          /* default metaID (should always be one, I suppose) */
}
METAIDTABLE;


/*
   -- module data
*/


struct MOD_Index
{
    /* entry vars */
    METAIDTABLE metaIDtable;
    ENTRYARRAY *entryArray;
    ENTRY  *hashentries[VERYBIGHASHSIZE];
    char    hashentriesdirty[VERYBIGHASHSIZE]; /* just a 0/1 flag */

    /* Compression Work buffer while compression locations in index ** proccess */
    unsigned char *compression_buffer;
    int     len_compression_buffer;

    unsigned char *worddata_buffer;  /* Buffer to store worddata */
    int    len_worddata_buffer;     /* Max size of the buffer */
    int    sz_worddata_buffer;      /* Space being used in worddata_buffer */

    /* File counter */
    int     filenum;

    /* index tmp (both FS and HTTP methods) */
    char   *tmpdir;

    /* Filenames of the swap files */
    char   *swap_location_name[MAX_LOC_SWAP_FILES]; /* Location info file */

    /* handlers for both files */
    FILE   *fp_loc_write[MAX_LOC_SWAP_FILES];       /* Location (writing) */
    FILE   *fp_loc_read[MAX_LOC_SWAP_FILES];        /* Location (reading) */

    struct dev_ino *inode_hash[BIGHASHSIZE];

    /* Buffers used by indexstring */
    int     lenswishword;
    char   *swishword;
    int     lenword;
    char   *word;

    /* Economic mode (-e) */
    int     swap_locdata;       /* swap location data */

    /* Pointer to swap functions */
    sw_off_t    (*swap_tell) (FILE *);
            size_t(*swap_write) (const void *, size_t, size_t, FILE *);
    int     (*swap_seek) (FILE *, sw_off_t, int);
            size_t(*swap_read) (void *, size_t, size_t, FILE *);
    int     (*swap_close) (FILE *);
    int     (*swap_putc) (int, FILE *);
    int     (*swap_getc) (FILE *);

    /* IgnoreLimit option values */
    int     plimit;
    int     flimit;
    /* Number of words from IgnoreLimit */
    int     nIgnoreLimitWords;
    struct swline *IgnoreLimitWords;

    /* Positions from stopwords from IgnoreLimit */
    struct IgnoreLimitPositions **IgnoreLimitPositionsArray;

    /* Index in blocks of chunk_size files */
    int     chunk_size;

    /* Variable to control the size of the zone used for store locations during chunk proccesing */
    int     optimalChunkLocZoneSize;

    /* variable to handle free memory space for locations inside currentChunkLocZone */

    LOCATION *freeLocMemChain;

    MEM_ZONE *perDocTmpZone;
    MEM_ZONE *currentChunkLocZone;
    MEM_ZONE *totalLocZone;
    MEM_ZONE *entryZone;

    int     update_mode;    /* Set to 1 when in update mode */
                            /* Set to 2 when in remove mode */
};

void    initModule_Index(SWISH *);
void    freeModule_Index(SWISH *);
int     configModule_Index(SWISH *, StringList *);


void    do_index_file(SWISH * sw, FileProp * fprop);

ENTRY  *getentry(SWISH * , char *);
void    addentry(SWISH *, ENTRY *, int, int, int, int);

void    addCommonProperties(SWISH * sw, FileProp * fprop, FileRec * fi, char *title, char *summary, int start);


int     getfilecount(IndexFILE *);

int     getNumberOfIgnoreLimitWords(SWISH *);
void    getPositionsFromIgnoreLimitWords(SWISH * sw);

char   *ruleparse(SWISH *, char *);

#define isIgnoreFirstChar(header,c) (header)->ignorefirstcharlookuptable[(int)((unsigned char)c)]
#define isIgnoreLastChar(header,c) (header)->ignorelastcharlookuptable[(int)((unsigned char)c)]
#define isBumpPositionCounterChar(header,c) (header)->bumpposcharslookuptable[(int)((unsigned char)c)]


void    computehashentry(ENTRY **, ENTRY *);

void    sort_words(SWISH *);

int     indexstring(SWISH * sw, char *s, int filenum, int structure, int numMetaNames, int *metaID, int *position);

void    addsummarytofile(IndexFILE *, int, char *);

void    BuildSortedArrayOfWords(SWISH *, IndexFILE *);



void    PrintHeaderLookupTable(int ID, int table[], int table_size, FILE * fp);
void    coalesce_all_word_locations(SWISH * sw, IndexFILE * indexf);
void    coalesce_word_locations(SWISH * sw, ENTRY * e);

void    adjustWordPositions(unsigned char *worddata, int *sz_worddata, int n_files, struct IgnoreLimitPositions **ilp);

#endif
������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/btree.h���������������������������������������������������������������������������0000664�0000771�0001750�00000004501�11166010110�012054� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/* 
$Id: btree.h 1946 2007-10-22 14:56:35Z karpet $


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL


*/


typedef struct BTREE_Page
{
    sw_off_t           next;    /* Next Page */
    sw_off_t           prev;    /* Previous Page */
    unsigned int       size;    /* Size of page */
    unsigned int       n;       /* Number of keys in page */
    unsigned int       flags;
    unsigned int       data_end;

    sw_off_t           page_number;
    int                modified;
    int                in_use;

    struct BTREE_Page  *next_cache;

    unsigned char data[0];        /* Page data */
} BTREE_Page;

#define BTREE_CACHE_SIZE 97

typedef struct BTREE
{
    sw_off_t root_page;
    int page_size;
    struct BTREE_Page *cache[BTREE_CACHE_SIZE];
    int levels;
    sw_off_t tree[1024];
          /* Values for sequential reading */
    sw_off_t current_page;
    unsigned long current_position;

    FILE *fp;
} BTREE;

BTREE *BTREE_Create(FILE *fp, unsigned int size);
BTREE *BTREE_Open(FILE *fp, int size, sw_off_t root_page);
sw_off_t BTREE_Close(BTREE *bt);
int BTREE_Insert(BTREE *b, unsigned char *key, int key_len, sw_off_t data_pointer);
sw_off_t BTREE_Search(BTREE *b, unsigned char *key, int key_len, unsigned char **found, int *found_len, int exact_match);
sw_off_t BTREE_Next(BTREE *b, unsigned char **found, int *found_len);
int BTREE_Update(BTREE *b, unsigned char *key, int key_len, sw_off_t new_data_pointer);

�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/swstring.c������������������������������������������������������������������������0000664�0000771�0001750�00000056472�11166010110�012644� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: swstring.c 1736 2005-05-12 15:41:22Z karman $
** Copyright (C) 1995, 1996, 1997, 1998 Hewlett-Packard Company
** Originally by Kevin Hughes, kev@kevcom.com, 3/11/94
**

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
Mon May  9 10:57:22 CDT 2005 -- added GPL notice

Much of this has been re-written since the original SWISH. How much is really
still copyright HP?


**---------------------------------------------------------
** ** ** PATCHED 5/13/96, CJC
** Added MatchAndChange for regex in replace rule G.Hill 2/10/98
**
** change sprintf to snprintf to avoid corruption
** added safestrcpy() macro to avoid corruption from strcpy overflow
** SRE 11/17/99
**
** fixed cast to int problems pointed out by "gcc -Wall"
** SRE 2/22/00
**
** 2001-02-xx  rasc  makeItLow, strtolower  optimized/new
**			   iso handling, minor bugfixes
**
** 2001-02-xx  jruiz, rasc:  -- IMPORTANT NOTE --
**                   due to ISO charsset tolower,isspace, strcmp, etc.
**                   have to be (unsigned char)!!
**                   otherwise some chars may fail.
**
** 2001-03-08 rasc   rewritten and enhanced suffix routines
** 2001-04-10 rasc   str_dirname, str_basename, changed char_decode_C_ESC
**
*/

#include <ctype.h>
#include "swish.h"
#include "mem.h"
#include "index.h"
#include "swish_qsort.h"
#include "swstring.h"
#include "error.h"



/* Case-insensitive strstr(). */
/* Jose Ruiz 02/2001 Faster one */
char   *lstrstr(char *s, char *t)
{
    int     lens;
    int     lent;
    int     first = tolower((unsigned char) *t);

    lent = strlen(t);
    lens = strlen(s);
    for (; lens && lent <= lens; lens--, s++)
    {
        if (tolower((int) ((unsigned char) *s)) == first)
        {
            if (lent == 1)
                return s;
            if (strncasecmp(s + 1, t + 1, lent - 1) == 0)
                return s;
        }
    }
    return NULL;
}

/* Gets the next word in a line. If the word's in quotes,
** include blank spaces in the word or phrase.
   -- 2001-02-11 rasc  totally rewritten, respect escapes like \"
   -- 2001-11-09 moseley rewritten again - doesn't check for missing end quote
   -- Always returns a string, but may be empty.  
*/

static char   *getword(char **in_buf)
{
    unsigned char quotechar;
    unsigned char uc;
    char   *s = *in_buf;
    char   *start = *in_buf;
    char    buf[MAXWORDLEN + 1];
    char   *cur_char = buf;
    int     backslash = 0;


    quotechar = '\0';

    s = str_skip_ws(s);

    /* anything to read? */
    if (!*s)
    {
        *in_buf = s;
        return estrdup("\0");
    }


    if (*s == '\"' || *s == '\'')
        quotechar = *s++;

    /* find end of "more words" or word */

    while (*s)
    {
        uc = (unsigned char) *s;

        if (uc == '\\' && !backslash && quotechar) // Mar 17, 2002 - only enable backslash inside of quotes
        {
            s++;
            backslash++;
            continue;
        }

        /* Can't see why we would need to escape these, can you? - always fed a single line */
        if (uc == '\n' || uc == '\r')
        {
            s++;
            break;
        }


        if (!backslash)
        {
            /* break on ending quote or unquoted space */

            if (uc == quotechar || (!quotechar && isspace((int) uc)))
            {
                s++;            // past quote or space char.
                break;
            }

        } else
            backslash = 0;


        *cur_char++ = *s++;

        if (cur_char - buf > MAXWORDLEN)
            progerr("Parsed word '%s' exceeded max length of %d", start, MAXWORDLEN);
    }

    if (backslash)
        *cur_char++ = '\\';


    *cur_char = '\0';

    *in_buf = s;

    return estrdup(buf);

}


/* Gets the value of a variable in a line of the configuration file.
** Basically, anything in quotes or an argument to a variable.
*/

char   *getconfvalue(line, var)
     char   *line;
     char   *var;
{
    int     i;
    char   *c;
    int     lentmpvalue;
    char   *tmpvalue,
           *p;

    if ((c = (char *) lstrstr(line, var)) != NULL)
    {
        if (c != line)
            return NULL;
        c += strlen(var);
        while (isspace((int) ((unsigned char) *c)) || *c == '\"')
            c++;
        if (*c == '\0')
            return NULL;
        tmpvalue = (char *) emalloc((lentmpvalue = MAXSTRLEN) + 1);
        for (i = 0; *c != '\0' && *c != '\"' && *c != '\n' && *c != '\r'; c++)
        {
            if (i == lentmpvalue)
            {
                lentmpvalue *= 2;
                tmpvalue = (char *) erealloc(tmpvalue, lentmpvalue + 1);
            }
            tmpvalue[i++] = *c;
        }
        tmpvalue[i] = '\0';
        /* Do not waste memory !! Resize word */
        p = tmpvalue;
        tmpvalue = estrdup(p);
        efree(p);
        return tmpvalue;
    } else
        return NULL;
}


/* In a string, replaces all occurrences of "oldpiece" with "newpiece".
** This is not really bulletproof yet.
*/
/* 05/00 Jose Ruiz 
** Totally rewritten
*/
char   *replace(string, oldpiece, newpiece)
     char   *string;
     char   *oldpiece;
     char   *newpiece;
{
    int     limit,
            curpos,
            lennewpiece,
            lenoldpiece,
            curnewlen;
    char   *c,
           *p,
           *q;
    int     lennewstring;
    char   *newstring;

    newstring = (char *) emalloc((lennewstring = strlen(string) * 2) + 1);
    lennewpiece = strlen(newpiece);
    lenoldpiece = strlen(oldpiece);
    c = string;
    q = newstring;
    curnewlen = 0;
    while ((p = (char *) strstr(c, oldpiece)))
    {
        limit = p - c;
        curnewlen += (limit + lennewpiece);
        if (curnewlen > lennewstring)
        {
            curpos = q - newstring;
            lennewstring = curnewlen + 200;
            newstring = (char *) erealloc(newstring, lennewstring + 1);
            q = newstring + curpos;
        }
        memcpy(q, c, limit);
        q += limit;
        memcpy(q, newpiece, lennewpiece);
        q += lennewpiece;
        c = p + lenoldpiece;
    }
    curnewlen += strlen(c);
    if (curnewlen > lennewstring)
    {
        curpos = q - newstring;
        lennewstring = curnewlen + 200;
        newstring = (char *) erealloc(newstring, lennewstring + 1);
        q = newstring + curpos;
    }
    strcpy(q, c);
    efree(string);
    return newstring;
}




/*----------------------------------------------------*/


/*
  -- Check if a file with a particular suffix should be indexed
  -- according to the settings in the configuration file.
  -- 2001-03-08 rasc   rewritten (optimize and match also
  --                   e.g. ".htm.de" or ".html.gz")
*/

int     isoksuffix(char *filename, struct swline *rulelist)
{
    char   *s,
           *fe;


    if (!rulelist)
        return 1;               /* no suffixlist */

    /* basically do a right to left compare */
    fe = (filename + strlen(filename));
    while (rulelist)
    {
        s = fe - strlen(rulelist->line);
        if (s >= filename)
        {                       /* no negative overflow! */
            if (!strcasecmp(rulelist->line, s))
            {
                return 1;
            }
        }
        rulelist = rulelist->next;
    }

    return 0;
}




/* 05/00 Jose Ruiz
** Function to copy strings 
** Reallocate memory if needed
** Returns the string copied
** [see als estrredup() and estrdup()]
*/
char   *SafeStrCopy(dest, orig, initialsize)
     char   *dest;
     char   *orig;
     int    *initialsize;
{
    int     len,
            oldlen;

    len = strlen(orig);
    oldlen = *initialsize;
    if (len > oldlen || !oldlen)
    {
        *initialsize = len + 200; /* 200 extra chars!!! */
        if (oldlen)
            efree(dest);
        dest = (char *) emalloc(*initialsize + 1);
    }
    memcpy(dest, orig, len);
    *(dest + len) = '\0';
    return (dest);
}

/* Comparison routine to sort a string - See sortstring */
int     ccomp(const void *s1, const void *s2)
{
    return (*(unsigned char *) s1 - *(unsigned char *) s2);
}

/* Sort a string  removing dups */
void    sortstring(char *s)
{
    int     i,
            j,
            len;

    len = strlen(s);
    swish_qsort(s, len, 1, &ccomp);
    for (i = 1, j = 1; i < len; i++)
        if (s[i] != s[j - 1])
            s[j++] = s[i];
    s[j] = '\0';

}

/* Merges two strings removing dups and ordering results */
char   *mergestrings(char *s1, char *s2)
{
    int     i,
            j,
            ilen1,
            ilen2,
            ilent;
    char   *s,
           *p;

    ilen1 = strlen(s1);
    ilen2 = strlen(s2);
    ilent = ilen1 + ilen2;
    s = emalloc(ilent + 1);
    p = emalloc(ilent + 1);
    if (ilen1)
        memcpy(s, s1, ilen1);
    if (ilen2)
        memcpy(s + ilen1, s2, ilen2);
    if (ilent)
        swish_qsort(s, ilent, 1, &ccomp);
    for (i = 1, j = 1, p[0] = s[0]; i < ilent; i++)
        if (s[i] != p[j - 1])
            p[j++] = s[i];
    p[j] = '\0';
    efree(s);
    return (p);
}

void    makelookuptable(char *s, int *l)
{
    int     i;

    for (i = 0; i < 256; i++)
        l[i] = 0;
    for (; *s; s++)
        l[(int) ((unsigned char) *s)] = 1;
}

void    makeallstringlookuptables(SWISH * sw)
{
    makelookuptable("aeiouAEIOU", sw->isvowellookuptable);
}

/* 06/00 Jose Ruiz- Parses a line into a StringList
** 02/2001 Jose Ruiz - Added extra NULL at the end
*/
StringList *parse_line(char *line)
{
    StringList *sl;
    int     cursize,
            maxsize;
    char   *p;

    if (!line)
        return (NULL);

    if ((p = strchr(line, '\n')))
        *p = '\0';

    cursize = 0;
    sl = (StringList *) emalloc(sizeof(StringList));

    sl->word = (char **) emalloc((maxsize = 2) * sizeof(char *));

    p = line;

    while (&line && (p = getword(&line)))
    {
        /* getword returns "" when, not null, so need to free it if we are not using it */
        if ( !*p) {
            efree( p );
            break;
        }
        
        if (cursize == maxsize)
            sl->word = (char **) erealloc(sl->word, (maxsize *= 2) * sizeof(char *));

        sl->word[cursize++] = (char *) p;
    }
    sl->n = cursize;

    /* Add an extra NULL */
    if (cursize == maxsize)
        sl->word = (char **) erealloc(sl->word, (maxsize += 1) * sizeof(char *));

    sl->word[cursize] = NULL;

    return sl;
}

/* Frees memory used by a StringList
*/
void    freeStringList(StringList * sl)
{
    if (sl)
    {
        while (sl->n)
            efree(sl->word[--sl->n]);
        efree(sl->word);
        efree(sl);
    }
}

/* 10/00 Jose Ruiz
** Function to copy len bytes from orig to dest+off_dest
** Reallocate memory if needed
** Returns the pointer to the new area
*/
unsigned char *SafeMemCopy(dest, orig, off_dest, sz_dest, len)
     unsigned char *dest;
     unsigned char *orig;
     int     off_dest;
     int    *sz_dest;
     int     len;
{
    if (len > (*sz_dest - off_dest))
    {
        *sz_dest = len + off_dest;
        if (dest)
            dest = (unsigned char *) erealloc(dest, *sz_dest);
        else
            dest = (unsigned char *) emalloc(*sz_dest);
    }
    memcpy(dest + off_dest, orig, len);
    return (dest);
}


/* Routine to check if a string contains only numbers */
int     isnumstring(unsigned char *s)
{
    if (!s || !*s)
        return 0;
    for (; *s; s++)
        if (!isdigit((int) (*s)))
            break;
    if (*s)
        return 0;
    return 1;
}

void    remove_newlines(char *s)
{
    char   *p;

    if (!s || !*s)
        return;
    for (p = s; p;)
        if ((p = strchr(p, '\n')))
            *p++ = ' ';
    for (p = s; p;)
        if ((p = strchr(p, '\r')))
            *p++ = ' ';
}


void    remove_tags(char *s)
{
    int     intag;
    char   *p,
           *q;

    if (!s || !*s)
        return;
    for (p = q = s, intag = 0; *q; q++)
    {
        switch (*q)
        {
        case '<':
            intag = 1;
            /* jmruiz 02/2001 change <tag> by a space */
            *p++ = ' ';
            break;
        case '>':
            intag = 0;
            break;
        default:
            if (!intag)
            {
                *p++ = *q;
            }
            break;
        }
    }
    *p = '\0';
}

/* #### Function to convert binary data of length len to a string */
unsigned char *bin2string(unsigned char *data, int len)
{
    unsigned char *s = NULL;

    if (data && len)
    {
        s = emalloc(len + 1);
        memcpy(s, data, len);
        s[len] = '\0';
    }
    return (s);
}

/* #### */






/* ------------------------------------------------------------ */




/*
  -- Skip white spaces...
  -- position to non space character
  -- return: ptr. to non space char or \0
  -- 2001-01-30  rasc
*/

char   *str_skip_ws(char *s)
{
    while (*s && isspace((int) (unsigned char) *s))
        s++;
    return s;
}

/*************************************
* Trim trailing white space
* Returns void
**************************************/

void  str_trim_ws(char *string)
{
    int i = strlen( string );

    while ( i  && isspace( (int)string[i-1]) )
        string[--i] = '\0';
}







/*
  -- character decode excape sequence
  -- input: ptr to \... escape sequence (C-escapes)
  -- return: character code
             se:  string ptr to char after control sequence.
                 (ignore, if NULL ptr)
  -- 2001-02-04  rasc
  -- 2001-04-10  rasc   handle  '\''\0'  (safty!)
*/


char    charDecode_C_Escape(char *s, char **se)
{
    char    c,
           *se2;

    if (*s != '\\')
    {
        /* no escape   */
        c = *s;                 /* return char */

    } else
    {

        switch (*(++s))
        {                       /* can be optimized ... */
        case 'a':
            c = '\a';
            break;
        case 'b':
            c = '\b';
            break;
        case 'f':
            c = '\f';
            break;
        case 'n':
            c = '\n';
            break;
        case 'r':
            c = '\r';
            break;
        case 't':
            c = '\t';
            break;
        case 'v':
            c = '\v';
            break;

        case 'x':              /* Hex  \xff  */
            c = (char) strtoul(++s, &se2, 16);
            s = --se2;
            break;

        case '0':              /* Oct  \0,  \012 */
            c = (char) strtoul(s, &se2, 8);
            s = --se2;
            break;

        case '\0':             /* outch!! null after \ */
            s--;                /* it's a "\"    */
        default:
            c = *s;             /* print the escaped character */
            break;
        }

    }

    if (se)
        *se = s + 1;
    return c;
}





/* 
   -- strtolower (make this string to lowercase)
   -- The string itself will be converted
   -- Return: ptr to string
   -- 2001-02-09  rasc:  former makeItLow() been a little optimized

   !!!  most tolower() don't map umlauts, etc. 
   !!!  you have to use the right cast! (unsigned char)
   !!!  or an ISO mapping...
*/

char   *strtolower(char *s)
{
    unsigned char *p = (unsigned char *) s;

    while (*p)
    {
        *p = tolower((unsigned char) *p);
        p++;
    }
    return s;
}






/* ---------------------------------------------------------- */
/* ISO characters conversion/mapping  handling                */



/*
  -- character map to normalize chars for search and store
  -- characters are all mapped to lowercase
  -- umlauts and special characters are mapped to ascii7 chars
  -- control chars/ special chars are mapped to " "
  -- 2001-02-10 rasc
*/


static const unsigned char iso8859_to_ascii7_lower_map[] = {
    ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', /*  0 */
    ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', /*  8 */
    ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', /* 16 */
    ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ',
    ' ', '!', '"', '#', '$', '%', '&', '\'', /* 32 */
    '(', ')', '*', '+', ',', '-', '.', '/',
    '0', '1', '2', '3', '4', '5', '6', '7', /* 48 */
    '8', '9', ':', ';', '<', '=', '>', '?',
    '@', 'a', 'b', 'c', 'd', 'e', 'f', 'g', /* 64 */
    'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o',
    'p', 'q', 'r', 's', 't', 'u', 'v', 'w', /* 80 */
    'x', 'y', 'z', '[', '\\', ']', '^', '_',
    '`', 'a', 'b', 'c', 'd', 'e', 'f', 'g', /* 96 */
    'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o',
    'p', 'q', 'r', 's', 't', 'u', 'v', 'w',
    'x', 'y', 'z', '{', '|', '}', '~', ' ',
    ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', /* 128 */
    ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ',
    ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', /* 144 */
    ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ',
    ' ', '!', 'c', 'l', 'o', 'y', '|', '§', /* 160 */
    '\"', 'c', ' ', '\"', ' ', '-', 'r', ' ',
    ' ', ' ', '2', '3', '\'', 'u', ' ', '.', /* 176 */
    ' ', '1', ' ', '"', ' ', ' ', ' ', '?',
    'a', 'a', 'a', 'a', 'a', 'a', 'e', 'c', /* 192 */
    'e', 'e', 'e', 'e', 'i', 'i', 'i', 'i',
    'd', 'n', 'o', 'o', 'o', 'o', 'o', ' ', /* 208 */
    'o', 'u', 'u', 'u', 'u', 'y', ' ', 's',
    'a', 'a', 'a', 'a', 'a', 'a', 'e', 'c', /* 224 */
    'e', 'e', 'e', 'e', 'i', 'i', 'i', 'i',
    'd', 'n', 'o', 'o', 'o', 'o', 'o', ' ', /* 240 */
    'o', 'u', 'u', 'u', 'u', 'y', ' ', 'y'
};



/*
  -- "normalize" ISO character for store and search
  -- operations. This means convert it to ascii7 lower case.
  -- Return: char
  -- 2001-02-11  rasc
*/

unsigned char char_ISO_normalize(unsigned char c)
{
    return iso8859_to_ascii7_lower_map[c];
}




/*
  -- "normalize" ISO character for store and search
  -- operations. This means convert it to ascii7 lower case.
  -- Return: char
  -- 2001-02-11  rasc
*/


char   *str_ISO_normalize(char *s)
{
    unsigned char *p;

    p = (unsigned char *) s;
    while (*p)
    {
        *p = iso8859_to_ascii7_lower_map[*p];
        p++;
    }
    return s;
}


/* 02/2001 Jmruiz - Builds a string from a Stringlist starting at the
n element */
unsigned char *StringListToString(StringList * sl, int n)
{
    int     i,
            j;
    unsigned char *s;
    int     len_s,
            len_w;

    s = emalloc((len_s = 256) + 1);
    /* compute required string size */
    for (i = n, j = 0; i < sl->n; i++)
    {
        len_w = strlen(sl->word[i]);
        if (len_s < (j + len_w + 1))
            s = erealloc(s, (len_s += len_w + 1) + 1);
        if (i != n)
        {
            *(s + j) = ' ';
            j++;
        }
        memcpy(s + j, sl->word[i], len_w);
        j += len_w;
    }
    *(s + j) = '\0';
    return s;
}




/* ---------------------------------------------------------- */



/* 
  -- translate chars 
  -- rewrite string itself via an character translation table
  -- translation table is a int[256] 
  -- return: ptr to string itself
*/

unsigned char *TranslateChars(int trlookup[], unsigned char *s)
{
    unsigned char *p;

    p = s;
    while (*p)
    {
        *p = (unsigned char) trlookup[(int) *p];
        p++;
    }
    return s;
}



/*
   -- Build a character translation table
   -- characters "from" will be converted in "to"
   -- result is stored in a lookuptable fixed size
   -- does also special translation rules like :ascii7:
   -- return: 0/1 param fail/ok
*/

int     BuildTranslateChars(int trlookup[], unsigned char *from, unsigned char *to)
{
    int     i;

    /* default init = 1:1 translation */
    for (i = 0; i < 256; i++)
        trlookup[i] = i;

    if (!from)
        return 0;               /* No param! */

    /* special cases, one param  */
    if (!strcmp( (char *)from, ":ascii7:"))
    {
        for (i = 0; i < 256; i++)
            trlookup[i] = (int) char_ISO_normalize((unsigned char) i);
        return 1;
    }

    if (!to)
        return 0;               /* missing second param */

    /* alter table for "non 1:1" translation... */
    while (*from && *to)
        trlookup[(int) *from++] = (int) *to++;
    if (*to || *from)
        return 0;               /* length the same? no? -> err */

    return 1;
}




/* ---------------------------------------------------------- */


/*
  -- cstr_basename
  -- return basename of a document path
  -- return: (char *) copy of filename
*/

char   *cstr_basename(char *path)
{
    return (char *) estrdup(str_basename(path));
}


/*
  -- str_basename
  -- return basename of a document path
  -- return: (char *) ptr into(!) path string
*/

char   *str_basename(char *path)
{
    char   *s;

    s = strrchr(path, '/');
    return (s) ? s + 1 : path;
}


/*
  -- cstr_dirname (copy)
  -- return dirname of a document path
  -- return: (char *) ptr on copy(!) of path
*/

char   *cstr_dirname(char *path)
{
    char   *s;
    char   *dir;
    int     len;

    s = strrchr(path, '/');

    if (!s)
    {
        dir = (char *) estrdup(" ");
        *dir = (*path == '/') ? '/' : '.';
    } else
    {
        len = s - path;
        dir = emalloc(len + 1);
        strncpy(dir, path, len);
        *(dir + len) = '\0';
    }

    return dir;
}



/* estrdup - like strdup except we call our emalloc routine explicitly
** as it does better memory management and tracking 
** Note: emalloc will report error and not return if no memory
*/

char   *estrdup(char *str)
{
    char   *p;
    
    if (!str)
        return NULL;

    if ((p = emalloc(strlen(str) + 1)))
        return strcpy(p, str);

    return NULL;
}


char   *estrndup(char *s, size_t n)
{
    size_t  lens = strlen(s);
    size_t  newlen;
    char   *news;

    if (lens < n)
        newlen = lens;
    else
        newlen = n;

    if (newlen < n)
        news = emalloc(n + 1);
    else
        news = emalloc(newlen + 1);
    memcpy(news, s, newlen);
    news[newlen] = '\0';
    return news;
}


/*
   -- estrredup
   -- do free on s1 and make copy of s2
   -- this is used, when s1 is replaced by s2
   -- 2001-02-15 rasc

*/

char   *estrredup(char *s1, char *s2)
{
    if (s1)
        efree(s1);
    return estrdup(s2);
}


/* 04/00 Jose Ruiz */
/* Simple routing for comparing pointers to integers in order to
get an ascending sort with qsort */
/* Identical to previous one but use two integers per array */
int     icomp2(const void *s1, const void *s2)
{
    int     rc,
           *p1,
           *p2;

    rc = (*(int *) s1 - *(int *) s2);
    if (rc)
        return (rc);
    else
    {
        p1 = (int *) s1;
        p2 = (int *) s2;
        return (*(++p1) - *(++p2));
    }
}


/* Functions to format a long with commas. */
/* Should really do this with locales. */
/* Maybe not the best file for this */

static void thousep(char *s1, const char *s2)
{
 if (*s2) {
         switch(strlen(s2) % 3)
         {
                 do { *s1++ = ',';
         case 0:      *s1++ = *s2++;
         case 2:      *s1++ = *s2++;
         case 1:      *s1++ = *s2++;
                 } while (*s2);
         }
 }

        *s1 = '\0';
}

char comma_buffer[100];

const char *comma_long( unsigned long u )
{
    char buf[60];

    sprintf( buf, "%lu", u );
    thousep( comma_buffer, buf );
    return comma_buffer;
}

������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/keychar_out.h���������������������������������������������������������������������0000664�0000771�0001750�00000002207�11166010110�013271� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: keychar_out.h 1736 2005-05-12 15:41:22Z karman $


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL


**
** 2001-03-20 rasc   own module for this routine  (from swish.c)
**
*/


void OutputKeyChar (SWISH *sw, int keychar);

�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/date_time.h�����������������������������������������������������������������������0000664�0000771�0001750�00000002310�11166010110�012702� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: date_time.h 1736 2005-05-12 15:41:22Z karman $
**


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL


**
**  Date / Time routines
**
** 2001-03-20 rasc   own module for this routine  (from swish.c)
**
*/



double  TimeElapsed(void);
double  TimeCPU(void);
char    *getTheDateISO(void);

������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/html.h����������������������������������������������������������������������������0000664�0000771�0001750�00000002615�11166010110�011723� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: html.h 1736 2005-05-12 15:41:22Z karman $


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL


*/

/*
   seems to be a very old module of swish
   some serious work to do in html.c and html.h!! 
   but anyway it seems to work...
 */


/* Just the prototypes */

int countwords_HTML(SWISH *sw, FileProp *fprop, FileRec *fi, char *buffer);

int parsecomment (SWISH *, char *, int, int, int, int *);
char   *parseHTMLtitle(SWISH *,char *buffer);
int     isoktitle(SWISH *sw, char *title);
�������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/xml.c�����������������������������������������������������������������������������0000664�0000771�0001750�00000033533�11166010110�011555� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: xml.c 1736 2005-05-12 15:41:22Z karman $
**


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
    
**
**
** 2001-03-17  rasc  save real_filename as title (instead full real_path)
**                   was: compatibility issue to v 1.x.x
** 2001-05-09  rasc  entities changed (new module)
**
** 2001-07-25  moseley complete rewrite to use James Clark's Expat parser
**

** Mon May  9 10:56:06 CDT 2005 -- added GPL notice

** BUGS:
**      UndefinedMetaTags ignore is not coded
*/

#include "swish.h"
#include "merge.h"
#include "mem.h"
#include "swstring.h"
#include "search.h"
#include "docprop.h"
#include "error.h"
#include "index.h"
#include "metanames.h"

#include "expat/xmlparse/xmlparse.h"   // James Clark's Expat

#define BUFFER_CHUNK_SIZE 20000

typedef struct {
    char   *buffer;     // text for buffer
    int     cur;        // pointer to end of buffer
    int     max;        // max size of buffer
    int     defaultID;  // default ID for no meta names.
} CHAR_BUFFER;


// I think that the property system can deal with StoreDescription in a cleaner way.
// This code shouldn't need to know about that StoreDescription.

typedef struct {
    struct metaEntry    *meta;
    int                 save_size;   /* save max size */
    char                *tag;        /* summary tag */
    int                 active;      /* inside summary */
} SUMMARY_INFO;
    

typedef struct {
    CHAR_BUFFER text_buffer;    // buffer for collecting text

    // CHAR_BUFFER prop_buffer;  // someday, may want a separate property buffer if want to collect tags within props

    SUMMARY_INFO    summary;     // argh.

    char       *ignore_tag;     // tag that triggered ignore (currently used for both)
    int         total_words;
    int         word_pos;
    int         filenum;
    XML_Parser *parser;
    INDEXDATAHEADER *header;
    SWISH      *sw;
    FileProp   *fprop;
    FileRec    *thisFileEntry;
    
} PARSE_DATA;


/* Prototypes */
static void start_hndl(void *data, const char *el, const char **attr);
static void end_hndl(void *data, const char *el);
static void char_hndl(void *data, const char *txt, int txtlen);
static void append_buffer( CHAR_BUFFER *buf, const char *txt, int txtlen );
static void flush_buffer( PARSE_DATA  *parse_data );
static void comment_hndl(void *data, const char *txt);
static char *isIgnoreMetaName(SWISH * sw, char *tag);




/*********************************************************************
*   Entry to index an XML file.
*
*   Creates an XML_Parser object and parses buffer
*
*   Returns:
*       Count of words indexed
*
*   ToDo:
*       This is a stream parser, so could avoid loading entire document into RAM before parsing
*
*********************************************************************/

int countwords_XML (SWISH *sw, FileProp *fprop, FileRec *fi, char *buffer)
{
    PARSE_DATA          parse_data;
    XML_Parser          p = XML_ParserCreate(NULL);
    IndexFILE          *indexf = sw->indexlist;
    struct StoreDescription *stordesc = fprop->stordesc;


    /* Set defaults  */
    memset(&parse_data, 0, sizeof(parse_data));

    parse_data.header = &indexf->header;
    parse_data.parser = p;
    parse_data.sw     = sw;
    parse_data.fprop  = fprop;
    parse_data.filenum = fi->filenum;
    parse_data.word_pos= 1;  /* compress doesn't like zero */
    parse_data.thisFileEntry = fi;


    /* Don't really like this, as mentioned above */
    if ( stordesc && (parse_data.summary.meta = getPropNameByName(parse_data.header, AUTOPROPERTY_SUMMARY)))
    {
        /* Set property limit size for this document type, and store previous size limit */
        parse_data.summary.save_size = parse_data.summary.meta->max_len;
        parse_data.summary.meta->max_len = stordesc->size;
        parse_data.summary.tag = stordesc->field;
    }
        
    
    addCommonProperties(sw, fprop, fi, NULL,NULL, 0);



    if (!p)
        progerr("Failed to create XML parser object for '%s'", fprop->real_path );


    /* Set event handlers */
    XML_SetUserData( p, (void *)&parse_data );          // local data to pass around
    XML_SetElementHandler(p, start_hndl, end_hndl);
    XML_SetCharacterDataHandler(p, char_hndl);

    if( sw->indexComments )
        XML_SetCommentHandler( p, comment_hndl );

    //XML_SetProcessingInstructionHandler(p, proc_hndl);

    if ( !XML_Parse(p, buffer, fprop->fsize, 1) )
        progwarn("XML parse error in file '%s' line %d.  Error: %s",
                     fprop->real_path, XML_GetCurrentLineNumber(p),XML_ErrorString(XML_GetErrorCode(p))); 


    /* clean up */
    XML_ParserFree(p);

    /* Flush any text left in the buffer, and free the buffer */
    flush_buffer( &parse_data );

    if ( parse_data.text_buffer.buffer )
        efree( parse_data.text_buffer.buffer );


    /* Restore the size in the StoreDescription property */
    if ( parse_data.summary.save_size )
        parse_data.summary.meta->max_len = parse_data.summary.save_size;
        
    return parse_data.total_words;
}
    
/*********************************************************************
*   Start Tag Event Handler
*
*   These routines check to see if a given meta tag should be indexed
*   and if the tags should be added as a property
*
*   To Do:
*       deal with attributes!
*
*********************************************************************/


static void start_hndl(void *data, const char *el, const char **attr)
{
    PARSE_DATA *parse_data = (PARSE_DATA *)data;
    struct metaEntry *m;
    SWISH *sw = parse_data->sw;
    char  tag[MAXSTRLEN + 1];


    /* return if within an ignore block */
    if ( parse_data->ignore_tag )
        return;

    /* Flush any text in the buffer */
    flush_buffer( parse_data );


    if(strlen(el) >= MAXSTRLEN)  // easy way out
    {
        progwarn("Warning: Tag found in %s is too long: '%s'", parse_data->fprop->real_path, el );
        return;
    }

    strcpy(tag,(char *)el);
    strtolower( tag );  // $$$ swish ignores case in xml tags!



    /* Bump on all meta names, unless overridden */
    /* Done before the ignore tag check since still need to bump */

    if (!isDontBumpMetaName(sw->dontbumpstarttagslist, tag))
        parse_data->word_pos++;


    /* check for ignore tag (should propably remove char handler for speed) */
    if ( (parse_data->ignore_tag = isIgnoreMetaName( sw, tag )))
        return;


    /* Check for metaNames */

    if ( (m  = getMetaNameByName( parse_data->header, tag)) )
        m->in_tag++;

    else
    {
        if (sw->UndefinedMetaTags == UNDEF_META_AUTO)
        {
            if (sw->verbose)
                printf("!!!Adding automatic MetaName '%s' found in file '%s'\n", tag, parse_data->fprop->real_path);

            addMetaEntry( parse_data->header, tag, META_INDEX, 0)->in_tag++;
        }


        /* If set to "error" on undefined meta tags, then error */
        if (sw->UndefinedMetaTags == UNDEF_META_ERROR)
            progerr("UndefinedMetaNames=error.  Found meta name '%s' in file '%s', not listed as a MetaNames in config", tag, parse_data->fprop->real_path);
    }


    /* Check property names */

    if ( (m  = getPropNameByName( parse_data->header, tag)) )
        m->in_tag++;


    /* Look to enable StoreDescription */
    {
        SUMMARY_INFO    *summary = &parse_data->summary;
        if ( summary->tag && (strcasecmp( tag, summary->tag ) == 0 ))
            summary->active++;
    }

}


/*********************************************************************
*   End Tag Event Handler
*
*
*
*********************************************************************/


static void end_hndl(void *data, const char *el)
{
    PARSE_DATA *parse_data = (PARSE_DATA *)data;
    char  tag[MAXSTRLEN + 1];
    struct metaEntry *m;

    if(strlen(el) > MAXSTRLEN)
    {
        progwarn("Warning: Tag found in %s is too long: '%s'", parse_data->fprop->real_path, el );
        return;
    }

    strcpy(tag,(char *)el);
    strtolower( tag );

    if ( parse_data->ignore_tag )
    {
        if  (strcmp( parse_data->ignore_tag, tag ) == 0)
            parse_data->ignore_tag = NULL;  // don't free since it's a pointer to the config setting
        return;
    }

    /* Flush any text in the buffer */
    flush_buffer( parse_data );


    /* Don't allow matching across tag boundry */
    if (!isDontBumpMetaName(parse_data->sw->dontbumpendtagslist, tag))
       parse_data->word_pos++;
    


    /* Flag that we are not in tag anymore - tags must be balanced, of course. */

    if ( ( m = getMetaNameByName( parse_data->header, tag) ) )
        if ( m->in_tag )
            m->in_tag--;


    if ( ( m = getPropNameByName( parse_data->header, tag) ) )
        if ( m->in_tag )
            m->in_tag--;


    /* Look to disable StoreDescription */
    {
        SUMMARY_INFO    *summary = &parse_data->summary;
        if ( summary->tag && (strcasecmp( tag, summary->tag ) == 0 ))
            summary->active--;
    }

}

/*********************************************************************
*   Character Data Event Handler
*
*   This does the actual adding of text to the index and adding properties
*   if any tags have been found to index
*
*
*********************************************************************/

static void char_hndl(void *data, const char *txt, int txtlen)
{
    PARSE_DATA         *parse_data = (PARSE_DATA *)data;


    /* If currently in an ignore block, then return */
    if ( parse_data->ignore_tag )
        return;

    /* Buffer the text */
    append_buffer( &parse_data->text_buffer, txt, txtlen );

    /* Some day, might want to have a separate property buffer if need to collect more than plain text */
    // append_buffer( parse_data->prop_buffer, txt, txtlen );

}

/*********************************************************************
*   Append character data to the end of the buffer
*
*   Buffer is extended/created if needed
*
*   ToDo: Flush buffer if it gets too large
*
*
*********************************************************************/

static void append_buffer( CHAR_BUFFER *buf, const char *txt, int txtlen )
{

    if ( !txtlen )  // shouldn't happen
        return;


    /* (re)allocate buf if needed */
    
    if ( buf->cur + txtlen >= buf->max )
    {
        buf->max = ( buf->max + BUFFER_CHUNK_SIZE+1 < buf->cur + txtlen )
                    ? buf->cur + txtlen+1
                    : buf->max + BUFFER_CHUNK_SIZE+1;

        buf->buffer = erealloc( buf->buffer, buf->max+1 );
    }


    memcpy( (void *) &(buf->buffer[buf->cur]), txt, txtlen );
    buf->cur += txtlen;
}




/*********************************************************************
*   Flush buffer - adds words to index, and properties
*
*    2001-08 jmruiz Change structure from IN_FILE | IN_META to IN_FILE
*    Since structure does not have much sense in XML, if we use only IN_FILE 
*    we will save memory and disk space (one byte per location)
*
*
*********************************************************************/
static void flush_buffer( PARSE_DATA  *parse_data )
{
    CHAR_BUFFER *buf = &parse_data->text_buffer;
    SWISH       *sw = parse_data->sw;

    /* anything to do? */
    if ( !buf->cur )
        return;

    buf->buffer[buf->cur] = '\0';


    /* Index the text */
    parse_data->total_words +=
        indexstring( sw, buf->buffer, parse_data->filenum, IN_FILE, 0, NULL, &(parse_data->word_pos) );


    /* Add the properties */
    addDocProperties( parse_data->header, &(parse_data->thisFileEntry->docProperties), (unsigned char *)buf->buffer, buf->cur, parse_data->fprop->real_path );


    /* yuck.  Ok, add to summary, if active */
    {
        SUMMARY_INFO    *summary = &parse_data->summary;
        if ( summary->active )
            addDocProperty( &(parse_data->thisFileEntry->docProperties), summary->meta, (unsigned char *)buf->buffer, buf->cur, 0 );
    }


    /* clear the buffer */
    buf->cur = 0;
}



/*********************************************************************
*   Comments
*
*   Should be able to call the char_hndl
*
*   To Do:
*       Can't use DontBump with comments.  Might need a config variable for that.
*
*********************************************************************/
static void comment_hndl(void *data, const char *txt)
{
    PARSE_DATA  *parse_data = (PARSE_DATA *)data;
    SWISH       *sw = parse_data->sw;
    

    /* Bump position around comments - hard coded, always done to prevent phrase matching */
    parse_data->word_pos++;

    /* Index the text */
    parse_data->total_words +=
        indexstring( sw, (char *)txt, parse_data->filenum, IN_COMMENTS, 0, NULL, &(parse_data->word_pos) );


    parse_data->word_pos++;

}



/*********************************************************************
*   check if a tag is an IgnoreTag
*
*   Note: this returns a pointer to the config set tag, so don't free it!
*
*
*********************************************************************/

static char *isIgnoreMetaName(SWISH * sw, char *tag)
{
    struct swline *tmplist = sw->ignoremetalist;

    if (!tmplist)
        return 0;
        
    while (tmplist)
    {
        if (strcmp(tag, tmplist->line) == 0)
            return tmplist->line;

        tmplist = tmplist->next;
    }

    return NULL;
}


���������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/swish_qsort.c���������������������������������������������������������������������0000664�0000771�0001750�00000011676�11166010110�013346� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
 $Id: swish_qsort.c 1717 2005-05-09 23:36:20Z karman $
 *
 * Copyright (c) 1992, 1993
 *	The Regents of the University of California.  All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 
 * 4. Neither the name of the University nor the names of its contributors
 *    may be used to endorse or promote products derived from this software
 *    without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 *
 * $FreeBSD: src/lib/libc/stdlib/qsort.c,v 1.8 1999/08/28 00:01:35 peter Exp $
 *
 * Renamed swish_qsort for use with swish. [wsm] June 2001
 *
 
 * Mon May  9 18:02:15 CDT 2005
 * #3 in BSD license removed per 
 * ftp://ftp.cs.berkeley.edu/pub/4bsd/README.Impt.License.Change
 * this also makes it GPL friendly.
 
 */

#include <stdlib.h>

#define inline

typedef int		 cmp_t (const void *, const void *);
static inline char	*med3 (char *, char *, char *, cmp_t *);
static inline void	 swapfunc (char *, char *, int, int);

#ifndef min
#define min(a, b)	(a) < (b) ? a : b
#endif

/*
 * Qsort routine from Bentley & McIlroy's "Engineering a Sort Function".
 */
#define swapcode(TYPE, parmi, parmj, n) { 		\
	long i = (n) / sizeof (TYPE); 			\
	register TYPE *pi = (TYPE *) (parmi); 		\
	register TYPE *pj = (TYPE *) (parmj); 		\
	do { 						\
		register TYPE	t = *pi;		\
		*pi++ = *pj;				\
		*pj++ = t;				\
        } while (--i > 0);				\
}

#define SWAPINIT(a, es) swaptype = ((char *)a - (char *)0) % sizeof(long) || \
	es % sizeof(long) ? 2 : es == sizeof(long)? 0 : 1;

static inline void
swapfunc(a, b, n, swaptype)
	char *a, *b;
	int n, swaptype;
{
	if(swaptype <= 1)
		swapcode(long, a, b, n)
	else
		swapcode(char, a, b, n)
}

#define swap(a, b)					\
	if (swaptype == 0) {				\
		long t = *(long *)(a);			\
		*(long *)(a) = *(long *)(b);		\
		*(long *)(b) = t;			\
	} else						\
		swapfunc(a, b, es, swaptype)

#define vecswap(a, b, n) 	if ((n) > 0) swapfunc(a, b, n, swaptype)

static inline char *
med3(a, b, c, cmp)
	char *a, *b, *c;
	cmp_t *cmp;
{
	return cmp(a, b) < 0 ?
	       (cmp(b, c) < 0 ? b : (cmp(a, c) < 0 ? c : a ))
              :(cmp(b, c) > 0 ? b : (cmp(a, c) < 0 ? a : c ));
}

void
swish_qsort(void *a, size_t n, size_t es, cmp_t *cmp)
{
	char *pa, *pb, *pc, *pd, *pl, *pm, *pn;
	int d, r, swaptype, swap_cnt;

loop:	SWAPINIT(a, es);
	swap_cnt = 0;
	if (n < 7) {
		for (pm = (char *)a + es; pm < (char *)a + n * es; pm += es)
			for (pl = pm; pl > (char *)a && cmp(pl - es, pl) > 0;
			     pl -= es)
				swap(pl, pl - es);
		return;
	}
	pm = (char *)a + (n / 2) * es;
	if (n > 7) {
		pl = a;
		pn = (char *)a + (n - 1) * es;
		if (n > 40) {
			d = (n / 8) * es;
			pl = med3(pl, pl + d, pl + 2 * d, cmp);
			pm = med3(pm - d, pm, pm + d, cmp);
			pn = med3(pn - 2 * d, pn - d, pn, cmp);
		}
		pm = med3(pl, pm, pn, cmp);
	}
	swap(a, pm);
	pa = pb = (char *)a + es;

	pc = pd = (char *)a + (n - 1) * es;
	for (;;) {
		while (pb <= pc && (r = cmp(pb, a)) <= 0) {
			if (r == 0) {
				swap_cnt = 1;
				swap(pa, pb);
				pa += es;
			}
			pb += es;
		}
		while (pb <= pc && (r = cmp(pc, a)) >= 0) {
			if (r == 0) {
				swap_cnt = 1;
				swap(pc, pd);
				pd -= es;
			}
			pc -= es;
		}
		if (pb > pc)
			break;
		swap(pb, pc);
		swap_cnt = 1;
		pb += es;
		pc -= es;
	}
	if (swap_cnt == 0) {  /* Switch to insertion sort */
		for (pm = (char *)a + es; pm < (char *)a + n * es; pm += es)
			for (pl = pm; pl > (char *)a && cmp(pl - es, pl) > 0;
			     pl -= es)
				swap(pl, pl - es);
		return;
	}

	pn = (char *)a + n * es;
	r = min(pa - (char *)a, pb - pa);
	vecswap(a, pb - r, r);
	r = min(pd - pc, pn - pd - es);
	vecswap(pb, pn - r, r);
	if ((r = pb - pa) > es)
		swish_qsort(a, r / es, es, cmp);
	if ((r = pd - pc) > es) {
		/* Iterate rather than recurse to save stack space */
		a = pn - r;
		n = r / es;
		goto loop;
	}
/*		qsort(pn - r, r / es, es, cmp);*/
}
������������������������������������������������������������������swish-e-2.4.7/src/filter.h��������������������������������������������������������������������������0000664�0000771�0001750�00000004200�11166010110�012234� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: filter.h 1841 2006-10-19 18:51:28Z whmoseley $
**


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL


**
**
** 2001-02-28 rasc    own module started for filters
** 2001-04-09 rasc    enhancing filters
*/


#ifndef __HasSeenModule_Filter
#define __HasSeenModule_Filter	1


/* Module data and structures */



typedef struct FilterList  /* 2002-03-16 moseley */
{
    struct FilterList   *next;
    char                *prog;      /* program name to run */
    regex_list          *regex;     /* list of regular expressions */
    char                *suffix;    /* or plain text suffix */
    StringList          *options;   /* list of parsed options */

} FilterList;




/* Global module data */

struct MOD_Filter {
   /* public:  */
   /* none */
   
   /* private: don't use outside this module! */
    char   *filterdir;              /* 1998-08-07 rasc */ /* depreciated */
    FilterList *filterlist;  /* 2002-03-16 moseley */

};





/* exported Prototypes */

void initModule_Filter   (SWISH *sw);
void freeModule_Filter   (SWISH *sw);
int  configModule_Filter (SWISH *sw, StringList *sl);

struct FilterList *hasfilter (SWISH *sw, char *filename);
FILE *FilterOpen (FileProp *fprop);
int FilterClose (FileProp *fprop);


#endif


������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/Makefile.in�����������������������������������������������������������������������0000664�0000771�0001750�00000074714�11166010110�012664� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������# Makefile.in generated by automake 1.9.6 from Makefile.am.
# @configure_input@

# Copyright (C) 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002,
# 2003, 2004, 2005  Free Software Foundation, Inc.
# This Makefile.in is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.

# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY, to the extent permitted by law; without
# even the implied warranty of MERCHANTABILITY or FITNESS FOR A
# PARTICULAR PURPOSE.

@SET_MAKE@




srcdir = @srcdir@
top_srcdir = @top_srcdir@
VPATH = @srcdir@
pkgdatadir = $(datadir)/@PACKAGE@
pkglibdir = $(libdir)/@PACKAGE@
pkgincludedir = $(includedir)/@PACKAGE@
top_builddir = ..
am__cd = CDPATH="$${ZSH_VERSION+.}$(PATH_SEPARATOR)" && cd
INSTALL = @INSTALL@
install_sh_DATA = $(install_sh) -c -m 644
install_sh_PROGRAM = $(install_sh) -c
install_sh_SCRIPT = $(install_sh) -c
INSTALL_HEADER = $(INSTALL_DATA)
transform = $(program_transform_name)
NORMAL_INSTALL = :
PRE_INSTALL = :
POST_INSTALL = :
NORMAL_UNINSTALL = :
PRE_UNINSTALL = :
POST_UNINSTALL = :
build_triplet = @build@
host_triplet = @host@
bin_PROGRAMS = swish-e$(EXEEXT)
EXTRA_PROGRAMS = libtest$(EXEEXT)
subdir = src
DIST_COMMON = $(include_HEADERS) $(srcdir)/Makefile.am \
	$(srcdir)/Makefile.in $(srcdir)/acconfig.h.in
ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
am__aclocal_m4_deps = $(top_srcdir)/config/acinclude.m4 \
	$(top_srcdir)/configure.in
am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \
	$(ACLOCAL_M4)
mkinstalldirs = $(SHELL) $(top_srcdir)/config/mkinstalldirs
CONFIG_HEADER = acconfig.h
CONFIG_CLEAN_FILES =
am__vpath_adj_setup = srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`;
am__vpath_adj = case $$p in \
    $(srcdir)/*) f=`echo "$$p" | sed "s|^$$srcdirstrip/||"`;; \
    *) f=$$p;; \
  esac;
am__strip_dir = `echo $$p | sed -e 's|^.*/||'`;
am__installdirs = "$(DESTDIR)$(libdir)" "$(DESTDIR)$(bindir)" \
	"$(DESTDIR)$(libexecdir)" "$(DESTDIR)$(includedir)"
libLTLIBRARIES_INSTALL = $(INSTALL)
LTLIBRARIES = $(lib_LTLIBRARIES) $(noinst_LTLIBRARIES)
am__DEPENDENCIES_1 =
am_libswish_e_la_OBJECTS = search.lo swish2.lo swish_words.lo \
	proplimit.lo rank.lo db_read.lo result_sort.lo hash.lo \
	compress.lo db_native.lo ramdisk.lo check.lo error.lo list.lo \
	mem.lo swstring.lo docprop.lo metanames.lo headers.lo \
	swish_qsort.lo date_time.lo double_metaphone.lo stemmer.lo \
	soundex.lo
libswish_e_la_OBJECTS = $(am_libswish_e_la_OBJECTS)
am_libswishindex_la_OBJECTS = fs.lo http.lo httpserver.lo extprog.lo \
	bash.lo methods.lo html.lo txt.lo xml.lo entities.lo index.lo \
	merge.lo pre_sort.lo file.lo filter.lo parse_conffile.lo \
	swregex.lo db_write.lo docprop_write.lo getruntime.lo
libswishindex_la_OBJECTS = $(am_libswishindex_la_OBJECTS)
binPROGRAMS_INSTALL = $(INSTALL_PROGRAM)
PROGRAMS = $(bin_PROGRAMS)
am_libtest_OBJECTS = libtest.$(OBJEXT)
libtest_OBJECTS = $(am_libtest_OBJECTS)
libtest_DEPENDENCIES = libswish-e.la
am_swish_e_OBJECTS = swish.$(OBJEXT) keychar_out.$(OBJEXT) \
	dump.$(OBJEXT) result_output.$(OBJEXT)
swish_e_OBJECTS = $(am_swish_e_OBJECTS)
swish_e_DEPENDENCIES = libswishindex.la libswish-e.la
libexecSCRIPT_INSTALL = $(INSTALL_SCRIPT)
SCRIPTS = $(libexec_SCRIPTS)
DEFAULT_INCLUDES = -I. -I$(srcdir) -I.
depcomp = $(SHELL) $(top_srcdir)/config/depcomp
am__depfiles_maybe = depfiles
COMPILE = $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) \
	$(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS)
LTCOMPILE = $(LIBTOOL) --tag=CC --mode=compile $(CC) $(DEFS) \
	$(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) \
	$(AM_CFLAGS) $(CFLAGS)
CCLD = $(CC)
LINK = $(LIBTOOL) --tag=CC --mode=link $(CCLD) $(AM_CFLAGS) $(CFLAGS) \
	$(AM_LDFLAGS) $(LDFLAGS) -o $@
SOURCES = $(libswish_e_la_SOURCES) $(EXTRA_libswish_e_la_SOURCES) \
	$(libswishindex_la_SOURCES) $(EXTRA_libswishindex_la_SOURCES) \
	$(libtest_SOURCES) $(swish_e_SOURCES)
DIST_SOURCES = $(libswish_e_la_SOURCES) $(EXTRA_libswish_e_la_SOURCES) \
	$(libswishindex_la_SOURCES) $(EXTRA_libswishindex_la_SOURCES) \
	$(libtest_SOURCES) $(swish_e_SOURCES)
RECURSIVE_TARGETS = all-recursive check-recursive dvi-recursive \
	html-recursive info-recursive install-data-recursive \
	install-exec-recursive install-info-recursive \
	install-recursive installcheck-recursive installdirs-recursive \
	pdf-recursive ps-recursive uninstall-info-recursive \
	uninstall-recursive
includeHEADERS_INSTALL = $(INSTALL_HEADER)
HEADERS = $(include_HEADERS)
ETAGS = etags
CTAGS = ctags
DIST_SUBDIRS = $(SUBDIRS)
DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST)
ACLOCAL = @ACLOCAL@
ALLOCA = @ALLOCA@
AMDEP_FALSE = @AMDEP_FALSE@
AMDEP_TRUE = @AMDEP_TRUE@
AMTAR = @AMTAR@
AR = @AR@
AS = @AS@
AUTOCONF = @AUTOCONF@
AUTOHEADER = @AUTOHEADER@
AUTOMAKE = @AUTOMAKE@
AWK = @AWK@
BTREE_OBJS = @BTREE_OBJS@
BUILDDOCS_FALSE = @BUILDDOCS_FALSE@
BUILDDOCS_TRUE = @BUILDDOCS_TRUE@
CC = @CC@
CCDEPMODE = @CCDEPMODE@
CFLAGS = @CFLAGS@
CPP = @CPP@
CPPFLAGS = @CPPFLAGS@
CXX = @CXX@
CXXCPP = @CXXCPP@
CXXDEPMODE = @CXXDEPMODE@
CXXFLAGS = @CXXFLAGS@
CYGPATH_W = @CYGPATH_W@
DEFS = @DEFS@
DEPDIR = @DEPDIR@
DLLTOOL = @DLLTOOL@
ECHO = @ECHO@
ECHO_C = @ECHO_C@
ECHO_N = @ECHO_N@
ECHO_T = @ECHO_T@
EGREP = @EGREP@
EXEEXT = @EXEEXT@
F77 = @F77@
FFLAGS = @FFLAGS@
INSTALLDOCS_FALSE = @INSTALLDOCS_FALSE@
INSTALLDOCS_TRUE = @INSTALLDOCS_TRUE@
INSTALL_DATA = @INSTALL_DATA@
INSTALL_PROGRAM = @INSTALL_PROGRAM@
INSTALL_SCRIPT = @INSTALL_SCRIPT@
INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@
LARGEFILES_MACROS = @LARGEFILES_MACROS@
LDFLAGS = @LDFLAGS@
LIBOBJS = @LIBOBJS@
LIBS = @LIBS@
LIBTOOL = @LIBTOOL@
LIBXML2_CFLAGS = @LIBXML2_CFLAGS@
LIBXML2_LIB = @LIBXML2_LIB@
LIBXML2_OBJS = @LIBXML2_OBJS@
LIBXML_REQUIRED_VERSION = @LIBXML_REQUIRED_VERSION@
LN_S = @LN_S@
LTLIBOBJS = @LTLIBOBJS@
MAINT = @MAINT@
MAINTAINER_MODE_FALSE = @MAINTAINER_MODE_FALSE@
MAINTAINER_MODE_TRUE = @MAINTAINER_MODE_TRUE@
MAKEINFO = @MAKEINFO@
OBJDUMP = @OBJDUMP@
OBJEXT = @OBJEXT@
PACKAGE = @PACKAGE@
PACKAGE_BUGREPORT = @PACKAGE_BUGREPORT@
PACKAGE_NAME = @PACKAGE_NAME@
PACKAGE_STRING = @PACKAGE_STRING@
PACKAGE_TARNAME = @PACKAGE_TARNAME@
PACKAGE_VERSION = @PACKAGE_VERSION@
PATH_SEPARATOR = @PATH_SEPARATOR@
PCRE_CFLAGS = @PCRE_CFLAGS@
PCRE_CONFIG = @PCRE_CONFIG@
PCRE_LIBS = @PCRE_LIBS@
PCRE_REQUIRED_VERSION = @PCRE_REQUIRED_VERSION@
PERL = @PERL@
POD2MAN = @POD2MAN@
RANLIB = @RANLIB@
SET_MAKE = @SET_MAKE@
SHELL = @SHELL@
STRIP = @STRIP@
SWISH_WEB = @SWISH_WEB@
VERSION = @VERSION@
XML2_CONFIG = @XML2_CONFIG@
Z_CFLAGS = @Z_CFLAGS@
Z_LIBS = @Z_LIBS@
ac_ct_AR = @ac_ct_AR@
ac_ct_AS = @ac_ct_AS@
ac_ct_CC = @ac_ct_CC@
ac_ct_CXX = @ac_ct_CXX@
ac_ct_DLLTOOL = @ac_ct_DLLTOOL@
ac_ct_F77 = @ac_ct_F77@
ac_ct_OBJDUMP = @ac_ct_OBJDUMP@
ac_ct_RANLIB = @ac_ct_RANLIB@
ac_ct_STRIP = @ac_ct_STRIP@
am__fastdepCC_FALSE = @am__fastdepCC_FALSE@
am__fastdepCC_TRUE = @am__fastdepCC_TRUE@
am__fastdepCXX_FALSE = @am__fastdepCXX_FALSE@
am__fastdepCXX_TRUE = @am__fastdepCXX_TRUE@
am__include = @am__include@
am__leading_dot = @am__leading_dot@
am__quote = @am__quote@
am__tar = @am__tar@
am__untar = @am__untar@
bindir = @bindir@
build = @build@
build_alias = @build_alias@
build_cpu = @build_cpu@
build_os = @build_os@
build_vendor = @build_vendor@
datadir = @datadir@
exec_prefix = @exec_prefix@
host = @host@
host_alias = @host_alias@
host_cpu = @host_cpu@
host_os = @host_os@
host_vendor = @host_vendor@
includedir = @includedir@
infodir = @infodir@
install_sh = @install_sh@
libdir = @libdir@
libexecdir = @libexecdir@
localstatedir = @localstatedir@
mandir = @mandir@
mkdir_p = @mkdir_p@
oldincludedir = @oldincludedir@
prefix = @prefix@
program_transform_name = @program_transform_name@
sbindir = @sbindir@
sharedstatedir = @sharedstatedir@
sysconfdir = @sysconfdir@
target_alias = @target_alias@
SUBDIRS = expat replace snowball

# Using AM_CPPFLAGS instead of per-target flags means object names
# don't get renamed.  If using per-target _CPPFLAGS then would need
# to update configure.in to use the prefix names on all optional objects
# passed in. (e.g. $BTREE_OBJS).
AM_CPPFLAGS = -Dlibexecdir=\"${libexecdir}\"  \
		-DPATH_SEPARATOR=\"${PATH_SEPARATOR}\" \
		$(Z_CFLAGS) $(PCRE_CFLAGS) $(LIBXML2_CFLAGS) -Ireplace


# Until can figure out how to use AM_AUTOMAKE_INIT([-Wall])
AM_CFLAGS = -Wall @LARGEFILES_MACROS@
swish_e_SOURCES = swish.c swish.h  keychar_out.c keychar_out.h dump.c dump.h result_output.c result_output.h
swish_e_LDADD = libswishindex.la libswish-e.la
libtest_SOURCES = libtest.c 
libtest_LDADD = libswish-e.la
libtest_LDFLAGS = -static
lib_LTLIBRARIES = libswish-e.la
libswish_e_la_LDFLAGS = -no-undefined -version-info 2:0:0 $(Z_LIBS) $(PCRE_LIBS)
libswish_e_la_LIBADD = $(BTREE_OBJS) replace/libreplace.la snowball/libsnowball.la
libswish_e_la_DEPENDENCIES = $(libswish_e_la_LIBADD)
libswish_e_la_SOURCES = \
	config.h \
	search.c search.h \
	swish2.c  \
	swish_words.c swish_words.h \
	proplimit.c proplimit.h \
        rank.c rank.h \
	db_read.c db.h \
	result_sort.c result_sort.h \
	hash.c hash.h \
	compress.c compress.h \
	db_native.c db_native.h \
	ramdisk.c ramdisk.h \
	check.c check.h \
	error.c error.h \
	list.c list.h \
	mem.c mem.h sys.h\
	swstring.c swstring.h \
	docprop.c docprop.h \
	metanames.c metanames.h \
	headers.c headers.h \
	swish_qsort.c swish_qsort.h \
	date_time.c date_time.h \
	double_metaphone.c double_metaphone.h \
	stemmer.c stemmer.h \
	soundex.c soundex.h

EXTRA_libswish_e_la_SOURCES = \
	btree.c btree.h \
	array.c array.h \
	worddata.c worddata.h \
	fhash.c fhash.h

noinst_LTLIBRARIES = libswishindex.la
libswishindex_la_LIBADD = expat/libswexpat.la $(LIBXML2_OBJS) snowball/libsnowball.la
libswishindex_la_LDFLAGS = $(LIBXML2_LIB) $(Z_LIBS) $(PCRE_LIBS) 
libswishindex_la_DEPENDENCIES = $(libswishindex_la_LIBADD)
EXTRA_libswishindex_la_SOURCES = parser.c parser.h
libswishindex_la_SOURCES = \
	fs.c fs.h \
	http.c http.h \
	httpserver.c httpserver.h \
	extprog.c extprog.h \
        bash.c bash.h \
	methods.c \
	html.c html.h \
	txt.c txt.h \
	xml.c xml.h \
	entities.c entities.h \
	index.c index.h \
	merge.c merge.h \
	pre_sort.c \
	file.c file.h \
        filter.c filter.h \
	parse_conffile.c parse_conffile.h \
	swregex.c swregex.h \
	db_write.c  \
	docprop_write.c \
        getruntime.c getruntime.h

include_HEADERS = swish-e.h
libexec_SCRIPTS = swishspider
EXTRA_DIST = swishspider
all: acconfig.h
	$(MAKE) $(AM_MAKEFLAGS) all-recursive

.SUFFIXES:
.SUFFIXES: .c .lo .o .obj
$(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am  $(am__configure_deps)
	@for dep in $?; do \
	  case '$(am__configure_deps)' in \
	    *$$dep*) \
	      cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh \
		&& exit 0; \
	      exit 1;; \
	  esac; \
	done; \
	echo ' cd $(top_srcdir) && $(AUTOMAKE) --foreign  src/Makefile'; \
	cd $(top_srcdir) && \
	  $(AUTOMAKE) --foreign  src/Makefile
.PRECIOUS: Makefile
Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status
	@case '$?' in \
	  *config.status*) \
	    cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh;; \
	  *) \
	    echo ' cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe)'; \
	    cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe);; \
	esac;

$(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES)
	cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh

$(top_srcdir)/configure: @MAINTAINER_MODE_TRUE@ $(am__configure_deps)
	cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh
$(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps)
	cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh

acconfig.h: stamp-h1
	@if test ! -f $@; then \
	  rm -f stamp-h1; \
	  $(MAKE) stamp-h1; \
	else :; fi

stamp-h1: $(srcdir)/acconfig.h.in $(top_builddir)/config.status
	@rm -f stamp-h1
	cd $(top_builddir) && $(SHELL) ./config.status src/acconfig.h
$(srcdir)/acconfig.h.in: @MAINTAINER_MODE_TRUE@ $(am__configure_deps) 
	cd $(top_srcdir) && $(AUTOHEADER)
	rm -f stamp-h1
	touch $@

distclean-hdr:
	-rm -f acconfig.h stamp-h1
install-libLTLIBRARIES: $(lib_LTLIBRARIES)
	@$(NORMAL_INSTALL)
	test -z "$(libdir)" || $(mkdir_p) "$(DESTDIR)$(libdir)"
	@list='$(lib_LTLIBRARIES)'; for p in $$list; do \
	  if test -f $$p; then \
	    f=$(am__strip_dir) \
	    echo " $(LIBTOOL) --mode=install $(libLTLIBRARIES_INSTALL) $(INSTALL_STRIP_FLAG) '$$p' '$(DESTDIR)$(libdir)/$$f'"; \
	    $(LIBTOOL) --mode=install $(libLTLIBRARIES_INSTALL) $(INSTALL_STRIP_FLAG) "$$p" "$(DESTDIR)$(libdir)/$$f"; \
	  else :; fi; \
	done

uninstall-libLTLIBRARIES:
	@$(NORMAL_UNINSTALL)
	@set -x; list='$(lib_LTLIBRARIES)'; for p in $$list; do \
	  p=$(am__strip_dir) \
	  echo " $(LIBTOOL) --mode=uninstall rm -f '$(DESTDIR)$(libdir)/$$p'"; \
	  $(LIBTOOL) --mode=uninstall rm -f "$(DESTDIR)$(libdir)/$$p"; \
	done

clean-libLTLIBRARIES:
	-test -z "$(lib_LTLIBRARIES)" || rm -f $(lib_LTLIBRARIES)
	@list='$(lib_LTLIBRARIES)'; for p in $$list; do \
	  dir="`echo $$p | sed -e 's|/[^/]*$$||'`"; \
	  test "$$dir" != "$$p" || dir=.; \
	  echo "rm -f \"$${dir}/so_locations\""; \
	  rm -f "$${dir}/so_locations"; \
	done

clean-noinstLTLIBRARIES:
	-test -z "$(noinst_LTLIBRARIES)" || rm -f $(noinst_LTLIBRARIES)
	@list='$(noinst_LTLIBRARIES)'; for p in $$list; do \
	  dir="`echo $$p | sed -e 's|/[^/]*$$||'`"; \
	  test "$$dir" != "$$p" || dir=.; \
	  echo "rm -f \"$${dir}/so_locations\""; \
	  rm -f "$${dir}/so_locations"; \
	done
libswish-e.la: $(libswish_e_la_OBJECTS) $(libswish_e_la_DEPENDENCIES) 
	$(LINK) -rpath $(libdir) $(libswish_e_la_LDFLAGS) $(libswish_e_la_OBJECTS) $(libswish_e_la_LIBADD) $(LIBS)
libswishindex.la: $(libswishindex_la_OBJECTS) $(libswishindex_la_DEPENDENCIES) 
	$(LINK)  $(libswishindex_la_LDFLAGS) $(libswishindex_la_OBJECTS) $(libswishindex_la_LIBADD) $(LIBS)
install-binPROGRAMS: $(bin_PROGRAMS)
	@$(NORMAL_INSTALL)
	test -z "$(bindir)" || $(mkdir_p) "$(DESTDIR)$(bindir)"
	@list='$(bin_PROGRAMS)'; for p in $$list; do \
	  p1=`echo $$p|sed 's/$(EXEEXT)$$//'`; \
	  if test -f $$p \
	     || test -f $$p1 \
	  ; then \
	    f=`echo "$$p1" | sed 's,^.*/,,;$(transform);s/$$/$(EXEEXT)/'`; \
	   echo " $(INSTALL_PROGRAM_ENV) $(LIBTOOL) --mode=install $(binPROGRAMS_INSTALL) '$$p' '$(DESTDIR)$(bindir)/$$f'"; \
	   $(INSTALL_PROGRAM_ENV) $(LIBTOOL) --mode=install $(binPROGRAMS_INSTALL) "$$p" "$(DESTDIR)$(bindir)/$$f" || exit 1; \
	  else :; fi; \
	done

uninstall-binPROGRAMS:
	@$(NORMAL_UNINSTALL)
	@list='$(bin_PROGRAMS)'; for p in $$list; do \
	  f=`echo "$$p" | sed 's,^.*/,,;s/$(EXEEXT)$$//;$(transform);s/$$/$(EXEEXT)/'`; \
	  echo " rm -f '$(DESTDIR)$(bindir)/$$f'"; \
	  rm -f "$(DESTDIR)$(bindir)/$$f"; \
	done

clean-binPROGRAMS:
	@list='$(bin_PROGRAMS)'; for p in $$list; do \
	  f=`echo $$p|sed 's/$(EXEEXT)$$//'`; \
	  echo " rm -f $$p $$f"; \
	  rm -f $$p $$f ; \
	done
libtest$(EXEEXT): $(libtest_OBJECTS) $(libtest_DEPENDENCIES) 
	@rm -f libtest$(EXEEXT)
	$(LINK) $(libtest_LDFLAGS) $(libtest_OBJECTS) $(libtest_LDADD) $(LIBS)
swish-e$(EXEEXT): $(swish_e_OBJECTS) $(swish_e_DEPENDENCIES) 
	@rm -f swish-e$(EXEEXT)
	$(LINK) $(swish_e_LDFLAGS) $(swish_e_OBJECTS) $(swish_e_LDADD) $(LIBS)
install-libexecSCRIPTS: $(libexec_SCRIPTS)
	@$(NORMAL_INSTALL)
	test -z "$(libexecdir)" || $(mkdir_p) "$(DESTDIR)$(libexecdir)"
	@list='$(libexec_SCRIPTS)'; for p in $$list; do \
	  if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \
	  if test -f $$d$$p; then \
	    f=`echo "$$p" | sed 's|^.*/||;$(transform)'`; \
	    echo " $(libexecSCRIPT_INSTALL) '$$d$$p' '$(DESTDIR)$(libexecdir)/$$f'"; \
	    $(libexecSCRIPT_INSTALL) "$$d$$p" "$(DESTDIR)$(libexecdir)/$$f"; \
	  else :; fi; \
	done

uninstall-libexecSCRIPTS:
	@$(NORMAL_UNINSTALL)
	@list='$(libexec_SCRIPTS)'; for p in $$list; do \
	  f=`echo "$$p" | sed 's|^.*/||;$(transform)'`; \
	  echo " rm -f '$(DESTDIR)$(libexecdir)/$$f'"; \
	  rm -f "$(DESTDIR)$(libexecdir)/$$f"; \
	done

mostlyclean-compile:
	-rm -f *.$(OBJEXT)

distclean-compile:
	-rm -f *.tab.c

@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/array.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/bash.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/btree.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/check.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/compress.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/date_time.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/db_native.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/db_read.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/db_write.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/docprop.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/docprop_write.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/double_metaphone.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/dump.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/entities.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/error.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/extprog.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/fhash.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/file.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/filter.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/fs.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/getruntime.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/hash.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/headers.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/html.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/http.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/httpserver.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/index.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/keychar_out.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libtest.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/list.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/mem.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/merge.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/metanames.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/methods.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/parse_conffile.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/parser.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/pre_sort.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/proplimit.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ramdisk.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/rank.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/result_output.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/result_sort.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/search.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/soundex.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/stemmer.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/swish.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/swish2.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/swish_qsort.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/swish_words.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/swregex.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/swstring.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/txt.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/worddata.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/xml.Plo@am__quote@

.c.o:
@am__fastdepCC_TRUE@	if $(COMPILE) -MT $@ -MD -MP -MF "$(DEPDIR)/$*.Tpo" -c -o $@ $<; \
@am__fastdepCC_TRUE@	then mv -f "$(DEPDIR)/$*.Tpo" "$(DEPDIR)/$*.Po"; else rm -f "$(DEPDIR)/$*.Tpo"; exit 1; fi
@AMDEP_TRUE@@am__fastdepCC_FALSE@	source='$<' object='$@' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
@am__fastdepCC_FALSE@	$(COMPILE) -c $<

.c.obj:
@am__fastdepCC_TRUE@	if $(COMPILE) -MT $@ -MD -MP -MF "$(DEPDIR)/$*.Tpo" -c -o $@ `$(CYGPATH_W) '$<'`; \
@am__fastdepCC_TRUE@	then mv -f "$(DEPDIR)/$*.Tpo" "$(DEPDIR)/$*.Po"; else rm -f "$(DEPDIR)/$*.Tpo"; exit 1; fi
@AMDEP_TRUE@@am__fastdepCC_FALSE@	source='$<' object='$@' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
@am__fastdepCC_FALSE@	$(COMPILE) -c `$(CYGPATH_W) '$<'`

.c.lo:
@am__fastdepCC_TRUE@	if $(LTCOMPILE) -MT $@ -MD -MP -MF "$(DEPDIR)/$*.Tpo" -c -o $@ $<; \
@am__fastdepCC_TRUE@	then mv -f "$(DEPDIR)/$*.Tpo" "$(DEPDIR)/$*.Plo"; else rm -f "$(DEPDIR)/$*.Tpo"; exit 1; fi
@AMDEP_TRUE@@am__fastdepCC_FALSE@	source='$<' object='$@' libtool=yes @AMDEPBACKSLASH@
@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
@am__fastdepCC_FALSE@	$(LTCOMPILE) -c -o $@ $<

mostlyclean-libtool:
	-rm -f *.lo

clean-libtool:
	-rm -rf .libs _libs

distclean-libtool:
	-rm -f libtool
uninstall-info-am:
install-includeHEADERS: $(include_HEADERS)
	@$(NORMAL_INSTALL)
	test -z "$(includedir)" || $(mkdir_p) "$(DESTDIR)$(includedir)"
	@list='$(include_HEADERS)'; for p in $$list; do \
	  if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \
	  f=$(am__strip_dir) \
	  echo " $(includeHEADERS_INSTALL) '$$d$$p' '$(DESTDIR)$(includedir)/$$f'"; \
	  $(includeHEADERS_INSTALL) "$$d$$p" "$(DESTDIR)$(includedir)/$$f"; \
	done

uninstall-includeHEADERS:
	@$(NORMAL_UNINSTALL)
	@list='$(include_HEADERS)'; for p in $$list; do \
	  f=$(am__strip_dir) \
	  echo " rm -f '$(DESTDIR)$(includedir)/$$f'"; \
	  rm -f "$(DESTDIR)$(includedir)/$$f"; \
	done

# This directory's subdirectories are mostly independent; you can cd
# into them and run `make' without going through this Makefile.
# To change the values of `make' variables: instead of editing Makefiles,
# (1) if the variable is set in `config.status', edit `config.status'
#     (which will cause the Makefiles to be regenerated when you run `make');
# (2) otherwise, pass the desired values on the `make' command line.
$(RECURSIVE_TARGETS):
	@failcom='exit 1'; \
	for f in x $$MAKEFLAGS; do \
	  case $$f in \
	    *=* | --[!k]*);; \
	    *k*) failcom='fail=yes';; \
	  esac; \
	done; \
	dot_seen=no; \
	target=`echo $@ | sed s/-recursive//`; \
	list='$(SUBDIRS)'; for subdir in $$list; do \
	  echo "Making $$target in $$subdir"; \
	  if test "$$subdir" = "."; then \
	    dot_seen=yes; \
	    local_target="$$target-am"; \
	  else \
	    local_target="$$target"; \
	  fi; \
	  (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \
	  || eval $$failcom; \
	done; \
	if test "$$dot_seen" = "no"; then \
	  $(MAKE) $(AM_MAKEFLAGS) "$$target-am" || exit 1; \
	fi; test -z "$$fail"

mostlyclean-recursive clean-recursive distclean-recursive \
maintainer-clean-recursive:
	@failcom='exit 1'; \
	for f in x $$MAKEFLAGS; do \
	  case $$f in \
	    *=* | --[!k]*);; \
	    *k*) failcom='fail=yes';; \
	  esac; \
	done; \
	dot_seen=no; \
	case "$@" in \
	  distclean-* | maintainer-clean-*) list='$(DIST_SUBDIRS)' ;; \
	  *) list='$(SUBDIRS)' ;; \
	esac; \
	rev=''; for subdir in $$list; do \
	  if test "$$subdir" = "."; then :; else \
	    rev="$$subdir $$rev"; \
	  fi; \
	done; \
	rev="$$rev ."; \
	target=`echo $@ | sed s/-recursive//`; \
	for subdir in $$rev; do \
	  echo "Making $$target in $$subdir"; \
	  if test "$$subdir" = "."; then \
	    local_target="$$target-am"; \
	  else \
	    local_target="$$target"; \
	  fi; \
	  (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \
	  || eval $$failcom; \
	done && test -z "$$fail"
tags-recursive:
	list='$(SUBDIRS)'; for subdir in $$list; do \
	  test "$$subdir" = . || (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) tags); \
	done
ctags-recursive:
	list='$(SUBDIRS)'; for subdir in $$list; do \
	  test "$$subdir" = . || (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) ctags); \
	done

ID: $(HEADERS) $(SOURCES) $(LISP) $(TAGS_FILES)
	list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \
	unique=`for i in $$list; do \
	    if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \
	  done | \
	  $(AWK) '    { files[$$0] = 1; } \
	       END { for (i in files) print i; }'`; \
	mkid -fID $$unique
tags: TAGS

TAGS: tags-recursive $(HEADERS) $(SOURCES) acconfig.h.in $(TAGS_DEPENDENCIES) \
		$(TAGS_FILES) $(LISP)
	tags=; \
	here=`pwd`; \
	if ($(ETAGS) --etags-include --version) >/dev/null 2>&1; then \
	  include_option=--etags-include; \
	  empty_fix=.; \
	else \
	  include_option=--include; \
	  empty_fix=; \
	fi; \
	list='$(SUBDIRS)'; for subdir in $$list; do \
	  if test "$$subdir" = .; then :; else \
	    test ! -f $$subdir/TAGS || \
	      tags="$$tags $$include_option=$$here/$$subdir/TAGS"; \
	  fi; \
	done; \
	list='$(SOURCES) $(HEADERS) acconfig.h.in $(LISP) $(TAGS_FILES)'; \
	unique=`for i in $$list; do \
	    if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \
	  done | \
	  $(AWK) '    { files[$$0] = 1; } \
	       END { for (i in files) print i; }'`; \
	if test -z "$(ETAGS_ARGS)$$tags$$unique"; then :; else \
	  test -n "$$unique" || unique=$$empty_fix; \
	  $(ETAGS) $(ETAGSFLAGS) $(AM_ETAGSFLAGS) $(ETAGS_ARGS) \
	    $$tags $$unique; \
	fi
ctags: CTAGS
CTAGS: ctags-recursive $(HEADERS) $(SOURCES) acconfig.h.in $(TAGS_DEPENDENCIES) \
		$(TAGS_FILES) $(LISP)
	tags=; \
	here=`pwd`; \
	list='$(SOURCES) $(HEADERS) acconfig.h.in $(LISP) $(TAGS_FILES)'; \
	unique=`for i in $$list; do \
	    if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \
	  done | \
	  $(AWK) '    { files[$$0] = 1; } \
	       END { for (i in files) print i; }'`; \
	test -z "$(CTAGS_ARGS)$$tags$$unique" \
	  || $(CTAGS) $(CTAGSFLAGS) $(AM_CTAGSFLAGS) $(CTAGS_ARGS) \
	     $$tags $$unique

GTAGS:
	here=`$(am__cd) $(top_builddir) && pwd` \
	  && cd $(top_srcdir) \
	  && gtags -i $(GTAGS_ARGS) $$here

distclean-tags:
	-rm -f TAGS ID GTAGS GRTAGS GSYMS GPATH tags

distdir: $(DISTFILES)
	@srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; \
	topsrcdirstrip=`echo "$(top_srcdir)" | sed 's|.|.|g'`; \
	list='$(DISTFILES)'; for file in $$list; do \
	  case $$file in \
	    $(srcdir)/*) file=`echo "$$file" | sed "s|^$$srcdirstrip/||"`;; \
	    $(top_srcdir)/*) file=`echo "$$file" | sed "s|^$$topsrcdirstrip/|$(top_builddir)/|"`;; \
	  esac; \
	  if test -f $$file || test -d $$file; then d=.; else d=$(srcdir); fi; \
	  dir=`echo "$$file" | sed -e 's,/[^/]*$$,,'`; \
	  if test "$$dir" != "$$file" && test "$$dir" != "."; then \
	    dir="/$$dir"; \
	    $(mkdir_p) "$(distdir)$$dir"; \
	  else \
	    dir=''; \
	  fi; \
	  if test -d $$d/$$file; then \
	    if test -d $(srcdir)/$$file && test $$d != $(srcdir); then \
	      cp -pR $(srcdir)/$$file $(distdir)$$dir || exit 1; \
	    fi; \
	    cp -pR $$d/$$file $(distdir)$$dir || exit 1; \
	  else \
	    test -f $(distdir)/$$file \
	    || cp -p $$d/$$file $(distdir)/$$file \
	    || exit 1; \
	  fi; \
	done
	list='$(DIST_SUBDIRS)'; for subdir in $$list; do \
	  if test "$$subdir" = .; then :; else \
	    test -d "$(distdir)/$$subdir" \
	    || $(mkdir_p) "$(distdir)/$$subdir" \
	    || exit 1; \
	    distdir=`$(am__cd) $(distdir) && pwd`; \
	    top_distdir=`$(am__cd) $(top_distdir) && pwd`; \
	    (cd $$subdir && \
	      $(MAKE) $(AM_MAKEFLAGS) \
	        top_distdir="$$top_distdir" \
	        distdir="$$distdir/$$subdir" \
	        distdir) \
	      || exit 1; \
	  fi; \
	done
check-am: all-am
check: check-recursive
all-am: Makefile $(LTLIBRARIES) $(PROGRAMS) $(SCRIPTS) $(HEADERS) \
		acconfig.h
install-binPROGRAMS: install-libLTLIBRARIES

installdirs: installdirs-recursive
installdirs-am:
	for dir in "$(DESTDIR)$(libdir)" "$(DESTDIR)$(bindir)" "$(DESTDIR)$(libexecdir)" "$(DESTDIR)$(includedir)"; do \
	  test -z "$$dir" || $(mkdir_p) "$$dir"; \
	done
install: install-recursive
install-exec: install-exec-recursive
install-data: install-data-recursive
uninstall: uninstall-recursive

install-am: all-am
	@$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am

installcheck: installcheck-recursive
install-strip:
	$(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \
	  install_sh_PROGRAM="$(INSTALL_STRIP_PROGRAM)" INSTALL_STRIP_FLAG=-s \
	  `test -z '$(STRIP)' || \
	    echo "INSTALL_PROGRAM_ENV=STRIPPROG='$(STRIP)'"` install
mostlyclean-generic:

clean-generic:

distclean-generic:
	-test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES)

maintainer-clean-generic:
	@echo "This command is intended for maintainers to use"
	@echo "it deletes files that may require special tools to rebuild."
clean: clean-recursive

clean-am: clean-binPROGRAMS clean-generic clean-libLTLIBRARIES \
	clean-libtool clean-noinstLTLIBRARIES mostlyclean-am

distclean: distclean-recursive
	-rm -rf ./$(DEPDIR)
	-rm -f Makefile
distclean-am: clean-am distclean-compile distclean-generic \
	distclean-hdr distclean-libtool distclean-tags

dvi: dvi-recursive

dvi-am:

html: html-recursive

info: info-recursive

info-am:

install-data-am: install-includeHEADERS

install-exec-am: install-binPROGRAMS install-libLTLIBRARIES \
	install-libexecSCRIPTS

install-info: install-info-recursive

install-man:

installcheck-am:

maintainer-clean: maintainer-clean-recursive
	-rm -rf ./$(DEPDIR)
	-rm -f Makefile
maintainer-clean-am: distclean-am maintainer-clean-generic

mostlyclean: mostlyclean-recursive

mostlyclean-am: mostlyclean-compile mostlyclean-generic \
	mostlyclean-libtool

pdf: pdf-recursive

pdf-am:

ps: ps-recursive

ps-am:

uninstall-am: uninstall-binPROGRAMS uninstall-includeHEADERS \
	uninstall-info-am uninstall-libLTLIBRARIES \
	uninstall-libexecSCRIPTS

uninstall-info: uninstall-info-recursive

.PHONY: $(RECURSIVE_TARGETS) CTAGS GTAGS all all-am check check-am \
	clean clean-binPROGRAMS clean-generic clean-libLTLIBRARIES \
	clean-libtool clean-noinstLTLIBRARIES clean-recursive ctags \
	ctags-recursive distclean distclean-compile distclean-generic \
	distclean-hdr distclean-libtool distclean-recursive \
	distclean-tags distdir dvi dvi-am html html-am info info-am \
	install install-am install-binPROGRAMS install-data \
	install-data-am install-exec install-exec-am \
	install-includeHEADERS install-info install-info-am \
	install-libLTLIBRARIES install-libexecSCRIPTS install-man \
	install-strip installcheck installcheck-am installdirs \
	installdirs-am maintainer-clean maintainer-clean-generic \
	maintainer-clean-recursive mostlyclean mostlyclean-compile \
	mostlyclean-generic mostlyclean-libtool mostlyclean-recursive \
	pdf pdf-am ps ps-am tags tags-recursive uninstall uninstall-am \
	uninstall-binPROGRAMS uninstall-includeHEADERS \
	uninstall-info-am uninstall-libLTLIBRARIES \
	uninstall-libexecSCRIPTS

# Tell versions [3.59,3.63) of GNU make to not export all variables.
# Otherwise a system limit (for SysV at least) may be exceeded.
.NOEXPORT:
����������������������������������������������������swish-e-2.4.7/src/bash.h����������������������������������������������������������������������������0000664�0000771�0001750�00000003332�11166010110�011671� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/* 
$Id: bash.h 1799 2006-06-11 02:28:19Z augur $

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL

*/


#define xmalloc emalloc
#define xfree efree
#define xrealloc erealloc

#define FS_EXISTS		0x1
#if defined(_WIN32) && !defined(__CYGWIN__)
#define FS_EXECABLE		0x1
#else
#define FS_EXECABLE		0x2
#endif

/* horrible Win32 hack */
#if defined _WIN32 || defined(__VMS)
/* Fake group functions... */
#define GETGROUPS_T int
#define getegid() 0
#define geteuid() 0
#define getgid()  0
#endif

#define savestring(x) (char *)strcpy((char *)xmalloc(1 + strlen (x)), (x))

extern int file_status(const char *name);
extern int absolute_program(const char *string);
extern char *get_next_path_element(const char *path_list, int *path_index_pointer);
extern char *make_full_pathname(const char *path, const char *name, int name_len);
������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/worddata.c������������������������������������������������������������������������0000664�0000771�0001750�00000062160�11166010110�012560� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: worddata.c 1946 2007-10-22 14:56:35Z karpet $


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
Mon May  9 10:57:22 CDT 2005 -- added GPL notice


*/
    
    
/********************************************************************************************
 * Here are some comments about how this works...
 * Note José Manuel Ruiz - April 27. 2005
 *
 *
 * The information is stored in pages. The pages are of fixed size. Its size is
 * defined by WORDDATA_PageSize. It must be good that this value is identical or
 * at least multiple of the I/O page size.
 *
 * If the size of worddata to be inserted do not fit in 1 page it is stored
 * en several contigous pages. This is the reason for 2 functions:
 *
 * WORDDATA_Put --> For worddata that fits in one page
 * WORDDATA_PutBig -> For worddata that does not fir in one page
 *
 * (There are equivalents for read and delete)
 *
 * For data that fits in 1 page...
 *
 * The data is stored in chunks of blocks. The size of the basic block is
 * defined by WORDDATA_BlockSize.
 *
 * The worddata is prefixed by 3 bytes:
 * - The first byte is an id from 1 to 0xff (so we can store up to 255
 *   worddatas in a page). This number is unique and identifies a worddata
 *   chunk
 *
 * - The bytes 2 and 3 are the size of worddata. To get the size apply: (p[1]
 *   << 8) + p[2]
 *
 * As you see, using a block size of 16 bytes, for a wordata of 14 bytes we
 * will need 32 bytes (2 blocks of 16 bytes): 1 byte for the id, 2 bytes for
 * the length (00 0E) and 14 bytes for the data (17 bytes rounded up to 32
 * bytes). So, in this case we are wasting 15 bytes. But it is the worst case.
 *
 ********************************************************************************************/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/*
#define DEBUG
*/

#include "swish.h"
#include "mem.h"
#include "compress.h"
#include "worddata.h"
#include "error.h"

/* WORDDATA page size */
/* !!!! Must no be less than 4096 and not greater than 65536*/
#define WORDDATA_PageSize 4096
/* WORDDATA Block size */
/* !!!! For a WORDDATA_PageSize of 4096, the Block size is 16 */
#define WORDDATA_BlockSize (WORDDATA_PageSize >> 8)
/* Using 1 byte for storing the block id, this leads to only 255 max blocks */
#define WORDDATA_Max_Blocks_Per_Page 0xff


/* Round to WORDDATA_PageSize */
#define WORDDATA_RoundPageSize(n) (sw_off_t)(((sw_off_t)(n) + (sw_off_t)(WORDDATA_PageSize - 1)) & (sw_off_t)(~(WORDDATA_PageSize - 1)))

/* Round a number to the upper BlockSize */
#define WORDDATA_RoundBlockSize(n) (((n) + WORDDATA_BlockSize - 1) & (~(WORDDATA_BlockSize - 1)))

/* Let's asign the first block for the header */
/* Check this if we put more data in the header !!!! */
#define WORDDATA_PageHeaderSize (WORDDATA_BlockSize) 

/* Max data size to fit in a single page */
#define WORDDATA_MaxDataSize (WORDDATA_PageSize - WORDDATA_PageHeaderSize)

#define WORDDATA_PageData(pg) ((pg)->data + WORDDATA_PageHeaderSize)

#define WORDDATA_SetBlocksInUse(pg,num) ( *(int *)((pg)->data + 0 * sizeof(int)) = PACKLONG(num))
#define WORDDATA_GetBlocksInUse(pg,num) ( (num) = UNPACKLONG(*(int *)((pg)->data + 0 * sizeof(int))))

#define WORDDATA_SetNumRecords(pg,num) ( *(int *)((pg)->data + 1 * sizeof(int)) = PACKLONG(num))
#define WORDDATA_GetNumRecords(pg,num) ( (num) = UNPACKLONG(*(int *)((pg)->data + 1 * sizeof(int))))


/* Routine to write the page to disk 
*/
int WORDDATA_WritePageToDisk(FILE *fp, WORDDATA_Page *pg)
{
    /* Set the page basic data*/
    /* Number of blocks in use */
    WORDDATA_SetBlocksInUse(pg,pg->used_blocks);
    /* Number of worddata entries in the page */
    WORDDATA_SetNumRecords(pg,pg->n);
    /* Seek to file pointer and write */
    sw_fseek(fp,(sw_off_t)(pg->page_number * (sw_off_t)WORDDATA_PageSize),SEEK_SET);
    if ( sw_fwrite(pg->data,WORDDATA_PageSize,1,fp) != 1 )
        progerrno("Failed to write page to disk: "); 
    return 1;
}

int WORDDATA_WritePage(WORDDATA *b, WORDDATA_Page *pg)
{
int hash = (int)(pg->page_number % (sw_off_t)WORDDATA_CACHE_SIZE);
WORDDATA_Page *tmp;
    /* Mark page as modified */
    pg->modified =1;
    /* If page is already in cache return. If not, add it to the cache */
    if((tmp = b->cache[hash]))
    {
        while(tmp)
        {
            if(tmp->page_number == pg->page_number)
            {
                return 0;
            }
            tmp = tmp->next_cache;
        }
    }
    pg->next_cache = b->cache[hash];
    b->cache[hash] = pg;
    return 0;
}

int WORDDATA_FlushCache(WORDDATA *b)
{
int i;
WORDDATA_Page *tmp, *next;
    for(i = 0; i < WORDDATA_CACHE_SIZE; i++)
    {
        if((tmp = b->cache[i]))
        {
            while(tmp)
            {
                next = tmp->next_cache;
                if(tmp->modified)
                {
                    WORDDATA_WritePageToDisk(b->fp, tmp);
                    tmp->modified = 0;
                }
                if(tmp != b->cache[i])
                    efree(tmp);

                tmp = next;
            }
            b->cache[i]->next_cache = NULL;
        }
    }
    return 0;
}

/* Routine to remove the page page_number from the cache */
void WORDDATA_CleanCachePage(WORDDATA *b,sw_off_t page_number)
{
int hash = (int)(page_number % (sw_off_t)WORDDATA_CACHE_SIZE);
WORDDATA_Page *tmp,*next,*prev = NULL;

    /* Search for page in cache */
    if((tmp = b->cache[hash]))
    {
        while(tmp)
        {
            if(tmp->page_number == page_number)
            {
                next = tmp->next_cache;
                efree(tmp);
                if(prev) 
                    prev->next_cache = next;
                else
                    b->cache[hash] = next;
                return;
            }
            prev = tmp;
            tmp = tmp->next_cache;
        }
    }
}

int WORDDATA_CleanCache(WORDDATA *b)
{
int i;
WORDDATA_Page *tmp,*next;
    for(i = 0; i < WORDDATA_CACHE_SIZE; i++)
    {
        if((tmp = b->cache[i]))
        {
            while(tmp)
            {
                next = tmp->next_cache;
                efree(tmp);
                tmp = next;
            }
            b->cache[i] = NULL;
        }
    }
    return 0;
}

WORDDATA_Page *WORDDATA_ReadPageFromDisk(FILE *fp, sw_off_t page_number)
{
WORDDATA_Page *pg = (WORDDATA_Page *)emalloc(sizeof(WORDDATA_Page) + WORDDATA_PageSize);

    if(sw_fseek(fp,(sw_off_t)(page_number * (sw_off_t)WORDDATA_PageSize),SEEK_SET)!=0 || ((sw_off_t)(page_number * (sw_off_t)WORDDATA_PageSize) != sw_ftell(fp)))
        progerrno("Failed to read page from disk: "); 

    sw_fread(pg->data,WORDDATA_PageSize, 1, fp);

    /* Load  basic data from page */
    /* Blocks in use */
    WORDDATA_GetBlocksInUse(pg,pg->used_blocks);
    /* Number of entries */
    WORDDATA_GetNumRecords(pg,pg->n);

    /* Page number */
    pg->page_number = page_number;
    /* Mark page as not modified */
    pg->modified = 0;
    return pg;
}

WORDDATA_Page *WORDDATA_ReadPage(WORDDATA *b, sw_off_t page_number)
{
int hash = (int)(page_number % (sw_off_t)WORDDATA_CACHE_SIZE);
WORDDATA_Page *tmp;

    /* Search for page in cache */
    if((tmp = b->cache[hash]))
    {
        while(tmp)
        {
            if(tmp->page_number == page_number)
            {
                return tmp;
            }
            tmp = tmp->next_cache;
        }
    }

    /* Not in cache. Read it from disk */
    tmp = WORDDATA_ReadPageFromDisk(b->fp, page_number);
    tmp->modified = 0;

    /* mark page as being used */
    tmp->in_use = 1;

    /* Add page to cache */
    tmp->next_cache = b->cache[hash];
    b->cache[hash] = tmp;
    return tmp;
}

/* Routine to get a new page */
WORDDATA_Page *WORDDATA_NewPage(WORDDATA *b)
{
WORDDATA_Page *pg;
sw_off_t offset;
FILE *fp = b->fp;
int hash;
int i;
sw_off_t page_number = (sw_off_t)0;
unsigned char empty_buffer[WORDDATA_PageSize];
    /* Let's see if we have a previous available page */
    if(b->num_Reusable_Pages)
    {
        /* First, look for a page of the same size */
        for(i = 0; i < b->num_Reusable_Pages ; i++)
        {
            if(WORDDATA_PageSize == b->Reusable_Pages[i].page_size)
                break;
        }
        /* If not found, let's try with a bigger one if exits */
        if(i == b->num_Reusable_Pages)
        {
            for(i = 0; i < b->num_Reusable_Pages ; i++)
            {
                if(WORDDATA_PageSize < b->Reusable_Pages[i].page_size)
                    break;
            }
        }
        /* If we got one page return it */
        if(i != b->num_Reusable_Pages)
        {
            page_number = b->Reusable_Pages[i].page_number;
            if(WORDDATA_PageSize == b->Reusable_Pages[i].page_size)
            {
                for(++i;i<b->num_Reusable_Pages;i++)
                {
                    /* remove page */
                    b->Reusable_Pages[i-1].page_number=b->Reusable_Pages[i].page_number;
                    b->Reusable_Pages[i-1].page_size=b->Reusable_Pages[i].page_size;
                }
                b->num_Reusable_Pages--;
            }
            else
            {
                b->Reusable_Pages[i].page_number ++;
                b->Reusable_Pages[i].page_size -= WORDDATA_PageSize;
            }
        }
    }
    /* If there is not any reusable page let's get it from disk */
    if(! page_number)
    {
        /* Get file pointer */
        if(sw_fseek(fp,(sw_off_t)0,SEEK_END) !=0)
            progerrno("Internal error seeking: "); 


        offset = sw_ftell(fp);
        /* Round up file pointer */
        offset = WORDDATA_RoundPageSize(offset);

        /* Set new file pointer - data will be aligned */
        if(sw_fseek(fp,offset, SEEK_SET)!=0 || offset != sw_ftell(fp))
            progerrno("Internal error seeking: "); 

        /* Reserve space in file */
        memset(empty_buffer,'0',WORDDATA_PageSize);

        if(sw_fwrite(empty_buffer,1,WORDDATA_PageSize,fp)!=WORDDATA_PageSize || ((sw_off_t)WORDDATA_PageSize + offset) != sw_ftell(fp))
            progerrno("Faild to write page data: ");

        page_number = offset / (sw_off_t)WORDDATA_PageSize;
    }

    pg = (WORDDATA_Page *)emalloc(sizeof(WORDDATA_Page) + WORDDATA_PageSize);
    memset(pg,0,sizeof(WORDDATA_Page) + WORDDATA_PageSize);
        /* Reserve space in file */

    pg->used_blocks = 0;
    pg->n = 0; /* Number of records */

    pg->page_number = page_number;

    /* add to cache */
    pg->modified = 1;
    pg->in_use = 1;
    hash = (int)(pg->page_number % (sw_off_t)WORDDATA_CACHE_SIZE);
    pg->next_cache = b->cache[hash];
    b->cache[hash] = pg;
    return pg;
}

void WORDDATA_FreePage(WORDDATA *b, WORDDATA_Page *pg)
{
int hash = (int)(pg->page_number % (sw_off_t)WORDDATA_CACHE_SIZE);

WORDDATA_Page *tmp;

    tmp = b->cache[hash];

    while(tmp)
    {
        if (tmp->page_number != pg->page_number)
            tmp = tmp->next_cache;
        else
        {
            tmp->in_use = 0;
            break;
        }
    }
}

WORDDATA *WORDDATA_New(FILE *fp)
{
WORDDATA *b;
    b = (WORDDATA *) emalloc(sizeof(WORDDATA));
    memset(b,0,sizeof(WORDDATA));
    b->fp = fp;
    return b;
}


WORDDATA *WORDDATA_Open(FILE *fp)
{
    return WORDDATA_New(fp);
}

void WORDDATA_Close(WORDDATA *bt)
{
    WORDDATA_FlushCache(bt);
    WORDDATA_CleanCache(bt);
    efree(bt);
}


sw_off_t WORDDATA_PutBig(WORDDATA *b, unsigned int len, unsigned char *data)
{
sw_off_t offset;
unsigned long p_len = (unsigned long)PACKLONG((unsigned long)len);
int size = WORDDATA_RoundPageSize(sizeof(p_len) + len);
FILE *fp = b->fp;
sw_off_t id;
sw_off_t page_number = (sw_off_t)0;
int i;
    /* Let's see if we have a previous available page */
    if(b->num_Reusable_Pages)
    {
        /* First, look for a page of the same size */
        for(i = 0; i < b->num_Reusable_Pages ; i++)
        {
            if(size == b->Reusable_Pages[i].page_size)
                break;
        }
        /* If not found, let's try with a bigger one if exits */
        if(i == b->num_Reusable_Pages)
        {
            for(i = 0; i < b->num_Reusable_Pages ; i++)
            {
                if(size < b->Reusable_Pages[i].page_size)
                    break;
            }
        }
        /* If we got one page return it */
        if(i != b->num_Reusable_Pages)
        {
            page_number = b->Reusable_Pages[i].page_number;
            if(size == b->Reusable_Pages[i].page_size)
            {
                for(++i;i<b->num_Reusable_Pages;i++)
                {
                    /* remove page */
                    b->Reusable_Pages[i-1].page_number=b->Reusable_Pages[i].page_number;
                    b->Reusable_Pages[i-1].page_size=b->Reusable_Pages[i].page_size;
                }
                b->num_Reusable_Pages--;
            }
            else
            {
                b->Reusable_Pages[i].page_number += size/WORDDATA_PageSize;
                b->Reusable_Pages[i].page_size -= size;
            }
        }
    }
    if(! page_number)
    {
        /* Get file pointer */
        if(sw_fseek(fp,(sw_off_t)0,SEEK_END) !=0)
            progerrno("Internal error seeking: "); 

        offset = sw_ftell(fp);
        /* Round up file pointer */
        offset = WORDDATA_RoundPageSize(offset);
    }
    else
    {
        offset = page_number * (sw_off_t)WORDDATA_PageSize;
    }
    /* Set new file pointer - data will be aligned */
    if(sw_fseek(fp,offset, SEEK_SET)!=0 || offset != sw_ftell(fp))
        progerrno("Internal error seeking: "); 

    id = (sw_off_t)(((offset / (sw_off_t)WORDDATA_PageSize)) << (sw_off_t)8);

    /* Write packed length */
    sw_fwrite(&p_len,1,sizeof(p_len),fp);
    /* Write data */
    sw_fwrite(data,1,len,fp);

    /* New offset */
    offset = sw_ftell(fp);
    /* Round up file pointer */
    offset = WORDDATA_RoundPageSize(offset);
    /* Set new file pointer - data will be aligned */
    if(sw_fseek(fp,offset, SEEK_SET)!=0 || offset != sw_ftell(fp))
        progerrno("Internal error seeking: "); 

    b->lastid = id;
    return id;
}


sw_off_t WORDDATA_Put(WORDDATA *b, unsigned int len, unsigned char *data)
{
int required_length;
int free_blocks;
int i, r_id, r_len, tmp;
int last_id, free_id;
unsigned char *p,*q;
unsigned char buffer[WORDDATA_PageSize];
WORDDATA_Page *last_page=NULL;
    /* Check if data fits in a single page */
    /* We need 1 byte for the id plus two bytes for the size */ 
    required_length = len + 1 + 2;
    /* Round it to the upper block size */
    required_length = WORDDATA_RoundBlockSize(required_length);
    if(required_length > WORDDATA_MaxDataSize)
    {
        /* Store long record in file */
        return WORDDATA_PutBig(b,len,data);
    }

    /* let's see if the data fits in the last page */
    /* First - Check for a page with a Del Operation */
    if(b->last_del_page)
    {
        free_blocks = WORDDATA_Max_Blocks_Per_Page - b->last_del_page->used_blocks;
        if(!(required_length > (free_blocks * WORDDATA_BlockSize)))
        {
            last_page = b->last_del_page;
        }
    }
    if(!last_page)
    {
        if( b->last_put_page)
        {
            /* Now check for the last page in a put operation */
            free_blocks = WORDDATA_Max_Blocks_Per_Page - b->last_put_page->used_blocks;
            if(required_length > (free_blocks * WORDDATA_BlockSize))
            {
                WORDDATA_FreePage(b,b->last_put_page);

                /* Save some memory - Do some flush of the data */
                if(!(b->page_counter % WORDDATA_CACHE_SIZE))
                {
                    WORDDATA_FlushCache(b);
                    WORDDATA_CleanCache(b);
                    b->page_counter = 0;
                    b->last_get_page = b->last_put_page = b->last_del_page =  0;
                }
                b->page_counter++;
                b->last_put_page = WORDDATA_NewPage(b);
            }
        }
        else
        {
            /* Save some memory - Do some flush flush of the data */
            if(!(b->page_counter % WORDDATA_CACHE_SIZE))
            {
                WORDDATA_FlushCache(b);
                WORDDATA_CleanCache(b);
                b->page_counter = 0;
                b->last_get_page = b->last_put_page = b->last_del_page =  0;
            }
            b->page_counter++;
            b->last_put_page = WORDDATA_NewPage(b);
        }
        last_page = b->last_put_page;
    }
    
    for(i = 0, free_id = 0, last_id = 0, p = WORDDATA_PageData(last_page); i < last_page->n; i++)
    {
        /* Get the record id */
        r_id = (int) (p[0]);
        /* Get the record length */
        r_len = ((((int)(p[1])) << 8) + (int)(p[2]));
        if((r_id - last_id) > 1)   /* find a reusable id */
        {
            free_id = last_id + 1;
            break;
        }
        last_id = r_id;
        p += WORDDATA_RoundBlockSize((3 + r_len));
    }
    if(!free_id)
        free_id = last_id + 1; /* The first block (0) is for the header */

    /* Let's use a temporal buffer and make the modifications in it */
    q = buffer;
    /* Init the buffer with the page content */
    /* p points to the start of the offset for the new worddata in the "real" page */
    memcpy(q,WORDDATA_PageData(last_page), p - WORDDATA_PageData(last_page));
    q += p - WORDDATA_PageData(last_page);
    /* Put id and size for worddata */
    q[0] = (unsigned char) free_id;
    q[1] = (unsigned char) (len >> 8);
    q[2] = (unsigned char) (len & 0xff);
    /* Put data */
    memcpy(q+3,data,len);
    /* Point to the next block */
    q += WORDDATA_RoundBlockSize((3 + len));
    /* Write worddata with ids greater than the new one after it */
    for(;i < last_page->n; i++)
    {
        /* Get the record length */
        r_len = ((((int)(p[1])) << 8) + (int)(p[2]));
        tmp = WORDDATA_RoundBlockSize((3 + r_len));
        memcpy(q,p,tmp);
        p += tmp;
        q += tmp;
    }
    /* Write the temp buffer into the page */
    memcpy(WORDDATA_PageData(last_page),buffer,q - buffer);
    last_page->n++;
    last_page->used_blocks += required_length / WORDDATA_BlockSize;
    WORDDATA_WritePage(b,last_page);

    /* Return the pointer to the data as page_number + id */
    /* The most significant byte is the id. The rest are for the page number */
    b->lastid=(sw_off_t)((sw_off_t)(last_page->page_number << (sw_off_t)8) + (sw_off_t)free_id);
    return(b->lastid);
}

unsigned char *WORDDATA_GetBig(WORDDATA *b, sw_off_t page_number, unsigned int *len)
{
sw_off_t offset = page_number * (sw_off_t)WORDDATA_PageSize;
unsigned long p_len;
unsigned char *data;
    sw_fseek(b->fp, offset, SEEK_SET);
    sw_fread(&p_len,1,sizeof(p_len),b->fp);
    *len = UNPACKLONG(p_len);
    data = (unsigned char *)emalloc(*len);
    sw_fread(data,1,*len,b->fp);
    return data;
}

unsigned char *WORDDATA_Get(WORDDATA *b, sw_off_t global_id, unsigned int *len)
{
/* Get the page number and id from the global_id */
sw_off_t page_number = global_id >> (sw_off_t)8;
int id = (int)(global_id & (sw_off_t)0xff);
int r_id=-1,r_len=-1;
int i;
unsigned char *p;
unsigned char *data;

    /* Special case. If id is null, the data did not fit in a normal page */
    /* So, go to get a big Worddata */
    if(!id)
    {
        return WORDDATA_GetBig(b,page_number,len);
    }
    /* reset last_get_page */
    if(b->last_get_page)
        WORDDATA_FreePage(b,b->last_get_page);

    b->last_get_page = WORDDATA_ReadPage(b,page_number);

    /* Search for the id in the page */
    for(i = 0, p = WORDDATA_PageData(b->last_get_page); i < b->last_get_page->n; i++)
    {
        /* Get the id */
        r_id = (int) (p[0]);
        /* Get the record length */
        r_len = ((((int)(p[1])) << 8) + (int)(p[2]));
        if(r_id == id)   /* find the id */
            break;
        p += WORDDATA_RoundBlockSize((3 + r_len));
    }

    /* If found read worddata */
    if(id == r_id)
    {
        data = (unsigned char *) emalloc(r_len);
        memcpy(data , p + 3 , r_len);
        *len = r_len;
    }
    else   /* Error */
    {
        data = NULL;
        *len = 0;
    }
    return data;
}

void WORDDATA_DelBig(WORDDATA *b, sw_off_t page_number, unsigned int *len)
{
sw_off_t offset = page_number * (sw_off_t)WORDDATA_PageSize;
unsigned long p_len;
    sw_fseek(b->fp, offset, SEEK_SET);
    sw_fread(&p_len,1,sizeof(p_len),b->fp);
    *len = UNPACKLONG(p_len) + sizeof(p_len);

    if(b->num_Reusable_Pages < WORDDATA_MAX_REUSABLE_PAGES)
    {
       b->Reusable_Pages[b->num_Reusable_Pages].page_number = page_number;
       b->Reusable_Pages[b->num_Reusable_Pages++].page_size = WORDDATA_RoundPageSize(*len);
    }
}

void WORDDATA_Del(WORDDATA *b, sw_off_t global_id, unsigned int *len)
{
sw_off_t page_number = global_id >> (sw_off_t)8;
int id = (int)(global_id & (sw_off_t)0xff);
int r_id=-1,r_len=-1,tmp;
int i;
unsigned char *p, *q;
int deleted_length;

    if(!id)
    {
        WORDDATA_DelBig(b,page_number,len);
        return;
    }
    if(b->last_del_page)
        WORDDATA_FreePage(b,b->last_del_page);

    b->last_del_page = WORDDATA_ReadPage(b,page_number);

    for(i = 0, p = WORDDATA_PageData(b->last_del_page); i < b->last_del_page->n; i++)
    {
        /* Get the id */
        r_id = (int) (p[0]);
        /* Get the record length */
        r_len = ((((int)(p[1])) << 8) + (int)(p[2]));
        if(r_id == id)   /* id found */
            break;
        p += WORDDATA_RoundBlockSize((3 + r_len));
    }

    if(id == r_id)
    {
        *len = r_len;
        deleted_length = WORDDATA_RoundBlockSize(r_len);
        /* Move rest of worddata to put them contigous (Remove the hole) */
        /* q points to the hole, p to the next record */
        q = p;
        p += WORDDATA_RoundBlockSize((3 + r_len));
        for(++i;i < b->last_del_page->n; i++)
        {
           /* Get the record length */
           r_len = ((((int)(p[1])) << 8) + (int)(p[2]));
           tmp = WORDDATA_RoundBlockSize((3 + r_len));
           memcpy(q,p,tmp);
           p += tmp;
           q += tmp;
        }
        b->last_del_page->n--;
        b->last_del_page->used_blocks -= deleted_length / WORDDATA_BlockSize;
        if(!b->last_del_page->n)
        {
            if(b->num_Reusable_Pages < WORDDATA_MAX_REUSABLE_PAGES)
            {
                b->Reusable_Pages[b->num_Reusable_Pages].page_number = page_number;
                b->Reusable_Pages[b->num_Reusable_Pages++].page_size = WORDDATA_PageSize;
            }
            /* If this page was also used in a put or get operation we must
            ** also resets it */
            if(b->last_get_page && b->last_get_page->page_number == b->last_del_page->page_number)
                 b->last_get_page = 0;
            if(b->last_put_page && b->last_put_page->page_number == b->last_del_page->page_number)
                 b->last_put_page = 0;
            /* Finally remove page from cache if exists */
            WORDDATA_CleanCachePage(b,b->last_del_page->page_number);
            /* And resets it */
            b->last_del_page = 0;
        }
        else
        {
            WORDDATA_WritePage(b,b->last_del_page);
        }
    }
    else   /* Error */
    {
        *len = 0;
    }

    return;
}


#ifdef DEBUG

#include <time.h>

#define N_TEST 5000

#if defined(_WIN32) && !defined(__CYGWIN__)
#define FILEMODE_READ           "rb"
#define FILEMODE_WRITE          "wb"
#define FILEMODE_READWRITE      "rb+"
#elif defined(__VMS)
#define FILEMODE_READ           "rb"
#define FILEMODE_WRITE          "wb"
#define FILEMODE_READWRITE      "rb+"
#else
#define FILEMODE_READ           "r"
#define FILEMODE_WRITE          "w"
#define FILEMODE_READWRITE      "r+"
#endif

int main()
{
FILE *fp;
WORDDATA *bt;
int i,len;
static unsigned long nums[N_TEST];
    srand(time(NULL));



    fp = sw_fopen("kkkkk",FILEMODE_WRITE);
    sw_fclose(fp);
    fp = sw_fopen("kkkkk",FILEMODE_READWRITE);

    sw_fwrite("aaa",1,3,fp);

printf("\n\nIndexing\n\n");

    bt = WORDDATA_Open(fp);
    for(i=0;i<N_TEST;i++)
    {
        nums[i] = WORDDATA_Put(bt,16,"1234567890123456");
        if(memcmp(WORDDATA_Get(bt,nums[i],&len),"1234567890123456",16)!=0)
            printf("\n\nmal %d\n\n",i);
        if(!(i%1000))
        {
            WORDDATA_FlushCache(bt);
            printf("%d            \r",i);
        }
    }

    WORDDATA_Close(bt);
    sw_fclose(fp);

printf("\n\nUnfreed %d\n\n",num);
printf("\n\nSearching\n\n");

    fp = sw_fopen("kkkkk",FILEMODE_READ);
    bt = WORDDATA_Open(fp);

    for(i=0;i<N_TEST;i++)
    {
        if(memcmp(WORDDATA_Get(bt,nums[i],&len),"1234567890123456",16)!=0)
            printf("\n\nmal %d\n\n",i);
        if(!(i%1000))
            printf("%d            \r",i);
    }

    WORDDATA_Close(bt);

    sw_fclose(fp);
printf("\n\nUnfreed %d\n\n",num);

}

#endif
����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/result_sort.h���������������������������������������������������������������������0000664�0000771�0001750�00000004614�11166010110�013345� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: result_sort.h 1736 2005-05-12 15:41:22Z karman $
**

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL


**
** 2001-01  jose   initial coding
**
*/



#ifndef __HasSeenModule_ResultSort
#define __HasSeenModule_ResultSort	1

#ifdef __cplusplus
extern "C" {
#endif

/*
   -- global module data structure
*/

struct MOD_ResultSort
{

	    /* sorted index flag */
	    /* TRUE - Use sorted index */
	int isPreSorted;
	    /* structure for presorted properties - used by index proccess */
    struct swline *presortedindexlist;

        /* Sortorder Translation table arrays */
              /* case sensitive translation table */
    int iSortTranslationTable[256];
              /* Ignore Case translarion table */
    int iSortCaseTranslationTable[256];
    
};





void initModule_ResultSort (SWISH *);
void freeModule_ResultSort (SWISH *);
int configModule_ResultSort (SWISH *sw, StringList *sl);


int compare_results(const void *s1, const void *s2);


int     sortresults(RESULTS_OBJECT *results);



int *CreatePropSortArray(IndexFILE *indexf, struct metaEntry *m, FileRec *fi, int free_cache );
void sortFileProperties(SWISH *sw, IndexFILE *indexf);


void initStrCmpTranslationTable(int *);
void initStrCaseCmpTranslationTable(int *);

int sw_strcasecmp(unsigned char *,unsigned char *, int *);
int sw_strcmp(unsigned char *,unsigned char *, int *);

int *LoadSortedProps( IndexFILE *indexf, struct metaEntry *m );

#ifdef __cplusplus
}
#endif /* __cplusplus */

#endif /* __HasSeenModule_ResultSort  */
��������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/index.c���������������������������������������������������������������������������0000664�0000771�0001750�00000275007�11166010110�012070� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: index.c 1945 2007-10-22 14:54:07Z karpet $
**
** Copyright (C) 1995, 1996, 1997, 1998 Hewlett-Packard Company
** Originally by Kevin Hughes, kev@kevcom.com, 3/11/94
**
    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 15:51:39 CDT 2005
** added GPL


**--------------------------------------------------------------------
** ** ** PATCHED 5/13/96, CJC
**
** Added code to countwords and countwordstr to disreguard the last char
** if requiered by the config.h
** G. Hill  3/12/97  ghill@library.berkeley.edu
**
** Changed addentry, countwords, countwordstr, parsecomment, rintindex
** added createMetaEntryList, getMeta, parseMetaData
** to support METADATA
** G. Hill 3/18/97 ghill@library.berkeley.edu
**
** Changed removestops to support printing of stop words
** G. Hill 4/7/97
**
** Changed countwords, countwrdstr, and parseMetaData to disreguard the
** first char if required by the config.h
** G.Hill 10/16/97  ghill@library.berkeley.edu
**
** Added stripIgnoreLastChars and isIgnoreLastChar routines which iteratively
** remove all ignore characters from the end of each word.
** P. Bergner  10/5/97  bergner@lcse.umn.edu
**
** Added stripIgnoreFirstChars and isIgnoreFirstChar to make stripping of
** the ignore first chars iterative.
** G. Hill 11/19/97 ghill@library.berkeley.edu
**
** Added possibility of use of quotes and brackets in meta CONTENT countwords, parsemetadata
** G. Hill 1/14/98
**
** Added regex for replace rule G.Hill 1/98
**
** REQMETANAME - don't index meta tags not specified in MetaNames
** 10/11/99 - Bill Moseley
**
** change sprintf to snprintf to avoid corruption, use MAXPROPLEN instead of literal "20",
** added include of merge.h - missing declaration caused compile error in prototypes,
** added word length arg to Stem() call for strcat overflow checking in stemmer.c
** added safestrcpy() macro to avoid corruption from strcpy overflow
** SRE 11/17/99
**
** fixed misc problems pointed out by "gcc -Wall"
** SRE 2/22/00
**
** Added code for storing word positions in index file 
** Jose Ruiz 3/00 jmruiz@boe.es
**
** 04/00 - Jose Ruiz
** Added code for a hash table in index file for searching words
** via getfileinfo in search.c (Lots of addons). Better perfomance
** with big databases and or heavy searchs (a* or b* or c*)
**
** 04/00 - Jose Ruiz
** Improved number compression function (compress)
** New number decompress function
** Both converted into macros for better performance
**
** 07/00 and 08/00 - Jose Ruiz
** Many modifications to make some functions thread safe
**
** 08/00 - Jose Ruiz
** New function indexstring. Up to now there were 4 functions doing almost
** the same thing: countwords, countwordstr, parseMetaData and parsecomment
** From now on, these 4 functions calls indexstring wich is the common part
** to all of them. In fact, countwordstr, parseMetaData and parsecomment
** are now simple frontends to indexstring
**
** 2000-11 - rasc
** some redesgin, place common index code into a common routine
** FileProp structures, routines
**
** --
** TODO
** $$ there has still to be some resesign to be done.
** $$ swish-e was originally designed to index html only. So the routines
** $$ are for historically reasons scattered
** $$ (e.g. isoktitle (), is ishtml() etc.)
**
** 2000-12 Jose Ruiz
** obsolete routine ishtml removed
** isoktitle moved to html.c
**
** 2001-03-02 rasc   Header: write translatecharacters
** 2001-03-14 rasc   resultHeaderOutput  -H n
** 2001-03-24 rasc   timeroutines rearranged
** 2001-06-08 wsm    Store word after ENTRY to save memory
** 2001-08    jmruiz All locations stuff rewritten to save memory
**
*/

#include "swish.h"
#include "mem.h"
#include "swstring.h"
#include "index.h"
#include "hash.h"
#include "check.h"
#include "search.h"
#include "merge.h"
#include "docprop.h"
#include "stemmer.h"
#include "double_metaphone.h"
#include "error.h"
#include "file.h"
#include "compress.h"
/* Removed due to problems with patents
#include "deflate.h"
*/
#include "html.h"
#include "xml.h"
#include "parser.h"
#include "txt.h"
#include "metanames.h"
#include "result_sort.h"
#include "result_output.h"
#include "filter.h"
#include "date_time.h"
#include "db.h"
#include "dump.h"
#include "swish_qsort.h"
#include "swish_words.h"
#include "list.h"

static void index_path_parts( SWISH *sw, char *path, path_extract_list *list, INDEXDATAHEADER *header, docProperties **properties );
static void SwapLocData(SWISH *,ENTRY *,unsigned char *,int);
static void unSwapLocData(SWISH *,int, ENTRY *);
static void sortSwapLocData(ENTRY *);

/* 
  -- init structures for this module
*/


void initModule_Index (SWISH  *sw)
{
    int i;
    struct MOD_Index *idx;

    idx = (struct MOD_Index *) emalloc(sizeof(struct MOD_Index));
    memset( idx, 0, sizeof( struct MOD_Index ) );
    sw->Index = idx;

    idx->filenum = 0;
    idx->entryArray = NULL;

    idx->len_compression_buffer = MAXSTRLEN;  /* For example */
    idx->compression_buffer=(unsigned char *)emalloc(idx->len_compression_buffer);

    idx->len_worddata_buffer = MAXSTRLEN;  /* For example */
    idx->worddata_buffer=(unsigned char *)emalloc(idx->len_worddata_buffer);
    idx->sz_worddata_buffer = 0;

    /* Init  entries hash table */
    for (i=0; i<VERYBIGHASHSIZE; i++)
    {
        idx->hashentries[i] = NULL;
        idx->hashentriesdirty[i] = 0;
    }


        /* Economic flag and temp files*/
    idx->swap_locdata = SWAP_LOC_DEFAULT;


    for(i=0;i<BIGHASHSIZE;i++) idx->inode_hash[i]=NULL;

    /* initialize buffers used by indexstring */
    idx->word = (char *) emalloc((idx->lenword = MAXWORDLEN) + 1);
    idx->swishword = (char *) emalloc((idx->lenswishword = MAXWORDLEN) + 1);

    idx->plimit=PLIMIT;
    idx->flimit=FLIMIT;
    idx->nIgnoreLimitWords = 0;
    idx->IgnoreLimitPositionsArray = NULL;

       /* Swapping access file functions */
    idx->swap_tell = sw_ftell;
    idx->swap_write = sw_fwrite;
    idx->swap_close = sw_fclose;
    idx->swap_seek = sw_fseek;
    idx->swap_read = sw_fread;
    idx->swap_getc = sw_fgetc;
    idx->swap_putc = sw_fputc;

    for( i = 0; i <MAX_LOC_SWAP_FILES ; i++)
    {
        idx->swap_location_name[i] = NULL;
        idx->fp_loc_write[i] = NULL;
        idx->fp_loc_read[i] = NULL;
    }
    /* Index in blocks of chunk_size documents */
    idx->chunk_size = INDEX_DEFAULT_CHUNK_SIZE;

    /* Use this value to avoid using big zones just as a temporary location storage */
    idx->optimalChunkLocZoneSize = INDEX_DEFAULT_OPTIMAL_CHUNK_ZONE_SIZE_FOR_LOCATIONS;

    idx->freeLocMemChain = NULL;

    /* memory zones for common structures */
    idx->perDocTmpZone = Mem_ZoneCreate("Per Doc Temporal Zone", 0, 0);
    idx->currentChunkLocZone = Mem_ZoneCreate("Current Chunk Locators", 0, 0);
    idx->totalLocZone = Mem_ZoneCreate("All Locators", 0, 0);
    idx->entryZone = Mem_ZoneCreate("struct ENTRY", 0, 0);

    /* table for storing which metaIDs to index */
    idx->metaIDtable.max = 200;  /* totally random guess */
    idx->metaIDtable.num = 0;
    idx->metaIDtable.array = (int *)emalloc( idx->metaIDtable.max * sizeof(int) );
    idx->metaIDtable.defaultID = -1;


    /* $$$ this is only a fix while http.c and httpserver.c still exist */
    idx->tmpdir = estrdup(".");

    /* By default, we are not in update mode */
    idx->update_mode = MODE_INDEX;

    return;
}


/* 
  -- release all wired memory for this module
  -- 2001-04-11 rasc
*/

void freeModule_Index (SWISH *sw)
{
  struct MOD_Index *idx = sw->Index;
  int i;

/* we need to call the real free here */

  for( i = 0; i < MAX_LOC_SWAP_FILES ; i++)
  {
      if (idx->swap_location_name[i] && isfile(idx->swap_location_name[i]))
      {
          if (idx->fp_loc_read[i])  
             idx->swap_close(idx->fp_loc_read[i]);

          if (idx->fp_loc_write[i])
             idx->swap_close(idx->fp_loc_write[i]);

          remove(idx->swap_location_name[i]);
      }


      if (idx->swap_location_name[i])
          efree(idx->swap_location_name[i]);
  }

  if(idx->tmpdir) efree(idx->tmpdir);        

        /* Free compression buffer */    
  efree(idx->compression_buffer);
        /* free worddata buffer */
  efree(idx->worddata_buffer);

    /* free word buffers used by indexstring */
  efree(idx->word);
  efree(idx->swishword);

  /* free IgnoreLimit stuff */
  if(idx->IgnoreLimitPositionsArray)
  {
      for(i=0; i<sw->indexlist->header.totalfiles; i++)
      {
          if(idx->IgnoreLimitPositionsArray[i])
          {
              efree(idx->IgnoreLimitPositionsArray[i]->pos);
              efree(idx->IgnoreLimitPositionsArray[i]);
          }
      }
      efree(idx->IgnoreLimitPositionsArray);
  }

  /* should be free by now!!! But just in case... */
  if (idx->entryZone)
      Mem_ZoneFree(&idx->entryZone);

  if (idx->totalLocZone)
      Mem_ZoneFree(&idx->totalLocZone);
  if (idx->currentChunkLocZone)
      Mem_ZoneFree(&idx->currentChunkLocZone);
  if (idx->perDocTmpZone)
      Mem_ZoneFree(&idx->perDocTmpZone);


  if ( idx->entryArray )
    efree( idx->entryArray);


  efree( idx->metaIDtable.array );

       /* free module data */
  efree (idx);
  sw->Index = NULL;


  return;
}


/*
** ----------------------------------------------
** 
**  Module config code starts here
**
** ----------------------------------------------
*/


/*
 -- Config Directives
 -- Configuration directives for this Module
 -- return: 0/1 = none/config applied
*/

int configModule_Index (SWISH *sw, StringList *sl)

{
  struct MOD_Index *idx = sw->Index;
  char *w0    = sl->word[0];
  int  retval = 1;
  char *env_tmp = NULL;

  if (strcasecmp(w0, "tmpdir") == 0)
  {
     if (sl->n == 2)
     {
        idx->tmpdir = erealloc( idx->tmpdir, strlen( sl->word[1] ) + 1 );
        strcpy( idx->tmpdir, sl->word[1] );
        normalize_path( idx->tmpdir );

        if (!isdirectory(idx->tmpdir))
           progerr("%s: %s is not a directory", w0, idx->tmpdir);

       if ( !( env_tmp = getenv("TMPDIR")) )
            if ( !(env_tmp = getenv("TMP")) )
                env_tmp = getenv("TEMP");

        if ( env_tmp )
            progwarn("Configuration setting for TmpDir '%s' will be overridden by environment setting '%s'", idx->tmpdir, env_tmp );

           
     }
     else
        progerr("%s: requires one value", w0);
  }
  else if (strcasecmp(w0, "IgnoreLimit") == 0)
  {
     if (sl->n == 3)
     {
        idx->plimit = atol(sl->word[1]);
        idx->flimit = atol(sl->word[2]);
     }
     else
        progerr("%s: requires two values", w0);
  }
  else 
  {
      retval = 0;                   /* not a module directive */
  }
  return retval;
}

/**************************************************************************
*   Remove a file from the index.  Used when the parser aborts
*   while indexing.  Typically because of FileRules.
*
**************************************************************************/


static void remove_last_file_from_list(SWISH * sw, IndexFILE * indexf)
{
    struct MOD_Index *idx = sw->Index;
    int i;
    ENTRY *ep, *prev_ep;
    LOCATION *l;


    /* Should be removed */
    if(idx->filenum == 0 || indexf->header.totalfiles == 0)
        progerr("Internal error in remove_last_file_from_list");


    /* walk the hash list to remove words */
    for (i = 0; i < VERYBIGHASHSIZE; i++)
    {
        if (idx->hashentriesdirty[i])
        {
            idx->hashentriesdirty[i] = 0;
            for (ep = idx->hashentries[i], prev_ep =NULL; ep; ep = ep->next)
            {
                if(ep->currentChunkLocationList && (ep->currentChunkLocationList != ep->currentlocation))
                {
                    if(ep->currentChunkLocationList->filenum == idx->filenum)
                    {
                        /* Now remove locations */
                        /* Go until filenum changes or reach the compressed
                        ** area (currentlocation)
                        */
                        for(l = ep->currentChunkLocationList; l; l = l->next)
                        {
                            if(ep->currentlocation == l || (l->filenum != idx->filenum))
                                break;
                        }
                        /* Adjust tfrequency if entry is in file */
                        if(l != ep->currentChunkLocationList)
                        {
                            /* Remove last filenum chunks */
                            ep->currentChunkLocationList = l;

                            /* Decrease word frequency */
                            ep->tfrequency--;

                            /* Reset last_filenum. At this moment
                            ** ep->u1.last_filenum point contains idx->filenum
                            ** and, after removing the last file it should
                            ** point to the previous one but we do not know
                            ** its value because the previous chunks can bei
                            ** compressed. Fortunately, we are only using
                            ** this value in add_entry routine to ensure
                            ** that we are in a new file. So, resetting
                            ** the value to 0 should be enough
                            */
                            ep->u1.last_filenum = 0;
                        }
                    }
                    /* If there is no locations we must also remove the word */
                    /* Do not call efree to remove the entry, entries use
                    ** a MemZone (perDocTmpZone) - Will be freed later */
                    if(!ep->currentChunkLocationList)
                    {
                        if(!ep->allLocationList)
                        {
                            if(!prev_ep)
                            {
                                idx->hashentries[i] = ep->next;
                            }
                            else
                            {
                                prev_ep->next = ep->next;
                            }
                            /* Adjust word counters */
                            idx->entryArray->numWords--;
                            indexf->header.totalwords--;
                        }
                    }
                }
                else
                {
                    prev_ep = ep;
                }
            }
        }
    }
    /* Decrease index filenum iand totalfiles counter*/
    idx->filenum--;
    indexf->header.totalfiles--;
}


/**************************************************************************
*  Index just the file name (or the title) for NoContents files
*  $$$ this can be removed if libxml2 is used full time
**************************************************************************/
static int index_no_content(SWISH * sw, FileProp * fprop, FileRec *fi, char *buffer)
{
    struct MOD_Index   *idx = sw->Index;
    char               *title = "";
    int                 n;
    int                 position = 1;       /* Position of word */
    int                 metaID = 1;         /* THIS ASSUMES that that's the default ID number */


    /* Look for title if HTML document */
    
    if (fprop->doctype == HTML)
    {
        title = parseHTMLtitle( sw , buffer );

        if (!isoktitle(sw, title))
            return -2;  /* skipped because of title */
    }


#ifdef HAVE_LIBXML2
    if (fprop->doctype == HTML2 || !fprop->doctype)
        return parse_HTML( sw, fprop, fi, buffer );
#endif


    addCommonProperties( sw, fprop, fi, title, NULL, 0 );


    n = indexstring( sw, *title == '\0' ? fprop->real_path : title , idx->filenum, IN_FILE, 1, &metaID, &position);


    /** ??? $$$ doesn't look right -- check this ***/
    if ( *title != '\0' )
        efree( title );
 
    return n;
}


/*********************************************************************
** 2001-08 jmruiz - A couple of specialized routines to be used with
** locations and MemZones. The main goal is avoid malloc/realloc/free
** wich produces a lot of fragmentation
**
** The memory will be allocated in blocks of 64 bytes inside a zone.
** (I have tried both 32 and 64. 32 looks fine
** In this way, there is some overhead because when a new block is
** requested from the MemZone, the space is not recovered. But this
** only true for the current document because the MemZone is reset
** onces the document is processed. Then, the space is recovered
** after a MemZoneReset is issued
**
** 2001-09 jmruiz Improved. Now unused space is recovered when asking
** for space. Free nlocks are maintained using a linked list
********************************************************************/

#define LOC_BLOCK_SIZE 32  /* Must be greater than sizeof(LOCATION) and a power of 2 */
#define LOC_MIN_SIZE   ((sizeof(LOCATION) + LOC_BLOCK_SIZE - 1) & (~(LOC_BLOCK_SIZE - 1)))

struct  loc_chain {
    struct loc_chain *next;
    int size;
};

/********************************************************************
** 2001-08 jmruiz
** Routine to allocate memory inside a zone for a plain LOCATION
** (frequency is 1). Since we are asking for LOC_BLOCK_SIZE bytes, we 
** are loosing some of the space.
** The advantage is that we do not need to call realloc so often. In 
** fact, most realloc function work this way. They asks for more memory
** to avoid the overhead of the sequence malloc, memcpy, free.
********************************************************************/

LOCATION *alloc_location(struct MOD_Index *idx,int size)
{
    struct loc_chain *tmp = (struct loc_chain *) idx->freeLocMemChain;
    struct loc_chain *big = NULL;
    LOCATION *tmp2 = NULL;
    int avail = 0;
    struct loc_chain *p_avail = NULL;

    /* Search for a previously freed location of the same size */
    while(tmp)
    {
        if(tmp->size == size)
        {
            if(!tmp2)
                idx->freeLocMemChain = (LOCATION *)tmp->next;
            else
                tmp2->next = (LOCATION *)tmp->next;
            return (LOCATION *)tmp;
        }
        else if(tmp->size > size)
        {
            /* Just reserve it to be used if we do not find a match */
            big = tmp;
        }
        else
        {
            p_avail = tmp;
            avail = tmp->size;
            /* Check consecutive for consecutive blocks */
            while(((unsigned char *)tmp + tmp->size) == (unsigned char *)tmp->next)
            {
                avail += tmp->next->size;
                if(avail == size)
                {
                    if(!tmp2)
                       idx->freeLocMemChain = (LOCATION *)tmp->next->next;
                    else
                       tmp2->next = (LOCATION *)tmp->next->next;
                    return (LOCATION *)p_avail;
                }
                else if(avail > size)
                {
                    break;
                }
                else
                {
                    tmp = tmp->next;
                }
            }
        }
        tmp2 = (LOCATION *)tmp;
        tmp = tmp->next;
    }
    /* Perhaps we have a block with greater size */
    if(big)
    {
        /* Split it */
        while(big->size > size)
        {
            big->size >>= 1;
            tmp = (struct loc_chain *) ((unsigned char *)big + big->size);
            tmp->next = big->next;
            tmp->size = big->size;
            if(tmp->size == size)
                return (LOCATION *)tmp;
            big->next = tmp;
            big = tmp;
        }
    }
    /* NO memory in free chain of the same size - Asks for size */
    return (LOCATION *)Mem_ZoneAlloc(idx->currentChunkLocZone, size);
}


LOCATION *new_location(struct MOD_Index *idx)
{
    return (LOCATION *)alloc_location(idx, LOC_MIN_SIZE);
}


int is_location_full(int size)
{
    int i;

    /* Fast test. Since LOC_BLOCK_SIZE is the minimum size ... */
    if(size % LOC_BLOCK_SIZE)
        return 0;  /* it is not a power of two */
    /* Check if size is a power of 2 (32,64,128,256,...) in binary ..000100... */
    for(i=LOC_BLOCK_SIZE;;i <<= 1)
    {
        if(size>i)
        {
            continue;
        }
        if((size & i) == size)
        {
            return 1;
        } 
        else
        {
            break;
        }
    }
    return 0;
}

/********************************************************************
** 2001-08 jmruiz
** Routine to reallocate memory inside a zone for a previous allocated
** LOCATION (frequency > 1). 
** A new block is allocated only if the previous becomes full
********************************************************************/
LOCATION *add_position_location(void *oldp, struct MOD_Index *idx, int frequency)
{
        LOCATION *newp = NULL;
        struct loc_chain *tmp = NULL;
        int oldsize; 

        oldsize = sizeof(LOCATION) + (frequency - 1) * sizeof(int);

        /* Check for available size in block */
        if(is_location_full(oldsize))
        {
            /* Not enough size - Allocate a new block. Size rounded to LOC_BLOCK_SIZE */
            newp = (LOCATION *)alloc_location(idx,oldsize << 1);
            memcpy((void *)newp,(void *)oldp,oldsize);
            /* Add old zone to the free chain of blocks */
            tmp = (struct loc_chain *)oldp;
            tmp->next = (struct loc_chain *)idx->freeLocMemChain;
            tmp->size = oldsize;
            idx->freeLocMemChain = (LOCATION *) tmp;
        }
        else
            /* Enough size */
            newp = oldp;

        return newp;
}

#ifdef USE_BTREE
/**********************************************************************
* file_is_newer_than_existing 
*       - returns true if the new file (based on fprop) is newer than the existing
*
* Call with:
*   fprop - file prop for the new file
*   sw    - current SWISH
*   existing_number - file number
*
* Returns:
*   true if new file is newer than existing and existing should be deleted
*
***********************************************************************/

static int file_is_newer_than_existing( SWISH *sw,  FileProp * fprop, int existing_filenum)
{
    IndexFILE   *indexf = sw->indexlist;
    int         ret;
    propEntry   *existing_prop;
    propEntry   *new_prop;
    FileRec     fi;
    int         error_flag;
    unsigned long tmp;

    /* Get the date from the existing file */


    /* Fetch metaEntry */
    if(!indexf->modified_meta)
        indexf->modified_meta = getPropNameByName( &indexf->header, AUTOPROPERTY_LASTMODIFIED );


    memset(&fi, 0, sizeof( FileRec ));
    fi.filenum = existing_filenum;
    existing_prop = ReadSingleDocPropertiesFromDisk(indexf, &fi, indexf->modified_meta->metaID, 0 );

    /* Don't need the fi struct any more, but can't call freefileinfo() */
    if ( fi.prop_index )
        efree( fi.prop_index );



    /* Create a property for the new file */

    if ( !fprop->mtime ) /* was a time provided?  */
        new_prop = NULL;  /* $$$ should be MISSING or NO_PROPERTY */

    else /* create a property */
    {
        tmp = PACKLONG(fprop->mtime);
        new_prop = CreateProperty( indexf->modified_meta, (unsigned char *)&tmp, sizeof( tmp ), 1, &error_flag );
    }


    /* -1 = existing < new_prop | +1 existing > new_prop */
    ret = Compare_Properties( indexf->modified_meta, existing_prop, new_prop );


    /* Skip new file is older or same age as existing file */
    /* and at least one of the files has a date */
    /* if neither file has a date then the new one will replace existing */

    if (ret >= 0 && ( existing_prop || new_prop ) )
    {
        if (sw->verbose >= 3)
        {
           if ( new_prop )
                printf(" - Update mode - File '%s' same or older than existing filenum: %d - (Skipping it)\n", fprop->real_path, existing_filenum);
            else
                printf(" - Update mode - File '%s' exists with date, but new files does not have a date.  Keeping existing file. filenum: %d\n", fprop->real_path, existing_filenum);
        }

        freeProperty( existing_prop );
        freeProperty( new_prop );

        return 0;
    }



    /* new file is newer so replace old file */

    if (sw->verbose >= 3)
        printf(" - Update mode - File '%s' replaced existing file number %d because %s\n",
                    fprop->real_path,
                    existing_filenum,
                    existing_prop && new_prop
                        ? "it is newer than the existing file"
                        : existing_prop || new_prop
                            ? "only the new file has a date"
                            : "neither file has a date"
         );

    freeProperty( existing_prop );
    freeProperty( new_prop );

    return 1;  /* go ahead and remove */
}
#endif


/***************************************************************************
* check_for_replace -  Tests if replacing a file in update mode
* Call with:
*   SWISH       *sw     - the swish object
*   FileProp    *fprop  - file properties
*
* Returns:
*   true if ok to index new file
*   false means skip indexing this file
*
****************************************************************************/
static int check_for_replace( SWISH *sw, FileProp * fprop )
{

#ifndef USE_BTREE
    return 1;
#else

    int         existing_filenum;
    int         existing_is_deleted;
    int         existing_word_count = 0;
    IndexFILE   *indexf = sw->indexlist;
    int         update_mode = sw->Index->update_mode;
    char        *update_string;
    int         return_value;
    int         delete_existing = 0;  /* flag to delete existing file */

    /* Return true if not in update or remove mode */
    if ( MODE_UPDATE != update_mode && MODE_REMOVE != update_mode )
        return 1;


    update_string = MODE_UPDATE == update_mode ? "Update" : "Remove";

    /* Always skip in remove mode.  In update, skip index if doesn't exist as below */

    return_value  = MODE_UPDATE == update_mode ? 1 : 0;  /* how to exit based on mode */



    existing_filenum = DB_ReadFileNum(sw, fprop->real_path, indexf->DB);

    /* If the file already exists then likely will remove the file, so lookup the old word count */
    /* Assumes that zero words indicates that a file was already deleted */
    if ( existing_filenum )
        existing_word_count = getTotalWordsInFile( indexf, existing_filenum );

    existing_is_deleted = existing_filenum && ( 0 == existing_word_count);  /* just to make it clear */



    if ( sw->verbose >= 5 )
        printf("\nFile %s.  Existing filenum: %d.  Existing is deleted: %d Existing wordcount: %d\n",
                fprop->real_path, existing_filenum, existing_is_deleted, existing_word_count );




    /* Skip if file does not already exist or has not already been deleted */

    if ( !existing_filenum )
    {
        if ( sw->verbose >= 4 )
            printf(" - %s Mode - File '%s' is a new file and does not exist in the index\n", update_string, fprop->real_path );

        return return_value;
    }



    if ( existing_is_deleted )
    {
        if ( sw->verbose >= 4 )
            printf(" - %s Mode - Existing file '%s' has already been deleted from the index\n", update_string, fprop->real_path );

        return return_value;

    }


    /* At this point we have an existing file */


    /* MODE_REMOVE is always deleted.  In update mode, delete if current file is newer than existing */


    delete_existing = MODE_REMOVE == update_mode
                        ? 1
                        : file_is_newer_than_existing( sw, fprop, existing_filenum  );


    if ( delete_existing )
    {

        DB_RemoveFileNum(sw,existing_filenum,indexf->DB);
        indexf->header.removedfiles++;
        indexf->header.removed_word_positions += existing_word_count;

        if ( sw->verbose >= 3 )
            printf(" - %s Mode - Removed existing file '%s' (#%d) from index\n", update_string, fprop->real_path, existing_filenum );

        return return_value;  /* keep indexing in update mode, other wise don't index */

    }

    return 0;  /* don't index the new file -- we are either keeping or are in remove mode */

#endif

}


/***********************************************************************
   -- Start the real indexing process for a file.
   -- This routine will be called by the different indexing methods
   -- (httpd, filesystem, etc.)
   -- The indexed file may be the
   --   - real file on filesystem
   --   - tmpfile or work file (shadow of the real file)
   -- Checks if file has to be send thru filter (file stream)
   -- 2000-11-19 rasc
***********************************************************************/

void    do_index_file(SWISH * sw, FileProp * fprop)
{
    int     (*countwords)(SWISH *sw,FileProp *fprop, FileRec *fi, char *buffer);
    IndexFILE   *indexf = sw->indexlist;
    int         wordcount;
    char        *rd_buffer = NULL;   /* complete file read into buffer */
    struct MOD_Index *idx = sw->Index;
    char        strType[30];
    int         i;
    FileRec     fi;  /* place to hold doc properties */

    memset( &fi, 0, sizeof( FileRec ) );


    wordcount = -1;



    /* skip file is the last_mod date is newer than the check date */

    if (sw->mtime_limit && fprop->mtime < sw->mtime_limit)
    {
        if (sw->verbose >= 3)
            progwarn("Skipping %s: last_mod date is too old\n", fprop->real_path);

        /* external program must seek past this data (fseek fails) */
        if (fprop->fp)
            flush_stream( fprop );

        return;
    }


    /* Upon entry, if fprop->fp is non-NULL then it's already opened and ready to be read from.
       This is the case with "prog" external programs, *except* when a filter is selected for the file type.
       If a filter is used with "prog" a temporary file was created (fprop->work_file), and
       fprop->fp will be NULL (as is with http and fs access methods).
       2001-05-13 moseley
    */



    /* Get input file handle */
    if (fprop->hasfilter)
    {
        /* This checks for a non-null file handle -- but may return true with popen regardless */
        if ( !FilterOpen(fprop) )
            progerr("Failed to open filter for file '%s'",fprop->real_path);
    }

    else if ( !fprop->fp )
    {
        fprop->fp = sw_fopen(fprop->work_path, F_READ_TEXT );

        if ( !fprop->fp )
        {
            progwarnno("Failed to open: '%s': ", fprop->work_path);
            return;
        }
    }
    else  /* Already open - flag to prevent closing the stream used with "prog" */
        fprop->external_program++;




    /** Replace the path for ReplaceRules **/

    if ( sw->replaceRegexps )
    {
        int     matched = 0;
        fprop->real_path = process_regex_list( fprop->real_path, sw->replaceRegexps, &matched );
    }


    /* Check for the need to remove an existing file first */
    if ( !check_for_replace( sw, fprop ) )
    {
        /* report file skipped if not in remove mode
         * all files are skipped in remove mode */

        if ( sw->verbose >= 1 && MODE_REMOVE != sw->Index->update_mode )
            printf("Document '%s' not added to index.\n\n", fprop->real_path);

        if (fprop->fp)
            flush_stream( fprop );

        return;
    }





    /** Read the buffer, if not a stream parser **/

#ifdef HAVE_LIBXML2
    if ( !fprop->doctype || fprop->doctype == HTML2 || fprop->doctype == XML2 || fprop->doctype == TXT2 )
        rd_buffer = NULL;
    else
#endif
    /* -- Read  all data, last 1 is flag that we are expecting text only */
    rd_buffer = read_stream(sw, fprop, 1);


    /* just for fun so we can show total bytes shown */
    sw->indexlist->total_bytes += fprop->fsize;


    /* Set which parser to use */
    
    switch (fprop->doctype)
    {

    case TXT:
        strcpy(strType,"TXT");
        countwords = countwords_TXT;
        break;

    case HTML:
        strcpy(strType,"HTML");
        countwords = countwords_HTML;
        break;

    case XML:
        strcpy(strType,"XML");
        countwords = countwords_XML;
        break;

    case WML:
        strcpy(strType,"WML");
        countwords = countwords_HTML;
        break;

#ifdef HAVE_LIBXML2
    case XML2:
        strcpy(strType,"XML2");
        countwords = parse_XML;
        break;

    case HTML2:
        strcpy(strType,"HTML2");
        countwords = parse_HTML;
        break;

    case TXT2:
        strcpy(strType,"TXT2");
        countwords = parse_TXT;
        break;

    default:
        strcpy(strType,"DEFAULT (HTML2)");
        countwords = parse_HTML;
        break;

#else

    /* Default if libxml not installed */
    default:
        strcpy(strType,"DEFAULT (HTML)");
        countwords = countwords_HTML;
        break;
#endif
        
    }


    if (sw->verbose >= 3)
        printf(" - Using %s parser - ",strType);


    /* Check for NoContents flag and just save the path name */
    /* $$$ Note, really need to only read_stream if reading from a pipe. */
    /* $$$ waste of disk IO and memory if reading from file system */

    if (fprop->index_no_content)
        countwords = index_no_content;


    /* Make sure all meta flags are cleared (incase a parser aborts) */
    ClearInMetaFlags( &indexf->header );




    /* Now bump the file counter  */
    idx->filenum++;
    indexf->header.totalfiles++;
    fi.filenum = idx->filenum;

    /** PARSE **/
    wordcount = countwords(sw, fprop, &fi, rd_buffer);





    if (!fprop->external_program)  /* external_program is not set if a filter is in use */
    {
        if (fprop->hasfilter)
            FilterClose(fprop); /* close filter pipe - should the filter be flushed? */
        else
            sw_fclose(fprop->fp); /* close file */
    }
    /* Else, it's -S prog so make sure we read all the bytes we are suppose to read! */
    /* Can remove the check for fprop->bytes_read once read_stream is no longer used */

    else if ( fprop->bytes_read && fprop->bytes_read < fprop->fsize )
        flush_stream( fprop );


    if (sw->verbose >= 3)
    {
        if (wordcount > 0)
            printf(" (%d words)\n", wordcount);
        else if (wordcount == 0)
            printf(" (no words indexed)\n");
        else if (wordcount == -1)
            printf(" (not opened)\n");
        else if (wordcount == -2)
            printf(" (Skipped due to 'FileRules title' setting)\n");
        else if (wordcount == -3)
            printf(" (Skipped due to Robots Exclusion Rule in meta tag)\n");
        fflush(stdout);
    }


    /* If indexing aborted, remove the last file entry */
    if ( wordcount == -3 || wordcount == -2 )
    {
        remove_last_file_from_list( sw, indexf );
        return;
    }


    /* Continue if a file was not indexed */
    if ( wordcount < 0 )
        return;


    if ( DEBUG_MASK & DEBUG_PROPERTIES )
        dump_file_properties( indexf, &fi );


    /* write properties to disk, and release docprop array (and the prop index array) */
    /* Currently this just passes sw, and assumes only one index file when indexing */
    WritePropertiesToDisk( sw , &fi );
	
#ifdef USE_BTREE
    /* Add the value pair (real_path, filenum) to the database */
    DB_WriteFileNum(sw,fi.filenum,fprop->real_path,strlen(fprop->real_path),indexf->DB);
    /* We always need this value in USE_BTREE mode */
    setTotalWordsPerFile(indexf, fi.filenum - 1,wordcount);
#else
    /* Save total words per file */
    if ( !indexf->header.ignoreTotalWordCountWhenRanking )
    {
        setTotalWordsPerFile(indexf, fi.filenum - 1,wordcount);
    }

#endif
    


    /* Compress the entries */
    {
        ENTRY       *ep;

        /* walk the hash list, and compress entries */
        for (i = 0; i < VERYBIGHASHSIZE; i++)
        {
            if (idx->hashentriesdirty[i])
            {
                idx->hashentriesdirty[i] = 0;
                for (ep = idx->hashentries[i]; ep; ep = ep->next)
                    CompressCurrentLocEntry(sw, ep);
            }
        }

        /* Coalesce word positions int a more optimal schema to avoid maintain the location data contiguous */
        if(idx->filenum && ((!(idx->filenum % idx->chunk_size)) || (Mem_ZoneSize(idx->currentChunkLocZone) > idx->optimalChunkLocZoneSize)))
        {
            for (i = 0; i < VERYBIGHASHSIZE; i++)
                for (ep = idx->hashentries[i]; ep; ep = ep->next)
                    coalesce_word_locations(sw, ep);
            /* Make zone available for reuse */
            Mem_ZoneReset(idx->currentChunkLocZone);
            idx->freeLocMemChain = NULL;

        }
    }


    /* Make zone available for reuse */
    Mem_ZoneReset(idx->perDocTmpZone);


    return;
}


ENTRY  *getentry(SWISH * sw, char *word)
{
    IndexFILE *indexf = sw->indexlist;
    struct MOD_Index *idx = sw->Index;
    int     hashval;
    ENTRY *e;

    if (!idx->entryArray)
    {
        idx->entryArray = (ENTRYARRAY *) emalloc(sizeof(ENTRYARRAY));
        idx->entryArray->numWords = 0;
        idx->entryArray->elist = NULL;
    }
    /* Compute hash value of word */
    hashval = verybighash(word);


    /* Look for the word in the hash array */
    for (e = idx->hashentries[hashval]; e; e = e->next)
        if (strcmp(e->word, word) == 0)
            break;

    /* flag hash entry used this file, so that the locations can be "compressed" in do_index_file */
    idx->hashentriesdirty[hashval] = 1;


    /* Word found, return it */
    if (e)
        return e;

    /* Word not found, so create a new word */

    e = (ENTRY *) Mem_ZoneAlloc(idx->entryZone, sizeof(ENTRY) + strlen(word));
    strcpy(e->word, word);
    e->next = idx->hashentries[hashval];
    idx->hashentries[hashval] = e;

    /* Init values */
    e->tfrequency = 0;  
    e->u1.last_filenum = 0; 
    e->currentlocation = NULL;
    e->currentChunkLocationList = NULL;  
    e->allLocationList = NULL;

    idx->entryArray->numWords++;
    indexf->header.totalwords++;

    return e;
}

/* Adds a word to the master index tree.
*/

void   addentry(SWISH * sw, ENTRY *e, int filenum, int structure, int metaID, int position)
{
    int     found;
    LOCATION *tp, *newtp, *prevtp;
    IndexFILE *indexf = sw->indexlist;
    struct MOD_Index *idx = sw->Index;


    indexf->total_word_positions_cur_run++;

    if ( DEBUG_MASK & DEBUG_WORDS )
    {
        struct metaEntry *m = getMetaNameByID(&indexf->header, metaID);

        printf("    Adding:[%d:%s(%d)]   '%s'   Pos:%d  Stuct:0x%0X (", filenum, m ? m->metaName : "PROP_UNKNOWN", metaID, e->word, position, structure);
        
        if ( structure & IN_EMPHASIZED ) printf(" EM");
        if ( structure & IN_HEADER ) printf(" HEADING");
        if ( structure & IN_COMMENTS ) printf(" COMMENT");
        if ( structure & IN_META ) printf(" META");
        if ( structure & IN_BODY ) printf(" BODY");
        if ( structure & IN_HEAD ) printf(" HEAD");
        if ( structure & IN_TITLE ) printf(" TITLE");
        if ( structure & IN_FILE ) printf(" FILE");
        printf(" )\n");
    }


    /* Check for first time */
    if(!e->tfrequency)
    {
        /* create a location record */
        tp = (LOCATION *) new_location(idx);
        tp->filenum = filenum;
        tp->frequency = 1;
        tp->metaID = metaID;
        tp->posdata[0] = SET_POSDATA(position,structure);
        tp->next = NULL;

        e->currentChunkLocationList = tp;
        e->tfrequency = 1;
        e->u1.last_filenum = filenum;

        return;
    }

    /* Word found -- look for same metaID and filename */
    /* $$$ To do it right, should probably compare the structure, too */
    /* Note: filename not needed due to compress we are only looking at the current file */
    /* Oct 18, 2001 -- filename is needed since merge adds words in non-filenum order */

    tp = e->currentChunkLocationList;
    found = 0;

    while (tp != e->currentlocation)
    {
        if(tp->metaID == metaID && tp->filenum == filenum  )
        {
            found =1;
            break;
        }
        tp = tp->next;
    }

    /* matching metaID NOT found.  So, add a new LOCATION record onto the word */
    /* This expands the size of the location array for this word by one */
    
    if(!found)
    {
        /* create the new LOCATION entry */
        tp = (LOCATION *) new_location(idx);
        tp->filenum = filenum;
        tp->frequency = 1;            /* count of times this word in this file:metaID */
        tp->metaID = metaID;
        tp->posdata[0] = SET_POSDATA(position,structure);

        /* add the new LOCATION onto the array */
        tp->next = e->currentChunkLocationList;
        e->currentChunkLocationList = tp;

        /* Count number of different files that this word is used in */
        if ( e->u1.last_filenum != filenum )
        {
            e->tfrequency++;
            e->u1.last_filenum = filenum;
        }

        return; /* all done */
    }


    /* Otherwise, found matching LOCATION record (matches filenum and metaID) */
    /* Just add the position number onto the end by expanding the size of the LOCATION record */

    /* 2001/08 jmruiz - Much better memory usage occurs if we use MemZones */
    /* MemZone will be reset when the doc is completely proccesed */

    newtp = add_position_location(tp, idx, tp->frequency);

    if(newtp != tp)
    {
        if(e->currentChunkLocationList == tp)
            e->currentChunkLocationList = newtp;
        else
            for(prevtp = e->currentChunkLocationList;;prevtp = prevtp->next)
            {
                if(prevtp->next == tp)
                {
                    prevtp->next = newtp;
                    break;
                }
            }
        tp = newtp;
    }

    tp->posdata[tp->frequency++] = SET_POSDATA(position,structure);

}


/*******************************************************************
*   Adds common file properties to the last entry in the file array
*   (which should be the current one)
*
*
*   Call with:
*       *SWISH      - need for indexing words
*       *fprop
*       *fi
*       *summary    - document summary (why here?)
*       start       - start position of a sub-document
*       size        - size in bytes of document
*
*   Returns:
*       void
*
*   Note:
*       Uses cached meta entries (created in metanames.c) to save the
*       metaEntry lookup by name costs
*
********************************************************************/

void    addCommonProperties( SWISH *sw, FileProp *fprop, FileRec *fi, char *title, char *summary, int start )
{
    struct metaEntry *q;
    docProperties   **properties = &fi->docProperties;
    unsigned long   tmp;
    int             metaID;
    INDEXDATAHEADER *header = &sw->indexlist->header;
    char            *filename = fprop->real_path;  /* should always have a path */
    int             filenum = fi->filenum;
    


    /* Check if filename is internal swish metadata -- should be! */

    if ((q = getPropNameByName(header, AUTOPROPERTY_DOCPATH)))
        addDocProperty( properties, q, (unsigned char *)filename, strlen(filename),0);


    /* Perhaps we want it to be indexed ... */
    if ((q = getMetaNameByName(header, AUTOPROPERTY_DOCPATH)))
    {
        int     metaID,
                positionMeta;

        metaID = q->metaID;
        positionMeta = 1;
        indexstring(sw, filename, filenum, IN_FILE, 1, &metaID, &positionMeta);
    }


    /* This allows extracting out parts of a path and indexing as a separate meta name */
    if ( sw->pathExtractList )
        index_path_parts( sw, fprop->orig_path, sw->pathExtractList, header, properties );
        


    /* Check if title is internal swish metadata */
    if ( title )
    {
        if ( (q = getPropNameByName(header, AUTOPROPERTY_TITLE)))
            addDocProperty(properties, q, (unsigned char *)title, strlen(title),0);


         /* Perhaps we want it to be indexed ... */
        if ( (q = getMetaNameByName(header, AUTOPROPERTY_TITLE)))
        {
            int     positionMeta;

            metaID = q->metaID;
            positionMeta = 1;
            indexstring(sw, title, filenum, IN_FILE, 1, &metaID, &positionMeta);
        }
    }


    if ( summary )
    {
        if ( (q = getPropNameByName(header, AUTOPROPERTY_SUMMARY)))
            addDocProperty(properties, q, (unsigned char *)summary, strlen(summary),0);

        
        if ( (q = getMetaNameByName(header, AUTOPROPERTY_SUMMARY)))
        {
            int     metaID,
                    positionMeta;

            metaID = q->metaID;
            positionMeta = 1;
            indexstring(sw, summary, filenum, IN_FILE, 1, &metaID, &positionMeta);
        }
    }



    /* Currently don't allow indexing by date or size or position */

    /* mtime is a time_t, but we don't have an entry for NOT A TIME.  Does anyone care about the first second of 1970? */

    if ( fprop->mtime && (q = getPropNameByName(header, AUTOPROPERTY_LASTMODIFIED)))
    {
        tmp = (unsigned long) fprop->mtime;
        tmp = PACKLONG(tmp);      /* make it portable */
        addDocProperty(properties, q, (unsigned char *) &tmp, sizeof(tmp),1);
    }

    if ( (q = getPropNameByName(header, AUTOPROPERTY_DOCSIZE)))
    {
        /* Use the disk size, if available */
        tmp = (unsigned long) ( fprop->source_size ? fprop->source_size : fprop->fsize);
        tmp = PACKLONG(tmp);      /* make it portable */
        addDocProperty(properties, q, (unsigned char *) &tmp, sizeof(tmp),1);
    }


    if ( (q = getPropNameByName(header, AUTOPROPERTY_STARTPOS)))
    {
        tmp = (unsigned long) start;
        tmp = PACKLONG(tmp);      /* make it portable */
        addDocProperty(properties, q, (unsigned char *) &tmp, sizeof(tmp),1);
    }

}


/*******************************************************************
*   extracts out parts from a path name and indexes that part
*
********************************************************************/
static void index_path_parts( SWISH *sw, char *path, path_extract_list *list, INDEXDATAHEADER *header, docProperties **properties )
{
    int metaID;
    int positionMeta = 1;
    int matched = 0;  /* flag if any patterns matched */
    
    while ( list )
    {
        char *str = process_regex_list( estrdup(path), list->regex, &matched );

        if ( !matched )
        {
            /* use default? */
            if ( list->meta_entry->extractpath_default )
            {
                metaID = list->meta_entry->metaID;
                indexstring(sw, list->meta_entry->extractpath_default, sw->Index->filenum, IN_FILE, 1, &metaID, &positionMeta);
            }
        }
        else
        {
            struct metaEntry *q;

            metaID = list->meta_entry->metaID;
            indexstring(sw, str, sw->Index->filenum, IN_FILE, 1, &metaID, &positionMeta);

            if ((q = getPropNameByName(header, list->meta_entry->metaName )))
                addDocProperty( properties, q, (unsigned char *)str, strlen(str),0);
            

            efree( str );
        }

        matched = 0;
        list = list->next;
    }
}


/* Just goes through the master list of files and
** counts 'em.
*/

int     getfilecount(IndexFILE * indexf)
{
    return indexf->header.totalfiles;
}



/* Removes words that occur in over _plimit_ percent of the files and
** that occur in over _flimit_ files (marks them as stopwords, that is).
*/
/* 05/00 Jose Ruiz
** Recompute positions when a stopword is removed from lists
** This piece of code is terrorific because the first goal
** was getting the best possible performace. So, the code is not
** very clear.
** The main problem is to recalculate word positions for all
** the words after removing the automatic stop words. This means
** looking at all word's positions for each automatic stop word
** and decrement its position
*/
/* 2001/02 jmruiz - rewritten - all the proccess is made in one pass to achieve
better performance */
/* 2001-08 jmruiz - rewritten - adapted to new locations and zone schema */
/* 2002-07 jmruiz - rewritten - adapted to new -e schema */

int getNumberOfIgnoreLimitWords(SWISH *sw)
{
    return sw->Index->nIgnoreLimitWords;
}


void getPositionsFromIgnoreLimitWords(SWISH * sw)
{
    int     i,
            k,
            m,
            stopwords,
            percent,
            chunk_size,
            metaID,
            frequency,
            tmpval,
            filenum;
    unsigned int    *posdata;
    unsigned int     local_posdata[MAX_STACK_POSITIONS];

    LOCATION *l, *next;
    ENTRY  *ep,
           *ep2;
    ENTRY **estop = NULL;
    int     estopsz = 0,
            estopmsz = 0;
    int     totalwords;
    IndexFILE *indexf = sw->indexlist;
    int     totalfiles = getfilecount(indexf);
    struct IgnoreLimitPositions **filepos = NULL;
    struct IgnoreLimitPositions *fpos;
    struct MOD_Index *idx = sw->Index;
    unsigned char *p, *q, *compressed_data, flag;
    int     last_loc_swap;

    stopwords = 0;
    totalwords = indexf->header.totalwords;

    idx->nIgnoreLimitWords = 0;
    idx->IgnoreLimitPositionsArray = NULL;

    if (!totalwords || idx->plimit >= NO_PLIMIT)
        return;

    if (sw->verbose)
    {
        printf("\r  Getting IgnoreLimit stopwords: ...");
        fflush(stdout);
    }


    if (!estopmsz)
    {
        estopmsz = 1;
        estop = (ENTRY **) emalloc(estopmsz * sizeof(ENTRY *));
    }

    
    /* this is the easy part: Remove the automatic stopwords from the hash array */
    /* Builds a list estop[] of ENTRY's that need to be removed */

    for (i = 0; i < VERYBIGHASHSIZE; i++)
    {
        for (ep2 = NULL, ep = sw->Index->hashentries[i]; ep; ep = ep->next)
        {
            percent = (ep->tfrequency * 100) / totalfiles;
            if (percent >= idx->plimit && ep->tfrequency >= idx->flimit)
            {
                add_word_to_hash_table( &indexf->header.hashstoplist, ep->word, HASHSIZE);

                /* for printing words removed at the end */
                idx->IgnoreLimitWords = addswline( idx->IgnoreLimitWords, ep->word);
                

                stopwords++;
                /* unlink the ENTRY from the hash */
                if (ep2)
                    ep2->next = ep->next;
                else
                    sw->Index->hashentries[i] = ep->next;

                totalwords--;
                sw->Index->entryArray->numWords--;
                indexf->header.totalwords--;

                /* Reallocte if more space is needed */
                if (estopsz == estopmsz)
                {
                    estopmsz *= 2;
                    estop = (ENTRY **) erealloc(estop, estopmsz * sizeof(ENTRY *));
                }

                /* estop is an array of ENTRY's that need to be removed */
                estop[estopsz++] = ep;
            }
            else
                ep2 = ep;
        }
    }



    /* If we have automatic stopwords we have to recalculate word positions */

    if (estopsz)
    {
        /* Build an array with all the files positions to be removed */
        filepos = (struct IgnoreLimitPositions **) emalloc(totalfiles * sizeof(struct IgnoreLimitPositions *));

        for (i = 0; i < totalfiles; i++)
            filepos[i] = NULL;

        /* Process each automatic stop word */
        for (i = 0; i < estopsz; i++)
        {
            ep = estop[i];

            if (sw->verbose)
            {
                printf("\r  Getting IgnoreLimit stopwords: %25s",ep->word);
                fflush(stdout);
            }

            if(sw->Index->swap_locdata)
            {
                /* jmruiz - Be careful with this lines!!!! If we have a lot of words,
                ** probably this code can be very slow and may be rethought.
                ** Fortunately, only a few words must usually raise a IgnoreLimit option
                */
                last_loc_swap = (verybighash(ep->word) * (MAX_LOC_SWAP_FILES - 1)) / (VERYBIGHASHSIZE - 1);
                unSwapLocData(sw, last_loc_swap, ep );
            }
      
            /* Run through location list to get positions */
            for(l=ep->allLocationList;l;)
            {
                 compressed_data = (unsigned char *) l;
                 /* Preserve next element */
                 next = *(LOCATION **)compressed_data;
                 /* Jump pointer to next element */
                 p = compressed_data + sizeof(LOCATION *);

                 metaID = uncompress2(&p);

                 memcpy((char *)&chunk_size,(char *)p,sizeof(chunk_size));
                 p += sizeof(chunk_size);

                 filenum = 0;
                 while(chunk_size)
                 {                   /* Read on all items */
                     q = p;
                     uncompress_location_values(&p,&flag,&tmpval,&frequency);
                     filenum += tmpval;

                     if(frequency > MAX_STACK_POSITIONS)
                         posdata = (unsigned int *) emalloc(frequency * sizeof(int));
                     else
                         posdata = local_posdata;

                     uncompress_location_positions(&p,flag,frequency,posdata);

                     chunk_size -= (p-q);
         
                     /* Now build the list by filenum of meta/position info */

                     if (!filepos[filenum - 1])
                     {
                         fpos = (struct IgnoreLimitPositions *) emalloc(sizeof(struct IgnoreLimitPositions));
                         fpos->pos = (int *) emalloc(frequency * 2 * sizeof(int));
                         fpos->n = 0;
                         filepos[filenum - 1] = fpos;
                     }
                     else /* file exists in array.  just append the meta and position data */
                     {
                         fpos = filepos[filenum - 1];
                         fpos->pos = (int *) erealloc(fpos->pos, (fpos->n + frequency) * 2 * sizeof(int));
                     }

                     for (m = fpos->n * 2, k = 0; k < frequency; k++)
                     {
                         fpos->pos[m++] = metaID;
                         fpos->pos[m++] = GET_POSITION(posdata[k]);
                     }

                     fpos->n += frequency;

                     if(posdata != local_posdata)
                         efree(posdata);
                }
                l = next;
            }
            if(sw->Index->swap_locdata)
                Mem_ZoneReset(idx->totalLocZone);
        }

        /* sort each file sort entries by metaname/position */
        for (i = 0; i < totalfiles; i++)
        {
            if (filepos[i])
                swish_qsort(filepos[i]->pos, filepos[i]->n, 2 * sizeof(int), &icomp2);
        }
    }

    idx->nIgnoreLimitWords = estopsz;
    idx->IgnoreLimitPositionsArray = filepos;

    if (sw->verbose)
    {
        printf("\r  Getting IgnoreLimit stopwords: Complete                            \n");
        fflush(stdout);
    }
}

/* 2001-08 jmruiz - Adjust positions if there was IgnoreLimit stopwords
** In all cases, removes null end of chunk marks */
void adjustWordPositions(unsigned char *worddata, int *sz_worddata, int n_files, struct IgnoreLimitPositions **ilp)
{
    int     frequency,
            metaID,
            tmpval,
            r_filenum, 
            w_filenum;
    unsigned int       *posdata;
    int     i,j,k;
    unsigned long    r_nextposmeta;
    unsigned char   *w_nextposmeta;
    unsigned int     local_posdata[MAX_STACK_POSITIONS];
    unsigned char r_flag, *w_flag;
    unsigned char *p, *q;

	/* TODO
	total_word_pos count for index header -- should be adjusted to reflect any positions +/- */
	

    p = worddata;

    tmpval = uncompress2(&p);     /* tfrequency */
    metaID = uncompress2(&p);     /* metaID */
    r_nextposmeta =  UNPACKLONG2(p); 
    w_nextposmeta = p;
    p += sizeof(long);

    q = p;
    r_filenum = w_filenum = 0;
    while(1)
    {                   /* Read on all items */
        uncompress_location_values(&p,&r_flag,&tmpval,&frequency);
        r_filenum += tmpval;
       
        if(frequency <= MAX_STACK_POSITIONS)
            posdata = local_posdata;
        else
            posdata = (unsigned int *) emalloc(frequency * sizeof(int));

        uncompress_location_positions(&p,r_flag,frequency,posdata);

        if(n_files && ilp && ilp[r_filenum - 1])
        {
            for(i = 0; i < ilp[r_filenum - 1]->n; i++)
            {
                tmpval = ilp[r_filenum - 1]->pos[2 * i];
                if( tmpval >= metaID)
                    break;
            }
            if(tmpval == metaID)
            {
                for(j = 0; j < frequency ; j++)
                {
                    for(k = i; k < ilp[r_filenum - 1]->n ; k++)
                    {
                        if(ilp[r_filenum - 1]->pos[2 * k] != metaID || 
                            ilp[r_filenum - 1]->pos[2 * k + 1] > GET_POSITION(posdata[j]))
                            break;  /* End */
                    }
                    posdata[j] = SET_POSDATA(GET_POSITION(posdata[j]) - (k-i), GET_STRUCTURE(posdata[j]));
                }
            } 
        }
               /* Store the filenum incrementally to save space */
        compress_location_values(&q,&w_flag,r_filenum - w_filenum,frequency, posdata);
        w_filenum = r_filenum;

               /* store positions */
        compress_location_positions(&q,w_flag,frequency,posdata);

        if(posdata != local_posdata)
            efree(posdata);

        if(!p[0])       /* End of chunk mark */
        {
            r_filenum = 0;  /* reset filenum */
            p++;
        }
        if ((p - worddata) == *sz_worddata)
             break;   /* End of worddata */

        if ((unsigned long)(p - worddata) == r_nextposmeta)
        {
            if(q != p)
                PACKLONG2(q - worddata, w_nextposmeta);

            metaID = uncompress2(&p);
            q = compress3(metaID,q);

            r_nextposmeta = UNPACKLONG2(p); 
            p += sizeof(long);

            w_nextposmeta = q;
            q += sizeof(long);

            w_filenum = 0;
        }
    }
    *sz_worddata = q - worddata;
    PACKLONG2(*sz_worddata, w_nextposmeta);
}



/*
** This is an all new ranking algorithm. I can't say it is based on anything,
** but it does seem to be better than what was used before!
** 2001/05 wsm
**
** Parameters:
**    sw
**        Pointer to SWISH structure
**
**    freq
**        Number of times this word appeared in this file
**
**    tfreq
**        Number of files this word appeared in this index (not used for ranking)
**
**    words
**        Number of owrds in this file
**
**    structure
**        Bit mask of context where this word appeared
**
**    ignoreTotalWordCount
**        Ignore total word count when ranking (config file parameter)
*/



int     entrystructcmp(const void *e1, const void *e2)
{
    const ENTRY *ep1 = *(ENTRY * const *) e1;
    const ENTRY *ep2 = *(ENTRY * const *) e2;

    return (strcmp(ep1->word, ep2->word));
}


/* Sorts the words */
void    sort_words(SWISH * sw)
{
    int     i,
            j;
    ENTRY  *e;


    if (!sw->Index->entryArray || !sw->Index->entryArray->numWords)
        return;


    if (sw->verbose)
    {
        printf("Sorting %s words alphabetically\n", comma_long( sw->Index->entryArray->numWords ) );
        fflush(stdout);
    }

    /* Build the array with the pointers to the entries */
    sw->Index->entryArray->elist = (ENTRY **) emalloc(sw->Index->entryArray->numWords * sizeof(ENTRY *));

    /* Fill the array with all the entries */
    for (i = 0, j = 0; i < VERYBIGHASHSIZE; i++)
        for (e = sw->Index->hashentries[i]; e; e = e->next)
            sw->Index->entryArray->elist[j++] = e;

    /* Sort them */
    swish_qsort(sw->Index->entryArray->elist, sw->Index->entryArray->numWords, sizeof(ENTRY *), &entrystructcmp);
}



/* Sort chunk locations of entry e by metaID, filenum */
static void    sortChunkLocations(ENTRY * e)
{
    int     i,
            j,
            k,
            filenum,metaID,frequency;
    unsigned char flag;
    unsigned char *ptmp,
           *ptmp2,
           *compressed_data;
    int    *pi = NULL;
    LOCATION *l, *prev = NULL, **lp;

    /* Very trivial case */
    if (!e)
        return;

    if(!e->currentChunkLocationList)
        return;

    /* Get the number of locations in chunk */
    for(i = 0, l = e->currentChunkLocationList; l; i++)
        l=*(LOCATION **)l;    /* Get next location */

    /* Compute array wide */
    j = 2 * sizeof(int) + sizeof(void *);

    /* Compute array size */
    ptmp = (void *) emalloc(j * i);

    /* Build an array with the elements to compare
       and pointers to data */

    for(l = e->currentChunkLocationList, ptmp2 = ptmp; l; )
    {
        pi = (int *) ptmp2;

        compressed_data = (unsigned char *)l;
        /* Jump next offset */
        compressed_data += sizeof(LOCATION *);

        metaID = uncompress2(&compressed_data);
        uncompress_location_values(&compressed_data,&flag,&filenum,&frequency);
        pi[0] = metaID;
        pi[1] = filenum;
        ptmp2 += 2 * sizeof(int);

        lp = (LOCATION **)ptmp2;
        *lp = l;
        ptmp2 += sizeof(void *);
          /* Get next location */
        l=*(LOCATION **)l;    /* Get next location */
    }

    /* Sort them */
    swish_qsort(ptmp, i, j, &icomp2);

    /* Store results */
    for (k = 0, ptmp2 = ptmp; k < i; k++)
    {
        ptmp2 += 2 * sizeof(int);

        l = *(LOCATION **)ptmp2;
        if(!k)
            e->currentChunkLocationList = l;
        else
            prev->next =l;
        ptmp2 += sizeof(void *);
        prev = l;
    }
    l->next =NULL;

    /* Free the memory of the array */
    efree(ptmp);
}

void    coalesce_all_word_locations(SWISH * sw, IndexFILE * indexf)
{
    int     i;
    ENTRY  *epi;

    for (i = 0; i < VERYBIGHASHSIZE; i++)
    {
        if ((epi = sw->Index->hashentries[i]))
        {
            while (epi)
            {
                coalesce_word_locations(sw, epi);
                epi = epi->next;
            }
        }
    }

}

/* Write the index entries that hold the word, rank, and other information.
*/


#ifndef USE_BTREE
void    write_index(SWISH * sw, IndexFILE * indexf)
{
    int     i;
    ENTRYARRAY *ep;
    ENTRY  *epi;
    int     totalwords;
    int     percent, lastPercent, n;
    int     last_loc_swap;

#define DELTA 10


    if ( !(ep = sw->Index->entryArray ))
        return;  /* nothing to do */

    totalwords = ep->numWords;

    DB_InitWriteWords(sw, indexf->DB);

    if (sw->verbose)
    {
        printf("  Writing word text: ...");
        fflush(stdout);
    }

    /* This is not longer needed. So free it as soon as possible */
    Mem_ZoneFree(&sw->Index->perDocTmpZone);


    /* This is not longer needed. So free it as soon as possible */
    Mem_ZoneFree(&sw->Index->currentChunkLocZone);

    /* If we are swaping locs to file, reset memory zone */
    if(sw->Index->swap_locdata)
        Mem_ZoneReset(sw->Index->totalLocZone);

    n = lastPercent = 0;
    for (i = 0; i < totalwords; i++)
    {
        if ( sw->verbose && totalwords > 10000 )  // just some random guess
        {
            n++;
            percent = (n * 100)/totalwords;
            if (percent - lastPercent >= DELTA )
            {
                printf("\r  Writing word text: %3d%%", percent );
                fflush(stdout);
                lastPercent = percent;
            }
        }

        epi = ep->elist[i];

        /* why check for stopwords here?  removestopwords could have remove them */
        if ( !is_word_in_hash_table( indexf->header.hashstoplist, epi->word ) )
        {
            /* Write word to index file */
            write_word(sw, epi, indexf);
        }
        else
            epi->u1.wordID = (sw_off_t)-1;  /* flag as a stop word */
    }    

    if (sw->verbose)
    {
        printf("\r  Writing word text: Complete\n" );
        printf("  Writing word hash: ...");
        fflush(stdout);
    }



    n = lastPercent = 0;
    for (i = 0; i < VERYBIGHASHSIZE; i++)
    {
        if ( sw->verbose )
        {
            n++;
            percent = (n * 100)/VERYBIGHASHSIZE;
            if (percent - lastPercent >= DELTA )
            {
                printf("\r  Writing word hash: %3d%%", percent );
                fflush(stdout);
                lastPercent = percent;
            }
        }


        if ((epi = sw->Index->hashentries[i]))
        {
            while (epi)
            {
                /* If it is not a stopword write it */
                if (epi->u1.wordID > (sw_off_t)0)  
                    DB_WriteWordHash(sw, epi->word,epi->u1.wordID,indexf->DB);
                epi = epi->next;
            }
        }
    }

    if (sw->verbose)
    {
        printf("\r  Writing word hash: Complete\n" );
        printf("  Writing word data: ...");
        fflush(stdout);
    }


    n = lastPercent = last_loc_swap = -1;
    for (i = 0; i < VERYBIGHASHSIZE; i++)
    {
         /* If we are in economic mode -e restore locations */
        if(sw->Index->swap_locdata)
        {
            if (((i * (MAX_LOC_SWAP_FILES - 1)) / (VERYBIGHASHSIZE - 1)) != last_loc_swap)
            {
                /* Free not longer needed memory */
                Mem_ZoneReset(sw->Index->totalLocZone);
                last_loc_swap = (i * (MAX_LOC_SWAP_FILES - 1)) / (VERYBIGHASHSIZE - 1);
                unSwapLocData(sw, last_loc_swap, NULL );
            }
        }
        if ((epi = sw->Index->hashentries[i]))
        {
            while (epi)
            {
                /* If we are in economic mode -e we must sort locations by metaID, filenum */
                if(sw->Index->swap_locdata)
                {
                    sortSwapLocData(epi);
                }
                if ( sw->verbose && totalwords > 10000 )  // just some random guess
                {
                    n++;
                    percent = (n * 100)/totalwords;
                    if (percent - lastPercent >= DELTA )
                    {
                        printf("\r  Writing word data: %3d%%", percent );
                        fflush(stdout);
                        lastPercent = percent;
                    }
                }
                if (epi->u1.wordID > (sw_off_t)0)   /* Not a stopword */
                {
                    build_worddata(sw, epi);
                    write_worddata(sw, epi, indexf);
                }
                epi = epi->next;
            }
        }
    }
    if (sw->verbose)
        printf("\r  Writing word data: Complete\n" );


    DB_EndWriteWords(sw, indexf->DB);

       /* free all ENTRY structs at once */
    Mem_ZoneFree(&sw->Index->entryZone);

       /* free all location compressed data */
    Mem_ZoneFree(&sw->Index->totalLocZone);

    efree(ep->elist);
}

#else

void    write_index(SWISH * sw, IndexFILE * indexf)
{
    int     i;
    ENTRYARRAY *ep;
    ENTRY  *epi;
    int     totalwords;
    int     percent, lastPercent, n;
    int     last_loc_swap;

    long    old_wordid;
    unsigned char *buffer =NULL;
    int     sz_buffer = 0;
#define DELTA 10


    if ( !(ep = sw->Index->entryArray ))
        return;  /* nothing to do */

    totalwords = ep->numWords;


    /* Write words */
    DB_InitWriteWords(sw, indexf->DB);

    if (sw->verbose)
    {
        printf("  Writing word text: ...");
        fflush(stdout);
    }

    /* This is not longer needed. So free it as soon as possible */
    Mem_ZoneFree(&sw->Index->perDocTmpZone);


    /* This is not longer needed. So free it as soon as possible */
    Mem_ZoneFree(&sw->Index->currentChunkLocZone);

    /* If we are swaping locs to file, reset memory zone */
    if(sw->Index->swap_locdata)
        Mem_ZoneReset(sw->Index->totalLocZone);

    n = lastPercent = last_loc_swap = -1;
    for (i = 0; i < VERYBIGHASHSIZE; i++)
    {
         /* If we are in economic mode -e restore locations */
        if(sw->Index->swap_locdata)
        {
            if (((i * (MAX_LOC_SWAP_FILES - 1)) / (VERYBIGHASHSIZE - 1)) != last_loc_swap)
            {
                /* Free not longer needed memory */
                Mem_ZoneReset(sw->Index->totalLocZone);
                last_loc_swap = (i * (MAX_LOC_SWAP_FILES - 1)) / (VERYBIGHASHSIZE - 1);
                unSwapLocData(sw, last_loc_swap, NULL );
            }
        }
        if ((epi = sw->Index->hashentries[i]))
        {
            while (epi)
            {
                /* If we are in economic mode -e we must sort locations by metaID, filenum */
                if(sw->Index->swap_locdata)
                {
                    sortSwapLocData(epi);
                }
                if ( sw->verbose && totalwords > 10000 )  // just some random guess
                {
                    n++;
                    percent = (n * 100)/totalwords;
                    if (percent - lastPercent >= DELTA )
                    {
                        printf("\r  Writing word data: %3d%%", percent );
                        fflush(stdout);
                        lastPercent = percent;
                    }
                }
                /* why check for stopwords here?  removestopwords could have remove them */
                if ( !is_word_in_hash_table( indexf->header.hashstoplist, epi->word ) )                     /* Not a stopword */
                {
                    /* Build worddata buffer */
                    build_worddata(sw, epi);
                    /* let's see if word is already in the index */
                    old_wordid = read_worddata(sw, epi, indexf, &buffer, &sz_buffer);
                    /* If exists, we have to add the new worddata buffer to the old one */
                    if(old_wordid)
                    {
                         add_worddata(sw, buffer, sz_buffer);
                         efree(buffer);
                         buffer = NULL;
                         sz_buffer = 0;
                         delete_worddata(sw, old_wordid, indexf);
                         write_worddata(sw, epi, indexf);
                         update_wordID(sw, epi, indexf);
                    }
                    else
                    {
                         /* Reset last error. It was set in read_worddata if
                         ** word was not found */
                         sw->lasterror = RC_OK;
                         /* Write word to index file */
                         write_worddata(sw, epi, indexf);
                         write_word(sw, epi, indexf);
                    }
                }
                epi = epi->next;
            }
        }
    }
    if (sw->verbose)
    {
        printf("\r  Writing word text: Complete\n" );
        fflush(stdout);
    }


    DB_EndWriteWords(sw, indexf->DB);

       /* free all ENTRY structs at once */
    Mem_ZoneFree(&sw->Index->entryZone);

       /* free all location compressed data */
    Mem_ZoneFree(&sw->Index->totalLocZone);

    efree(ep->elist);
}


#endif




static void addword( char *word, SWISH * sw, int filenum, int structure, int numMetaNames, int *metaID, int *word_position)
{
    int     i;

    /* Add the word for each nested metaname. */
    for (i = 0; i < numMetaNames; i++)
        (void) addentry(sw, getentry(sw,word), filenum, structure, metaID[i], *word_position);

    (*word_position)++;
}




/* Gets the next white-space delimited word */
int next_word( char **buf, char **word, int *lenword )
{
    int     i;

    /* skip any whitespace */
    while ( **buf && isspace( (unsigned char) **buf) )
        (*buf)++;

    i = 0;
    while ( **buf && !isspace( (unsigned char) **buf) )
    {
        /* reallocate buffer, if needed */
        if ( i == *lenword )
        {
            *lenword *= 2;
            *word = erealloc(*word, *lenword + 1);
        }

        (*word)[i++] = **buf;
        (*buf)++;
    }

    if ( i )
    {
        (*word)[i] = '\0';
        return 1;
    }

    return 0;
}

/* Gets the next non WordChars delimited word */
/* Bumps position if needed */
int next_swish_word(SWISH * sw, char **buf, char **word, int *lenword, int *word_position )
{
    int     i;
    IndexFILE *indexf = sw->indexlist;
    int     bump_flag = 0;

    /* skip non-wordchars and check for bump chars */
    while ( **buf && !iswordchar(indexf->header, **buf ) )
    {
        if (!bump_flag && isBumpPositionCounterChar(&indexf->header, (int) **buf))
            bump_flag++;
            
        (*buf)++;
    }

    i = 0;
    while ( **buf && iswordchar(indexf->header, **buf) )
    {
        /* It doesn't really make sense to have a WordChar that's also a bump char */
        if (!bump_flag && isBumpPositionCounterChar(&indexf->header, (int) **buf))
            bump_flag++;


        /* reallocate buffer, if needed */
        if ( i == *lenword )
        {
            *lenword *= 2;
            *word = erealloc(*word, *lenword + 1);
        }

        (*word)[i++] = **buf;
        (*buf)++;
    }

    /* If any bump chars were found then bump to prevent phrase matching */
    if ( bump_flag )
        (*word_position)++;

    if ( i )
    {
        (*word)[i] = '\0';
        stripIgnoreLastChars(&indexf->header, *word);
        stripIgnoreFirstChars(&indexf->header, *word);

        return *word ? 1 : 0;
    }

    return 0;
}

/******************************************************************
*  Build the list of metaIDs that need to be indexed
*
*  Returns number of IDs found
*
*
******************************************************************/
static int build_metaID_list( SWISH *sw )
{
    struct  MOD_Index   *idx = sw->Index;
    METAIDTABLE         *metas = &idx->metaIDtable;
    IndexFILE           *indexf = sw->indexlist;
    INDEXDATAHEADER     *header = &indexf->header;
    struct metaEntry    *m;
    int                 i;


    /* cache the default metaID for speed */
    if ( metas->defaultID == -1 )
    {
        m = getMetaNameByName( header, AUTOPROPERTY_DEFAULT );
        metas->defaultID = m ? m->metaID : 0;
    }    


    metas->num = 0;


    /* Would be smart to track number of metas flagged so not to loop through all for every lookup */
    
    for ( i = 0; i < header->metaCounter; i++)
    {
        m = header->metaEntryArray[i];

        if ( (m->metaType & META_INDEX) && m->in_tag )
        {
            if ( ++metas->num > metas->max )
                metas->array = (int *)erealloc( metas->array, (metas->max = metas->num + 200) );

            metas->array[metas->num - 1] = m->metaID;
        }
    }

    /* If no metas found to index, then add default metaID */
    if ( !metas->num && metas->defaultID  )
        metas->array[metas->num++] = metas->defaultID;

    return metas->num;
}
    

/******************************************************************
*  Index a string
*
*
******************************************************************/

/* 05/2001 Jose Ruiz - Changed word and swishword buffers to make this routine ** thread safe */


int     indexstring(SWISH * sw, char *s, int filenum, int structure, int numMetaNames, int *metaID, int *position)
{
    int     wordcount = 0;

    IndexFILE *indexf = sw->indexlist;

    char   *buf_pos;        /* pointer to current position */
    char   *cur_pos;        /* pointer to position with a word */

    struct  MOD_Index *idx = sw->Index;

                            /* Assign word buffers */
    char   *word = idx->word;
    int     lenword = idx->lenword;
    char   *swishword = idx->swishword;
    int     lenswishword = idx->lenswishword;

    struct swline *sp_stem = NULL;

    int not_fuzzy = FUZZY_NONE == indexf->header.fuzzy_data->stemmer->fuzzy_mode;

    /* Generate list of metaIDs to index unless passed in */
    if ( !metaID )
    {
        if ( !(numMetaNames = build_metaID_list( sw )) )
            return 0;
        else
            metaID = idx->metaIDtable.array;
    }

    /* current pointer into buffer */
    buf_pos = s;


    /* get the next word as defined by whitespace */
    while ( next_word( &buf_pos, &word, &lenword ) )
    {
        if ( DEBUG_MASK & DEBUG_PARSED_WORDS )
            printf("White-space found word '%s'\n", word );

    
        strtolower(word);

        /* is this a useful feature? */
        if ( indexf->header.hashuselist.count )
        {
            if ( is_word_in_hash_table( indexf->header.hashuselist, word ) )
            {
                addword(word, sw, filenum, structure, numMetaNames, metaID, position );
                wordcount++;
            }

            continue;                
        }


        /* Check for buzzwords */
        if ( indexf->header.hashbuzzwordlist.count )
        {
            /* only strip when buzzwords are being used since stripped again as a "swish word" */
            stripIgnoreLastChars(&indexf->header, word);
            stripIgnoreFirstChars(&indexf->header, word);
            if ( !*word ) /* stripped clean? */
                continue;

            if ( is_word_in_hash_table( indexf->header.hashbuzzwordlist, word ) )
            {
                addword(word, sw, filenum, structure, numMetaNames, metaID, position );
                wordcount++;
                continue;
            }
        }





        /* Translate chars */
        TranslateChars(indexf->header.translatecharslookuptable, (unsigned char *)word);

        cur_pos = word;



        /* Now split the word up into "swish words" */

        while ( next_swish_word( sw, &cur_pos, &swishword, &lenswishword, position ) )
        {

            /* Weed out Numbers - or anything that's all the listed chars */
            if ( indexf->header.numberchars_used_flag )
            {
                unsigned char *c = (unsigned char *)swishword;

                /* look for any char that's NOT in the lookup table */
                while ( *c ) {
                    if ( !indexf->header.numbercharslookuptable[(int) *c ] )
                        break;
                    c++;
                }

                /* if got all the way through the string then it's only those chars */
                if ( !*c )
                    continue; /* skip this word */
            }


            /* Check Begin & EndCharacters */
            if (!indexf->header.begincharslookuptable[(int) ((unsigned char) swishword[0])])
                continue;

            if (!indexf->header.endcharslookuptable[(int) ((unsigned char) swishword[strlen(swishword) - 1])])
                continue;


            /* limit by stopwords, min/max length, max number of digits, ... */

            if (!isokword(sw, swishword, indexf))
                continue;

            /* Now translate word if fuzzy mode */

            if ( not_fuzzy )
            {
                addword(swishword, sw, filenum, structure, numMetaNames, metaID, position );
                wordcount++;
            }

            else
            {
                char **current_word;
                int  not_first = 0;
                FUZZY_WORD *fw;

                sp_stem = NULL;  /* pointer to cached stemmed word */

#ifndef NOSTEMCACHE

                /* 2003/08 jmruiz */
                /* Snowball's stemming is very slow. So, let's try some */
                /* caching.                                             */
                /* Cache for stemming */
                /* Only use it if we are not in economic mode (-e) */

                /* Note that double-metaphone may produce two values, so don't cache 
                 * or figure a way to invalidate the cache entry if there's two words.
                 */

                if( !idx->swap_locdata )
                {
                    if((sp_stem = is_word_in_hash_table( indexf->hashstemcache, swishword)))
                    {
                        /* cache hit! */

                        if ( sp_stem->other.data )  /* might be null if the stemmer returned more than one value */
                        {
                            addword(sp_stem->other.data, sw, filenum, structure, numMetaNames, metaID, position );
                            wordcount++;
                            continue;  /* done, continue on */
                        }
                    }
                    else
                    {
                       sp_stem = add_word_to_hash_table( &indexf->hashstemcache, swishword, VERYBIGHASHSIZE);
                       sp_stem->other.data = NULL;
                   }
                }
#endif /* NOSTEMCACHE */


                /* Convert the word */
                fw = fuzzy_convert( indexf->header.fuzzy_data, swishword );

                current_word = fw->word_list;
                while ( *current_word )
                {
                    /* when a word stems to more than one word all words should have the same position number */
                    /* currently, that's only a rare case with double metaphone */

                    if ( not_first++ )
                         (*position)--;

                    addword(*current_word, sw, filenum, structure, numMetaNames, metaID, position );
                    wordcount++;

                    /* If using the stem cache, add reference to the word just entered in the index */
                    /* but since the "data" payload currently can hold only one value, only cache if one stem */

                    if ( sp_stem && !sp_stem->other.data && ( 1 == fw->list_size) )
                        sp_stem->other.data = (getentry(sw,*current_word))->word;


                    current_word++; /* move to next word in list */
                }

                fuzzy_free_word( fw );
            }
       }
    }

           /* Buffers can be reallocated - So, reasign them */
    idx->word = word;
    idx->lenword = lenword;
    idx->swishword = swishword;
    idx->lenswishword = lenswishword;

    return wordcount;
}


/* Coalesce word current word location into the linked list */
void add_coalesced(SWISH *sw, ENTRY *e, unsigned char *coalesced, int sz_coalesced, int metaID)
{
    int        tmp;
    LOCATION  *tloc, *tprev;
    LOCATION **tmploc, **tmploc2;
    unsigned char *tp;


    /* Check for economic mode (-e) and swap data to disk */
    if(sw->Index->swap_locdata)
    {
        tmploc = (LOCATION **)coalesced;
        *tmploc = (LOCATION *)e;   /* Preserve e in buffer */
                                   /* The cast is for avoiding the warning */                                 
        SwapLocData(sw, e, coalesced, sz_coalesced);
        return;
    }

    /* Add to the linked list keeping the data sorted by metaname, filenum */
    for(tprev =NULL, tloc = e->allLocationList; tloc; )
    {
        tp = (unsigned char *)tloc + sizeof(void *);
        tmp = uncompress2(&tp); /* Read metaID */
        if(tmp > metaID)
             break;
        tprev = tloc;
        tmploc = (LOCATION **)tloc;
        tloc = *tmploc;
    }

    if(! tprev)
    {
        tmploc = (LOCATION **)coalesced;
        *tmploc = e->allLocationList;
        e->allLocationList = (LOCATION *)coalesced;
    }
    else
    {
        tmploc = (LOCATION **)coalesced;
        tmploc2 = (LOCATION **)tprev;
        *tmploc = *tmploc2;
        *tmploc2 = (LOCATION *)coalesced;
    }
}


void    coalesce_word_locations(SWISH * sw, ENTRY *e)
{
    int      curmetaID, metaID,
             curfilenum, filenum,
             frequency,
             num_locs,
             worst_case_size;
    unsigned int tmp;
    unsigned char *p, *q, *size_p = NULL;
    unsigned char uflag, *cflag;
    LOCATION *loc, *next;
    static unsigned char static_buffer[COALESCE_BUFFER_MAX_SIZE];
    unsigned char *buffer;
    unsigned int sz_buffer;
    unsigned char *coalesced_buffer;
    unsigned int     *posdata;
    unsigned int      local_posdata[MAX_STACK_POSITIONS];


    /* Check for new locations in the current chunk */
    if(!e->currentChunkLocationList)
        return;

    /* Sort all pending word locations by metaID, filenum */
    sortChunkLocations(e);

    /* Init buffer to static buffer */
    buffer = static_buffer;
    sz_buffer = COALESCE_BUFFER_MAX_SIZE;

    /* Init vars */
    curmetaID = 0;
    curfilenum = 0;
    q = buffer;     /* Destination buffer */
    num_locs = 0;   /* Number of coalesced LOCATIONS */

    /* Run on all locations */
    for(loc = e->currentChunkLocationList; loc; )
    {
        p = (unsigned char *) loc;

        /* get next LOCATION in linked list*/
        next = * (LOCATION **) loc;
        p += sizeof(LOCATION *);

        /* get metaID of LOCATION */
        metaID = uncompress2(&p);

        /* Check for new metaID */
        if(metaID != curmetaID)
        {
            /* If exits previous data add it to the linked list */
            if(curmetaID)
            {
                /* add to the linked list and reset values */
                /* Update the size of chunk's data in *size_p */
                tmp = q - (size_p + sizeof(unsigned int));

                /* Write the size */
                memcpy(size_p, (char *)&tmp, sizeof(tmp) );

                /* Add to the linked list keeping the data sorted by metaname, filenum */
                /* Allocate memory space */
                coalesced_buffer = (unsigned char *)Mem_ZoneAlloc(sw->Index->totalLocZone,q-buffer);

                /* Copy content to it */
                memcpy(coalesced_buffer,buffer,q-buffer);

                /* Add to the linked list */
                add_coalesced(sw, e, coalesced_buffer, q - buffer, curmetaID);
            }

            /* Reset values */
            curfilenum = 0;
            curmetaID = metaID;
            q = buffer + sizeof(void *);   /* Make room for linked list pointer */        
            q = compress3(metaID,q);  /* Add metaID */
            size_p = q;      /* Preserve position for size */
            q += sizeof(unsigned int);     /* Make room for size */
            num_locs = 0;
        }
        uncompress_location_values(&p,&uflag,&filenum,&frequency);
        worst_case_size = sizeof(unsigned char *) + (3 + frequency) * MAXINTCOMPSIZE;

        while ((q + worst_case_size) - buffer > (int)sz_buffer)
        {
            if(!num_locs)
            {
                //progerr("Buffer too short in coalesce_word_locations. Increase COALESCE_BUFFER_MAX_SIZE in config.h and rebuild.");
                /* Allocate new buffer */
                unsigned char * new_buffer = emalloc(sz_buffer * 2 + worst_case_size);
                memcpy(new_buffer,buffer,sz_buffer);
                sz_buffer = sz_buffer * 2 +worst_case_size;
                if(buffer != static_buffer)
                    efree(buffer);

                /* Adjust pointers */
                q = new_buffer + (q - buffer);
                size_p = new_buffer + (size_p -buffer);
                /* Asign buffer and continue */
                buffer = new_buffer;
                break;
            }
            /* add to the linked list and reset values */
            /* Update the size of chunk's data in *size_p */
            tmp = q - (size_p + sizeof(unsigned int));  /* tmp contains the size */

            /* Write the size */
            memcpy(size_p,(char *)&tmp,sizeof(tmp));

            /* Add to the linked list keeping the data sorted by metaname, filenum */
            /* Allocate memory space */
            coalesced_buffer = (unsigned char *)Mem_ZoneAlloc(sw->Index->totalLocZone,q-buffer);

            /* Copy content to it */
            memcpy(coalesced_buffer,buffer,q-buffer);

            /* Add to the linked list */
            add_coalesced(sw, e, coalesced_buffer, q - buffer, curmetaID);


            /* Reset values */

            curfilenum = 0;
            curmetaID = metaID;
            q = buffer + sizeof(void *);   /* Make room for linked list pointer */
            q = compress3(metaID,q);
            size_p = q;      /* Preserve position for size */
            q += sizeof(unsigned int);     /* Make room for size */
            num_locs = 0;
        }

        if(frequency > MAX_STACK_POSITIONS)
            posdata = (unsigned int *)emalloc(frequency * sizeof(int));
        else
            posdata = local_posdata;

        uncompress_location_positions(&p,uflag,frequency,posdata);

                /* Store the filenum incrementally to save space */
        compress_location_values(&q,&cflag,filenum - curfilenum,frequency, posdata);

        curfilenum = filenum;

        compress_location_positions(&q,cflag,frequency,posdata);

        if(frequency > MAX_STACK_POSITIONS)
            efree(posdata);

        num_locs++;

        loc = next;
    }
    if (num_locs)
    {
        /* add to the linked list and reset values */

        /* Update the size of chunk's data in *size_p */
        tmp = q - (size_p + sizeof(unsigned int));  /* tmp contains the size */
        /* Write the size */
        memcpy(size_p,(char *)&tmp,sizeof(tmp));

        /* Add to the linked list keeping the data sorted by metaname, filenum */
        /* Allocate memory space */
        coalesced_buffer = (unsigned char *)Mem_ZoneAlloc(sw->Index->totalLocZone,q-buffer);

        /* Copy content to it */
        memcpy(coalesced_buffer,buffer,q-buffer);

        /* Add to the linked list */
        add_coalesced(sw, e, coalesced_buffer, q - buffer, curmetaID);
    }
    e->currentChunkLocationList = NULL;
    e->currentlocation = NULL;

    /* If we are swaping locs to file, reset also correspondant memory zone */
    if(sw->Index->swap_locdata)
        Mem_ZoneReset(sw->Index->totalLocZone);


    /* Free buffer if not using static buffer */
    if(buffer != static_buffer)
        efree(buffer);
}



/* 09/00 Jose Ruiz
** Function to swap location data to a temporal file or ramdisk to save
** memory. Unfortunately we cannot use it with IgnoreLimit option
** enabled.
** The data has been compressed previously in memory.
** Returns the pointer to the file.
*/
static void SwapLocData(SWISH * sw, ENTRY *e, unsigned char *buf, int lenbuf)
{
    int     idx_swap_file;
    struct  MOD_Index *idx = sw->Index;

    /* 2002-07 jmruiz - Get de corrsponding swap file */
    /* Get the corresponding hash value to this word  */
    /* IMPORTANT!!!! - The routine being used here to compute the hash */  
    /* must be the same used to store the words */
    /* Then we must get the corresponding swap file index */
    /* Since we cannot have so many swap files (VERYBIGHASHSIZE for verybighash */
    /* routine) we must scale the hash into SWAP_FILES */
    idx_swap_file = (verybighash(e->word) * (MAX_LOC_SWAP_FILES -1))/(VERYBIGHASHSIZE -1);

    if (!idx->fp_loc_write[idx_swap_file])
    {
        idx->fp_loc_write[idx_swap_file] = create_tempfile(sw, F_WRITE_BINARY, "loc", &idx->swap_location_name[idx_swap_file], 0 );
    }

    compress1(lenbuf, idx->fp_loc_write[idx_swap_file], idx->swap_putc);
    if (idx->swap_write(buf, 1, lenbuf, idx->fp_loc_write[idx_swap_file]) != (unsigned int) lenbuf)
    {
        progerr("Cannot write location to swap file");
    }
}

/* 2002-07 jmruiz - New -e schema */
/* Get location data from swap file */
/* If e is null, all data will be restored */
/* If e si not null, only the location for this data will be readed */
static void unSwapLocData(SWISH * sw, int idx_swap_file, ENTRY *ep)
{
    unsigned char *buf;
    int     lenbuf;
    struct MOD_Index *idx = sw->Index;
    ENTRY *e;
    LOCATION *l;
    FILE *fp;

    /* Check if some swap file is being used */
    if(!idx->fp_loc_write[idx_swap_file] && !idx->fp_loc_read[idx_swap_file])
       return;

    /* Check if the file is opened for write and close it */
    if(idx->fp_loc_write[idx_swap_file])
    {
        /* Write a 0 to mark the end of locations */
        if ( idx->swap_putc(0,idx->fp_loc_write[idx_swap_file]) == EOF )
            progerrno("Failed to write locaton mark to index: ");

        idx->swap_close(idx->fp_loc_write[idx_swap_file]);
        idx->fp_loc_write[idx_swap_file] = NULL;
    }

    /* Reopen in read mode for (for faster reads, I suppose) */
    if(!idx->fp_loc_read[idx_swap_file])
    {
        if (!(idx->fp_loc_read[idx_swap_file] = sw_fopen(idx->swap_location_name[idx_swap_file], F_READ_BINARY)))
            progerrno("Could not open temp file %s: ", idx->swap_location_name[idx_swap_file]);
    }
    else
    {
        /* File already opened for read -> reset pointer */
        sw_fseek(idx->fp_loc_read[idx_swap_file],(sw_off_t)0,SEEK_SET);
    }

    fp = idx->fp_loc_read[idx_swap_file];
    while((lenbuf = uncompress1(fp, idx->swap_getc)))
    {
        if(ep == NULL)
        {
            buf = (unsigned char *) Mem_ZoneAlloc(idx->totalLocZone,lenbuf);
            idx->swap_read(buf, lenbuf, 1, fp);
            e = *(ENTRY **)buf;
            /* Store the locations in reverse order - Faster. They will be
            ** sorted later */
            l = (LOCATION *) buf;
            l->next = e->allLocationList;
            e->allLocationList = l;
        }
        else
        {
            idx->swap_read(&e,sizeof(ENTRY *),1,fp);
            if(ep == e)
            {
                buf = (unsigned char *) Mem_ZoneAlloc(idx->totalLocZone,lenbuf);             
                memcpy(buf,&e,sizeof(ENTRY *));
                idx->swap_read(buf + sizeof(ENTRY *),lenbuf - sizeof(ENTRY *),1,fp);
                /* Store the locations in reverse order - Faster. They will be
                ** sorted later */
                l = (LOCATION *) buf;
                l->next = e->allLocationList;
                e->allLocationList = l;
            }
            else
            {
                /* Just advance file pointer */
                idx->swap_seek(fp,(sw_off_t)(lenbuf - sizeof(ENTRY *)),SEEK_CUR);
            }
        }
    }
}

/* 2002-07 jmruiz - Sorts unswaped location data by metaname, filenum */
static void sortSwapLocData(ENTRY *e)
{
    int i, j, k, metaID;
    int    *pi = NULL;
    unsigned char *ptmp,
           *ptmp2,
           *compressed_data;
    LOCATION **tmploc;
    LOCATION *l, *prev=NULL, **lp;

    /* Count the locations */
    for(i = 0, l = e->allLocationList; l;i++, l = l->next);
    
    /* Very trivial case */
    if(i < 2)
        return;

    /* */
    /* Now, let's sort by metanum, offset in file */

    /* Compute array wide for sort */
    j = 2 * sizeof(int) + sizeof(void *);

    /* Compute array size */
    ptmp = (void *) emalloc(j * i);

    /* Build an array with the elements to compare
       and pointers to data */

    /* Very important to remind - data was read from the loc
    ** swap file in reverse order, so, to get data sorted
    ** by filenum we just need to use a reverse counter (i - k)
    ** as the other value for sorting (it has the same effect
    ** as filenum)
    */
    for(k=0, ptmp2 = ptmp, l = e->allLocationList ; k < i; k++, l = l->next)
    {
        pi = (int *) ptmp2;

        compressed_data = (unsigned char *)l;
        /* Jump fileoffset */
        compressed_data += sizeof(LOCATION *);

        metaID = uncompress2(&compressed_data);
        pi[0] = metaID;
        pi[1] = i-k;
        ptmp2 += 2 * sizeof(int);

        lp = (LOCATION **)ptmp2;
        *lp = l;
        ptmp2 += sizeof(void *);
    }

    /* Sort them */
    swish_qsort(ptmp, i, j, &icomp2);

    /* Store results */
    for (k = 0, ptmp2 = ptmp; k < i; k++)
    {
        ptmp2 += 2 * sizeof(int);

        l = *(LOCATION **)ptmp2;
        if(!k)
            e->allLocationList = l;
        else
        {
            tmploc = (LOCATION **)prev;
            *tmploc = l;
        }
        ptmp2 += sizeof(void *);
        prev = l;
    }
    tmploc = (LOCATION **)l;
    *tmploc = NULL;

    /* Free the memory of the sorting array */
    efree(ptmp);

}
�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/docprop.h�������������������������������������������������������������������������0000664�0000771�0001750�00000007121�11166010110�012422� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*

$Id: docprop.h 1736 2005-05-12 15:41:22Z karman $


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL


************************************************************************************
 * DocProperties.c, DocProperties.h
 *
 * Functions to manage the index's Document Properties 
 *
 * File Created.
 * M. Gaulin 8/10/98
 * Jose Ruiz 2000/10 many modifications
 * Jose Ruiz 2001/01 many modifications
 *
 * 2001-01-26  rasc  getPropertyByname changed
 * 2001-02-09  rasc  printSearchResultProperties changed
 */

#ifdef __cplusplus
extern "C" {
#endif

void freeProperty( propEntry *prop );
void freeDocProperties (docProperties *);
void freefileinfo(FileRec *);

unsigned char *storeDocProperties (docProperties *, int *);

unsigned char *allocatePropIOBuffer(SWISH *sw, unsigned long buf_needed );

propEntry *getDocProperty( RESULT *result, struct metaEntry **meta_entry, int metaID, int max_size );
propEntry *CreateProperty(struct metaEntry *meta_entry, unsigned char *propValue, int propLen, int preEncoded, int *error_flag );
void addDocProperties( INDEXDATAHEADER *header, docProperties **docProperties, unsigned char *propValue, int propLen, char *filename );
int addDocProperty (docProperties **, struct metaEntry * , unsigned char* ,int, int );
int Compare_Properties( struct metaEntry *meta_entry, propEntry *p1, propEntry *p2 );

unsigned char *fetchDocProperties ( FileRec *, char * );


void swapDocPropertyMetaNames (docProperties **, struct metaMergeEntry *);

char *getResultPropAsString(RESULT *, int);
char *DecodeDocProperty( struct metaEntry *meta_entry, propEntry *prop );
void getSwishInternalProperties(FileRec *, IndexFILE *);


PropValue *getResultPropValue (RESULT *r, char *name, int ID);
void    freeResultPropValue(PropValue *pv);



void     WritePropertiesToDisk( SWISH *sw , FileRec *fi);
propEntry *ReadSingleDocPropertiesFromDisk( IndexFILE *indexf, FileRec *fi, int metaID, int max_size );
docProperties *ReadAllDocPropertiesFromDisk( IndexFILE *indexf, int filenum );



/*
   -- Mapping AutoProperties <-> METANAMES  
   -- should be the same
*/

/* all AutoPropteries start with this string ! */


#define AUTOPROPERTY_DEFAULT      "swishdefault"
#define AUTOPROPERTY_REC_COUNT    "swishreccount"
#define AUTOPROPERTY_RESULT_RANK  "swishrank"
#define AUTOPROPERTY_FILENUM      "swishfilenum"
#define AUTOPROPERTY_INDEXFILE    "swishdbfile"

#define AUTOPROPERTY_DOCPATH      "swishdocpath"
#define AUTOPROPERTY_TITLE        "swishtitle"
#define AUTOPROPERTY_DOCSIZE      "swishdocsize"
#define AUTOPROPERTY_LASTMODIFIED "swishlastmodified"
#define AUTOPROPERTY_SUMMARY      "swishdescription"
#define AUTOPROPERTY_STARTPOS     "swishstartpos"

#ifdef __cplusplus
}
#endif /* __cplusplus */

�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/rank.h����������������������������������������������������������������������������0000664�0000771�0001750�00000002467�11166010110�011717� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/* 

$Id: rank.h 1945 2007-10-22 14:54:07Z karpet $

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL



added getrankIDF and getrankDEF to allow multiple ranking schemes
karman Mon Aug 30 07:01:31 CDT 2004
*/
#ifndef RANK_H
#define RANK_H 1

#include "mem.h"
#include "search.h"

int getrank( RESULT *r );
int getrankDEF( RESULT *r );
int getrankIDF( RESULT *r );
int scale_word_score( int score );

#endif
���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/fs.c������������������������������������������������������������������������������0000664�0000771�0001750�00000046540�11166010110�011367� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*

$Id: fs.c 1736 2005-05-12 15:41:22Z karman $

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 15:51:39 CDT 2005
** added GPL

**
** change sprintf to snprintf to avoid corruption,
** added safestrcpy() macro to avoid corruption from strcpy overflow,
** and use MAXKEYLEN as string length vs. literal "34"
** SRE 11/17/99
**
** 2000-11     jruiz,rasc  some redesign
** 2001-04-07  rasc        fixed FileRule pathname
**
*/

#include "swish.h"
#include "mem.h"
#include "swstring.h"
#include "index.h"
#include "hash.h"
#include "file.h"
#include "list.h"
#include "fs.h"
#include "check.h"
#include "error.h"
#include "xml.h"
#include "txt.h"
#include "parse_conffile.h"
#include "swish_qsort.h"

typedef struct
{
    int     currentsize;
    int     maxsize;
    char  **filenames;
}
DOCENTRYARRAY;



#define MAXKEYLEN 34            /* Hash key -- allow for 64 bit inodes */


static int get_rules( char *name, StringList *sl, PATH_LIST *pathlist );
static int check_FileTests( char *path, PATH_LIST *test );
static void indexadir(SWISH *, char *);
static void indexafile(SWISH *, char *);
static void printfile(SWISH *, char *);
static void printfiles(SWISH *, DOCENTRYARRAY *);
static void printdirs(SWISH *, DOCENTRYARRAY *);
static DOCENTRYARRAY *adddocentry(DOCENTRYARRAY * e, char *filename);
static void split_path(char *path, char **directory, char **file);

/*
  -- init structures for this module
*/

void    initModule_FS(SWISH * sw)
{
    struct MOD_FS *fs;

    fs = (struct MOD_FS *) emalloc(sizeof(struct MOD_FS));

    memset(fs, 0, sizeof(struct MOD_FS));
    sw->FS = fs;

}


/*
  -- release all wired memory for this module
*/

void    freeModule_FS(SWISH * sw)
{
    struct MOD_FS *fs = sw->FS;

    /* Free fs parameters */

    free_regex_list( &fs->filerules.pathname );
    free_regex_list( &fs->filerules.dirname );
    free_regex_list( &fs->filerules.filename );
    free_regex_list( &fs->filerules.dircontains );
    free_regex_list( &fs->filerules.title );

    free_regex_list( &fs->filematch.pathname );
    free_regex_list( &fs->filematch.dirname );
    free_regex_list( &fs->filematch.filename );
    free_regex_list( &fs->filematch.dircontains );
    free_regex_list( &fs->filematch.title );


    /* free module data */
    efree(fs);
    sw->FS = NULL;

    return;
}


/*
 -- Config Directives
 -- Configuration directives for this Module
 -- return: 0/1 = none/config applied
 Aug 1, 2001 -- these probably should be pre-compiled regular expressions,
                and their memory should be freed on exit.  moseley
*/



int     configModule_FS(SWISH * sw, StringList * sl)
{
    struct MOD_FS *fs = sw->FS;
    char   *w0 = sl->word[0];

    if (strcasecmp(w0, "FileRules") == 0)
        return get_rules( w0, sl, &fs->filerules );

    if (strcasecmp(w0, "FileMatch") == 0)
        return get_rules( w0, sl, &fs->filematch );


    if (strcasecmp(w0, "FollowSymLinks") == 0)
    {
        fs->followsymlinks = getYesNoOrAbort(sl, 1, 1);
        return 1;
    }

    return 0;                   /* not one of our parameters */

}

static int get_rules( char *name, StringList *sl, PATH_LIST *pathlist )
{
    char   *w1;
    char   *both;
    int     regex_pattern = 0;


    if (sl->n < 4)
    {
        printf("err: Wrong number of parameters in %s\n", name);
        return 0;
    }


    /* For "is" make sure it matches the entire pattern */
    /* A bit ugly */

    if ( strcasecmp(sl->word[2], "is") == 0 )
    {
        int    i;
        /* make patterns match the full string */
        for ( i = 3; i < sl->n; i++ )
        {
            int len = strlen( sl->word[i] );
            char *new;
            char *old = sl->word[i];

            if ( (strcasecmp( old, "not" ) == 0) && i < sl->n-1 )
                continue;

            new = emalloc( len + 3 );
            
            if ( sl->word[i][0] != '^' )
            {
                strcpy( new, "^" );
                strcat( new, sl->word[i] );
            }
            else
                strcpy( new, sl->word[i] );

            if ( sl->word[i][len-1] != '$' )
                strcat( new, "$" );

            sl->word[i] = new;
            efree( old );
        }
                
    }

    else if ( strcasecmp(sl->word[2], "regex") == 0 )
        regex_pattern++;

    
    else if ( !(strcasecmp(sl->word[2], "contains") == 0) )
    {
        printf("err: %s must be followed by [is|contains|regex]\n", name);
        return 0;
    }

    w1 = sl->word[1];

    both = emalloc( strlen( name ) + strlen( w1 ) + 2 );
    strcpy( both, name );
    strcat( both, " ");
    strcat( both, w1 );

    

    if ( strcasecmp(w1, "pathname") == 0 )
        add_regex_patterns( both, &pathlist->pathname, &(sl->word)[3], regex_pattern );

    else if ( strcasecmp(w1, "filename") == 0 )
        add_regex_patterns( both, &pathlist->filename, &(sl->word)[3], regex_pattern );        
        
    else if ( strcasecmp(w1, "dirname") == 0 )
        add_regex_patterns( both, &pathlist->dirname, &(sl->word)[3], regex_pattern );

    else if ( strcasecmp(w1, "title") == 0 )
        add_regex_patterns( both, &pathlist->title, &(sl->word)[3], regex_pattern );        

    else if ( strcasecmp(w1, "directory") == 0 )
        add_regex_patterns( both, &pathlist->dircontains, &(sl->word)[3], regex_pattern );

    else
    {
        printf("err: '%s' - invalid parameter '%s'\n", both, w1 );
        return 0;
    }


    efree( both );
    return 1;
}    


        
        

        


/* Have we already indexed a file or directory?
** This function is used to avoid multiple index entries
** or endless looping due to symbolic links.
*/

static int     fs_already_indexed(SWISH * sw, char *path)
{
#ifndef NO_SYMBOLIC_FILE_LINKS
    struct dev_ino *p;
    struct stat buf;
    char    key[MAXKEYLEN];     /* Hash key -- allow for 64 bit inodes */
    unsigned hashval;

    if (stat(path, &buf))
        return 0;

    /* Create hash key:  string contains device and inode. */
    /* Avoid snprintf -> MAXKEYLEN is big enough for two longs
       snprintf( key, MAXKEYLEN, "%lx/%lx", (unsigned long)buf.st_dev,
       (unsigned long)buf.st_ino  );
     */
    sprintf(key, "%lx/%lx", (unsigned long) buf.st_dev, (unsigned long) buf.st_ino);

    hashval = bighash(key);     /* Search hash for this file. */
    for (p = sw->Index->inode_hash[hashval]; p != NULL; p = p->next)
        if (p->dev == buf.st_dev && p->ino == buf.st_ino)
        {                       /* We found it. */
            if (sw->verbose >= 3)
                printf("Skipping %s:  %s\n", path, "Already indexed.");
            return 1;
        }

    /* Not found, make new entry. */
    p = (struct dev_ino *) Mem_ZoneAlloc(sw->Index->entryZone,sizeof(struct dev_ino));

    p->dev = buf.st_dev;
    p->ino = buf.st_ino;
    p->next = sw->Index->inode_hash[hashval];
    sw->Index->inode_hash[hashval] = p; /* Aug 1, 2001 -- this is not freed */
#endif

    return 0;
}


/* Recursively goes into a directory and calls the word-indexing
** functions for each file that's found.
*/

static void    indexadir(SWISH * sw, char *dir)
{
    int             allgoodfiles = 0;
    DIR             *dfd;

#ifdef NEXTSTEP
    struct direct   *dp;
#else
    struct dirent   *dp;
#endif
    int             pathbuflen;
    char            *pathname;
    DOCENTRYARRAY   *sortfilelist = NULL;
    DOCENTRYARRAY   *sortdirlist = NULL;
    int             dirlen = strlen( dir );
    struct MOD_FS   *fs = sw->FS;

    /* First check if it's a symlink and if so are they allowed */
    if (!fs->followsymlinks && islink(dir))
        return;


    /* logic is not well defined - here we only check dirname */
    /* but that bypasses pathname checks -- but that's checked per-file */
    /* This allows one to override File* directory checks */
    /* but allows a pathname check to be limited to full paths */
    /* This also means you can avoid indexing an entire directory tree with FileRules dirname, */
    /* but using a FileRules pathname allows recursion into the directory */

    /* Reject entire directory due to FileRules dirname */
    if ( *dir && match_regex_list( dir, fs->filerules.dirname, "FileRules dirname" ) )
        return;


    /* 
     * Now mark this directory as visited.  This is done before dircontains
     * because dircontains would have the same results regardless if "dir" is
     * a symlink or not, so it only needs to be done once.
     */

    if (fs_already_indexed(sw, dir))
        return;


    /* Handle "FileRules directory" directive */
    /*  - Check all files within the directory before proceeding -- means reading the directory twice */
    /*  - All files are checked. */

    if (fs->filematch.dircontains || fs->filerules.dircontains )
    {
        if ((dfd = opendir(dir)) == NULL)
        {
            if ( sw->verbose )
                progwarnno("Failed to open dir '%s' :", dir );
            return;
        }

        while ((dp = readdir(dfd)) != NULL)
        {
            if ( match_regex_list( dp->d_name, fs->filerules.dircontains,"FileRules dircontains" ) )
            {
                closedir( dfd );
                return;  /* prevents recursion into subdirs of this directory */
            }

            if ( match_regex_list( dp->d_name, fs->filematch.dircontains, "FileMatch dircontains" ) )
            {
                allgoodfiles++;
                break;
            }

        }
        closedir(dfd);
    }


    /* Now, build list of files and directories */

    pathbuflen = MAXFILELEN;
    pathname = (char *) emalloc( pathbuflen + 1 ); 

    if ((dfd = opendir(dir)) == NULL)
    {
        if ( sw->verbose )
            progwarnno("Failed to open dir '%s' :", dir );
        return;
    }


    if ( dirlen == 1 && *dir == '/' ) /* case of root dir */
        dirlen = 0;

    while ((dp = readdir(dfd)) != NULL)
    {
        int filelen = strlen( dp->d_name );

        /* For security reasons, don't index dot files */
        /* Check for hidden under Windows? */

        if ((dp->d_name)[0] == '.')
            continue;


        /* Build full path to file */

        /* reallocate filename buffer, if needed (dir + path + '/' ) */
        if ( dirlen + filelen + 1 > pathbuflen )
        {
            pathbuflen = dirlen + filelen + 200;
            pathname = (char *) erealloc(pathname, pathbuflen + 1);
        }

        if ( dirlen )
            memcpy(pathname, dir, dirlen);

        pathname[dirlen] = '/';  // Add path separator
        memcpy(pathname + dirlen + 1, dp->d_name, filelen);
        pathname[ dirlen + filelen + 1] = '\0';

        /* Check if the path is a symlink */
        if ( !fs->followsymlinks && islink( pathname ) )
            continue;


        if ( isdirectory(pathname) )
        {
            sortdirlist = (DOCENTRYARRAY *) adddocentry(sortdirlist, pathname);
        }
        else
        {
            /*
             * "allgoodfiles" was set above if ANY file in the directory matched
             * FileMatch directory [...]
             * Otherwise, run the FilesMatch tests to see if it should be included.
             */

            if ( allgoodfiles || check_FileTests( pathname, &fs->filematch ) ) 
            {
                sortfilelist = (DOCENTRYARRAY *) adddocentry(sortfilelist, pathname);
                continue;
            }


            if (!isoksuffix(dp->d_name, sw->suffixlist))
                continue;


            /* Check FileRules for rejects  */
            if ( check_FileTests( pathname, &fs->filerules ) )
                continue;

            sortfilelist = (DOCENTRYARRAY *) adddocentry(sortfilelist, pathname);
        }
    }

    efree(pathname);

    closedir(dfd);

    printfiles(sw, sortfilelist);
    printdirs(sw, sortdirlist);
}

/* Calls the word-indexing function for a single file.
*/

static void    indexafile(SWISH * sw, char *path)
{
    struct MOD_FS *fs = sw->FS;


    if (!fs->followsymlinks && islink(path))
        return;


    /* Check for File|Pathmatch, and index if any match */
    if ( check_FileTests( path, &fs->filematch ) ) 
    {
        printfile(sw, path);
        return;
    }

    /* This is likely faster, so do it first */
    if (!isoksuffix(path, sw->suffixlist))
        return;

    /* Check FileRules for rejects  */
    if ( check_FileTests( path, &fs->filerules ) )
        return;


    /* Passed all tests, so index */
    printfile(sw, path);
}

/**********************************************************
* Process FileTests
*
* Returns 1 = something matched
*
**********************************************************/
static int check_FileTests( char *path, PATH_LIST *test )
{
    char *dir;
    char *file;


    if ( match_regex_list( path, test->pathname, "File[Rules|Match] pathname"  ) )
        return 1;

    if ( !( test->dirname || test->filename ) )
        return 0;

    split_path( path, &dir, &file );

    if ( *dir && match_regex_list( dir, test->dirname, "File[Rules|Match] dirname" ) )
    {
        efree( dir );
        efree( file );
        return 1;
    }
    
    if ( *file && match_regex_list( file, test->filename, "File[Rules|Match] filename" ) )
    {
        efree( dir );
        efree( file );
        return 1;
    }

    efree( dir );
    efree( file );
    return 0;
}

/***************************************************
*  Note that this is mostly a duplicate of above,
*  but was designed to work with both path and URLs
*
*  Probably should settle on one
*  Also, this returns "" on empty dirs, where above returns " "
*  Mar 2002 -- and is only called by fs.c...
*  May 2002, moved to fs.c.  Why isn't basename library used for this?
***************************************************/

static void    split_path(char *path, char **directory, char **file)
{
    char   *p1,
           *p2,
           *p3;

    /* look for last DIRDELIMITER (FS) and last / (HTTP) */
    //p1 = strrchr( path, DIRDELIMITER);
    p1 = strrchr( path, '/');
    p2 = strrchr( path, '/');

    if (p1 && p2)
    {                           /* if both are found, use the longest. */
        if (p1 >= p2)
            p3 = p1;
        else
            p3 = p2;
    } else if (p1 && !p2)
        p3 = p1;
    else if (!p1 && p2)
        p3 = p2;
    else
        p3 = NULL;

    /* Set directory */
    if (!p3)
        *directory = (char *) estrdup((char *) "");
    else
    {
        char c = *++p3;

        *p3 = '\0';
        *directory = (char *) estrdup((char *) path);
        *p3 = c;
        path = p3;
    }

    *file = (char *) estrdup((char *) path);
}




/* Indexes the words in the file
*/

static void    printfile(SWISH * sw, char *filename)
{
    char   *s;
    FileProp *fprop;

    if ( !filename )
        return;

    /* Only index files once */
    if ( fs_already_indexed( sw, filename ) )
        return;

    if (sw->verbose >= 3)
    {
        /* Only display file name */
        if ((s = (char *) strrchr(filename, '/')) == NULL)
            printf("  %s", filename);
        else
            printf("  %s", s + 1);
        fflush(stdout);
    }


    fprop = file_properties(filename, filename, sw);

    do_index_file(sw, fprop);

    free_file_properties(fprop);
}

/* 2001-08 Jose Ruiz */
/* function for comparing filenames to get all filenames in a dir sorted
** Original addsortentry used strcmp - So, I use the same routine here
** What about Win32?
*/
int     compfilenames(const void *s1, const void *s2)
{
    char *r1 = *(char * const *) s1;
    char *r2 = *(char * const *) s2;

    return strcmp(r1,r2);
}

/* Indexes the words in all the files in the array of files
** The array is sorted alphabetically
*/

static void    printfiles(SWISH * sw, DOCENTRYARRAY * e)
{
    int     i;

    if (e)
    {
        /* 2001-08 sorting of filenames moved here - Do we really
        ** need to sort them?  - Adjust it in config.h */
        if(e->currentsize)
        {
            if(SORT_FILENAMES)
            {
                swish_qsort(e->filenames, e->currentsize, sizeof(char *), compfilenames);
            }
        }

        for (i = 0; i < e->currentsize; i++)
        {
            printfile(sw, e->filenames[i]);
            efree( e->filenames[i] );
        }

        /* free the array and filenames */
        efree(e->filenames);
        efree(e);
    }
}

/* Prints out the directory names as things are getting indexed.
** Calls indexadir() so directories in the array are indexed,
** in alphabetical order...
*/

void    printdirs(SWISH * sw, DOCENTRYARRAY * e)
{
    int     i;

    if (e)
    {
        /* 2001-08 sorting of dirs moved here - Do we really
        ** need to sort them? - Adjust it in config.h */
        if(e->currentsize)
        {
            if(SORT_FILENAMES)
            {
                swish_qsort(e->filenames, e->currentsize, sizeof(char *), compfilenames);
            }
        }

        for (i = 0; i < e->currentsize; i++)
        {
            if (sw->verbose >= 3)
                printf("\nIn dir \"%s\":\n", e->filenames[i]);
            else if (sw->verbose >= 2)
                printf("Checking dir \"%s\"...\n", e->filenames[i]);

            indexadir(sw, e->filenames[i]);
            efree(e->filenames[i]);
        }
        efree(e->filenames);
        efree(e);
    }
}

/* Stores file names in alphabetical order so they can be
** indexed alphabetically. No big whoop.
*/

static DOCENTRYARRAY *adddocentry(DOCENTRYARRAY * e, char *filename)
{
    if (e == NULL)
    {
        e = (DOCENTRYARRAY *) emalloc(sizeof(DOCENTRYARRAY));
        e->maxsize = VERYBIGHASHSIZE; /* Put what you like */
        e->filenames = (char **) emalloc(e->maxsize * sizeof(char *));

        e->currentsize = 1;
        e->filenames[0] = (char *) estrdup(filename);
    }
    else
    {
        if ((e->currentsize + 1) == e->maxsize)
        {
            e->maxsize += 1000;
            e->filenames = (char **) erealloc(e->filenames, e->maxsize * sizeof(char *));
        }
        e->filenames[e->currentsize++] = (char *) estrdup(filename);
    }
    return e;
}




/********************************************************/
/*					"Public" functions					*/
/********************************************************/

void    fs_indexpath(SWISH * sw, char *path)
{

    normalize_path( path );  /* flip backslashes and remove trailing slash */


    if (isdirectory(path))
    {
        if (sw->verbose >= 2)
            printf("\nChecking dir \"%s\"...\n", path);

        indexadir(sw, path);
    }

    else if (isfile(path))
    {
        if (sw->verbose >= 2)
            printf("\nChecking file \"%s\"...\n", path);
        indexafile(sw, path);
    }
    else
        progwarnno("Invalid path '%s': ", path);
}




struct _indexing_data_source_def FileSystemIndexingDataSource = {
    "File-System",
    "fs",
    fs_indexpath,
    configModule_FS
};
����������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/compress.c������������������������������������������������������������������������0000664�0000771�0001750�00000072567�11166010110�012622� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*

$Id: compress.c 1945 2007-10-22 14:54:07Z karpet $

** Copyright (C) 1995, 1996, 1997, 1998 Hewlett-Packard Company
** Originally by Kevin Hughes, kev@kevcom.com, 3/11/94
**

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 15:51:39 CDT 2005
** added GPL
** karman: how much of this file is still original??



** 2001-02-12 rasc   errormsg "print" changed...
**
*/

#include "swish.h"
#include "swstring.h"
#include "compress.h"
#include "mem.h"
#include "error.h"
#include "merge.h"
#include "search.h"
#include "docprop.h"
#include "index.h"
#include "hash.h"
#include "ramdisk.h"
#include "swish_qsort.h"
#include "file.h"

#ifdef HAVE_ZLIB
#include <zlib.h>
#define Z_BUFSIZE 16384
#endif

/* Surfing the web I found this:
** it is a very simple macro that can be used in *PACKLONG* routines to 
** detect if we need to spend some cycles for [un]packing the number in a
** portable format
*/
#ifndef LITTLE_ENDIAN
static const int swish_endian_test_value = 1;
#define LITTLE_ENDIAN (*(const unsigned char *)&swish_endian_test_value)
#endif

/* 2001-05 jmruiz */
/* Routines for compressing numbers - Macros converted to routines */

/* 2002-11 jmruiz */
/* Get required size in bytes for a given compressed number */
int sizeofcompint(int number)
{
int size = 0;
    do
    {
        size++;
    } 
    while ((number >>= 7));
    return size;
}

/* Compress a number and writes it to a file */
void    compress1(int num, FILE * fp, int (*f_putc) (int, FILE *))
{
    int     _i = 0,
            _r = num;
    unsigned char _s[MAXINTCOMPSIZE];

    /* Trivial case: 0 */
    if(!_r)
    {
        if (f_putc(0,fp) == EOF )
            progerrno("compress1 failed to write null: ");

        return;
    }

    /* Any other case ... */
    while (_r)
    {
        _s[_i++] = _r & 127;
        _r >>= 7;
    }
    while (--_i >= 0)
        if ( f_putc(_s[_i] | (_i ? 128 : 0), fp) == EOF )
            progerrno("compress1 failed to write: ");
}

/* Compress a number and writes it to a buffer */
/* buffer must be previously allocated */
/* returns the decreased buffer pointer after storing the compressed number in it */
unsigned char *SW_compress2(int num, unsigned char *buffer)
{
    int     _i = num;

    /* Trivial case: 0 */
    if(!_i)
    {
        *buffer-- = 0; 
        return 0;
    }

    /* Any other case ... */
    while (_i)
    {
        *buffer = _i & 127;
        if (_i != num)
            *buffer |= 128;
        _i >>= 7;
        buffer--;
    }

    return buffer;
}

/* Compress a number and writes it to a buffer */
/* buffer must be previously allocated */
/* returns the incrmented buffer pointer after storing the compressed number in it */
unsigned char *compress3(int num, unsigned char *buffer)
{
    int     _i = 0,
            _r = num;
    unsigned char _s[MAXINTCOMPSIZE];

    /* Trivial case: 0 */
    if(!_r)
    {
        *buffer++ = 0;
        return buffer;
    }

    /* Any other case ... */
    while (_r)
    {
        _s[_i++] = _r & 127;
        _r >>= 7;
    }
    while (--_i >= 0)
        *buffer++ = (_s[_i] | (_i ? 128 : 0));

    return buffer;
}

/* Uncompress a number from a file */
int     uncompress1(FILE * fp, int (*f_getc) (FILE *))
{
    int     _c;
    int     num = 0;
    
    /* printf("uncompress: _c = %d num = %d\n", _c, num); */

    do
    {
        _c = (int) f_getc(fp);
        
        if (_c < 0) {
             progerr("_c is < 0 in uncompress1()\n");
        }
        
        num <<= 7;
        num |= _c & 127;
        
        /* printf("uncompress: _c = %d num = %d\n", _c, num); */
        
        if (!num)
            break;
    }
    while (_c & 128);

    return num;
}

/* same routine but this works with a memory forward buffer instead of file */
/* it also increases the buffer pointer */
int     uncompress2(unsigned char **buffer)
{
    int     _c;
    int     num = 0;
    unsigned char *p = *buffer;

    do
    {
        _c = (int) ((unsigned char) *p++);
        num <<= 7;
        num |= _c & 127;
        if (!num)
            break;
    }
    while (_c & 128);

    *buffer = p;
    return num;
}

/* Routines to make long integers portable */
unsigned long PACKLONG(unsigned long num)
{
    unsigned long tmp = 0L;
    unsigned char *s;
    int sz_long = sizeof(unsigned long);  

    if (num && LITTLE_ENDIAN)
    {
        s = (unsigned char *) &tmp;
        while(sz_long)
            *s++ = (unsigned char) ((num >> ((--sz_long)<<3)) & 0xFF);

        return tmp;
    }
    return num;
}

/* Same routine - Packs long in buffer */
void    PACKLONG2(unsigned long num, unsigned char *s)
{
    int sz_long = sizeof(unsigned long);

    if(LITTLE_ENDIAN)
    {
        while(sz_long)
            *s++ = (unsigned char) ((num >> ((--sz_long)<<3)) & 0xFF);
    }
    else
    {
        memcpy(s,(unsigned char *)&num,sz_long);
    }
}


unsigned long UNPACKLONG(unsigned long num)
{
    int sz_long = sizeof(unsigned long);
	unsigned long tmp = 0;
    unsigned char *s = (unsigned char *) #

    if(LITTLE_ENDIAN)
    {
        while(sz_long)
            tmp += *s++ << ((--sz_long)<<3);
        return tmp;
    }
    return num;
}

/* Same macro - UnPacks long from buffer */
unsigned long UNPACKLONG2(unsigned char *s)
{
    int sz_long = sizeof(unsigned long);
	unsigned long tmp = 0;

    if(LITTLE_ENDIAN)
    {
        while(sz_long)
            tmp += *s++ << ((--sz_long)<<3);
    }
    else
    {
        memcpy((unsigned char *)&tmp,s,sz_long);
    }
    return tmp;
}

/* 2003/10/28 jmruiz - Routines to make file offsets portable */
/* sw_off_t is a type defined in config.h to be 32 or 64 bit */
sw_off_t PACKFILEOFFSET(sw_off_t num)
{
    sw_off_t tmp = (sw_off_t)0;
    unsigned char *s;
    int sz_off_t = sizeof(sw_off_t);

    if (num && LITTLE_ENDIAN)
    {
        s = (unsigned char *) &tmp;
        while(sz_off_t)
            *s++ = (unsigned char) ((num >> (sw_off_t)((--sz_off_t)<<3)) & (sw_off_t)0xFF);

        return tmp;
    }
    return num;
}

/* Same routine - Packs file offset into a buffer */
void    PACKFILEOFFSET2(sw_off_t num, unsigned char *s)
{
    int sz_off_t = sizeof(sw_off_t);

    if(LITTLE_ENDIAN)
    {
        while(sz_off_t)
            *s++ = (unsigned char) ((num >> (sw_off_t)((--sz_off_t)<<3)) & (sw_off_t)0xFF);
    }
    else
    {
        memcpy(s,(unsigned char *)&num,sz_off_t);
    }
}

/* Routine to unpack a file offset */
sw_off_t UNPACKFILEOFFSET(sw_off_t num)
{
    int sz_off_t = sizeof(sw_off_t);
    sw_off_t tmp = (sw_off_t)0;
    unsigned char *s = (unsigned char *) #

    if(LITTLE_ENDIAN)
    {
        while(sz_off_t)
            tmp += (sw_off_t)((sw_off_t)(*s++) << (sw_off_t)((--sz_off_t)<<3));
        return tmp;
    }
    return num;
}

/* Same routine - UnPacks file offset from buffer */
sw_off_t UNPACKFILEOFFSET2(unsigned char *s)
{
    int sz_off_t = sizeof(sw_off_t);
    sw_off_t tmp = (sw_off_t)0;

    if(LITTLE_ENDIAN)
    {
        while(sz_off_t)
            tmp += (sw_off_t)((sw_off_t)(*s++) << (sw_off_t)((--sz_off_t)<<3));
    }
    else
    {
        memcpy((unsigned char *)&tmp,s,sz_off_t);
    }
    return tmp;
}

/***********************************************************************************
*   09/00 Jose Ruiz 
*   Function to compress location data in memory 
*
*   Compresses a LOCATION entry
*
*   A single position LOCATION goes from 20 to 3 bytes.
*   three positions goes from 28 to 5.
* 
************************************************************************************/

#define IS_FLAG              0x80  /* Binary 10000000 */   
#define COMMON_STRUCTURE     0x60  /* Binary 01100000 */
#define COMMON_IN_FILE       0x20  /* Binary 00100000 */
#define COMMON_IN_HTML_BODY  0x40  /* Binary 01000000 */
#define POS_4_BIT            0x10  /* Binary 00010000 */
/************************************************************************

From Jose on Feb 13, 2002

IS_FLAG is to indicate that the byte is a flag. As far as I remember, I 
needed it to avoid null values.

When COMMON_STRUCTURE is on, this means that all the positions
have the same structure value. This helps a lot with non html
files and can save a lot of space.

When FREQ_AND_POS_EQ_1 is on, this means that freq is 1 and
pos[0]=1. Mmm, I am not sure if this is very useful now. Let me
explain better. This was useful for xml files with fields that contains
just one value. For example:
<code>00001</code>
<date>20001231</date>
But, now, I am not sure if this is useful because long time ago I
changed the position counter to not be reseted after a each field
change.
I need to check this.

POS_4_BIT indicates that all positions are within 16 positions
of each other and can thus be stored as 2 per byte.  Position numbers are
stored as a delta from the previous position.

Here's indexing /usr/doc:


23840 files indexed.  177638538 total bytes.  19739102 total words.
Elapsed time: 00:04:42 CPU time: 00:03:09
Indexing done!
      4 bit = 843,081 (total length = 10,630,425) 12 bytes/chunk
  not 4 bit = 13,052,904 (length 83,811,498) 6 bytes/chunk

I wonder if storing the initial postion would improve that much.




*************************************************************************/





void compress_location_values(unsigned char **buf,unsigned char **flagp,int filenum,int frequency, unsigned int *posdata)
{
    unsigned char *p = *buf;
    unsigned char *flag;
    int structure = GET_STRUCTURE(posdata[0]);
    int common_structure = COMMON_STRUCTURE;
    int i;

    /* Make room for flag and init it */
    flag = p;
    *flagp = p;
    p++;

    *flag = IS_FLAG;

    /* Add file number */
    p = compress3(filenum, p);


    /* Check for special case frequency == 1 and position[0] < 128  && structure == IN_FILE */
    if(frequency == 1 && (GET_POSITION(posdata[0]) < 128) && structure == IN_FILE)
    {
        /* Remove IS_FLAG and store position in the lower 7 bits */
        /* In this way we have 0bbbbbbb in *flag 
        ** where bbbbbbb is the position and the leading 0 bit
        ** indicates that frequency is 1 and position is < 128 */
        *flag = (unsigned char) ((int)(GET_POSITION(posdata[0])));
    }
    else
    {
        /* Otherwise IS_FLAG is set */
        /* Now, let's see if all positions have the same structure to 
        ** get better compression */
        for(i=1;i<frequency;i++)
        {
            if(structure != GET_STRUCTURE(posdata[i]))
            {
                common_structure = 0;
                break;
            }
        }
        if(frequency < 16)
             (*flag) |= frequency; /* Store freequency in flag - low 4 bits */
        else                       
             p = compress3(frequency, p); /* Otherwise, leave frequency "as is" */
        /* Add structure if it is equal for all positions */
        if(common_structure)
        {
            switch(structure)
            {
                case IN_FILE:
                    *flag |= COMMON_IN_FILE; 
                     break;     

                case IN_BODY | IN_FILE:
                    *flag |= COMMON_IN_HTML_BODY; 
                     break;     

                default:         
                    *p++ = (unsigned char) structure;
                    *flag |= COMMON_STRUCTURE;
                    break;
            }
        }
    }
    *buf = p;
}

void compress_location_positions(unsigned char **buf,unsigned char *flag,int frequency, unsigned int *posdata)
{
    unsigned char *p = *buf;
    int i, j;

    if((*flag) & IS_FLAG)
    { 
        (*flag) |= POS_4_BIT;

        for(i = frequency - 1; i > 0 ; i--)
        {
            posdata[i] = SET_POSDATA(GET_POSITION(posdata[i]) - GET_POSITION(posdata[i-1]),GET_STRUCTURE(posdata[i]));
            if( GET_POSITION(posdata[i]) >= 16)
                (*flag) &= ~POS_4_BIT; 
        }

        /* Always write first position "as is" */
        p = compress3(GET_POSITION(posdata[0]), p);

        /* write the position data starting at 1 */
        if((*flag) & POS_4_BIT)
        {
            for (i = 1, j = 0; i < frequency ; i++, j++)
            {
                if(j % 2)
                    p[j/2] |= (unsigned char) GET_POSITION(posdata[i]);
                else
                    p[j/2] = (unsigned char) GET_POSITION(posdata[i]) << 4;
            }
            p += ((j + 1)/2);
        }
        else
        {
            for (i = 1; i < frequency; i++)
                p = compress3(GET_POSITION(posdata[i]), p);
        }

        /* Write out the structure bytes */
        if(! (*flag & COMMON_STRUCTURE))
            for(i = 0; i < frequency; i++)
                *p++ = (unsigned char) GET_STRUCTURE(posdata[i]);

        *buf = p;
    }
}

static unsigned char *compress_location(SWISH * sw, LOCATION * l)
{
    unsigned char *p,
           *q;
    int     i,
            max_size;
    unsigned char *flag;
    struct MOD_Index *idx = sw->Index;

    /* check if the work buffer is long enough */
    /* just to avoid bufferoverruns */
    /* In the worst case and integer will need MAXINTCOMPSIZE bytes */
    /* but fortunatelly this is very uncommon */

/* 2002/01 JMRUIZ
** Added an extra byte (MAXINTCOMPSIZE+1) for each position's structure
*/

    max_size = sizeof(unsigned char) + sizeof(LOCATION *) + (((sizeof(LOCATION) / sizeof(int) + 1) + (l->frequency - 1)) * (MAXINTCOMPSIZE + sizeof(unsigned char)));


    /* reallocate if needed */
    if (max_size > idx->len_compression_buffer)
    {
        idx->len_compression_buffer = max_size + 200;
        idx->compression_buffer = erealloc(idx->compression_buffer, idx->len_compression_buffer);
    }


    /* Pointer to the buffer */
    p = idx->compression_buffer;

    /* Add extra bytes for handling linked list */
//***JMRUIZ

    memcpy(p,&l->next,sizeof(LOCATION *));
    p += sizeof(LOCATION *);

    /* Add the metaID */
    p = compress3(l->metaID,p);

    compress_location_values(&p,&flag,l->filenum,l->frequency, l->posdata);
    compress_location_positions(&p,flag,l->frequency,l->posdata);



    /* Get the length of all the data */
    i = p - idx->compression_buffer;


    /* Did we underrun our buffer? */
    if (i > idx->len_compression_buffer)
        progerr("Internal error in compress_location routine");


    q = (unsigned char *) Mem_ZoneAlloc(idx->currentChunkLocZone, i);
    memcpy(q, idx->compression_buffer, i);

    return (unsigned char *) q;
}


void uncompress_location_values(unsigned char **buf,unsigned char *flag, int *filenum,int *frequency)
{
    unsigned char *p = *buf;

    *frequency = 0;

    *flag = *p++;

    if(!((*flag) & IS_FLAG))
    {
        *frequency = 1;
    }
    else
        (*frequency) |= (*flag) & 15;   /* Binary 00001111 */

    *filenum = uncompress2(&p);

    if(! (*frequency))
        *frequency = uncompress2(&p);

    *buf = p;
}

unsigned long four_bit_count = 0;
unsigned long four_bit_bytes = 0;
unsigned long not_four = 0;
unsigned long not_four_bytes = 0;
unsigned long four_bit_called = 0;
unsigned long not_four_called;


void uncompress_location_positions(unsigned char **buf, unsigned char flag, int frequency, unsigned int *posdata)
{
    int i, j, tmp;
    unsigned char *p = *buf;
    int common_structure = 0;
    int structure = 0;

    /* Check for special case frequency == 1 and position[0] < 128 and structure == IN_FILE */
    if (!(flag & IS_FLAG))
    {
        structure = IN_FILE;
        posdata[0] =  SET_POSDATA((int)(flag),structure);
    }
    else
    {
        /* Check for common structure */
        if ((tmp =(flag & COMMON_STRUCTURE)))
        {
            common_structure = COMMON_STRUCTURE;
            switch(tmp)
            {
                case COMMON_IN_FILE:
                    structure = IN_FILE;
                    break;

                case COMMON_IN_HTML_BODY:
                    structure = IN_FILE | IN_BODY;
                    break;

                default:
                    structure = (int)((unsigned char) *p++);
                    break;
            }
        }

        /* First position is always "as is" */
        posdata[0] = (unsigned int)uncompress2(&p);

        /* Check if positions where stored as two values per byte or the old "compress" style */
        if(flag & POS_4_BIT)
        {
            for (i = 1, j = 0; i < frequency; i++, j++)
            {
                if(j%2)
                    posdata[i] = (unsigned int)((unsigned int)p[j/2] & (unsigned int)0x0F);
                else
                    posdata[i] = (unsigned int)((unsigned int)p[j/2] >> (unsigned int)4);
            }
            p += ((j + 1)/2);
        }
        else
        {
            for (i = 1; i < frequency; i++)
            {
                tmp = uncompress2(&p);
                posdata[i] = (unsigned int)tmp;
            }
        }
        /* Position were compressed incrementally. So restore them */
        for(i = 1; i < frequency; i++)
            posdata[i] += posdata[i-1];

        /* Get structure */
        for(i = 0; i < frequency; i++)
        {
            if(!common_structure)
                structure = (int)((unsigned char) *p++);

            posdata[i] = SET_POSDATA(posdata[i],structure);
        }
    }
    /* Update buffer pointer */
    *buf = p;
}


/* 09/00 Jose Ruiz
** Compress all non yet compressed location data of an entry
*/
void    CompressCurrentLocEntry(SWISH * sw, ENTRY * e)
{
    LOCATION *l, *prev, *next, *comp;
   
    for(l = e->currentChunkLocationList,prev = NULL ; l != e->currentlocation; )
    {
        next = l->next;
        comp = (LOCATION *) compress_location(sw, l);
        if(l == e->currentChunkLocationList)
            e->currentChunkLocationList =comp;
        if(prev)
            memcpy(prev, &comp, sizeof(LOCATION *));   /* Use memcpy to avoid alignment problems */
        prev = comp;
        l = next;        
    } 
    e->currentlocation = e->currentChunkLocationList;
}


/* 2002/11 jmruiz
** Simple routine to compress worddata using zlib where available
** 
** 2004/06 jmruiz 
** economic flag is for use less RAM
** Trying to compress worddata needs some extra RAM in order to call
** zlib's compress2 routine because this routine needs a buffer
** for storing the compressed data.
** So, if someone is trying to index in economic mode (-e switch)
** he can experiment the annoying "Out of RAM" message if his computer
** does not have enough RAM for allocating that buffer.
** In order to fix it, I have tried the low level deflate routines of zlib's
** (deflateInit, deflate and deflateEnd) with two local buffers:
** local_buffer_in and local_buffer_out.
** The original data is being copied in chunks to local_buffer_in
** and compressed to local_buffer_out after each call to zlib's deflate
** routine. The compressed chunks are ithen copied to the original worddata
** area taking care of not to overrun the buffer.
**
** On exit returns the new size of the compressed buffer
*/
int compress_worddata(unsigned char *wdata,int wdata_size, int economic)
{
#ifndef HAVE_ZLIB
    return wdata_size;
#else
    unsigned char  *WDataBuf;     /* For compressing and uncompressing */
    uLongf          dest_size;
    int             zlib_status = 0;
    int             off_wdata, len_chunk, off_out;
    unsigned char   local_buffer_out[Z_BUFSIZE];/* Just to avoid emalloc/efree overhead and for deflate method */
    unsigned char   local_buffer_in[Z_BUFSIZE];/* Just to avoid emalloc/efree overhead*/


    /* Don't bother compressing smaller items */
    if ( wdata_size < MIN_WORDDATA_COMPRESS_SIZE )
        return wdata_size;

    if(economic)
    {                    /* -e switch is set. Use deflate* routines */
        z_stream z;      /* zlib compression stream */

        z.zalloc = (alloc_func)0;   /* init zlib compression stream */
        z.zfree = (free_func)0;
        z.opaque = (voidpf)0;

        if(Z_OK != deflateInit(&z, 9))
            return wdata_size;

        z.avail_in  = 0;
        z.next_out = (Bytef*)local_buffer_out;
        z.avail_out  = Z_BUFSIZE;

        dest_size = 0;
        off_wdata = 0;
        off_out = 0;
        for(;;)
        {
            if (off_wdata == wdata_size)
                break;     /* No more data */ 
            else
            {
                if (z.avail_in==0) 
                {            /* Fill local_buffer_in with more data */
                    len_chunk = Min(Z_BUFSIZE,(wdata_size - off_wdata));
                    if(!len_chunk)  /* No more data to compress: exit */
                        break;     
                    memcpy(local_buffer_in,wdata + off_wdata, len_chunk);
                    off_wdata += len_chunk;
                    z.next_in = local_buffer_in;
                    z.avail_in = len_chunk;
                }
            }
                       /* Compress local_buffer_in */
                       /* Z_NO_FLUSH flag achieves better results */
            zlib_status = deflate(&z, Z_NO_FLUSH);
                       /* get the size of compressed data */
            len_chunk = Z_BUFSIZE - z.avail_out;
            if(len_chunk)
            {
                       /* Check for buffer overrun */
                if((off_out + len_chunk) >= off_wdata)
                {
                    /* We are in buffer overrun condition but if we are in 
                    ** the first chunk we can recover the original data
                    ** from local_buffer_in
                    */
                    if(off_wdata <= Z_BUFSIZE)
                    {
                        deflateEnd(&z);
                        memcpy(wdata,local_buffer_in,wdata_size);   /* Do nothing - Retains data uncompressed */
                        return wdata_size;   
                    }
                    else
                        progerr("WordData Compression Error. Unable to compress worddata in economic mode. Remove switch -e from your command line or add \"CompressPositions Yes\" to your config file");
                }
                       /* Copy the compressed data onto the original buffer */
                       /* off_out contains the current length of the total 
                       ** compressed data
                       */
                memcpy(wdata + off_out, local_buffer_out, len_chunk);
                off_out += len_chunk;
            }
                       /* reset local_buffer_out to next step */
            z.next_out = (Bytef*)local_buffer_out;
            z.avail_out  = Z_BUFSIZE;
            if(zlib_status != Z_OK)
                break;
        }

                       /* We have used Z_NO_FLUSH to achieve better
                       ** results. So, we have to issue a deflate with
                       ** Z_FINISH flag to flush the pending data
                       ** in local_buffer_out
                       */
        for(;;)
        {
            zlib_status = deflate(&z, Z_FINISH);
                       /* get the size of compressed data */
            len_chunk = Z_BUFSIZE - z.avail_out;
            if(len_chunk)
            {
                       /* Check for buffer overrun */
                if((off_out + len_chunk) >= off_wdata)
                {
                    /* We are in buffer overrun condition but if we are in 
                    ** the first chunk we can recover the original data
                    ** from local_buffer_in
                    */
                    if(off_wdata <= Z_BUFSIZE)
                    {
                        deflateEnd(&z);
                        memcpy(wdata,local_buffer_in,wdata_size);   /* Do nothing - Retains data uncompressed */
                        return wdata_size;   
                    }
                    else
                        progerr("WordData Compression Error. Unable to compress worddata in economic mode. Remove switch -e from your command line or add \"CompressPositions Yes\" to your config file");
                }
                       /* Copy the compressed data onto the original buffer */
                       /* off_out contains the current length of the total 
                       ** compressed data
                       */
                memcpy(wdata + off_out, local_buffer_out, len_chunk);
                off_out += len_chunk;
            }
                       /* reset local_buffer_out to next step */
            z.next_out = (Bytef*)local_buffer_out;
            z.avail_out  = Z_BUFSIZE;
            if(zlib_status != Z_OK)
                break;
        }
        deflateEnd(&z); 
        dest_size = off_out;
    }
    else
    {
        /* Buffer should be +1% + a few bytes. */
        dest_size = (uLongf)(wdata_size + ( wdata_size / 100 ) + 1000);  // way more than should be needed

        /* Get an output buffer */
        if( dest_size > Z_BUFSIZE )
            WDataBuf = (unsigned char *) emalloc((int)dest_size );
        else
            WDataBuf = local_buffer_out;

        zlib_status = compress2((Bytef *)WDataBuf, &dest_size, wdata, wdata_size, 9);
        if ( zlib_status != Z_OK )
            progerr("WordData Compression Error.  zlib compress2 returned: %d  Worddata size: %d compress buf size: %d", zlib_status, wdata_size, (int)dest_size);

        /* Make sure it's compressed enough -- should check that destsize is not > MAXINT */
        if ( (int)dest_size < wdata_size )
        {
            memcpy(wdata,WDataBuf,(int)dest_size);
        }
        else
        {
            dest_size = wdata_size;
        }

        if ( WDataBuf != local_buffer_out)
            efree(WDataBuf);
    }

    return (int)dest_size;
#endif
}


/* 2002/11 jmruiz
** Routine to uncompress worddata
*/
void uncompress_worddata(unsigned char **buf, int *buf_size, int saved_bytes)
{
#ifdef HAVE_ZLIB
    unsigned char *new_buf;
    int             zlib_status = 0;
    uLongf          new_buf_size = (uLongf)(*buf_size + saved_bytes);

    if(! saved_bytes)     /* nothing to do */
        return;   
    new_buf= (unsigned char *) emalloc(*buf_size + saved_bytes);
    zlib_status = uncompress(new_buf, &new_buf_size, *buf, (uLongf)buf_size );
    if ( zlib_status != Z_OK )
    {
        // $$$ make sure this works ok if returning null $$$
        progwarn("Failed to uncompress Property. zlib uncompress returned: %d.  uncompressed size: %d buf_len: %d saved_bytes: %d\n",
            zlib_status, new_buf_size, *buf_size, saved_bytes );
        return;
    }
    efree(*buf);
    *buf_size = (int)new_buf_size;
    *buf = new_buf;
#else
    if ( saved_bytes )
        progerr("The index was created with zlib compression.\n This version of swish was not compiled with zlib");
#endif
}


/* 2002/09 jmruiz
** This routine changes longs in worddata by shorter compressed
** numbers.
**
** Here are two reasons for using compressed numbers in worddata
** instead of longs:
**   - Compressed numbers are more portable: longs are usually 4 bytes
**     long in a 32 bit machine but in a 64 bit alpha they are 8 bytes
**     long (this a waste of space).
**   - The obvious one is that compressed numbers use less disk space
**
** BTW, Any change in worddata will also affect to dump.c, merge.c and search.c
** (getfileinfo routine).
**
**  worddata has the following format before entering the routine
**  <tfreq><metaID><nextposmetaID><data><metaID><nextposmetaID><data>...
**
**  Entering this routine nextposmetaID is the offset to next metaid
**  in bytes starting to count them from the begining of worddata.
**  It is a packed long number (sizeof(long) bytes).
**
**  Exiting this routine, nextposmetaID has changed to be the size of
**  the data block and is stored as a compressed number.
**
**  In other words, worddata has the following format:
**  <tfreq><metaID><data_len><data><metaID><data_len><data>...
**
*/
void    remove_worddata_longs(unsigned char *worddata,int *sz_worddata)
{
    unsigned char *src,*dst;   //source and dest pointers for worddata
    unsigned int metaID, tfrequency, data_len;
    unsigned long nextposmetaID;

    src = worddata;

    /* Jump over tfrequency and get first metaID */
    tfrequency = uncompress2(&src);     /* tfrequency */
    metaID = uncompress2(&src);     /* metaID */
    dst = src;

    while(1)
    {
        /* Get offset to next one */
        nextposmetaID = UNPACKLONG2(src);
        src += sizeof(long);

        /* Compute data length for this metaID */
        data_len = (int)nextposmetaID - (src - worddata);

        /* Store data_len as a compressed number */
        dst = compress3(data_len,dst);

        /* This must not happen. Anyway check it */
        if(dst > src)
            progerr("Internal error in remove_worddata_longs");

        /* dst may be smaller than src. So move the data */
        memcpy(dst,src,data_len);

        /* Increase pointers */
        src += data_len;
        dst += data_len;

        /* Check if we are at the end of the buffer */
        if ((src - worddata) == *sz_worddata)
            break;   /* End of worddata */

        /* Get next metaID */
        metaID = uncompress2(&src);
        dst = compress3(metaID,dst);
    }
    /* Adjust to new size */
    *sz_worddata = dst - worddata;
}
�����������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/merge.h���������������������������������������������������������������������������0000664�0000771�0001750�00000003145�11166010110�012055� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
** Copyright (C) 1995, 1996, 1997, 1998 Hewlett-Packard Company
** Originally by Kevin Hughes, kev@kevcom.com, 3/11/94
**

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL


**-----------------------------------------------------------
** Added remapVar and initmapentrylist to fix the merge option -M
** G. Hill 3/7/97
**
*/
#ifndef __HasSeenModule_Merge
#define __HasSeenModule_Merge  1

/* used by docprop.c, but maybe should be in merge.c */
struct metaMergeEntry {
	struct metaMergeEntry* next;
	char* metaName;
	int oldMetaID;
	int newMetaID;
	int metaType;
};


/* called by swish.c */
void readmerge (char *, char *, char *, int);
void merge_indexes( SWISH *sw_input, SWISH *sw_output );


#endif


���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/list.c����������������������������������������������������������������������������0000664�0000771�0001750�00000017347�11166010110�011735� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
** Copyright (C) 1995, 1996, 1997, 1998 Hewlett-Packard Company
** Originally by Kevin Hughes, kev@kevcom.com, 3/11/94
**

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 15:09:27 CDT 2005
** added GPL




**
** 1998-07-04  addfilter   ( R. Scherg)
** 2001-02-28  rasc  -- addfilter removed here
**
*/

#include "swish.h"
#include "list.h"
#include "mem.h"
#include "metanames.h"
#include "swstring.h"
#include "hash.h"


struct swline *newswline_n(char *line, int size)
{
struct swline *newnode;

    newnode = (struct swline *)emalloc(sizeof(struct swline) + size);
    strncpy(newnode->line,line,size);
    newnode->next = NULL;

    return newnode;
}

struct swline *newswline(char *line)
{
struct swline *newnode;
int size = strlen(line);  /* Compute strlen only once */

    newnode = (struct swline *)emalloc(sizeof(struct swline) + size);
    memcpy(newnode->line,line,size + 1);
    newnode->next = NULL;

    return newnode;
}

struct swline *addswline(struct swline *rp, char *line)
{
struct swline *newnode;

    newnode = newswline(line);

    if (rp == NULL)
        rp = newnode;
    else
        rp->other.nodep->next = newnode;
    
    rp->other.nodep = newnode;
    
    return rp;
}

struct swline *dupswline(struct swline *rp)
{
struct swline *tmp=NULL, *tmp2=NULL;
struct swline *newnode;

    while(rp)
    {
        newnode = newswline(rp->line);
        
        if(!tmp)
            tmp = newnode;
        else
            tmp2->next=newnode;
        tmp2 = newnode;
        rp=rp->next;
    }
    tmp->other.nodep = tmp2;  /* Put last in nodep */
    return tmp;
}

void addindexfile(SWISH *sw, char *line)
{
    IndexFILE *head = sw->indexlist;
    IndexFILE *indexf = (IndexFILE *) emalloc(sizeof(IndexFILE));

    memset( indexf, 0, sizeof(IndexFILE) );

    indexf->sw = sw;  /* save parent object */
    indexf->line = estrdup(line);
    init_header(&indexf->header);
    indexf->next = NULL;


    /* Add default meta names -- these will be replaced if reading from an index file */
    add_default_metanames(indexf);

    /* Add index to end of list */

    if ( head == NULL )  /* first entry? */
        sw->indexlist = head = indexf;
    else
        head->nodep->next = indexf;  /* point the previous last one to the new last one */
    
    head->nodep = indexf;  /* set the last pointer */
}


void freeswline(struct swline *tmplist)
{
    struct swline *tmplist2;

    while (tmplist) {
        tmplist2 = tmplist->next;
        efree(tmplist);
        tmplist = tmplist2;
    }
}




void init_header(INDEXDATAHEADER *header)
{

    header->lenwordchars=header->lenbeginchars=header->lenendchars=header->lenignorelastchar=header->lenignorefirstchar=header->lenbumpposchars=MAXCHARDEFINED;

    header->wordchars = (char *)emalloc(header->lenwordchars + 1);
        header->wordchars = SafeStrCopy(header->wordchars,WORDCHARS,&header->lenwordchars);
        sortstring(header->wordchars);  /* Sort chars and remove dups */
        makelookuptable(header->wordchars,header->wordcharslookuptable);

    header->beginchars = (char *)emalloc(header->lenbeginchars + 1);
        header->beginchars = SafeStrCopy(header->beginchars,BEGINCHARS,&header->lenbeginchars);
        sortstring(header->beginchars);  /* Sort chars and remove dups */
        makelookuptable(header->beginchars,header->begincharslookuptable);

    header->endchars = (char *)emalloc(header->lenendchars + 1);
        header->endchars = SafeStrCopy(header->endchars,ENDCHARS,&header->lenendchars);
        sortstring(header->endchars);  /* Sort chars and remove dups */
        makelookuptable(header->endchars,header->endcharslookuptable);

    header->ignorelastchar = (char *)emalloc(header->lenignorelastchar + 1);
        header->ignorelastchar = SafeStrCopy(header->ignorelastchar,IGNORELASTCHAR,&header->lenignorelastchar);
        sortstring(header->ignorelastchar);  /* Sort chars and remove dups */
        makelookuptable(header->ignorelastchar,header->ignorelastcharlookuptable);

    header->ignorefirstchar = (char *)emalloc(header->lenignorefirstchar + 1);
        header->ignorefirstchar = SafeStrCopy(header->ignorefirstchar,IGNOREFIRSTCHAR,&header->lenignorefirstchar);
        sortstring(header->ignorefirstchar);  /* Sort chars and remove dups */
        makelookuptable(header->ignorefirstchar,header->ignorefirstcharlookuptable);


    header->bumpposchars = (char *)emalloc(header->lenbumpposchars + 1);
    header->bumpposchars[0]='\0';

    header->lenindexedon=header->lensavedasheader=header->lenindexn=header->lenindexd=header->lenindexp=header->lenindexa=MAXSTRLEN;
    header->indexn = (char *)emalloc(header->lenindexn + 1);header->indexn[0]='\0';
    header->indexd = (char *)emalloc(header->lenindexd + 1);header->indexd[0]='\0';
    header->indexp = (char *)emalloc(header->lenindexp + 1);header->indexp[0]='\0';
    header->indexa = (char *)emalloc(header->lenindexa + 1);header->indexa[0]='\0';
    header->savedasheader = (char *)emalloc(header->lensavedasheader + 1);header->savedasheader[0]='\0';
    header->indexedon = (char *)emalloc(header->lenindexedon + 1);header->indexedon[0]='\0';

    header->ignoreTotalWordCountWhenRanking = 1;
    header->minwordlimit = MINWORDLIMIT;
    header->maxwordlimit = MAXWORDLIMIT;

    makelookuptable("",header->bumpposcharslookuptable); 

    BuildTranslateChars(header->translatecharslookuptable,(unsigned char *)"",(unsigned char *)"");


    /* this is to ignore numbers */
    header->numberchars_used_flag = 0;  /* not used by default*/


    /* initialize the stemmer/fuzzy structure to None */
    header->fuzzy_data = set_fuzzy_mode( header->fuzzy_data, "None" );
}


void free_header(INDEXDATAHEADER *header)
{
    if(header->lenwordchars) efree(header->wordchars);
    if(header->lenbeginchars) efree(header->beginchars);
    if(header->lenendchars) efree(header->endchars);
    if(header->lenignorefirstchar) efree(header->ignorefirstchar);
    if(header->lenignorelastchar) efree(header->ignorelastchar);
    if(header->lenindexn) efree(header->indexn);
    if(header->lenindexa) efree(header->indexa);
    if(header->lenindexp) efree(header->indexp);
    if(header->lenindexd) efree(header->indexd);
    if(header->lenindexedon) efree(header->indexedon);        
    if(header->lensavedasheader) efree(header->savedasheader);    
    if(header->lenbumpposchars) efree(header->bumpposchars);

    
    /* Free the hashed word arrays */
    free_word_hash_table( &header->hashstoplist );
    free_word_hash_table( &header->hashbuzzwordlist );
    free_word_hash_table( &header->hashuselist );
    

    /* $$$ temporary until metas and props are seperated */
    if ( header->propIDX_to_metaID )
        efree( header->propIDX_to_metaID );

    if ( header->metaID_to_PropIDX )
        efree( header->metaID_to_PropIDX );

    /* free up the stemmer */
    free_fuzzy_mode( header->fuzzy_data );


#ifndef USE_BTREE
    if ( header->TotalWordsPerFile )
        efree( header->TotalWordsPerFile );
#endif

}



�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/list.h����������������������������������������������������������������������������0000664�0000771�0001750�00000003033�11166010110�011725� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
** Copyright (C) 1995, 1996, 1997, 1998 Hewlett-Packard Company
** Originally by Kevin Hughes, kev@kevcom.com, 3/11/94
**

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL


*/

#ifdef __cplusplus
extern "C" {
#endif

struct swline *newswline_n(char *line, int size);
struct swline *newswline(char *line);
struct swline *addswline (struct swline *rp, char *line);
struct swline *dupswline (struct swline *rp);
void addindexfile(struct SWISH *sw, char *line);
void freeswline (struct swline *ptr);
void init_header (INDEXDATAHEADER *header);
void free_header (INDEXDATAHEADER *header);

#ifdef __cplusplus
}
#endif /* __cplusplus */

�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/txt.h�����������������������������������������������������������������������������0000664�0000771�0001750�00000002130�11166010110�011566� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: txt.h 1736 2005-05-12 15:41:22Z karman $
**


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:15:43 CDT 2005
** added GPL

*/

int countwords_TXT (SWISH *sw, FileProp *fprop, FileRec *fi, char *buffer);

����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/db_read.c�������������������������������������������������������������������������0000775�0000771�0001750�00000041103�11166010110�012330� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*

$Id: db_read.c 1945 2007-10-22 14:54:07Z karpet $
    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 15:51:39 CDT 2005
** added GPL


**
** 2001-05-07 jmruiz init coding
**
*/

#include "swish.h"
#include "mem.h"
#include "swstring.h"
#include "index.h"
#include "hash.h"
#include "date_time.h"
#include "compress.h"
#include "error.h"
#include "metanames.h"
#include "db.h"
#include "db_native.h"
#include "stemmer.h"
// #include "db_berkeley_db.h"

#ifndef min
#define min(a, b)    (a) < (b) ? a : b
#endif

/*
  -- init structures for this module
*/

void initModule_DB (SWISH  *sw)
{
          /* Allocate structure */
   initModule_DBNative(sw);
   // initModule_DB_db(sw);
   return;
}


/*
  -- release all wired memory for this module
*/

void freeModule_DB (SWISH *sw)
{
  freeModule_DBNative(sw);
  // freeModule_DB_db(sw);
  return;
}



/* ---------------------------------------------- */





static void load_word_hash_from_buffer(WORD_HASH_TABLE *table_ptr, char *buffer);


/* 04/2002 jmruiz
** Function to read all word's data from the index DB
*/





sw_off_t read_worddata(SWISH * sw, ENTRY * ep, IndexFILE * indexf, unsigned char **buffer, int *sz_buffer)
{
sw_off_t wordID;
char *word = ep->word;
int saved_bytes = 0;

    DB_InitReadWords(sw, indexf->DB);
    DB_ReadWordHash(sw, word, &wordID, indexf->DB);

    if(!wordID)
    {    
        DB_EndReadWords(sw, indexf->DB);
        sw->lasterror = WORD_NOT_FOUND;
        *buffer = NULL;
        *sz_buffer = 0;
        return (sw_off_t)0;
   } 
   DB_ReadWordData(sw, wordID, buffer, sz_buffer, &saved_bytes, indexf->DB);
   uncompress_worddata(buffer,sz_buffer,saved_bytes);
   DB_EndReadWords(sw, indexf->DB);
   return wordID;
}





/* General read DB routines - Common to all DB */

/* Reads the file offset table in the index file.
*/

/* Reads and prints the header of an index file.
** Also reads the information in the header (wordchars, beginchars, etc)
*/

// $$$ to be rewritten as function = smaller code (rasc)

#define parse_int_from_buffer(num,s) (num) = UNPACKLONG2((s))
#define parse_int2_from_buffer(num1,num2,s) (num1) = UNPACKLONG2((s));(num2) = UNPACKLONG2((s+sizeof(long)))
#define parse_int3_from_buffer(num1,num2,num3,s) (num1) = UNPACKLONG2((s));(num2) = UNPACKLONG2((s+sizeof(long))); (num3) = UNPACKLONG2((s+sizeof(long)+sizeof(long)))
#define parse_int4_from_buffer(num1,num2,num3,num4,s) \
{ \
	(num1) = UNPACKLONG2((s));\
	(num2) = UNPACKLONG2((s+sizeof(long)));\
	(num3) = UNPACKLONG2((s+sizeof(long)+sizeof(long))); \
	(num4) = UNPACKLONG2((s+sizeof(long)+sizeof(long)+sizeof(long))); \
}

void    read_header(SWISH *sw, INDEXDATAHEADER *header, void *DB)
{
    int     id,
            len;
    unsigned long    tmp, tmp1, tmp2, tmp3, tmp4;
    unsigned char   *buffer;

    DB_InitReadHeader(sw, DB);

    DB_ReadHeaderData(sw, &id,&buffer,&len,DB);

    while (id)
    {
        switch (id)
        {
        case INDEXHEADER_ID:
        case INDEXVERSION_ID:
        case MERGED_ID:
        case DOCPROPENHEADER_ID:
            break;
        case WORDCHARSHEADER_ID:
            header->wordchars = SafeStrCopy(header->wordchars, (char *)buffer, &header->lenwordchars);
            sortstring(header->wordchars);
            makelookuptable(header->wordchars, header->wordcharslookuptable);
            break;
        case BEGINCHARSHEADER_ID:
            header->beginchars = SafeStrCopy(header->beginchars, (char *)buffer, &header->lenbeginchars);
            sortstring(header->beginchars);
            makelookuptable(header->beginchars, header->begincharslookuptable);
            break;
        case ENDCHARSHEADER_ID:
            header->endchars = SafeStrCopy(header->endchars, (char *)buffer, &header->lenendchars);
            sortstring(header->endchars);
            makelookuptable(header->endchars, header->endcharslookuptable);
            break;
        case IGNOREFIRSTCHARHEADER_ID:
            header->ignorefirstchar = SafeStrCopy(header->ignorefirstchar, (char *)buffer, &header->lenignorefirstchar);
            sortstring(header->ignorefirstchar);
            makelookuptable(header->ignorefirstchar, header->ignorefirstcharlookuptable);
            break;
        case IGNORELASTCHARHEADER_ID:
            header->ignorelastchar = SafeStrCopy(header->ignorelastchar, (char *)buffer, &header->lenignorelastchar);
            sortstring(header->ignorelastchar);
            makelookuptable(header->ignorelastchar, header->ignorelastcharlookuptable);
            break;

        /* replaced by fuzzy_mode Aug 20, 2002     
        case STEMMINGHEADER_ID:
            parse_int_from_buffer(tmp,buffer);
            header-> = tmp;
            break;
        case SOUNDEXHEADER_ID:
            parse_int_from_buffer(tmp,buffer);
            header->applySoundexRules = tmp;
            break;
        */

        case FUZZYMODEHEADER_ID:
            parse_int_from_buffer(tmp,buffer);
            header->fuzzy_data = get_fuzzy_mode(header->fuzzy_data, tmp);
            break;
            
        case IGNORETOTALWORDCOUNTWHENRANKING_ID:
            parse_int_from_buffer(tmp,buffer);
            header->ignoreTotalWordCountWhenRanking = tmp;
            break;
        case MINWORDLIMHEADER_ID:
            parse_int_from_buffer(tmp,buffer);
            header->minwordlimit = tmp;
            break;
        case MAXWORDLIMHEADER_ID:
            parse_int_from_buffer(tmp,buffer);
            header->maxwordlimit = tmp;
            break;
        case SAVEDASHEADER_ID:
            header->savedasheader = SafeStrCopy(header->savedasheader, (char *)buffer, &header->lensavedasheader);
            break;
        case NAMEHEADER_ID:
            header->indexn = SafeStrCopy(header->indexn, (char *)buffer, &header->lenindexn);
            break;
        case DESCRIPTIONHEADER_ID:
            header->indexd = SafeStrCopy(header->indexd, (char *)buffer, &header->lenindexd);
            break;
        case POINTERHEADER_ID:
            header->indexp = SafeStrCopy(header->indexp, (char *)buffer, &header->lenindexp);
            break;
        case MAINTAINEDBYHEADER_ID:
            header->indexa = SafeStrCopy(header->indexa, (char *)buffer, &header->lenindexa);
            break;
        case INDEXEDONHEADER_ID:
            header->indexedon = SafeStrCopy(header->indexedon, (char *)buffer, &header->lenindexedon);
            break;
        case COUNTSHEADER_ID:
            parse_int4_from_buffer(tmp1,tmp2,tmp3,tmp4,buffer);
            header->totalwords = tmp1;
            header->totalfiles = tmp2;
            header->total_word_positions = tmp3;
            header->removedfiles = tmp4;
            break;

        case TOTALWORDS_REMOVED_ID:  /* Added here instead of above to keep index compatible */
            parse_int_from_buffer(tmp, buffer);
            header->removed_word_positions = tmp;
            break;

/* removed due to patents problems
        case FILEINFOCOMPRESSION_ID:
            ReadHeaderInt(itmp, fp);
            header->applyFileInfoCompression = itmp;
            break;
*/
        case TRANSLATECHARTABLE_ID:
            parse_integer_table_from_buffer(header->translatecharslookuptable, sizeof(header->translatecharslookuptable) / sizeof(int), (char *)buffer);
            break;

        case STOPWORDS_ID:
            load_word_hash_from_buffer(&header->hashstoplist, (char *)buffer);
            break;

        case METANAMES_ID:
            parse_MetaNames_from_buffer(header, (char *)buffer);
            break;

        case BUZZWORDS_ID:
            load_word_hash_from_buffer(&header->hashbuzzwordlist, (char *)buffer);
            break;

#ifndef USE_BTREE
        case TOTALWORDSPERFILE_ID:
            if ( !header->ignoreTotalWordCountWhenRanking )
            {
                header->TotalWordsPerFile = emalloc( header->totalfiles * sizeof(int) );
                parse_integer_table_from_buffer(header->TotalWordsPerFile, header->totalfiles, (char *)buffer);
            }
            break;
#endif

        default:
            progerr("Severe index error in header.  Unknown index header ID: %d", id );
            break;
        }
        efree(buffer);
        DB_ReadHeaderData(sw, &id,&buffer,&len,DB);
    }
    DB_EndReadHeader(sw, DB);
}

/* Reads the metaNames from the index
*/

void    parse_MetaNames_from_buffer(INDEXDATAHEADER *header, char *buffer)
{
    int     len;
    int     num_metanames;
    int     metaType,
            i,
            alias,
            sort_len,
            bias,
            metaID;
    char   *word;
    unsigned char   *s = (unsigned char *)buffer;
    struct metaEntry *m;


    /* First clear out the default metanames */
    freeMetaEntries( header );

    num_metanames = uncompress2(&s);

    for (i = 0; i < num_metanames; i++)
    {
        len = uncompress2(&s);
        word = emalloc(len +1);
        memcpy(word,s,len); s += len;
        word[len] = '\0';
        /* Read metaID */
        metaID = uncompress2(&s);
        /* metaType was saved as metaType+1 */
        metaType = uncompress2(&s);

        alias = uncompress2(&s) - 1;

        sort_len = uncompress2(&s);

        bias = uncompress2(&s) - RANK_BIAS_RANGE - 1;


        /* add the meta tag */
        if ( !(m = addNewMetaEntry(header, word, metaType, metaID)))
            progerr("failed to add new meta entry '%s:%d'", word, metaID );

        m->alias = alias;
        m->rank_bias = bias;
        m->sort_len = sort_len;

        efree(word);
    }
}


static void load_word_hash_from_buffer(WORD_HASH_TABLE *table_ptr, char *buffer)
{
    int     len;
    int        num_words;
    int     i;
    char   *word = NULL;

    unsigned char   *s = (unsigned char *)buffer;

    num_words = uncompress2(&s);
    
    for (i=0; i < num_words ; i++)   
    {
        len = uncompress2(&s);
        word = emalloc(len+1);
        memcpy(word,s,len); s += len;
        word[len] = '\0';

        add_word_to_hash_table( table_ptr, word , HASHSIZE);
        efree(word);
    }
}





void parse_integer_table_from_buffer(int table[], int table_size, char *buffer)
{
    int     tmp,i;
    unsigned char    *s = (unsigned char *)buffer;

    tmp = uncompress2(&s);   /* Jump the number of elements */
    for (i = 0; i < table_size; i++)
    {
        tmp = uncompress2(&s); /* Gut all the elements */
        table[i] = tmp - 1;
    }
}

/* Used by rank.c */

int getTotalWordsInFile( IndexFILE *indexf, int filenum )
{
    if ( filenum < 1 || filenum > indexf->header.totalfiles )
        progerr("getTotalWordsInFile passed an invalied file number");

    /* This is still one too many layers */
#ifdef USE_BTREE
    return DB_CheckFileNum( indexf->sw, filenum, indexf->DB );
#else
    if ( indexf->header.ignoreTotalWordCountWhenRanking )
        progerr("Can't return total words -- index was not built with IgnoreTotalWordCountWhenRanking");
    else
        return indexf->header.TotalWordsPerFile[filenum - 1];

#endif
    return 0;  /* make the compiler quiet */
}

/*------------------------------------------------------*/
/*---------- General entry point of DB module ----------*/


void   *DB_Open (SWISH *sw, char *dbname, int mode)
{
   return sw->Db->DB_Open(sw, dbname,mode);
}

void    DB_Close(SWISH *sw, void *DB)
{
   sw->Db->DB_Close(DB);
}


int     DB_InitReadHeader(SWISH *sw, void *DB)
{
   return sw->Db->DB_InitReadHeader(DB);
}

int     DB_ReadHeaderData(SWISH *sw, int *id, unsigned char **s, int *len, void *DB)
{
   return sw->Db->DB_ReadHeaderData(id, s, len, DB);
}

int     DB_EndReadHeader(SWISH *sw, void *DB)
{
   return sw->Db->DB_EndReadHeader(DB);
}



int     DB_InitReadWords(SWISH *sw, void *DB)
{
   return sw->Db->DB_InitReadWords(DB);
}

int     DB_ReadWordHash(SWISH *sw, char *word, sw_off_t *wordID, void *DB)
{
   return sw->Db->DB_ReadWordHash(word, wordID, DB);
}

int     DB_ReadFirstWordInvertedIndex(SWISH *sw, char *word, char **resultword, sw_off_t *wordID, void *DB)
{
   return sw->Db->DB_ReadFirstWordInvertedIndex(word, resultword, wordID, DB);
}

int     DB_ReadNextWordInvertedIndex(SWISH *sw, char *word, char **resultword, sw_off_t *wordID, void *DB)
{
   return sw->Db->DB_ReadNextWordInvertedIndex(word, resultword, wordID, DB);
}

long    DB_ReadWordData(SWISH *sw, sw_off_t wordID, unsigned char **worddata, int *data_size, int *saved_bytes, void *DB)
{
   return sw->Db->DB_ReadWordData(wordID, worddata, data_size, saved_bytes, DB);
}

int     DB_EndReadWords(SWISH *sw, void *DB)
{
   return sw->Db->DB_EndReadWords(DB);
}


int     DB_CheckFileNum(SWISH *sw, int filenum, void *DB)
{
   return sw->Db->DB_CheckFileNum(filenum, DB);
}

int     DB_ReadFileNum(SWISH *sw, unsigned char *filedata, void *DB)
{
   return sw->Db->DB_ReadFileNum( filedata, DB);
}

 
int     DB_InitReadSortedIndex(SWISH *sw, void *DB)
{
   return sw->Db->DB_InitReadSortedIndex(DB);
}

int     DB_ReadSortedIndex(SWISH *sw, int propID, unsigned char **data, int *sz_data,void *DB)
{
   return sw->Db->DB_ReadSortedIndex(propID, data, sz_data,DB);
}

/* ******* This is now a macro and accessies the native data by default
int     DB_ReadSortedData(SWISH *sw, int *data,int index, int *value, void *DB)
{
   return sw->Db->DB_ReadSortedData(data,index,value,DB);
}
*******/

int     DB_EndReadSortedIndex(SWISH *sw, void *DB)
{
   return sw->Db->DB_EndReadSortedIndex(DB);
}


void    DB_ReadPropPositions(SWISH *sw, IndexFILE *indexf, FileRec *fi, void *db)
{
    sw->Db->DB_ReadPropPositions( indexf, fi, db);
}


char *DB_ReadProperty(SWISH *sw, IndexFILE *indexf, FileRec *fi, int propID, int *buf_len, int *uncompressed_len, void *db)
{
    return sw->Db->DB_ReadProperty( indexf, fi, propID, buf_len, uncompressed_len, db );
}



#ifdef USE_BTREE

int       DB_ReadTotalWordsPerFile(SWISH *sw, int index, int *value, void *DB)
{
    return sw->Db->DB_ReadTotalWordsPerFile(sw, index, value, DB);
}

#endif



/* 11/00 Function to read all words starting with a character */
char   *getfilewords(SWISH * sw, int c, IndexFILE * indexf)
{
    int     i,
            j;
    int     wordlen;
    char   *buffer, *resultword;
    int     bufferpos,
            bufferlen;
    unsigned char    word[2];
    sw_off_t    wordID;

    

    if (!c)
        return "";
    /* Check if already read */
    j = (int) ((unsigned char) c);
    if (indexf->keywords[j])
        return (indexf->keywords[j]);

    DB_InitReadWords(sw, indexf->DB);

    word[0]=(unsigned char)c;
    word[1]='\0';

    DB_ReadFirstWordInvertedIndex(sw, (char *)word, &resultword, &wordID, indexf->DB);
    i = (int) ((unsigned char) c);
    if (!wordID)
    {
        DB_EndReadWords(sw, indexf->DB);
        sw->lasterror = WORD_NOT_FOUND;
        return "";
    }

    wordlen = strlen(resultword);    
    bufferlen = wordlen + MAXSTRLEN * 10;
    bufferpos = 0;
    buffer = emalloc(bufferlen + 1);
    buffer[0] = '\0';


    memcpy(buffer, resultword, wordlen);
    efree(resultword);
    if (c != (int)((unsigned char) buffer[bufferpos]))
    {
        buffer[bufferpos] = '\0';
        indexf->keywords[j] = buffer;
        return (indexf->keywords[j]);
    }

    buffer[bufferpos + wordlen] = '\0';
    bufferpos += wordlen + 1;

    /* Look for occurrences */
    DB_ReadNextWordInvertedIndex(sw, (char *)word, &resultword, &wordID, indexf->DB);
    while (wordID)
    {
        wordlen = strlen(resultword);
        if ((bufferpos + wordlen + 1 + 1) > bufferlen)
        {
            bufferlen += MAXSTRLEN + wordlen + 1 + 1;
            buffer = (char *) erealloc(buffer, bufferlen + 1);
        }
        memcpy(buffer + bufferpos, resultword, wordlen);
        efree(resultword);
        if (c != (int)((unsigned char)buffer[bufferpos]))
        {
            buffer[bufferpos] = '\0';
            break;
        }
        
        buffer[bufferpos + wordlen] = '\0';
        bufferpos += wordlen + 1;
        DB_ReadNextWordInvertedIndex(sw, (char *)word, &resultword, &wordID, indexf->DB);
    }
    buffer[bufferpos] = '\0';
    indexf->keywords[j] = buffer;
    return (indexf->keywords[j]);
}


�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/win32/����������������������������������������������������������������������������0000777�0000771�0001750�00000000000�11166013167�011645� 5����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/win32/dirent.h��������������������������������������������������������������������0000664�0000771�0001750�00000003025�11166010104�013205� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/* 
 * @(#) dirent.h 2.0 17 Jun 91   Public Domain. 
 * 
 *  A public domain implementation of BSD directory routines for 
 *  MS-DOS.  Written by Michael Rendell ({uunet,utai}michael@garfield), 
 *  August 1987 
 * 
 *  Enhanced and ported to OS/2 by Kai Uwe Rommel; added scandir() prototype 
 *  December 1989, February 1990 
 *  Change of MAXPATHLEN for HPFS, October 1990 
 *   
 *  Unenhanced and ported to Windows NT by Bill Gallagher 
 *  17 Jun 91 
 *  changed d_name to char * instead of array, removed non-std extensions 
 *  
 *  Cleanup, other hackery, Summer '92, Brian Moran , brianmo@microsoft.com 
 */ 

#ifndef _DIRENT
#define _DIRENT

#include <direct.h>
#include <sys/types.h>
struct dirent 
{
	ino_t    d_ino;                   /* a bit of a farce */ 
    short    d_reclen;                /* more farce */ 
    short    d_namlen;                /* length of d_name */ 
    char    *d_name;
}; 
 
struct _dircontents 
{ 
    char *_d_entry; 
    struct _dircontents *_d_next; 
}; 
 
typedef struct _dirdesc 
{ 
    int  dd_id;			   /* uniquely identify each open directory*/ 
    long dd_loc;			/* where we are in directory entry */ 
    struct _dircontents *dd_contents;	/* pointer to contents of dir */ 
    struct _dircontents *dd_cp;		/* pointer to current position */ 
} 
DIR; 

extern DIR *opendir(char *); 
extern struct dirent *readdir(DIR *); 
extern void seekdir(DIR *, long); 
extern long telldir(DIR *); 
extern void closedir(DIR *); 
#define rewinddir(dirp) seekdir(dirp, 0L) 

#endif /* _DIRENT */

/* end of dirent.h */ 
�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/win32/libswishe.dsp���������������������������������������������������������������0000664�0000771�0001750�00000012676�11166010104�014264� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������# Microsoft Developer Studio Project File - Name="libswishe" - Package Owner=<4>
# Microsoft Developer Studio Generated Build File, Format Version 6.00
# ** DO NOT EDIT **

# TARGTYPE "Win32 (x86) Static Library" 0x0104

CFG=libswishe - Win32 Debug
!MESSAGE This is not a valid makefile. To build this project using NMAKE,
!MESSAGE use the Export Makefile command and run
!MESSAGE 
!MESSAGE NMAKE /f "libswishe.mak".
!MESSAGE 
!MESSAGE You can specify a configuration when running NMAKE
!MESSAGE by defining the macro CFG on the command line. For example:
!MESSAGE 
!MESSAGE NMAKE /f "libswishe.mak" CFG="libswishe - Win32 Debug"
!MESSAGE 
!MESSAGE Possible choices for configuration are:
!MESSAGE 
!MESSAGE "libswishe - Win32 Release" (based on "Win32 (x86) Static Library")
!MESSAGE "libswishe - Win32 Debug" (based on "Win32 (x86) Static Library")
!MESSAGE 

# Begin Project
# PROP AllowPerConfigDependencies 0
# PROP Scc_ProjName ""
# PROP Scc_LocalPath ""
CPP=cl.exe
RSC=rc.exe

!IF  "$(CFG)" == "libswishe - Win32 Release"

# PROP BASE Use_MFC 0
# PROP BASE Use_Debug_Libraries 0
# PROP BASE Output_Dir "Release"
# PROP BASE Intermediate_Dir "Release"
# PROP BASE Target_Dir ""
# PROP Use_MFC 0
# PROP Use_Debug_Libraries 0
# PROP Output_Dir "tmp/libswishe_Release"
# PROP Intermediate_Dir "tmp/libswishe_Release"
# PROP Target_Dir ""
# ADD BASE CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_MBCS" /D "_LIB" /YX /FD /c
# ADD CPP /nologo /MD /W3 /GX /O2 /I "." /I "../../../pcre/include" /I "../../../zlib/include" /I "..\replace" /D "HAVE_PCRE" /D "HAVE_CONFIG_H" /D "WIN32" /D "NDEBUG" /D "_MBCS" /D "_LIB" /D "HAVE_ZLIB" /D "HAVE_LIBXML2" /YX /FD /c
# ADD BASE RSC /l 0x409 /d "NDEBUG"
# ADD RSC /l 0x409 /d "NDEBUG"
BSC32=bscmake.exe
# ADD BASE BSC32 /nologo
# ADD BSC32 /nologo
LIB32=link.exe -lib
# ADD BASE LIB32 /nologo
# ADD LIB32 /nologo /out:"libswish-e.lib"

!ELSEIF  "$(CFG)" == "libswishe - Win32 Debug"

# PROP BASE Use_MFC 0
# PROP BASE Use_Debug_Libraries 1
# PROP BASE Output_Dir "Debug"
# PROP BASE Intermediate_Dir "Debug"
# PROP BASE Target_Dir ""
# PROP Use_MFC 0
# PROP Use_Debug_Libraries 1
# PROP Output_Dir "tmp/libswishe_Debug"
# PROP Intermediate_Dir "tmp/libswishe_Debug"
# PROP Target_Dir ""
# ADD BASE CPP /nologo /W3 /Gm /GX /ZI /Od /D "WIN32" /D "_DEBUG" /D "_MBCS" /D "_LIB" /YX /FD /GZ /c
# ADD CPP /nologo /W3 /Gm /GX /ZI /Od /I "." /I "../../../pcre/include" /I "../../../zlib/include" /I "..\replace" /D "HAVE_PCRE" /D "HAVE_CONFIG_H" /D "WIN32" /D "_DEBUG" /D "_MBCS" /D "_LIB" /D "HAVE_ZLIB" /D "HAVE_LIBXML2" /YX /FD /GZ /c
# ADD BASE RSC /l 0x409 /d "_DEBUG"
# ADD RSC /l 0x409 /d "_DEBUG"
BSC32=bscmake.exe
# ADD BASE BSC32 /nologo
# ADD BSC32 /nologo
LIB32=link.exe -lib
# ADD BASE LIB32 /nologo
# ADD LIB32 /nologo /out:"tmp/libswishe_Debug\libswish-e.lib"

!ENDIF 

# Begin Target

# Name "libswishe - Win32 Release"
# Name "libswishe - Win32 Debug"
# Begin Group "Source Files"

# PROP Default_Filter "cpp;c;cxx;rc;def;r;odl;idl;hpj;bat"
# Begin Source File

SOURCE=..\snowball\api.c
# End Source File
# Begin Source File

SOURCE=..\check.c
# End Source File
# Begin Source File

SOURCE=..\compress.c
# End Source File
# Begin Source File

SOURCE=..\date_time.c
# End Source File
# Begin Source File

SOURCE=..\db_native.c
# End Source File
# Begin Source File

SOURCE=..\db_read.c
# End Source File
# Begin Source File

SOURCE=..\docprop.c
# End Source File
# Begin Source File

SOURCE=..\double_metaphone.c
# End Source File
# Begin Source File

SOURCE=..\error.c
# End Source File
# Begin Source File

SOURCE=..\hash.c
# End Source File
# Begin Source File

SOURCE=..\headers.c
# End Source File
# Begin Source File

SOURCE=..\list.c
# End Source File
# Begin Source File

SOURCE=..\mem.c
# End Source File
# Begin Source File

SOURCE=..\metanames.c
# End Source File
# Begin Source File

SOURCE=..\proplimit.c
# End Source File
# Begin Source File

SOURCE=..\ramdisk.c
# End Source File
# Begin Source File

SOURCE=..\rank.c
# End Source File
# Begin Source File

SOURCE=..\result_sort.c
# End Source File
# Begin Source File

SOURCE=..\search.c
# End Source File
# Begin Source File

SOURCE=..\soundex.c
# End Source File
# Begin Source File

SOURCE=..\snowball\stem_de.c
# End Source File
# Begin Source File

SOURCE=..\snowball\stem_dk.c
# End Source File
# Begin Source File

SOURCE=..\snowball\stem_en1.c
# End Source File
# Begin Source File

SOURCE=..\snowball\stem_en2.c
# End Source File
# Begin Source File

SOURCE=..\snowball\stem_es.c
# End Source File
# Begin Source File

SOURCE=..\snowball\stem_fi.c
# End Source File
# Begin Source File

SOURCE=..\snowball\stem_fr.c
# End Source File
# Begin Source File

SOURCE=..\snowball\stem_it.c
# End Source File
# Begin Source File

SOURCE=..\snowball\stem_nl.c
# End Source File
# Begin Source File

SOURCE=..\snowball\stem_no.c
# End Source File
# Begin Source File

SOURCE=..\snowball\stem_pt.c
# End Source File
# Begin Source File

SOURCE=..\snowball\stem_ru.c
# End Source File
# Begin Source File

SOURCE=..\snowball\stem_se.c
# End Source File
# Begin Source File

SOURCE=..\stemmer.c
# End Source File
# Begin Source File

SOURCE=..\swish2.c
# End Source File
# Begin Source File

SOURCE=..\swish_qsort.c
# End Source File
# Begin Source File

SOURCE=..\swish_words.c
# End Source File
# Begin Source File

SOURCE=..\swstring.c
# End Source File
# Begin Source File

SOURCE=..\snowball\utilities.c
# End Source File
# End Group
# Begin Group "Header Files"

# PROP Default_Filter "h;hpp;hxx;hm;inl"
# End Group
# End Target
# End Project
������������������������������������������������������������������swish-e-2.4.7/src/win32/dirent.c��������������������������������������������������������������������0000664�0000771�0001750�00000012365�11166010104�013207� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/* 
   dir.c for MS-DOS by Samuel Lam <skl@van-bc.UUCP>, June/87 
*/ 
 
/* #ifdef WIN32 */
/* 
 * @(#)dir.c 1.4 87/11/06 Public Domain. 
 * 
 *  A public domain implementation of BSD directory routines for 
 *  MS-DOS.  Written by Michael Rendell ({uunet,utai}michael@garfield), 
 *  August 1897 
 *  Ported to OS/2 by Kai Uwe Rommel 
 *  December 1989, February 1990 
 *  Ported to Windows NT 22 May 91 
 *    other mods Summer '92 brianmo@microsoft.com 
 *  opendirx() was horribly written, very inefficient, and did not take care
 *    of all cases.  It is still not too clean, but it is far more efficient.
 *    Changes made by Gordon Chaffee (chaffee@bugs-bunny.cs.berkeley.edu)
 */ 
 
 
/*Includes: 
 *	crt 
 */ 
#include <windows.h>
#include <stdlib.h> 
#include <string.h> 
#include <sys\types.h> 
#include <sys\stat.h> 
#define _SWISH_PORT
#include "dirent.h" 
#include "acconfig.h"

/* 
 *	NT specific 
 */ 
#include <stdio.h> 
 
/* 
 *	random typedefs 
 */ 
#define HDIR        HANDLE 
#define HFILE       HANDLE 
#define PHFILE      PHANDLE 
 
/* 
 *	local functions 
 */ 
static char *getdirent(char *); 
static void free_dircontents(struct _dircontents *); 
 
static HDIR				FindHandle; 
static WIN32_FIND_DATA	FileFindData; 
 
static struct dirent dp; 
 
DIR *opendirx(char *name, char *pattern) 
{ 
    struct stat statb; 
    DIR *dirp; 
    char c; 
    char *s; 
    struct _dircontents *dp; 
    int len;
    int unc;
    char path[ SW_MAXPATHNAME ]; 
    register char *ip, *op;
	
    for (ip = name, op = path; ; op++, ip++) {
		*op = *ip;
		if (*ip == '\0') {
			break;
		}
    }

    len = ip - name;
    if (len > 0) {
		/* Windows NT required a trailing '/' at some point.  Now it MUST NOT have one.  */
//		unc = ((path[0] == '\\' || path[0] == '/') &&
//			(path[1] == '\\' || path[1] == '/'));
		unc = 0;
		c = path[len - 1];
		if (unc) {
			if (c != '\\' && c != '/') {
				path[len] = '/';
				len++;
				path[len] ='\0';
			}
		} else {
			if ((c == '\\' || c == '/') && (len > 1)) {
				len--;
				path[len] = '\0';
				
				if (path[len - 1] == ':' ) {
					path[len] = '/'; len++;
					path[len] = '.'; len++;
					path[len] = '\0';
				}
			} else if (c == ':' ) {
				path[len] = '.';
				len++;
				path[len] ='\0';
			}
		}
    } else {
		unc = 0;
		path[0] = '.';
		path[1] = '\0';
		len = 1;
    }
	
    if (stat(path, &statb) < 0 || (statb.st_mode & S_IFMT) != S_IFDIR) {
		return NULL; 
    }
	
    dirp = malloc(sizeof(DIR));
    if (dirp == NULL) {
		return dirp;
    }
	
    c = path[len - 1];
    if (c == '.' ) {
		if (len == 1) {
			len--;
		} else {
			c = path[len - 2];
			if (c == '\\' || c == ':') {
				len--;
			} else {
				path[len] = '/';
				len++;
			}
		}
    } else if (!unc && ((len != 1) || (c != '\\' && c != '/'))) {
		path[len] = '/';
		len++;
    }
    strcpy(path + len, pattern);
	
    dirp -> dd_loc = 0; 
    dirp -> dd_contents = dirp -> dd_cp = NULL; 
	
    if ((s = getdirent(path)) == NULL) {
		return dirp;
    }
	
    do 
    { 
		if (((dp = malloc(sizeof(struct _dircontents))) == NULL) || 
			((dp -> _d_entry = malloc(strlen(s) + 1)) == NULL)      ) 
		{ 
			if (dp) 
				free(dp); 
			free_dircontents(dirp -> dd_contents); 
			
			return NULL; 
		} 
		
		if (dirp -> dd_contents) 
			dirp -> dd_cp = dirp -> dd_cp -> _d_next = dp; 
		else 
			dirp -> dd_contents = dirp -> dd_cp = dp; 
		
		strcpy(dp -> _d_entry, s); 
		dp -> _d_next = NULL; 
		
    } 
    while ((s = getdirent(NULL)) != NULL); 
	
    dirp -> dd_cp = dirp -> dd_contents; 
    return dirp; 
} 
 
DIR *opendir(char *name)
{
	return opendirx(name, "*");
} 

void closedir(DIR * dirp) 
{ 
	free_dircontents(dirp -> dd_contents); 
	free(dirp); 
} 
 
struct dirent *readdir(DIR * dirp) 
{ 
	/* static struct dirent dp; */ 
	if (dirp -> dd_cp == NULL) 
		return NULL; 
	
	/*strcpy(dp.d_name,dirp->dd_cp->_d_entry); */ 
	
	dp.d_name = dirp->dd_cp->_d_entry; 
	
	dp.d_namlen = dp.d_reclen = 
		strlen(dp.d_name); 
	
	dp.d_ino = dirp->dd_loc+1; /* fake the inode */ 
	
	dirp -> dd_cp = dirp -> dd_cp -> _d_next; 
	dirp -> dd_loc++; 
	
	
	return &dp; 
} 

void seekdir(DIR * dirp, long off) 
{ 
	long i = off; 
	struct _dircontents *dp; 
	
	if (off >= 0) 
	{ 
		for (dp = dirp -> dd_contents; --i >= 0 && dp; dp = dp -> _d_next); 
		
		dirp -> dd_loc = off - (i + 1); 
		dirp -> dd_cp = dp; 
	} 
} 
 
 
long telldir(DIR * dirp) 
{ 
	return dirp -> dd_loc; 
} 

static void free_dircontents(struct _dircontents * dp) 
{ 
	struct _dircontents *odp; 
	
	while (dp) 
	{ 
		if (dp -> _d_entry) 
			free(dp -> _d_entry); 
		
		dp = (odp = dp) -> _d_next; 
		free(odp); 
	} 
} 
/* end of "free_dircontents" */ 

static char *getdirent(char *dir) 
{ 
    int got_dirent; 
	
    if (dir != NULL) 
    {				       /* get first entry */ 
		if ((FindHandle = FindFirstFile( dir, &FileFindData )) 
			== (HDIR)0xffffffff) 
		{ 
			return NULL; 
		} 
		got_dirent = 1;
    } 
    else				       /* get next entry */ 
		got_dirent = FindNextFile( FindHandle, &FileFindData ); 
	
    if (got_dirent) 
		return FileFindData.cFileName; 
    else 
    { 
		FindClose(FindHandle); 
		return NULL; 
    } 
} 
/* end of getdirent() */ 

struct passwd * _cdecl
getpwnam(char *name)
{
    return NULL;
}

struct passwd * _cdecl
getpwuid(int uid)
{
    return NULL;
}

int
getuid()
{
    return 0;
}

void _cdecl
endpwent(void)
{
}

/* #endif */
���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/win32/swishe.dsw������������������������������������������������������������������0000664�0000771�0001750�00000002733�11166010104�013575� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������Microsoft Developer Studio Workspace File, Format Version 6.00
# WARNING: DO NOT EDIT OR DELETE THIS WORKSPACE FILE!

###############################################################################

Project: "API"=".\API.dsp" - Package Owner=<4>

Package=<5>
{{{
}}}

Package=<4>
{{{
}}}

###############################################################################

Project: "dllswish"=".\dllswishe.dsp" - Package Owner=<4>

Package=<5>
{{{
}}}

Package=<4>
{{{
    Begin Project Dependency
    Project_Dep_Name libswishe
    End Project Dependency
}}}

###############################################################################

Project: "libswishe"=".\libswishe.dsp" - Package Owner=<4>

Package=<5>
{{{
}}}

Package=<4>
{{{
}}}

###############################################################################

Project: "libswishindex"=".\libswishindex.dsp" - Package Owner=<4>

Package=<5>
{{{
}}}

Package=<4>
{{{
}}}

###############################################################################

Project: "swishe"=".\swishe.dsp" - Package Owner=<4>

Package=<5>
{{{
}}}

Package=<4>
{{{
    Begin Project Dependency
    Project_Dep_Name libswishindex
    End Project Dependency
    Begin Project Dependency
    Project_Dep_Name dllswish
    End Project Dependency
}}}

###############################################################################

Global:

Package=<5>
{{{
}}}

Package=<3>
{{{
}}}

###############################################################################

�������������������������������������swish-e-2.4.7/src/win32/release.nsi�����������������������������������������������������������������0000664�0000771�0001750�00000002154�11166010104�013704� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������Name "SWISH-E"
OutFile "swishsetup.exe"

!define VERSION "2.4.3"

; Some default compiler settings (uncomment and change at will):
SetCompress auto ; (can be off or force)
SetDatablockOptimize on ; (can be off)
CRCCheck on ; (can be off)
AutoCloseWindow false ; (can be true for the window go away automatically at end)
ShowInstDetails hide ; (can be show to have them shown, or nevershow to disable)
SetDateSave on ; (can be on to have files restored to their orginal date)

LicenseText "You may redistribute SWISH-E under the following terms:"
LicenseData "../../COPYING.txt"

InstallDir "C:\SWISH-E"
InstallDirRegKey HKEY_LOCAL_MACHINE "SOFTWARE\SWISH-E Team\SWISH-E\${VERSION}" ""
;DirShow show ; (make this hide to not let the user change it)
DirText "Select location where to install SWISH-E:"

ComponentText "Which components do you require?"

# defines SF_*, SECTION_OFF and some macros
!include Sections.nsh

Page license
Page directory
# Page custom InstallActivePerl ": ActivePerl Detection"
Page components
Page instfiles


;--------------------------------
; Installer Sections

; Basic Sections
!include filebase.nsh

��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/win32/build-perl.bat��������������������������������������������������������������0000775�0000771�0001750�00000000242�11166010104�014277� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������
perl Makefile.pl LIBS="../src/win32/libswish-e.lib ../../zlib/lib/zlib.lib msvcrt.lib" OPTIMIZE="-MD -Zi -DNDEBUG -O1 -I../src" SWISHIGNOREVER=1  SWISHSKIPTEST=1��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/win32/build.sh��������������������������������������������������������������������0000775�0000771�0001750�00000002030�11166010104�013200� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/bin/sh
# This script documents how to build SWISH-E for Win32 under Linux

# To jumpstart your development here are pcre, libxml2, and zlib:
# http://www.webaugur.com/wares/files/swish-e/builddir.zip

# You also need the following from Debian (unstable?):
# apt-get install mingw32 mingw32-binutils mingw32-runtime

# Host Arch
HA=i586-mingw32msvc

# Build System Arch
BA=i686-linux

# Remove the cache for our configure script else we will have problems.
rm -f config.cross.cache

# Take note of the host, target and build options.  If you're building
# on another OS you will want change these.
#   libxml2, zlib, pcre are the build directory for each.
./configure --prefix=${PWD}/../prefix \
        --cache-file=config.cross.cache \
	--disable-docs \
        --host=${HA} \
        --target=${HA} \
        --build=${BA} \
        --with-libxml2=$PWD/../libxml2 \
        --with-zlib=$PWD/../zlib \
        --with-pcre=$PWD/../pcre \
        --enable-shared

# Build Binaries
make

# Build SWISH::API
pushd perl
make -f Makefile.mingw
popd

��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/win32/dist.sh���������������������������������������������������������������������0000775�0000771�0001750�00000001141�11166010104�013046� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/bin/sh

# You need a native Linux version of makensis.  This is not easy to come by.
# Here you'll find a binary tarball and Debian package:
#     http://webaugur.com/wares/files/makensis/

# You'll also need the following from Debian: 
# apt-get install sysutils



# Convert our documentation and scripts to DOS format
find . -type f -regex ".*/\(README\|COPYING\)" -exec mv {} {}.txt \;
find . -type f -regex ".*/\(.*\(\.\)\(txt\|html\|pm\|pl\|html\|tt\|tmpl\|cgi\)\|swishspider\)" -exec unix2dos {} \; 2>&1

# Build the installer executable, assumes makensis is in PATH
makensis src/win32/release.nsi

�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/win32/filebase.nsh����������������������������������������������������������������0000664�0000771�0001750�00000033436�11166010104�014044� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������
Section "Required Components" SecProgram
    SectionIn 1 RO
    
    ; Make sure there are no old versions hanging around
    Delete "$INSTDIR\swish-e.exe"
    Delete "$INSTDIR\bin\libxml2-2.dll"
    Delete "$INSTDIR\bin\zlib1.dll"
    Delete "$INSTDIR\bin\pcre*.dll"
    Delete "$INSTDIR\bin\libswish-e*.dll"

    ; System Files
    SetOutPath "$SYSDIR"
    
    ; LibXML2, PCRE, and ZLib
    File "../../../libxml2/bin/libxml2-2.dll"
    File "../../../zlib/bin/zlib1.dll"
    File "../../../pcre/bin/pcre.dll"
    File "../../../pcre/bin/pcreposix.dll"
    File "../.libs/libswish-e-2.dll"

    ; install swish-e to the bin folder
    SetOutPath "$INSTDIR\bin\"
    File "../.libs/swish-e.exe"
    
    ; Local Files
    SetOutPath "$INSTDIR\"
    File "../../COPYING.txt"
    File "fixperl.pl"

    ; Misc Documents
    SetOutPath "$INSTDIR\share\doc\swish-e"
    File "../../INSTALL"
    File "../../README.txt"
    File "../../README.cvs"
    
    ; Local Helper Files
    SetOutPath "$INSTDIR\lib\swish-e\"
    File "../swishspider"
    File "../../prog-bin/spider.pl.in"
    


    ; Create shorcuts on the Start Menu
    SetOutPath "$SMPROGRAMS\SWISH-E\"
    CreateShortcut "$SMPROGRAMS\\SWISH-E\\Browse Files.lnk" "$INSTDIR\\"
    WriteINIStr "$SMPROGRAMS\\SWISH-E\\Website.url" "InternetShortcut" "URL" "http://swish-e.org/"
    WriteINIStr "$SMPROGRAMS\\SWISH-E\\Documentation.url" "InternetShortcut" "URL" "http://swish-e.org/docs/"
    CreateShortcut "$SMPROGRAMS\\SWISH-E\\License.lnk" "$INSTDIR\\COPYING.txt"
    SetOutPath "$SMPROGRAMS\SWISH-E\PERL_Resources\"
    WriteINIStr "$SMPROGRAMS\\SWISH-E\\PERL_Resources\\Install_ActivePerl.url" "InternetShortcut" "URL" "http://www.activestate.com/Products/Download/Download.plex?id=ActivePerl"
    WriteINIStr "$SMPROGRAMS\\SWISH-E\\PERL_Resources\\CPAN_PERL_Modules.url" "InternetShortcut" "URL" "http://search.cpan.org/"
SectionEnd ; end of default section

; User and Developer Documentation
Section "Documentation" SecDoc
;    SetOutPath "$INSTDIR\share\doc\swish-e\html"
;    File "../../html/*.html"
;    File "../../html/*.css"
    SetOutPath "$INSTDIR\share\doc\swish-e\pod"
    File "../../pod/*.pod"
SectionEnd

; the static libraries are 20 MB.
;Section "Developer Files" SecDev
;    SetOutPath "$INSTDIR\lib\"
;    File "../.libs/libswish-e.a"
;    File "../.libs/libswish-e.dll.a"
;    File "../.libs/libswishindex.a"
;
;    SetOutPath "$INSTDIR\include\"
;    File "../swish-e.h"
;SectionEnd

SubSection "Document Filters" SubSecFilters
    Section "MS Office Filters" SecDocFilter
        WriteRegStr HKEY_LOCAL_MACHINE "SOFTWARE\SWISH-E Team\SWISH-E\${VERSION}\Options" "catdoc" "1"
        
        SetOutPath "$INSTDIR\bin\"
        File "../../../catdoc/catdoc.exe"
        File "../../../catdoc/catppt.exe"
        File "../../../catdoc/xls2csv.exe"
        
        SetOutPath "$INSTDIR\share\doc\catdoc"
    	File "../../../catdoc/COPYING"
    	
    	SetOutPath "$INSTDIR\charsets\"
        File "../../../catdoc/charsets/*"
    SectionEnd
    Section "PDF Filter" SecPDFFilter
        WriteRegStr HKEY_LOCAL_MACHINE "SOFTWARE\SWISH-E Team\SWISH-E\${VERSION}\Options" "xpdf" "1"
        
        SetOutPath "$INSTDIR\bin\"
        File "../../../xpdf/pdfinfo.exe"
        File "../../../xpdf/pdftotext.exe"
        
        SetOutPath "$INSTDIR\share\doc\xpdf\"
        File "../../../xpdf/COPYING"
        File "../../../xpdf/README"
        File "../../../xpdf/pdfinfo.txt"
        File "../../../xpdf/pdftotext.txt"
        File "../../../xpdf/sample-xpdfrc"
        File "../../../xpdf/xpdfrc.txt"
    SectionEnd
SubSectionEnd        
    
SubSection "PERL Support" SubSecPerlSupport
    Section /o "PERL API" SecPerlApi
        WriteRegStr HKEY_LOCAL_MACHINE "SOFTWARE\SWISH-E Team\SWISH-E\${VERSION}\Options" "Perl" "1"
        WriteRegStr HKEY_LOCAL_MACHINE "SOFTWARE\SWISH-E Team\SWISH-E\${VERSION}\Options" "PerlApi" "1"
        
        ; SWISH::API Scripts
        SetOutPath "$INSTDIR\lib\swish-e\perl\SWISH\"
        File "../../perl/API.pm"
        
        ; SWISH::API Binaries go into $PERL/lib/auto/SWISH/API
        Call ActivePerlLocation
        Pop $R1
        SetOutPath "$R1\site\lib\auto\SWISH\API\"
        File "../../perl/blib/arch/auto/SWISH/API/*"
    SectionEnd

    Section /o "Perl CGI Scripts" SecPerlCgi
        WriteRegStr HKEY_LOCAL_MACHINE "SOFTWARE\SWISH-E Team\SWISH-E\${VERSION}\Options" "PerlCgi" "1"

        ; swish.cgi scripts
        SetOutPath "$INSTDIR\lib\swish-e\"
        File "../../example/swish.cgi.in"
        File "../../example/search.cgi.in"

        ; swish.cgi other stuff
        SetOutPath "$INSTDIR\share\swish-e\"
        File "../../example/swish.tt"
        File "../../example/swish.tmpl"
        File "../../example/swish.gif"
        File "../../example/README.txt"

        ; swish.cgi templates
        SetOutPath "$INSTDIR\share\swish-e\templates\"
        File "../../example/templates/search.tt"
        File "../../example/templates/page_layout"
        File "../../example/templates/common_header"
        File "../../example/templates/common_footer"
        File "../../example/templates/style.css"
        File "../../example/templates/markup.css"

        ; swish.cgi Modules
        SetOutPath "$INSTDIR\lib\swish-e\perl\SWISH\"
        File "../../example/modules/SWISH/DateRanges.pm"
        File "../../example/modules/SWISH/DefaultHighlight.pm"
        File "../../example/modules/SWISH/PhraseHighlight.pm"
        File "../../example/modules/SWISH/SimpleHighlight.pm"
        File "../../example/modules/SWISH/TemplateDefault.pm"
        File "../../example/modules/SWISH/TemplateDumper.pm"
        File "../../example/modules/SWISH/TemplateFrame.pm"
        File "../../example/modules/SWISH/TemplateHTMLTemplate.pm"
        File "../../example/modules/SWISH/TemplateToolkit.pm"
        File "../../example/modules/SWISH/ParseQuery.pm"
    SectionEnd
    
    Section /o "PERL Filters" SecPerlFilter
        WriteRegStr HKEY_LOCAL_MACHINE "SOFTWARE\SWISH-E Team\SWISH-E\${VERSION}\Options" "Perl" "1"
        WriteRegStr HKEY_LOCAL_MACHINE "SOFTWARE\SWISH-E Team\SWISH-E\${VERSION}\Options" "PerlFilters" "1"
        SetOutPath "$INSTDIR\bin\"
        File "../../filters/swish-filter-test.in"
        
        SetOutPath "$INSTDIR\share\doc\swish-e\examples\filter-bin\"
        ; Example Filter Scripts
        File "../../filter-bin/_binfilter.sh"
        File "../../filter-bin/_pdf2html.pl"
	File "../../filter-bin/swish_filter.pl.in"

	# SWISH::Filter API
        SetOutPath "$INSTDIR\lib\swish-e\perl\SWISH\"
        File "../../filters/SWISH/Filter.pm.in"
        SetOutPath "$INSTDIR\lib\swish-e\perl\SWISH\Filters"
        File "../../filters/SWISH/Filters/*.pm"

    SectionEnd
    
    Section /o "PERL -S prog Examples" SecPerlMethod
        WriteRegStr HKEY_LOCAL_MACHINE "SOFTWARE\SWISH-E Team\SWISH-E\${VERSION}\Options" "Perl" "1"
        WriteRegStr HKEY_LOCAL_MACHINE "SOFTWARE\SWISH-E Team\SWISH-E\${VERSION}\Options" "PerlMethods" "1"
        SetOutPath "$INSTDIR\share\doc\swish-e\examples\prog-bin\"
        File "../../prog-bin/*.pl"
        File "../../prog-bin/*.pl.in"
        File "../../prog-bin/*.pm"
    SectionEnd
SubSectionEnd
 
Section "Examples" SecExample
    SetOutPath "$INSTDIR\share\doc\swish-e\examples\conf\"
    File "../../conf/*.config"
    File "../../conf/README.txt"
    SetOutPath "$INSTDIR\share\doc\swish-e\examples\conf\stopwords\"
    File "../../conf/stopwords/*.txt"

    ; Rename text files so Windows has a clue
    Rename "$INSTDIR/share/doc/swish-e/conf/README" "$INSTDIR/conf/README.txt"
SectionEnd ; end of section 'Examples'

Section "-post" ; (post install section, happens last after any optional sections)
    ; add any commands that need to happen after any optional sections here
    WriteRegStr HKEY_LOCAL_MACHINE "SOFTWARE\SWISH-E Team\SWISH-E\${VERSION}" "" "$INSTDIR"
    WriteRegStr HKEY_LOCAL_MACHINE "SOFTWARE\SWISH-E Team\SWISH-E" "CurrentVersion" "${VERSION}"
    
    ; fixperl needs these values:  installdir libexecdir perlmoduledir pkgdatadir swishbinary
    WriteRegStr HKEY_LOCAL_MACHINE "SOFTWARE\SWISH-E Team\SWISH-E\${VERSION}" "installdir" "$INSTDIR"
    WriteRegStr HKEY_LOCAL_MACHINE "SOFTWARE\SWISH-E Team\SWISH-E\${VERSION}" "libexecdir" "$INSTDIR\lib\swish-e"
    WriteRegStr HKEY_LOCAL_MACHINE "SOFTWARE\SWISH-E Team\SWISH-E\${VERSION}" "mylibexecdir" "$INSTDIR\lib\swish-e"
    WriteRegStr HKEY_LOCAL_MACHINE "SOFTWARE\SWISH-E Team\SWISH-E\${VERSION}" "perlmoduledir" "$INSTDIR\lib\swish-e\perl"
    WriteRegStr HKEY_LOCAL_MACHINE "SOFTWARE\SWISH-E Team\SWISH-E\${VERSION}" "pkgdatadir" "$INSTDIR\lib\swish-e"
    WriteRegStr HKEY_LOCAL_MACHINE "SOFTWARE\SWISH-E Team\SWISH-E\${VERSION}" "swishbinary" "$INSTDIR\swish-e.exe"
    
    ; Uninstaller information
    WriteRegStr HKEY_LOCAL_MACHINE "Software\Microsoft\Windows\CurrentVersion\Uninstall\SWISH-E" "DisplayName" "SWISH-E (remove only)"
    WriteRegStr HKEY_LOCAL_MACHINE "Software\Microsoft\Windows\CurrentVersion\Uninstall\SWISH-E" "UninstallString" '"$INSTDIR\uninst.exe"'
    
    ; Add swish-e.exe to the Win32 Path
    WriteRegStr HKEY_LOCAL_MACHINE "SOFTWARE\Microsoft\Windows\CurrentVersion\AppPaths\swish-e.exe" "" "$INSTDIR\bin\swish-e.exe"
    WriteRegStr HKEY_LOCAL_MACHINE "SOFTWARE\Microsoft\Windows\CurrentVersion\AppPaths\swish-e.exe" "Path" "$INSTDIR\bin;$INSTDIR\lib\swish-e"
    
    WriteUninstaller "uninst.exe"
    
    ; Clean out older versions of Swish-e and libs
    Delete "$INSTDIR/swish-e.exe"
    Delete "$INSTDIR/*.dll"
    
    ; Is ActivePerl Installed?
    Call IsActivePerlInstalled
    Pop $R0
    StrCmp $R0 0 endofpost
    
    ; Did SWISH-E Install any Perl modules?
    Call IsSwishPerlInstalled
    Pop $R0
    StrCmp $R0 0 endofpost
    
    ; If both were true we'll run fixperl.pl
    Call ActivePerlLocation
    Pop $R1
    Exec '$R1bin\perl.exe "$INSTDIR\fixperl.pl"'
    
    endofpost:
SectionEnd ; end of -post section

Section Uninstall
    ; add delete commands to delete whatever files/registry keys/etc you installed here.
    ; UninstallText "This will remove SWISH-E from your system"
    Delete "$INSTDIR\uninst.exe"
    DeleteRegKey HKEY_LOCAL_MACHINE "SOFTWARE\SWISH-E Team\SWISH-E\${VERSION}"
    DeleteRegValue HKEY_LOCAL_MACHINE "SOFTWARE\SWISH-E Team\SWISH-E" "CurrentVersion"
    DeleteRegKey HKEY_LOCAL_MACHINE "SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\SWISH-E"
    DeleteRegKey HKEY_LOCAL_MACHINE "SOFTWARE\Microsoft\Windows\CurrentVersion\AppPaths\swish-e.exe"
    
;    UnRegDLL "$SYSDIR\swishctl.dll"
    
    ; Other files
    RMDir /r "$INSTDIR"
    RMDir /r "$SMPROGRAMS\SWISH-E"
    
    ; SWISH::API Binaries go into $PERL\lib\auto\SWISH\API
    Call un.ActivePerlLocation
    Pop $R1
    RMDir /r "$R1\lib\auto\SWISH" 
SectionEnd ; end of uninstall section


Function .onInit
    Call IsActivePerlInstalled
    Pop $R0
    StrCmp $R0 1 endselect
    
    ; Do Not Select Perl Modules
    MessageBox MB_OK "ActivePerl 5.8 is not installed!  SWISH-E Perl support will not work and has been deselected."
    Goto end
    
    ; Select Perl Modules
    endselect:
    !insertmacro SelectSection ${SecPerlFilter}
    !insertmacro SelectSection ${SecPerlApi}
    !insertmacro SelectSection ${SecPerlMethod}
    !insertmacro SelectSection ${SecPerlCgi}
    
    end:
FunctionEnd                                                                                                                                                                                                 

Function DownloadActivePerl
; http://downloads.activestate.com/ActivePerl/Windows/5.8/ActivePerl-5.8.0.806-MSWin32-x86.msi    
    GetTempFileName $R0
;    File /oname=$R0 perlpage.ini
    InstallOptions::dialog $R0
    Pop $R1
    StrCmp $R1 "cancel" done
    StrCmp $R1 "back" done
    StrCmp $R1 "success" done
;    error: MessageBox MB_OK|MB_ICONSTOP "InstallOptions error:$/r$/n$R1"
    done:

    ; Is ActivePerl Installed?
    Call IsActivePerlInstalled
    Pop $R0
    StrCmp $R0 1 end
    MessageBox MB_YESNO "Would you like to install ActivePerl 5.8?" IDNO end
    
    ; Download ActivePerl to SWISH-E Install Directory
    NSISdl::download http://downloads.activestate.com/ActivePerl/Windows/5.8/ActivePerl-5.8.0.806-MSWin32-x86.msi "$INSTDIR/ActivePerl-5.8.0.806.msi"
    Pop $R0 ;Get the return value
    StrCmp $R0 "success" succeed
    MessageBox MB_OK "ActivePerl Download Failed: $R0"
    Goto end
    
    ; Attempt to install ActivePerl
    succeed:
    ExecShell open "$INSTDIR/ActivePerl-5.8.0.806.msi"
    end:
FunctionEnd

;--------------------------------
 ; IsActivePerlInstalled
 ; Based on IsFlashInstalled
 ; By Yazno, http://yazno.tripod.com/powerpimpit/
 ; Returns on top of stack
 ; 0 (ActivePerl is not installed)
 ; or
 ; 1 (ActivePerl is installed)
 ;
 ; Usage:
 ;   Call IsActivePerlInstalled
 ;   Pop $R0
 ;   ; $R0 at this point is "1" or "0"

 Function IsActivePerlInstalled
  Push $R0
  ClearErrors
  ReadRegStr $R0 HKEY_LOCAL_MACHINE "Software\ActiveState\ActivePerl\" "CurrentVersion"
  IfErrors lbl_na
    StrCpy $R0 1
  Goto lbl_end
  lbl_na:
    StrCpy $R0 0
  lbl_end:
  Exch $R0
 FunctionEnd
 
 Function ActivePerlLocation
  Push $R0
  ClearErrors
  ReadRegStr $R0 HKEY_LOCAL_MACHINE "Software\ActiveState\ActivePerl\" "CurrentVersion"
  ReadRegStr $R1 HKEY_LOCAL_MACHINE "Software\ActiveState\ActivePerl\$R0" ""
  IfErrors lbl_na
  Goto lbl_end
  lbl_na:
    StrCpy $R0 0
  lbl_end:
  Exch $R1
 FunctionEnd
 
 Function un.ActivePerlLocation
  Push $R0
  ClearErrors
  ReadRegStr $R0 HKEY_LOCAL_MACHINE "Software\ActiveState\ActivePerl\" "CurrentVersion"
  ReadRegStr $R1 HKEY_LOCAL_MACHINE "Software\ActiveState\ActivePerl\$R0" ""
  IfErrors lbl_na
  Goto lbl_end
  lbl_na:
    StrCpy $R0 0
  lbl_end:
  Exch $R1
 FunctionEnd
 Function IsSwishPerlInstalled
  Push $R0
  ClearErrors
  ReadRegStr $R0 HKEY_LOCAL_MACHINE "Software\SWISH-E Team\SWISH-E\${VERSION}\Options" "Perl"
  IfErrors lbl_na
    StrCpy $R0 1
  Goto lbl_end
  lbl_na:
    StrCpy $R0 0
  lbl_end:
  Exch $R0
 FunctionEnd
����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/win32/fixperl.pl������������������������������������������������������������������0000664�0000771�0001750�00000006242�11166010104�013561� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������use File::Find;
use File::Basename;
use Win32::TieRegistry 0.20 ( KEY_READ );

# Gloabal registry access options
my $registry_options = {
    Delimiter => '/',
    Access    => KEY_READ(),
};

# First, let's get location of ActivePerl
my $active_key = "LMachine/Software/ActiveState/ActivePerl";
my $perl_binary = get_perl_binary($active_key);



# Our Registry keys that we need
my $install_key = "LMachine/Software/SWISH-E Team/SWISH-E";
my @config_options = qw/installdir mylibexecdir libexecdir perlmoduledir pkgdatadir swishbinary /;

# Fetch data from registry
my $config = get_registry_data( $install_key, @config_options );

die "Failed to read the registry [$install_key].  Cannot continue\n"
    unless ref $config;

for ( @config_options ) {
   die "Failed to read registry [$install_key/$_]\n"
       unless $config->{$_};
}

# Add "perlbinary" into the config hash
$config->{perlbinary} = $perl_binary;
push @config_options, 'perlbinary';

# Now look for .in files to update at install time
find( {wanted => \&wanted }, $config->{installdir} );

sub wanted{ 
    return if -d;
    return if !-r;
    return unless /\.in$/;
    

	my $filename = $_;
    $basename = basename($filename, qw{.in});
    # open files
    open ( INF, "<$filename" ) or die "Failed to open [$filename] for reading:$!";
    open ( OUTF, ">$basename") or die "Failed to open [$basename] for output: $!";

    my $count;
    while ( <INF> ) {
        for my $setting ( @config_options ) {
            $count += s/qw\( \@\@$setting\@\@ \)/'$config->{$setting}'/g;
            $count += s/\@\@$setting\@\@/$config->{$setting}/g;
        }
        print OUTF;
    }
    close INF;

    printf("%20s --> %20s (%3d changes )\n", $filename, $basename, $count);



    # normal people will see this.  let's not scare them.
    unlink $filename || warn "Failed to unlink '$filename':$!\n";
}


# This fetches data from a registry entry based on a CurrentVersion lookup.

sub get_registry_data {
    my ( $top_level_key, @params ) = @_;

    my %data;

    my $key = Win32::TieRegistry->new( $top_level_key, $registry_options );
    unless ( $key ) {
        warn "Can't access registry key [$top_level_key]: $^E\n";
        return;
    }

    my $cur_version = $key->GetValue("CurrentVersion");
    unless ( $cur_version ) {
        warn "Failed to get current version from registry [$top_level_key]\n";
        return;
    }

    $data{CurrentVersion} = $cur_version;

    my $cur_key = $key->Open($cur_version);
    unless ( $cur_key ) {
        warn "Failed to find registry entry [$key\\$cur_version]\n";
        return;
    }

    # Load registry entries
    $data{$_} = $cur_key->GetValue($_) for @params;

    return \%data;
}

sub get_perl_binary {
    my $key = shift;

    # Get "(default)" key for install directory
    my $reg = get_registry_data( $key, "" );

    return unless ref $reg && $reg->{""};

    $perl_build = $reg->{CurrentVersion};

    my $perl_binary = $reg->{""} . ( $reg->{""} =~ /\\$/ ? 'bin\perl.exe' : '\bin\perl.exe');

    if ( -x $perl_binary ) {
        warn "Found Perl at: $perl_binary (build $perl_build)\n";
        return $perl_binary;
    }

    warn "Failed to find perl binary [$perl_binary]\n";
    return;
}

��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/win32/libswishindex.dsp�����������������������������������������������������������0000775�0000771�0001750�00000011302�11166010104�015133� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������# Microsoft Developer Studio Project File - Name="libswishindex" - Package Owner=<4>
# Microsoft Developer Studio Generated Build File, Format Version 6.00
# ** DO NOT EDIT **

# TARGTYPE "Win32 (x86) Static Library" 0x0104

CFG=libswishindex - Win32 Debug
!MESSAGE This is not a valid makefile. To build this project using NMAKE,
!MESSAGE use the Export Makefile command and run
!MESSAGE 
!MESSAGE NMAKE /f "libswishindex.mak".
!MESSAGE 
!MESSAGE You can specify a configuration when running NMAKE
!MESSAGE by defining the macro CFG on the command line. For example:
!MESSAGE 
!MESSAGE NMAKE /f "libswishindex.mak" CFG="libswishindex - Win32 Debug"
!MESSAGE 
!MESSAGE Possible choices for configuration are:
!MESSAGE 
!MESSAGE "libswishindex - Win32 Release" (based on "Win32 (x86) Static Library")
!MESSAGE "libswishindex - Win32 Debug" (based on "Win32 (x86) Static Library")
!MESSAGE 

# Begin Project
# PROP AllowPerConfigDependencies 0
# PROP Scc_ProjName ""
# PROP Scc_LocalPath ""
CPP=cl.exe
RSC=rc.exe

!IF  "$(CFG)" == "libswishindex - Win32 Release"

# PROP BASE Use_MFC 0
# PROP BASE Use_Debug_Libraries 0
# PROP BASE Output_Dir "libswishindex___Win32_Release"
# PROP BASE Intermediate_Dir "libswishindex___Win32_Release"
# PROP BASE Target_Dir ""
# PROP Use_MFC 0
# PROP Use_Debug_Libraries 0
# PROP Output_Dir "tmp/libswishindex_Release"
# PROP Intermediate_Dir "tmp/libswishindex_Release"
# PROP Target_Dir ""
# ADD BASE CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_MBCS" /D "_LIB" /YX /FD /c
# ADD CPP /nologo /MD /W3 /GX /O2 /I "." /I "../replace" /I "../../../pcre/include" /I "../../../zlib/include" /I "../../../iconv/include" /I "../../../libxml2/include" /I ".." /D "WIN32" /D "NDEBUG" /D "_MBCS" /D "_LIB" /D "HAVE_PCRE" /D "HAVE_LIBXML2" /D "HAVE_ZLIB" /D "HAVE_CONFIG_H" /YX /FD /c
# ADD BASE RSC /l 0x409 /d "NDEBUG"
# ADD RSC /l 0x409 /d "NDEBUG"
BSC32=bscmake.exe
# ADD BASE BSC32 /nologo
# ADD BSC32 /nologo
LIB32=link.exe -lib
# ADD BASE LIB32 /nologo
# ADD LIB32 /nologo /out:"libswishindex.lib"

!ELSEIF  "$(CFG)" == "libswishindex - Win32 Debug"

# PROP BASE Use_MFC 0
# PROP BASE Use_Debug_Libraries 1
# PROP BASE Output_Dir "libswishindex___Win32_Debug"
# PROP BASE Intermediate_Dir "libswishindex___Win32_Debug"
# PROP BASE Target_Dir ""
# PROP Use_MFC 0
# PROP Use_Debug_Libraries 1
# PROP Output_Dir "libswishindex___Win32_Debug"
# PROP Intermediate_Dir "libswishindex___Win32_Debug"
# PROP Target_Dir ""
# ADD BASE CPP /nologo /W3 /Gm /GX /ZI /Od /D "WIN32" /D "_DEBUG" /D "_MBCS" /D "_LIB" /YX /FD /GZ /c
# ADD CPP /nologo /W3 /Gm /GX /ZI /Od /I "." /I "../replace" /I "../../../pcre/include" /I "../../../libxml2/include" /I "../../../zlib/include" /I ".." /D "WIN32" /D "_MBCS" /D "_LIB" /D "HAVE_PCRE" /D "HAVE_LIBXML2" /D "HAVE_ZLIB" /D "HAVE_CONFIG_H" /D "_DEBUG" /YX /FD /GZ /c
# ADD BASE RSC /l 0x409 /d "_DEBUG"
# ADD RSC /l 0x409 /d "_DEBUG"
BSC32=bscmake.exe
# ADD BASE BSC32 /nologo
# ADD BSC32 /nologo /o"tmp/libswishindex___Win32_Debug/libswishindex.bsc"
LIB32=link.exe -lib
# ADD BASE LIB32 /nologo
# ADD LIB32 /nologo

!ENDIF 

# Begin Target

# Name "libswishindex - Win32 Release"
# Name "libswishindex - Win32 Debug"
# Begin Group "Source Files"

# PROP Default_Filter "cpp;c;cxx;rc;def;r;odl;idl;hpj;bat"
# Begin Source File

SOURCE=..\bash.c
# End Source File
# Begin Source File

SOURCE=..\db_write.c
# End Source File
# Begin Source File

SOURCE=.\dirent.c
# End Source File
# Begin Source File

SOURCE=..\docprop_write.c
# End Source File
# Begin Source File

SOURCE=..\entities.c
# End Source File
# Begin Source File

SOURCE=..\extprog.c
# End Source File
# Begin Source File

SOURCE=..\file.c
# End Source File
# Begin Source File

SOURCE=..\filter.c
# End Source File
# Begin Source File

SOURCE=..\fs.c
# End Source File
# Begin Source File

SOURCE=..\getruntime.c
# End Source File
# Begin Source File

SOURCE=..\html.c
# End Source File
# Begin Source File

SOURCE=..\http.c
# End Source File
# Begin Source File

SOURCE=..\httpserver.c
# End Source File
# Begin Source File

SOURCE=..\index.c
# End Source File
# Begin Source File

SOURCE=..\merge.c
# End Source File
# Begin Source File

SOURCE=..\methods.c
# End Source File
# Begin Source File

SOURCE=..\replace\mkstemp.c
# End Source File
# Begin Source File

SOURCE=..\parse_conffile.c
# End Source File
# Begin Source File

SOURCE=..\parser.c
# End Source File
# Begin Source File

SOURCE=..\pre_sort.c
# End Source File
# Begin Source File

SOURCE=..\swregex.c
# End Source File
# Begin Source File

SOURCE=..\txt.c
# End Source File
# Begin Source File

SOURCE=..\xml.c
# End Source File
# End Group
# Begin Group "Header Files"

# PROP Default_Filter "h;hpp;hxx;hm;inl"
# Begin Source File

SOURCE=..\sys.h
# End Source File
# End Group
# End Target
# End Project
������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/win32/acconfig.h������������������������������������������������������������������0000775�0000771�0001750�00000007441�11166010104�013502� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
    $Id: acconfig.h 1736 2005-05-12 15:41:22Z karman $

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.

* 
*   2005-05-09 - DLN - Added GPL with linking clause.
* 
*/

#ifdef _WIN32

/* Special Inclusions  */

#include <windows.h>    /* Win32 API */
#include <stdlib.h>		/* _sleep() */
#include <process.h>	/* _getpid() */
#include <io.h>			/* _umask(), _mktemp, etc */
#include <fcntl.h>		/* Most io.h functions want this */
#include <sys/types.h>	/* Most io.h functions want this */
#include <sys/stat.h>	/* Most io.h functions want this */


/* We define this in OS-specific code which include's config.h  */

#ifndef _SWISH_PORT
#  include "dirent.h"
#  include "pcreposix.h"
#  include "mkstemp.h"
#endif


/* ifdef logic  */
#define NO_GETTOD		/* Win32 has no Get Time Of Day  */
#undef HAVE_LSTAT		/* Win32 has no symbolic links */
#undef INDEXPERMS		/* Win32 has no chmod() - DLN 2001-11-05 Umm, yes it does... */
#define HAVE_SYS_TIMEB_H /* _ftime(), struct _timeb */ 
#define HAVE_STDLIB_H	/* We need stdlib.h instead of unistd.h  */
#define HAVE_PROCESS_H  /* _getpid is here  */
#define HAVE_VARARGS_H  /* va_list, vsnprintf, etc */
#define HAVE_LIBXML2 1  /* enable libxml2 XML parser */
#define HAVE_MEMCPY  1  /* sys.h explodes without this */
#define STDC_HEADERS 1  /* We have Standard C Headers, I think */

/* Environment Stuff */
#define HAVE_SETENV     /* Enable PATH setting */
#define PATH_SEPARATOR ";" /* PATH separator string */

#define HAVE_STRING_H   /* For mkstemp from libiberty  */

/* Macros which rewrite values  */
#define SWISH_VERSION "2.4.3"	/* Should we find a better way to handle this */
#define VERSION SWISH_VERSION   /* Some things want this  */

#define libexecdir "/usr/local/lib"  /* Microsoft CPP is brain damaged */


/* Internal SWISH-E File Access Modes */
#define FILEMODE_READ		"rb"	/* Read only */
#define FILEMODE_WRITE		"wb"	/* Write only */
#define FILEMODE_READWRITE	"rb+"	/* Read Write */

/* External POSIX File Access Modes */
#define O_RDWR _O_RDWR
#define O_CREAT _O_CREAT
#define O_EXCL _O_EXCL
#define O_BINARY _O_BINARY

/* Stat Stuff; borrowed from linux/stat.h */
#define S_ISLNK(m)      (((m) & S_IFMT) == S_IFLNK)
#define S_ISREG(m)      (((m) & S_IFMT) == S_IFREG)
#define S_ISDIR(m)      (((m) & S_IFMT) == S_IFDIR)
#define S_ISCHR(m)      (((m) & S_IFMT) == S_IFCHR)
#define S_ISBLK(m)      (((m) & S_IFMT) == S_IFBLK)
#define S_ISFIFO(m)     (((m) & S_IFMT) == S_IFIFO)
#define S_ISSOCK(m)     (((m) & S_IFMT) == S_IFSOCK)


/* Win32 filename lengths  */
#define SW_MAXPATHNAME 4096
#define SW_MAXFILENAME 256

/* Type definitions */
typedef int pid_t;			/* process ID */
typedef int mode_t;         /* file permission mode ID */

/* Rewrite ANSI functions to Win32 equivalents */
#define popen		_popen
#define pclose		_pclose
#define strcasecmp	stricmp
#define strncasecmp	strnicmp
#define sleep		_sleep
#define getpid		_getpid
#define umask       _umask
#define vsnprintf   _vsnprintf
#define stat	    _stat

#endif
�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/win32/swishe.dsp������������������������������������������������������������������0000664�0000771�0001750�00000015435�11166010104�013571� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������# Microsoft Developer Studio Project File - Name="swishe" - Package Owner=<4>
# Microsoft Developer Studio Generated Build File, Format Version 6.00
# ** DO NOT EDIT **

# TARGTYPE "Win32 (x86) Console Application" 0x0103

CFG=swishe - Win32 Debug
!MESSAGE This is not a valid makefile. To build this project using NMAKE,
!MESSAGE use the Export Makefile command and run
!MESSAGE 
!MESSAGE NMAKE /f "swishe.mak".
!MESSAGE 
!MESSAGE You can specify a configuration when running NMAKE
!MESSAGE by defining the macro CFG on the command line. For example:
!MESSAGE 
!MESSAGE NMAKE /f "swishe.mak" CFG="swishe - Win32 Debug"
!MESSAGE 
!MESSAGE Possible choices for configuration are:
!MESSAGE 
!MESSAGE "swishe - Win32 Release" (based on "Win32 (x86) Console Application")
!MESSAGE "swishe - Win32 Debug" (based on "Win32 (x86) Console Application")
!MESSAGE 

# Begin Project
# PROP AllowPerConfigDependencies 0
# PROP Scc_ProjName ""
# PROP Scc_LocalPath ""
CPP=cl.exe
RSC=rc.exe

!IF  "$(CFG)" == "swishe - Win32 Release"

# PROP BASE Use_MFC 0
# PROP BASE Use_Debug_Libraries 0
# PROP BASE Output_Dir "Release"
# PROP BASE Intermediate_Dir "Release"
# PROP BASE Target_Dir ""
# PROP Use_MFC 0
# PROP Use_Debug_Libraries 0
# PROP Output_Dir "tmp/swishe_Release"
# PROP Intermediate_Dir "tmp/swishe_Release"
# PROP Ignore_Export_Lib 0
# PROP Target_Dir ""
# ADD BASE CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
# ADD CPP /nologo /MD /W3 /GX /O2 /I "." /I "../replace" /I "..\..\..\expat\xmlparse" /I "..\..\..\zlib\include" /I "..\..\..\expat\xmltok" /I "../../../libxml2/include" /I "../../../pcre/include" /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /D "HAVE_PCRE" /D "HAVE_CONFIG_H" /D "HAVE_ZLIB" /FR /YX /FD /c
# ADD BASE RSC /l 0x409 /d "NDEBUG"
# ADD RSC /l 0x409 /d "NDEBUG"
BSC32=bscmake.exe
# ADD BASE BSC32 /nologo
# ADD BSC32 /nologo
LINK32=link.exe
# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386
# ADD LINK32 kernel32.lib dllswish-e.lib libswishindex.lib ../expat/xmltok/Release/xmltok.lib ../expat/xmlparse/Release/xmlparse.lib ../../../libxml2/lib/libxml2.lib ../../../zlib/lib/zdll.lib ../../../pcre/lib/pcreposix.lib /nologo /subsystem:console /machine:I386 /out:"swish-e.exe"
# SUBTRACT LINK32 /pdb:none

!ELSEIF  "$(CFG)" == "swishe - Win32 Debug"

# PROP BASE Use_MFC 0
# PROP BASE Use_Debug_Libraries 1
# PROP BASE Output_Dir "Debug"
# PROP BASE Intermediate_Dir "Debug"
# PROP BASE Target_Dir ""
# PROP Use_MFC 0
# PROP Use_Debug_Libraries 1
# PROP Output_Dir "tmp/swishe_Debug"
# PROP Intermediate_Dir "tmp/swishe_Debug"
# PROP Ignore_Export_Lib 0
# PROP Target_Dir ""
# ADD BASE CPP /nologo /W3 /Gm /GX /ZI /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /GZ /c
# ADD CPP /nologo /W3 /Gm /GX /ZI /Od /I "." /I "../replace" /I "..\..\..\expat\xmlparse" /I "../../../libxml2/include" /I "..\..\..\expat\xmltok" /I "../../../pcre/include" /I "../../../zlib" /D "HAVE_PCRE" /D "HAVE_CONFIG_H" /D "HAVE_ZLIB" /D "_CONSOLE" /D "WIN32" /D "_DEBUG" /D "_MBCS" /YX /FD /I /GZ /c
# ADD BASE RSC /l 0x409 /d "_DEBUG"
# ADD RSC /l 0x409 /d "_DEBUG"
BSC32=bscmake.exe
# ADD BASE BSC32 /nologo
# ADD BSC32 /nologo
LINK32=link.exe
# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
# ADD LINK32 kernel32.lib libswish-e.lib libswishindex.lib ../expat/xmltok/Release/xmltok.lib ../expat/xmlparse/Release/xmlparse.lib ../../../libxml2/lib/libxml2.lib ../../../zlib/lib/zdll.lib ../../../pcre/lib/pcreposix.lib /nologo /subsystem:console /debug /machine:I386 /out:"swish-e.exe" /pdbtype:sept
# SUBTRACT LINK32 /pdb:none

!ENDIF 

# Begin Target

# Name "swishe - Win32 Release"
# Name "swishe - Win32 Debug"
# Begin Group "Source Files"

# PROP Default_Filter "cpp;c;cxx;rc;def;r;odl;idl;hpj;bat"
# Begin Source File

SOURCE=..\dump.c
# End Source File
# Begin Source File

SOURCE=..\keychar_out.c
# End Source File
# Begin Source File

SOURCE=..\result_output.c
# End Source File
# Begin Source File

SOURCE=..\swish.c
# End Source File
# End Group
# Begin Group "Header Files"

# PROP Default_Filter "h;hpp;hxx;hm;inl"
# Begin Source File

SOURCE=.\acconfig.h
# End Source File
# Begin Source File

SOURCE=..\check.h
# End Source File
# Begin Source File

SOURCE=..\compress.h
# End Source File
# Begin Source File

SOURCE=..\config.h
# End Source File
# Begin Source File

SOURCE=.\config.h
# End Source File
# Begin Source File

SOURCE=..\date_time.h
# End Source File
# Begin Source File

SOURCE=..\deflate.h
# End Source File
# Begin Source File

SOURCE=.\dirent.h
# End Source File
# Begin Source File

SOURCE=..\docprop.h
# End Source File
# Begin Source File

SOURCE=..\error.h
# End Source File
# Begin Source File

SOURCE=..\extprog.h
# End Source File
# Begin Source File

SOURCE=..\file.h
# End Source File
# Begin Source File

SOURCE=..\filter.h
# End Source File
# Begin Source File

SOURCE=..\fs.h
# End Source File
# Begin Source File

SOURCE=..\hash.h
# End Source File
# Begin Source File

SOURCE=..\html.h
# End Source File
# Begin Source File

SOURCE=..\http.h
# End Source File
# Begin Source File

SOURCE=..\httpserver.h
# End Source File
# Begin Source File

SOURCE=..\index.h
# End Source File
# Begin Source File

SOURCE=..\keychar_out.h
# End Source File
# Begin Source File

SOURCE=..\list.h
# End Source File
# Begin Source File

SOURCE=..\lst.h
# End Source File
# Begin Source File

SOURCE=..\mem.h
# End Source File
# Begin Source File

SOURCE=..\merge.h
# End Source File
# Begin Source File

SOURCE=..\metanames.h
# End Source File
# Begin Source File

SOURCE=..\parse_conffile.h
# End Source File
# Begin Source File

SOURCE=..\result_output.h
# End Source File
# Begin Source File

SOURCE=..\result_sort.h
# End Source File
# Begin Source File

SOURCE=..\search.h
# End Source File
# Begin Source File

SOURCE=..\search_alt.h
# End Source File
# Begin Source File

SOURCE=..\soundex.h
# End Source File
# Begin Source File

SOURCE=..\stemmer.h
# End Source File
# Begin Source File

SOURCE=..\string.h
# End Source File
# Begin Source File

SOURCE=..\swish.h
# End Source File
# Begin Source File

SOURCE=..\txt.h
# End Source File
# Begin Source File

SOURCE=..\xml.h
# End Source File
# End Group
# Begin Group "Resource Files"

# PROP Default_Filter "ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe"
# End Group
# End Target
# End Project
�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/array.h���������������������������������������������������������������������������0000664�0000771�0001750�00000003263�11166010110�012075� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/* 
$Id: array.h 1946 2007-10-22 14:56:35Z karpet $

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL

*/


typedef struct ARRAY_Page
{
    sw_off_t           next;    /* Next Page */

    sw_off_t           page_number;
    int                modified;
    int                in_use;

    struct ARRAY_Page  *next_cache;

    unsigned char data[0];        /* Page data */
} ARRAY_Page;

#define ARRAY_CACHE_SIZE 97

typedef struct ARRAY
{
    sw_off_t root_page;
    int page_size;
    struct ARRAY_Page *cache[ARRAY_CACHE_SIZE];
    int levels;

    FILE *fp;
} ARRAY;

ARRAY *ARRAY_Create(FILE *fp);
ARRAY *ARRAY_Open(FILE *fp, sw_off_t root_page);
sw_off_t ARRAY_Close(ARRAY *bt);
int ARRAY_Put(ARRAY *b, int index, unsigned long value);
unsigned long ARRAY_Get(ARRAY *b, int index);
���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/soundex.c�������������������������������������������������������������������������0000664�0000771�0001750�00000013175�11166010110�012442� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*

$Id: soundex.c 1736 2005-05-12 15:41:22Z karman $

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.


Tue May 10 08:00:20 CDT 2005
added GPL and removed OpenSA copyright per David Norris, author
*********************************************************************************
**
 * Reference: Adapted from Knuth, D.E. (1973) The art of computer programming;
 *    Volume 3: Sorting and searching.  Addison-Wesley Publishing Company:
 *    Reading, Mass. Page 392.
 *
 * 1. Retain the first letter of the name, and drop all occurrences of
 *    a, e, h, i, o, u, w, y in other positions.
 *
 * 2. Assign the following numbers to the remaining letters after the first:
 *      b, f, p, v -> 1                         l -> 4
 *      c, g, j, k, q, s, x, z -> 2             m, n -> 5
 *      d, t -> 3                               r -> 6
 *
 * 3. If two or more letters with the same code were adjacent in the original
 *    name (before step 1), omit all but the first.
 *
 * 4. Convert to the form ``letter, digit, digit, digit'' by adding trailing
 *    zeros (if there are less than three digits), or by dropping rightmost
 *    digits (if there are more than three).
 *
 * The examples given in the book are:
 *
 *      Euler, Ellery           E460
 *      Gauss, Ghosh            G200
 *      Hilbert, Heilbronn      H416
 *      Knuth, Kant             K530
 *      Lloyd, Ladd             L300
 *      Lukasiewicz, Lissajous  L222
 *
 * Most algorithms fail in two ways:
 *  1. they omit adjacent letters with the same code AFTER step 1, not before.
 *  2. they do not omit adjacent letters with the same code at the beginning
 *     of the name.
 *
 */

#include "swish.h"
#include "stemmer.h"  /* For constants */
#include "swstring.h" /* for estrdup */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

#include "soundex.h"

FUZZY_WORD *soundex( FUZZY_OBJECT *fi, const char *inword)
   {
        FUZZY_WORD *fw = create_fuzzy_word( inword, 1 ); /* create place to store stemmed word */
        char word[MAXWORDLEN+1];
	/* Misc Stuff  */
	char u, l ;
	int i, j, n;
	/* Resultant Sound Code  */
	char soundCode[5] = "0000\0";
	/* Group Number Lookup Table  */
	static char soundTable[26] =
	{0,						/* A  */
	 '1',					/* B  */
	 '2',					/* C  */
	 '3',					/* D  */
	 0,						/* E  */
	 '1',					/* F  */
	 '2',					/* G  */
	 0,						/* H  */
	 0,						/* I  */
	 '2',					/* J  */
	 '2',					/* K  */
	 '4',					/* L  */
	 '5',					/* M  */
	 '5',					/* N  */
	 0,						/* O  */
	 '1',					/* P  */
	 '2',					/* Q  */
	 '6',					/* R  */
	 '2',					/* S  */
	 '3',					/* T  */
	 0,						/* U  */
	 '1',					/* V  */
	 0,						/* W  */
	 '2',					/* X  */
	 0,						/* Y  */
	 '2'};					/* Z  */

    /* Make sure the word is not too large from the start. */
    if ( strlen( inword ) >= MAXWORDLEN )
    {
        fw->error =  STEM_WORD_TOO_BIG;
        return fw;
    }


    /* make working copy */
    strcpy( word, inword );

  

#ifdef _DEBUG
	/* Debug to console  */
	printf("# %15s: %s ", "soundex.c", word);
#endif

	/* Make sure it actually starts with a letter  */
	if(!isalpha((int)((unsigned char)word[0]))) 
        {
            fw->error = STEM_NOT_ALPHA;
            return fw;
        }

#ifdef _DEBUG
	/* Debug to console  */
	printf("isalpha, ");
#endif
	
	/* Get string length and make sure its at least 3 characters  */
	if((n = (int)strlen(word)) < 3) 
        {
            fw->error = STEM_TOO_SMALL;
            return fw;
        }
#ifdef _DEBUG
	/* Debug to console  */
	printf("=>3, ");
#endif

        /* If looks like a 4 digit soundex code we don't want to touch it. */

        /* Humm.  Just because it looks like a duck, doesn't mean it is one
         * The source is suppose to not be soundex, so this doesn't make a lot of sense.  - moseely */
#ifdef skip_section
        
        if((n = (int)strlen(word)) == 4){
                if( isdigit( (int)(unsigned char)word[1] ) 
                 && isdigit( (int)(unsigned char)word[2] ) 
                 && isdigit( (int)(unsigned char)word[3] ) )
                       return STEM_OK;  /* Hum, probably not right */
        }
#endif

	/* Convert chars to lower case and strip non-letter chars  */
	j = 0;
	for (i = 0; i < n; i++) {
		u = tolower((unsigned char)word[i]);
		if ((u > 96) && (u < 123)) {
			 word[j] = u;
			j++;
		}
	}

	/* terminate string  */
	 word[j] = 0;

	/* String length again  */
	n = strlen(word);

	soundCode[0] = word[0];

	/* remember first char  */
	l = soundTable[((word[0]) - 97)];

	j = 1;

	/* build soundex string  */
	for (i = 1; i < n && j < 4; i++) {
		u = soundTable[((word[i]) - 97)];

		if (u != l) {
			if (u != 0) {
				soundCode[(int) j++] = u;
			}
			l = u;
		}
	}


    fw->free_strings = 1; /* flag that we are creating a string */
    fw->string_list[0] = estrdup( soundCode );
    return fw;

}
���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/merge.c���������������������������������������������������������������������������0000664�0000771�0001750�00000106604�11166010110�012054� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*

$Id: merge.c 1945 2007-10-22 14:54:07Z karpet $


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 15:07:32 CDT 2005
** added GPL
    
**-----------------------------------------------------------------
**
**  rewritten from scratch - moseley Oct 17, 2001
**
*/

#include <assert.h>             /* for bug hunting */
#include "swish.h"
#include "mem.h"
#include "swstring.h"
#include "merge.h"
#include "error.h"
#include "search.h"
#include "index.h"
#include "hash.h"
#include "file.h"
#include "docprop.h"
#include "list.h"
#include "compress.h"
#include "metanames.h"
#include "db.h"
#include "dump.h"
#include "result_sort.h"
#include "swish_qsort.h"
#include "result_output.h"
#include "parse_conffile.h"
#include "stemmer.h"
#include "headers.h"

static void dup_header( SWISH *sw_input, SWISH *sw_output );
static void check_header_match( IndexFILE *in_index, SWISH *sw_output );
static void make_meta_map( IndexFILE *in_index, SWISH *sw_output);
static void load_filename_sort( SWISH *sw, IndexFILE *cur_index );
static IndexFILE *get_next_file_in_order( SWISH *sw_input );
static void add_file( FILE *filenum_map, IndexFILE *cur_index, SWISH *sw_output );
static int *get_map( FILE *filenum_map, IndexFILE *cur_index );
static void dump_index_words(SWISH * sw, IndexFILE * indexf, SWISH *sw_output );
static void write_word_pos( IndexFILE *indexf, SWISH *sw_output, int *file_num_map, int filenum, ENTRY *e, int metaID, unsigned int posdata );


// #define DEBUG_MERGE

/****************************************************************************
*  merge_indexes -- reads from input indexes, and outputs a new index
*
*
*****************************************************************************/

void merge_indexes( SWISH *sw_input, SWISH *sw_output )
{
    IndexFILE   *cur_index;
    FILE        *filenum_map;
    char        *tmpfilename;
    struct MOD_Index *idx_output = sw_output->Index;
    ENTRY       *e, *prev;
    int          hash,
                 sz_worddata,
                 saved_bytes,
                 tmpval,
                 filenum,
                 metaID = 0,
                 frequency,
                 loc_count = 0,
                 word_count = 0;
    sw_off_t     wordID;
    int          metadata_length = 0;
    unsigned char   *worddata;
    unsigned char   *s, *start;
    unsigned char   flag;
    unsigned int          local_posdata[MAX_STACK_POSITIONS];
    unsigned int         *posdata;
    int          i;

    /*******************************************************************************
    * Get ready to merge the indexes.  For each index:
    *   - check that it has the correct headers
    *   - create meta entries in output index, and create a map to convert metas
    *   - load an array of file numbers sorted by filename so can merge sort the filesnames
    *   - set some initial defaults.
    *********************************************************************************/

    cur_index = sw_input->indexlist;
    while( cur_index  )
    {
        printf("Input index '%s' has %d files and %d words\n", cur_index->line, cur_index->header.totalfiles, cur_index->header.totalwords);

        if ( cur_index == sw_input->indexlist )
            /* Duplicate the first index's header into the output index */
            dup_header( sw_input, sw_output );
        else
            check_header_match( cur_index, sw_output );  // errors if headers don't match - don't really need to check first one since it was the one that was dupped


        make_meta_map( cur_index, sw_output);        // add metas to new index, and create map

        load_filename_sort( sw_input, cur_index );   // so can read in filename order

        cur_index->current_file = 0;
        cur_index->cur_prop = NULL;

#ifdef DEBUG_MERGE
        dump_metanames( sw_input, cur_index, 1 );
        dump_metanames( sw_output, sw_output->indexlist, 0 );
#endif

        cur_index = cur_index->next;
    }


#ifdef DEBUG_MERGE
    printf("----- Output Header (requires -H9) ----------\n");
    print_index_headers( sw_output->indexlist );
    printf("\n\n");
#endif



    /****************************************************************************
    *  Now, read in filename order (so can throw out duplicates)
    *  - read properties and write out to new index
    *  - write a temporay of records to identify
    *       - indexfile
    *       - old filenum to new filenum mapping
    *       - total words per file, if set
    ****************************************************************************/

    /* place to store file number map and total words per file */
    filenum_map = create_tempfile(sw_input, F_WRITE_BINARY, "fnum", &tmpfilename, 0 );

    while( (cur_index = get_next_file_in_order( sw_input )) )
        add_file( filenum_map, cur_index, sw_output );



    /* Don't need the pre-sorted indexes any more */
    for ( cur_index = sw_input->indexlist; cur_index; cur_index = cur_index->next )
    {
        efree( cur_index->path_order );
        cur_index->path_order = NULL;
    }

    fclose( filenum_map );

    if ( !(filenum_map = fopen( tmpfilename, F_READ_BINARY )) )
        progerrno("failed to reopen '%s' :", tmpfilename );



    /****************************************************************************
    *  Finally, read the indexes one-by-one to read word and position data
    *  - reads through the temp file for each index to build a filenumber map
    *
    ****************************************************************************/

    /* 08/2002 jmruiz
    ** First of all, get all the words
    */
    cur_index = sw_input->indexlist;
    while( cur_index )
    {
        dump_index_words(sw_input, cur_index, sw_output);
        /* Get filr_num_map for later proccess */
        cur_index->merge_file_num_map = get_map( filenum_map, cur_index );
        cur_index = cur_index->next;
    }

    /* At this point we have all the words. Now we have to get worddata
    * and merge it
    */
    word_count = 0;
    printf("Processing words in index '%s': %6d words\r", sw_output->indexlist->line, word_count);
    fflush(stdout);
    /* walk the hash list to merge worddata */
    for (hash = 0; hash < VERYBIGHASHSIZE; hash++)
    {
        if (idx_output->hashentriesdirty[hash])
        {
            idx_output->hashentriesdirty[hash] = 0;
            for (e = idx_output->hashentries[hash]; e; e = e->next)
            {
                word_count++;
                /* Search the word in all index and get worddata */
                cur_index = sw_input->indexlist;
                while( cur_index )
                {
                    DB_ReadWordHash(sw_input, e->word, &wordID, cur_index->DB);
                    /* If word exits in the index */
                    if(wordID)
                    {

                        DB_ReadWordData(sw_input, wordID, &worddata, &sz_worddata, &saved_bytes, cur_index->DB);
                        uncompress_worddata(&worddata,&sz_worddata,saved_bytes);

                        /* Now, parse word's data */
                        s = worddata;
                        tmpval = uncompress2(&s);     /* tfrequency */
                        metaID = uncompress2(&s);     /* metaID */

                        if (metaID)
                        {
                            metadata_length = uncompress2(&s);
                        }

                        filenum = 0;
                        start = s;

                        while(1)
                        {                   /* Read on all items */
                            uncompress_location_values(&s,&flag,&tmpval,&frequency);
                            filenum += tmpval;
                            /* Use stack array when possible to avoid malloc/free overhead */
                            if(frequency > MAX_STACK_POSITIONS)
                                posdata = (unsigned int *) emalloc(frequency * sizeof(int));
                            else
                                posdata = local_posdata;

                            /* Read the positions */
                            uncompress_location_positions(&s,flag,frequency,posdata);


                            /* now we have the word data */
                            for (i = 0; i < frequency; i++, loc_count++)
                                write_word_pos( cur_index, sw_output, cur_index->merge_file_num_map, filenum, e, metaID, posdata[i]);

                            if(e->tfrequency)
                            {
                                /* 08/2002 jmruiz - We will call CompressCurrentLocEntry from time
                                ** to time to help addentry.
                                ** If we do not do this, addentry routine will have to run linked lists
                                ** of positions with thousands of elements and makes the merge proccess
                                ** very slow
                                */
                                if(!(loc_count % 100))
                                    CompressCurrentLocEntry(sw_output, e);
                            }


                            if(posdata != local_posdata)
                                efree(posdata);

                            /* Check for enf of worddata */
                            if ((s - worddata) == sz_worddata)
                                break;   /* End of worddata */

                            /* Check for end of current metaID data */
                            if ( metadata_length == (s - start))
                            {
                                filenum = 0;
                                metaID = uncompress2(&s);
                                metadata_length = uncompress2(&s);
                                start = s;
                            }
                        }

                        if(e->tfrequency)
                            CompressCurrentLocEntry(sw_output, e);

                        efree(worddata);
                    }
                    cur_index = cur_index->next;
                }
                /* Let's coalesce locations for each word to save memory
                ** This makes use of the -e feature
                ** Because we are proccessing one word at a time we can
                ** coalesce its data just once
                */
                coalesce_word_locations(sw_output,e);

                if(!(word_count % 1000))
                {
                    /* Make zone available for reuse and save memory */
                    Mem_ZoneReset(sw_output->Index->currentChunkLocZone);
                    sw_output->Index->freeLocMemChain = NULL;
                    printf("Processing words in index '%s': %6d words\r", sw_output->indexlist->line, word_count);
                }
            }
        }
    }

    printf("Processing words in index '%s': %6d words\n", sw_output->indexlist->line, word_count);
    fflush(stdout);

    cur_index = sw_input->indexlist;
    while( cur_index )
    {
        /* free the maps */
        efree( cur_index->merge_file_num_map );
        efree( cur_index->meta_map );
        cur_index->meta_map = NULL;
        cur_index = cur_index->next;
    }


#ifdef DEBUG_MERGE
    printf("----- Final Output Header (requires -H9) ----------\n");
    print_index_headers( sw_output->indexlist );
#endif

    remove( tmpfilename );
    efree( tmpfilename );


    /* 2002/09 MERGE fix jmruiz */
    /* Finally, remove words from the hash array with tfrequncy == 0 */
    /* walk the hash list to merge worddata */
    for (word_count = 0, hash = 0; hash < VERYBIGHASHSIZE; hash++)
    {
        for (prev = NULL, e = idx_output->hashentries[hash]; e; e = e->next)
        {
            if( ! e->tfrequency )
            {
                word_count++;
                if( ! prev)   /* First in list */
                {
                    idx_output->hashentries[hash] = e->next;
                }
                else
                {
                    prev->next = e->next;
                }
                /* Adjust counters */
                idx_output->entryArray->numWords--;
                sw_output->indexlist->header.totalwords--;
            }
            else
            {
                prev = e;
            }
        }
    }
    printf("Removed %6d words no longer present in docs for index '%s'\n",
       word_count, sw_output->indexlist->line);

    /* 2002/09 MERGE FIX end */



}

/****************************************************************************
*  dup_header -- duplicates a header
*
*  rereads the header from the data base, and clears out some values
*
*****************************************************************************/

static void dup_header( SWISH *sw_input, SWISH *sw_output )
{
    INDEXDATAHEADER *out_header = &sw_output->indexlist->header;

    // probably need to free the sw_output header from what's created in swishnew.

    /* Read in the header from the first merge file and store in the output file */
    read_header(sw_input, out_header, sw_input->indexlist->DB);

    out_header->totalfiles = 0;

    /* $$$ This needs to be fixed */
    out_header->removedfiles = 0;
    out_header->removed_word_positions = 0;
    out_header->totalwords = 0;

    freeMetaEntries( out_header );

    /* Remove the date from the index */

    if ( out_header->indexedon )
    {
        efree( out_header->indexedon );
        out_header->indexedon = NULL;
        out_header->lenindexedon = 0;
    }
}

/****************************************************************************
*  check_header_match -- makes sure that the imporant settings match
*
*
*****************************************************************************/

// This assumes that the size will always preceed the content.
typedef struct
{
    int     len;
    char    *str;
} *HEAD_CMP;

static void compare_header( char *index, char *name, void *in, void *out )
{
    HEAD_CMP    in_item = (HEAD_CMP)in;
    HEAD_CMP    out_item = (HEAD_CMP)out;

    if ( in_item->len != out_item->len )
        progerr("Header %s in index %s doesn't match length in length with output header", name, index );

    if ( strcmp( (const char *)in_item->str, (const char *)out_item->str ))
        progerr("Header %s in index %s doesn't match output header", name, index );

    //if ( memcmp( (const void *)in_item->str, (const void *)out_item->str, in_item->len ) )
    //    progerr("Header %s in index %s doesn't match output header", name, index );




}


static void check_header_match( IndexFILE *in_index, SWISH *sw_output )
{
    INDEXDATAHEADER *out_header = &sw_output->indexlist->header;
    INDEXDATAHEADER *in_header = &in_index->header;

    compare_header( in_index->line, "WordCharacters", &in_header->lenwordchars,  &out_header->lenwordchars );
    compare_header( in_index->line, "BeginCharacters", &in_header->lenbeginchars,  &out_header->lenbeginchars );
    compare_header( in_index->line, "EndCharacters", &in_header->lenendchars,  &out_header->lenendchars );

    compare_header( in_index->line, "IgnoreLastChar", &in_header->lenignorelastchar,  &out_header->lenignorelastchar );
    compare_header( in_index->line, "IgnoreFirstChar", &in_header->lenignorefirstchar,  &out_header->lenignorefirstchar );

    compare_header( in_index->line, "BumpPositionChars", &in_header->lenbumpposchars,  &out_header->lenbumpposchars );


    if ( fuzzy_mode_value(in_header->fuzzy_data) != fuzzy_mode_value(out_header->fuzzy_data) )
        progerr("FuzzyIndexingMode in index %s of '%s' doesn't match '%s'",
            in_index->line,
            fuzzy_string( in_header->fuzzy_data ),
            fuzzy_string( out_header->fuzzy_data ));

    if ( in_header->ignoreTotalWordCountWhenRanking != out_header->ignoreTotalWordCountWhenRanking )
        progerr("ignoreTotalWordCountWhenRanking Rules doesn't match for index %s", in_index->line );

    if ( memcmp( &in_header->translatecharslookuptable, &out_header->translatecharslookuptable, sizeof(in_header->translatecharslookuptable) / sizeof( int ) ) )
        progerr("TranslateChars header doesn't match for index %s", in_index->line );


    //??? need to compare stopword lists

    //??? need to compare buzzwords

}

/****************************************************************************
*  make_meta_map - adds metanames to output index and creates map
*
*
*****************************************************************************/

static void make_meta_map( IndexFILE *in_index, SWISH *sw_output)
{
    INDEXDATAHEADER *out_header = &sw_output->indexlist->header;
    INDEXDATAHEADER *in_header = &in_index->header;
    int             i;
    struct metaEntry *in_meta;
    struct metaEntry *out_meta;
    int             *meta_map;

    meta_map = emalloc( sizeof( int ) * (in_header->metaCounter + 1) );
    memset( meta_map, 0, sizeof( int ) * (in_header->metaCounter + 1) );

    for( i = 0; i < in_header->metaCounter; i++ )
    {
        in_meta = in_header->metaEntryArray[i];


        /* Try to see if it's an existing metaname */
        out_meta = is_meta_index( in_meta )
                   ? getMetaNameByNameNoAlias( out_header, in_meta->metaName )
                   : getPropNameByNameNoAlias( out_header, in_meta->metaName );



        /* if meta from input header is not found in the output header then add it */
        if ( !out_meta )
            out_meta = cloneMetaEntry( out_header, in_meta ); /* can't fail */


        /* Validate that the two metas are indeed the same */
        /* This should be done in metanames.c, but error messages are harder */

        if (out_meta->metaType != in_meta->metaType )
            progerr("meta name %s in index %s is different type than in output index", in_meta->metaName, in_index->line );

        if (out_meta->sort_len != in_meta->sort_len )
            progerr("meta name %s in index %s has different sort length than in output index", in_meta->metaName, in_index->line );

        if (out_meta->rank_bias != in_meta->rank_bias )
            progerr("meta name %s in index %s is different rank bias than in output index", in_meta->metaName, in_index->line );




        /* Now, save the mapping */
        meta_map[ in_meta->metaID ] = out_meta->metaID;


        /* 
         * now here's a pain, and lots of room for screw up.
         * Basically, check for alias mappings, and that they are correct
         * you can say title is an alias for swishtitle in one index, and then say
         * title is an alias for doctitle in another index, which would be an error.
         * So, if title is an alias for swishtitle, then the output index either
         * needs to have that alias already, or it must be created.
         */

        if ( in_meta->alias )
        {
            struct metaEntry *in_alias;
            struct metaEntry *out_alias;

            /* Grab alias meta entry so we can look it up in the out_header */

            in_alias = is_meta_index( in_meta )
                   ? getMetaNameByID( in_header, in_meta->alias )
                   : getPropNameByID( in_header, in_meta->alias );


            /* This should not happen -- it would be a very broken input header */
            if ( !in_alias )
                progerr("Failed to lookup alias for %s in index %s", in_meta->metaName, in_index->line );


            /* now lookup the alias in the out_header by name */
            out_alias = is_meta_index( in_alias )
                   ? getMetaNameByNameNoAlias( out_header, in_alias->metaName )
                   : getPropNameByNameNoAlias( out_header, in_alias->metaName );


            /* 
             * should be there, since it would have been added earlier 
             * the real metas must be added before the aliases 
             * */

            if ( !out_alias )
                progerr("Failed to lookup alias for %s in output index", out_meta->metaName );


            /* If this is new (or doesn't point to the alias root, then just assign it */
            if ( !out_meta->alias )
                out_meta->alias = out_alias->metaID;

            /* else, if it is already an alias, but points someplace else, we have a problem */
            else if ( out_meta->alias != out_alias->metaID )
                progerr("In index %s metaname '%s' is an alias for '%s'(%d).  But another input index already mapped '%s' to '%s'(%d)", 
                        in_index->line, in_meta->metaName, in_alias->metaName, in_alias->metaID,
                        out_meta->metaName,
                        is_meta_index( out_meta )
                            ?  getMetaNameByID( out_header,  out_meta->alias )->metaName
                            :  getPropNameByID( out_header,  out_meta->alias )->metaName,
                        out_meta->alias
                        );
        }
    }

    in_index->meta_map = meta_map;


#ifdef DEBUG_MERGE
    printf(" %s   ->   %s  ** Meta Map **\n", in_index->line, sw_output->indexlist->line );
    for ( i=0; i<in_header->metaCounter + 1;i++)
        printf("%4d  ->  %3d\n", i, meta_map[i] );
#endif

}

/****************************************************************************
*  load_filename_sort - creates an array for reading in filename order
*
*
*****************************************************************************/

static int  *sorted_data;  /* Static array to make the qsort function a bit quicker */

static int     compnums(const void *s1, const void *s2)
{
    int         a = *(int *)s1; // filenumber passed from qsort
    int         b = *(int *)s2;
    int         v1 = sorted_data[ a-1 ];
    int         v2 = sorted_data[ b-1 ];

    // return v1 <=> v2;

    if ( v1 < v2 )
        return -1;
    if ( v1 > v2 )
        return 1;

    return 0;
}

/******************************************************************************
* load_filename_sort -
*
*   Creates an array used for sorting file names.
*   Uses the pre-sorted array, if available, otherwise, creates one.
*
*******************************************************************************/

static void load_filename_sort( SWISH *sw, IndexFILE *cur_index )
{
    struct metaEntry *path_meta = getPropNameByName( &cur_index->header, AUTOPROPERTY_DOCPATH );
    int         i;
    int         *sort_array;
    int         totalfiles = cur_index->header.totalfiles;

    if ( !path_meta )
        progerr("Can't merge index %s.  It doesn't contain the property %s", cur_index->line, AUTOPROPERTY_DOCPATH );


    /* Save for looking up pathname when sorting */
    cur_index->path_meta = path_meta;

    /* Case is important for most OS when comparing file names */
    cur_index->path_meta->metaType &= ~META_IGNORE_CASE;



    cur_index->modified_meta = getPropNameByName( &cur_index->header, AUTOPROPERTY_LASTMODIFIED );


    /*
     * Since USE_PRESORT_ARRAY has a different internal format that what is generated
     * by CreatePropeSortArray() we must ALWAYS create an actual integer
     * array total_files long.
     * 
     * $$$ The problem is that with USE_PRESORT_ARRAY the format is different
     *     before and after saving the array to disk
     */

#ifdef USE_PRESORT_ARRAY
    if ( 1 )
#else
    if ( !LoadSortedProps( cur_index, path_meta ) )
#endif

    {
        FileRec fi;
        memset( &fi, 0, sizeof( FileRec ));
        path_meta->sorted_data = CreatePropSortArray( cur_index, path_meta, &fi, 1 );
    }


    /* So the qsort compare function can read it */
    sorted_data = path_meta->sorted_data;


    if ( !sorted_data )
        progerr("failed to load or create sorted properties for index %s", cur_index->line );


    sort_array = emalloc(  totalfiles * sizeof( int ) );
    memset( sort_array, 0, totalfiles * sizeof( int ) );


    /* build an array with file numbers and sort into filename order */
    for ( i = 0; i < totalfiles; i++ )
        sort_array[i] = i+1;  // filenumber starts a one


    swish_qsort( sort_array, totalfiles, sizeof( int ), &compnums);

    cur_index->path_order = sort_array;

    /* $$$ can this be freeded when using BTREE??? */
    efree( path_meta->sorted_data );
    path_meta->sorted_data = NULL;
}

/****************************************************************************
*  get_next_file_in_order -- grabs the next file entry from all the indexes
*  in filename (and then modified date) order
*
*
*****************************************************************************/

/* This isn't really accurate, as some other file may come and replace the newer */

static void print_file_removed(IndexFILE *older, propEntry *op, IndexFILE *newer, propEntry *np )
{

    char *p1, *d1, *p2, *d2;
    p1 = DecodeDocProperty( older->path_meta, older->cur_prop );
    d1 = DecodeDocProperty( older->modified_meta, op );

    p2 = DecodeDocProperty( newer->path_meta, newer->cur_prop );
    d2 = DecodeDocProperty( newer->modified_meta, np );

    printf("Replaced file '%s:%s %s' with '%s:%s %s'\n",
         older->line,
         *p1 ? p1 : "(file name not defined)",
         *d1 ? d1 : "(date not defined)",
         newer->line,
         *p2 ? p2 : "(file name not defined)",
         *d2 ? d2 : "(date not defined)"
    );

    efree( p1 );
    efree( d1 );
    efree( p2 );
    efree( d2 );

}


static IndexFILE *get_next_file_in_order( SWISH *sw_input )
{
    IndexFILE   *winner = NULL;
    IndexFILE   *cur_index = sw_input->indexlist;
    FileRec     fi;
    int         ret;
    propEntry   *wp, *cp;

    memset(&fi, 0, sizeof( FileRec ));

    for ( cur_index = sw_input->indexlist; cur_index; cur_index = cur_index->next )
    {
        /* don't use cached props, as they belong to a different index! */
        if ( fi.prop_index )
            efree( fi.prop_index );
        memset(&fi, 0, sizeof( FileRec ));

        /* still some to read in this index? */
        if ( cur_index->current_file >= cur_index->header.totalfiles )
            continue;



        /* get file number from lookup table */
        fi.filenum = cur_index->path_order[cur_index->current_file];

        if ( !cur_index->cur_prop )
            cur_index->cur_prop = ReadSingleDocPropertiesFromDisk(cur_index, &fi, cur_index->path_meta->metaID, 0 );


        if ( !winner )
        {
            winner = cur_index;
            continue;
        }

        ret = Compare_Properties( cur_index->path_meta, cur_index->cur_prop, winner->cur_prop );

        if ( ret != 0 )
        {
            if ( ret < 0 )  /* take cur_index if it's smaller */
                winner = cur_index;

            continue;
        }



        /* if they are the same name, then take the newest, and increment the older one */


        /* read the modified time for the current file */
        /* Use the same fi record, because it has the cached prop seek locations */
        cp = ReadSingleDocPropertiesFromDisk(cur_index, &fi, cur_index->modified_meta->metaID, 0 );


        /* read the modified time for the current winner */
        if ( fi.prop_index )
            efree( fi.prop_index );
        memset(&fi, 0, sizeof( FileRec ));

        fi.filenum = winner->path_order[winner->current_file];
        wp = ReadSingleDocPropertiesFromDisk(winner, &fi, cur_index->modified_meta->metaID, 0 );

        ret = Compare_Properties( cur_index->modified_meta, cp, wp );



        /* If current is greater (newer) then throw away winner */
        if ( ret > 0 )
        {
            print_file_removed( winner, wp, cur_index, cp);
            winner->current_file++;
            if ( winner->cur_prop )
                efree( winner->cur_prop );
            winner->cur_prop = NULL;
            winner = cur_index;
        }
        /* else, keep winner, and throw away current */
        else
        {
            print_file_removed(cur_index, cp, winner, wp );
            cur_index->current_file++;
            if ( cur_index->cur_prop )
                efree( cur_index->cur_prop );

            cur_index->cur_prop = NULL;
        }

        freeProperty( cp );
        freeProperty( wp );

    }

    if ( fi.prop_index )
        efree( fi.prop_index );


    if ( !winner )
        return NULL;


    winner->filenum = winner->path_order[winner->current_file++];

#ifdef DEBUG_MERGE
printf("   Files in order: index %s file# %d winner\n", winner->line, winner->filenum );
#endif

    /* free prop, as it's not needed anymore */
    if ( winner->cur_prop )
        efree( winner->cur_prop );
    winner->cur_prop = NULL;


    return winner;
}


/****************************************************************************
*  add_file
*
*  Now, read in filename order (so can throw out duplicates)
*  - read properties and write out to new index
*  - write a temporay of records to identify
*       - indexfile
*       - old filenum to new filenum mapping
*       - total words per file, if set
****************************************************************************/

static void add_file( FILE *filenum_map, IndexFILE *cur_index, SWISH *sw_output )
{
    FileRec             fi;
    IndexFILE           *indexf = sw_output->indexlist;
    struct MOD_Index    *idx = sw_output->Index;
    docProperties       *d;
    int                 i;
    propEntry           *tmp;
    docProperties       *docProperties=NULL;
    struct metaEntry    meta_entry;


    meta_entry.metaName = "(default)";  /* for error message, I think */


    memset( &fi, 0, sizeof( FileRec ));


#ifdef DEBUG_MERGE
    printf("Reading Properties from input index '%s' file %d\n", cur_index->line, cur_index->filenum);
#endif

    /* read the properties and map them as needed */
    d = ReadAllDocPropertiesFromDisk( cur_index, cur_index->filenum );


#ifdef DEBUG_MERGE
    fi.docProperties = d;
    dump_file_properties( cur_index, &fi );
#endif



    /* all this off-by-one things are a mess */

    /* read through all the property slots, and map them, as needed */
    for ( i = 0; i < d->n; i++ )
        if ( (tmp = d->propEntry[i]) )
        {
            meta_entry.metaID = cur_index->meta_map[ i ];
            addDocProperty(&docProperties, &meta_entry, tmp->propValue, tmp->propLen, 1 );
        }

#ifdef DEBUG_MERGE
    printf(" after mapping file %s\n", indexf->line);
    fi.docProperties = docProperties;
    dump_file_properties( cur_index, &fi );
    printf("\n");
#endif


    /* Now bump the file counter  */
    idx->filenum++;
    indexf->header.totalfiles++;

    if ( docProperties )  /* always true */
    {
        fi.filenum = idx->filenum;
        fi.docProperties = docProperties;

        WritePropertiesToDisk( sw_output , &fi );

        freeDocProperties( d );
    }




    /* now write out the data to be used for mapping file for a given index. */
    //    compress1( cur_index->filenum, filenum_map, fputc );   // what file number this came from

    if ( fwrite( &cur_index->filenum, sizeof(int), 1, filenum_map) != 1 )
        progerrno("Failed to write mapping data: ");

    if ( fwrite( &cur_index, sizeof(IndexFILE *), 1, filenum_map) != 1 )        // what index
        progerrno("Failed to write mapping data: ");


    /* Save total words per file */
    if ( !indexf->header.ignoreTotalWordCountWhenRanking )
    {
        INDEXDATAHEADER *header = &indexf->header;
        int idx1 = fi.filenum - 1;

        if ( !header->TotalWordsPerFile || idx1 >= header->TotalWordsPerFileMax )
        {
            header->TotalWordsPerFileMax += 20000;  /* random guess -- could be a config setting */
            header->TotalWordsPerFile = erealloc( header->TotalWordsPerFile, header->TotalWordsPerFileMax * sizeof(int) );
        }

        header->TotalWordsPerFile[idx1] = cur_index->header.TotalWordsPerFile[cur_index->filenum-1];
    }
}

/****************************************************************************
*  Builds a old_filenum -> new_filenum map;
*
*  This makes is so you can lookup an old file number and map it to a new file number
*
****************************************************************************/

static int *get_map( FILE *filenum_map, IndexFILE *cur_index )
{
    int         *array = emalloc( (cur_index->header.totalfiles+1) * sizeof( int ) );
    IndexFILE   *idf;
    int         filenum;
    int         new_filenum = 0;



    memset( array, 0, (cur_index->header.totalfiles+1) * sizeof( int ) );


    clearerr( filenum_map );
    fseek( filenum_map, 0, 0 );  /* start at beginning */

    while ( 1 )
    {
        new_filenum++;

        if (!fread( &filenum, sizeof(int), 1, filenum_map))
            break;


        if(!fread( &idf, sizeof(IndexFILE *), 1, filenum_map))
            break;

        if ( idf == cur_index )
            array[filenum] = new_filenum;

    }

    return array;
}

/****************************************************************************
*  Reads the index to get the all the words
****************************************************************************/

static void dump_index_words(SWISH * sw, IndexFILE * indexf, SWISH *sw_output)
{
    int         j;
    int         word_count = 0;
    char        word[2];
    char       *resultword;
    sw_off_t    wordID;

    DB_InitReadWords(sw, indexf->DB);


    printf("Getting words in index '%s': %3d words\r", indexf->line, word_count);
    fflush(stdout);

    for(j=0;j<256;j++)
    {

        word[0] = (unsigned char) j; word[1] = '\0';
        DB_ReadFirstWordInvertedIndex(sw, word,&resultword,&wordID,indexf->DB);

        while(wordID)
        {
            /* Add resultword to output */
            getentry(sw_output, resultword);
            efree(resultword);
            DB_ReadNextWordInvertedIndex(sw, word,&resultword,&wordID,indexf->DB);
            word_count++;
            if(!word_count % 10000)
                printf("Getting words in index '%s': %3d words\r", indexf->line, word_count);
        }
    }
    printf("Getting words in index '%s': %6d words\n", indexf->line, word_count);

    DB_EndReadWords(sw, indexf->DB);

}

/****************************************************************************
*  Writes a word out to the index
*
*
****************************************************************************/

static void write_word_pos( IndexFILE *indexf, SWISH *sw_output, int *file_num_map, int filenum, ENTRY *e, int metaID, unsigned int posdata )
{
    int         new_file;
    int         new_meta;

#ifdef DEBUG_MERGE
    printf("\nindex %s '%s' Struct: %d Pos: %d",
    indexf->line, e->word, GET_STRUCTURE(posdata), GET_POSITION(posdata) );


    if ( !(new_file = file_num_map[ filenum ]) )
    {
        printf("  file: %d **File deleted!**\n", filenum);
        return;
    }

    if ( !(new_meta = indexf->meta_map[ metaID ] ))
    {
        printf("  file: %d **Failed to map meta ID **\n", filenum);
        return;
    }

    printf("  File: %d -> %d  Meta: %d -> %d\n", filenum, new_file, metaID, new_meta );

    addentry( sw_output, e, new_file, GET_STRUCTURE(posdata), new_meta, GET_POSITION(posdata) );

    return;


#else


    if ( !(new_file = file_num_map[ filenum ]) )
        return;

    if ( !(new_meta = indexf->meta_map[ metaID ] ))
        return;

    addentry( sw_output, e, new_file, GET_STRUCTURE(posdata), new_meta, GET_POSITION(posdata) );

    return;

#endif


}

����������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/fs.h������������������������������������������������������������������������������0000664�0000771�0001750�00000003110�11166010110�011356� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/* 
fs.h
$Id: fs.h 1736 2005-05-12 15:41:22Z karman $

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL



*/
#ifndef __HasSeenModule_FS
#define __HasSeenModule_FS       1


#ifdef __cplusplus
extern "C" {
#endif


/*
   -- module data
*/

typedef struct
{
    regex_list  *pathname;
    regex_list  *dirname;
    regex_list  *filename;
    regex_list  *dircontains;
    regex_list  *title;
    
}
PATH_LIST;

struct MOD_FS
{
    PATH_LIST   filerules;
    PATH_LIST   filematch;
    int         followsymlinks;

};


void initModule_FS (SWISH *);
void freeModule_FS (SWISH *);
int  configModule_FS (SWISH *, StringList *);

#ifdef __cplusplus
}
#endif /* __cplusplus */



#endif
��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/parse_conffile.c������������������������������������������������������������������0000664�0000771�0001750�00000140321�11166010110�013726� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: parse_conffile.c 1945 2007-10-22 14:54:07Z karpet $
**

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 14:55:38 CDT 2005
** added GPL

*/

/* New file created from file.c 02/2001 jmruiz */
/* Contains routines for parsing the configuration file */

/*
** 2001-02-15 rasc    ResultExtFormatName
** 2001-03-03 rasc    EnableAltaVistaSyntax
**                    code optimize: getYesNoOrAbort
** 2001-03-13 rasc    SwishSearchOperators, SwishSearchDefaultRule
** 2001-03-16 rasc    TruncateDocSize nbytes
** 2001-04-09 rasc    Filters: options (opt.)
**
*/


#include <limits.h>     // for ULONG_MAX

#include "swish.h"
#include "swstring.h"
#include "mem.h"
#include "list.h"
#include "file.h"
#include "metanames.h"
#include "hash.h"
#include "error.h"
#include "entities.h"
#include "filter.h"
#include "index.h"
#include "search.h"
/* #include "search_alt.h" */
#include "parse_conffile.h"
#include "merge.h"   /* Argh, needed for docprop.h */
#include "docprop.h"
#include "result_output.h"
/* removed stuff
#include "deflate.h"
*/
#include "result_sort.h"
#include "db.h"
#include "extprog.h"
#include "stemmer.h"
#ifdef HAVE_ZLIB
#include <zlib.h>
#endif




static int read_integer( char *string,  char *message, int low, int high );
static void Build_ReplaceRules( char *name, char **params, regex_list **reg_list );
static  void add_ExtractPath( char * name, SWISH *sw, struct metaEntry *m, char **params );
static int     getDocTypeOrAbort(StringList * sl, int n);
static int parseconfline(SWISH *, StringList *);
static void get_undefined_meta_flags( char *w0, StringList * sl, UndefMetaFlag *setting );

static void readwordsfile(WORD_HASH_TABLE *table_ptr, char *stopw_file);
static void word_hash_config(StringList *sl, WORD_HASH_TABLE *table_ptr );

static char *read_line_from_file( int * linenum, FILE *fp );

void fuzzy_or_die( IndexFILE *indexf, char *mode )
{
    indexf->header.fuzzy_data = set_fuzzy_mode( indexf->header.fuzzy_data, mode );
    if ( !indexf->header.fuzzy_data )
        progerr("Invalid FuzzyIndexingMode '%s' in config file", mode );
}

/* Reads the configuration file and puts all the right options
** in the right variables and structures.
*/

void    getdefaults(SWISH * sw, char *conffile, int *hasdir, int *hasindex, int hasverbose)
{
    int     i,
            gotdir,
            gotindex;
    char   *line = NULL;
    FILE   *fp;
    int     linenumber = 0;
    int     baddirective = 0;
    StringList *sl;
    IndexFILE *indexf = NULL;
    unsigned char *StringValue = NULL;
    struct swline *tmplist;
    char   *w0;

    gotdir = gotindex = 0;

    if ((fp = fopen(conffile, F_READ_TEXT)) == NULL || !isfile(conffile))
        progerrno("Couldn't open the configuration file '%s': ", conffile);

    if ( sw->verbose >= 2 )
        printf("Parsing config file '%s'\n", conffile );


    /* Init default index file */
    addindexfile(sw, INDEXFILE);
    indexf = sw->indexlist;



    sl = NULL;

    while ( !feof( fp ) )
    {
        /* Free previous line */
        if ( line )
            efree( line );

        /* Read a line */
        line = read_line_from_file( &linenumber, fp );

        if ( sl )
            freeStringList(sl);

        /* Parse line */
        if (!(sl = parse_line(line)))
            continue;

        if (!sl->n)
            continue;

        w0 = sl->word[0];       /* Config Direct. = 1. word */

        if (w0[0] == '#')
            continue;           /* comment */



        if (strcasecmp(w0, "IndexDir") == 0)
        {
            if (sl->n > 1)
            {
                if (!*hasdir)
                {
                    gotdir = 1;
                    grabCmdOptions(sl, 1, &sw->dirlist);
                }
            }
            else
                progerr("%s: requires at least one value", w0);

            continue;
        }


        if (strcasecmp(w0, "IncludeConfigFile") == 0)
        {
            if (sl->n == 2)
            {
                normalize_path( sl->word[1] );
                getdefaults(sw, sl->word[1], hasdir, hasindex, hasverbose);
            }
            else
                progerr("%s: requires one value", w0);

            continue;
        }


        if (strcasecmp(w0, "NoContents") == 0)
        {
            if (sl->n > 1)
            {
                grabCmdOptions(sl, 1, &sw->nocontentslist);
            }
            else
                progerr("%s: requires at least one value", w0);

            continue;
        }


        if (strcasecmp(w0, "IndexFile") == 0)
        {
            if (!(*hasindex))
            {
                if (sl->n == 2)
                {
                    gotindex = 1;
                    if (indexf->line)
                        efree(indexf->line);
                    indexf->line = estrdup(sl->word[1]);
                    normalize_path( indexf->line );
                }
                else
                    progerr("%s: requires one value", w0);
            }

            continue;
        }


        if (strcasecmp(w0, "IndexReport") == 0)
        {
            if (sl->n == 2)
            {
                if (!hasverbose)
                    sw->verbose =  read_integer( sl->word[1], w0, 0, 4 );
            }
            else
                progerr("%s: requires one value", w0);
            continue;
        }

/* karman: it would be nice to be able to override the ParserWarnLevel in
           conf file via cmd line -W N
           right now, it is the opposite: conf file overrides -W at cmd line
*/

        if (strcasecmp(w0, "ParserWarnLevel") == 0)
        {
            if (sl->n == 2)
                sw->parser_warn_level = read_integer( sl->word[1], w0, 0, 9 );
            else
                progerr("%s: requires one value", w0);
            continue;
        }


        if (strcasecmp(w0, "obeyRobotsNoIndex") == 0)
        {
            sw->obeyRobotsNoIndex = getYesNoOrAbort(sl, 1, 1);
            continue;
        }


        if (strcasecmp(w0, "AbsoluteLinks") == 0)
        {
            sw->AbsoluteLinks = getYesNoOrAbort(sl, 1, 1);
            continue;
        }




        if (strcasecmp(w0, "MinWordLimit") == 0)
        {
            if (sl->n == 2)
            {
                indexf->header.minwordlimit = read_integer( sl->word[1], w0, 0, INT_MAX );
            }
            else
                progerr("%s: requires one value", w0);
            continue;
        }

        if (strcasecmp(w0, "MaxWordLimit") == 0)
        {
            if (sl->n == 2)
            {
                indexf->header.maxwordlimit = read_integer( sl->word[1], w0, 0, INT_MAX );
            }
            else
                progerr("%s: requires one value", w0);

            continue;
        }


        if (strcasecmp(w0, "IndexComments") == 0)
        {
            sw->indexComments = getYesNoOrAbort(sl, 1, 1);
            continue;
        }


        if (strcasecmp(w0, "IgnoreNumberChars") == 0)
        {
            if (sl->n == 2)
            {
                indexf->header.numberchars = SafeStrCopy(indexf->header.numberchars, sl->word[1], &indexf->header.lennumberchars);
                sortstring(indexf->header.numberchars);
                makelookuptable(indexf->header.numberchars, indexf->header.numbercharslookuptable);
                indexf->header.numberchars_used_flag = 1;  /* Flag that it is used */
            }
            else
                progerr("%s: requires one value (a set of characters)", w0);

            continue;
        }


        if (strcasecmp(w0, "WordCharacters") == 0)
        {
            if (sl->n == 2)
            {
                indexf->header.wordchars = SafeStrCopy(indexf->header.wordchars, sl->word[1], &indexf->header.lenwordchars);
                sortstring(indexf->header.wordchars);
                makelookuptable(indexf->header.wordchars, indexf->header.wordcharslookuptable);
            }
            else
                progerr("%s: requires one value", w0);

            continue;
        }


        if (strcasecmp(w0, "BeginCharacters") == 0)
        {
            if (sl->n == 2)
            {
                indexf->header.beginchars = SafeStrCopy(indexf->header.beginchars, sl->word[1], &indexf->header.lenbeginchars);
                sortstring(indexf->header.beginchars);
                makelookuptable(indexf->header.beginchars, indexf->header.begincharslookuptable);
            }
            else
                progerr("%s: requires one value", w0);
            continue;
        }

        if (strcasecmp(w0, "EndCharacters") == 0)
        {
            if (sl->n == 2)
            {
                indexf->header.endchars = SafeStrCopy(indexf->header.endchars, sl->word[1], &indexf->header.lenendchars);
                sortstring(indexf->header.endchars);
                makelookuptable(indexf->header.endchars, indexf->header.endcharslookuptable);
            }
            else
                progerr("%s: requires one value", w0);

            continue;
        }


        if (strcasecmp(w0, "IgnoreLastChar") == 0)
        {
            if (sl->n == 2)
            {
                indexf->header.ignorelastchar = SafeStrCopy(indexf->header.ignorelastchar, sl->word[1], &indexf->header.lenignorelastchar);
                sortstring(indexf->header.ignorelastchar);
                makelookuptable(indexf->header.ignorelastchar, indexf->header.ignorelastcharlookuptable);
            }                   /* Do nothing */
            /* else progerr("%s: requires one value",w0); */

            continue;
        }

        if (strcasecmp(w0, "IgnoreFirstChar") == 0)
        {
            if (sl->n == 2)
            {
                indexf->header.ignorefirstchar = SafeStrCopy(indexf->header.ignorefirstchar, sl->word[1], &indexf->header.lenignorefirstchar);
                sortstring(indexf->header.ignorefirstchar);
                makelookuptable(indexf->header.ignorefirstchar, indexf->header.ignorefirstcharlookuptable);
            }                   /* Do nothing */
            /*  else progerr("%s: requires one value",w0); */

            continue;
        }


        if (strcasecmp(w0, "ReplaceRules") == 0)
        {
            if (sl->n > 2)
                Build_ReplaceRules( w0, sl->word, &sw->replaceRegexps );
            else
                progerr("%s: requires at least two values", w0);

            continue;
        }


        if (strcasecmp(w0, "IndexName") == 0)
        {
            if (sl->n > 1)
            {
                StringValue = StringListToString(sl, 1);
                indexf->header.indexn = SafeStrCopy(indexf->header.indexn, (char *)StringValue, &indexf->header.lenindexn);
                efree(StringValue);
            }
            else
                progerr("%s: requires a value", w0);
            continue;
        }


        if (strcasecmp(w0, "IndexDescription") == 0)
        {
            if (sl->n > 1)
            {
                StringValue = StringListToString(sl, 1);
                indexf->header.indexd = SafeStrCopy(indexf->header.indexd, (char *)StringValue, &indexf->header.lenindexd);
                efree(StringValue);
            }
            else
                progerr("%s: requires a value", w0);

            continue;
        }


        if (strcasecmp(w0, "IndexPointer") == 0)
        {
            if (sl->n > 1)
            {
                StringValue = StringListToString(sl, 1);
                indexf->header.indexp = SafeStrCopy(indexf->header.indexp, (char *)StringValue, &indexf->header.lenindexp);
                efree(StringValue);
            }
            else
                progerr("%s: requires a value", w0);

            continue;
        }


        if (strcasecmp(w0, "IndexAdmin") == 0)
        {
            if (sl->n > 1)
            {
                StringValue = StringListToString(sl, 1);
                indexf->header.indexa = SafeStrCopy(indexf->header.indexa, (char *)StringValue, &indexf->header.lenindexa);
                efree(StringValue);
            }
            else
                progerr("%s: requires one value", w0);

            continue;
        }


        if (strcasecmp(w0, "UseStemming") == 0)
        {
            progwarn("UseStemming is deprecated.  See FuzzyIndexingMode in the docs");
            if ( getYesNoOrAbort(sl, 1, 1) )
                fuzzy_or_die( indexf, "Stemming_en" );

            continue;
        }

        if (strcasecmp(w0, "UseSoundex") == 0)
        {
            if ( getYesNoOrAbort(sl, 1, 1) )
                fuzzy_or_die( indexf, "Soundex" );

            continue;
        }


        if (strcasecmp(w0, "FuzzyIndexingMode") == 0)
        {
            if (sl->n != 2)
                progerr("%s: requires one value", w0);

            fuzzy_or_die( indexf, sl->word[1] );
            continue;
        }



        if (strcasecmp(w0, "IgnoreTotalWordCountWhenRanking") == 0)
        {
            indexf->header.ignoreTotalWordCountWhenRanking = getYesNoOrAbort(sl, 1, 1);
            continue;
        }



        if (strcasecmp(w0, "TranslateCharacters") == 0)
        {
            if (sl->n >= 2)
            {
                if (!BuildTranslateChars(indexf->header.translatecharslookuptable, (unsigned char *)sl->word[1], (unsigned char *)sl->word[2]))
                {
                    progerr("%s: requires two values (same length) or one translation rule", w0);
                }
            }
            continue;
        }


        if (strcasecmp(w0, "ExtractPath") == 0)
        {
            struct metaEntry *m;
            char **words;

            if (sl->n < 4)
                progerr("%s: requires at least three values: metaname expression type and a expression/strings", w0);

            if ( !( m = getMetaNameByName( &indexf->header, sl->word[1])) )
                m = addMetaEntry(&indexf->header, sl->word[1], META_INDEX, 0);

            words = sl->word;
            words++;  /* past metaname */
            add_ExtractPath( w0, sw, m, words );

            continue;
        }

        if (strcasecmp(w0, "ExtractPathDefault") == 0)
        {
            struct metaEntry *m;

            if (sl->n != 3)
                progerr("%s: requires two values: metaname default_value", w0);

            if ( !( m = getMetaNameByName( &indexf->header, sl->word[1])) )
                m = addMetaEntry(&indexf->header, sl->word[1], META_INDEX, 0);

            if ( m->extractpath_default )
                progerr("%s already defined for meta '%s' as '%s'", w0, m->metaName, m->extractpath_default );

            m->extractpath_default = estrdup( sl->word[2] );

            continue;
        }




        if (strcasecmp(w0, "MetaNames") == 0)
        {
            if (sl->n <= 1)
                progerr("%s: requires at least one value", w0);

            for (i = 1; i < sl->n; i++)
            {
                if ( getMetaNameByName( &indexf->header, sl->word[i]) )
                    progerr("%s - name '%s' is already a MetaName", w0, sl->word[i] );

                addMetaEntry(&indexf->header, sl->word[i], META_INDEX, 0);
            }

            continue;
        }



        if (strcasecmp(w0, "MetaNameAlias") == 0)
        {
            struct metaEntry *meta_entry;
            struct metaEntry *new_meta;

            if (sl->n < 3)
                progerr("%s: requires at least two values", w0);


            /* Make sure first entry is not an alias */
            /* Lookup entry, and do not follow alias */
            if ( !(meta_entry = getMetaNameByNameNoAlias( &indexf->header, sl->word[1]) ) )
                progerr("%s - name '%s' not a MetaName", w0, sl->word[1] );


            if ( meta_entry->alias )
                progerr("%s - name '%s' must not be an alias", w0, sl->word[1] );


            for (i = 2; i < sl->n; i++)
            {
                if ( getMetaNameByNameNoAlias( &indexf->header, sl->word[i]) )
                    progerr("%s - name '%s' is already a MetaName or MetaNameAlias", w0, sl->word[i] );

                new_meta = addMetaEntry(&indexf->header, sl->word[i], meta_entry->metaType, 0);
                new_meta->alias = meta_entry->metaID;
            }

            continue;
        }


        /* Allow setting a bias on MetaNames */

        if (strcasecmp(w0, "MetaNamesRank") == 0)
        {
            struct metaEntry *meta_entry;
            int               rank = 0;

            if (sl->n < 3)
                progerr("%s: requires only two or more values, a rank (integer) and a list of property names", w0);


            rank = read_integer( sl->word[1], w0, -RANK_BIAS_RANGE, RANK_BIAS_RANGE  );  // NOTE: if this is changed db.c must match


            for (i = 2; i < sl->n; i++)
            {
                /* already exists? */
                if ( (meta_entry = getMetaNameByNameNoAlias( &indexf->header, sl->word[i])) )
                {
                    if ( meta_entry->alias )
                        progerr("Can't assign a rank to metaname '%s': it is an alias", meta_entry->metaName );

                    if ( meta_entry->rank_bias )
                        progwarn("Why are you redefining the rank of metaname '%s'?", meta_entry->metaName );
                }
                else
                    meta_entry = addMetaEntry(&indexf->header, sl->word[i], META_INDEX, 0);


                meta_entry->rank_bias = rank;
            }

            continue;
        }





        /* Meta name to extract out <a href> links */
        if (strcasecmp(w0, "HTMLLinksMetaName") == 0)
        {
            if (sl->n <= 1)
                progerr("%s: requires one value", w0);

            if ( !( sw->links_meta = getMetaNameByName( &indexf->header, sl->word[1]) ))
                sw->links_meta = addMetaEntry(&indexf->header, sl->word[1], META_INDEX, 0);

            continue;
        }


        /* What to do with IMG ATL tags? */
        if (strcasecmp(w0, "IndexAltTagMetaName") == 0)
        {
            if (sl->n <= 1)
                progerr("%s: requires one value", w0);

            if ( strcasecmp( sl->word[1], "as-text" ) == 0)
            {
                sw->IndexAltTag = 1;
                if ( sw->IndexAltTagMeta )
                {
                    efree( sw->IndexAltTagMeta );
                    sw->IndexAltTagMeta = NULL;
                }
            }
            else
            {
                sw->IndexAltTag = 1;
                if ( sw->IndexAltTagMeta )
                {
                    efree( sw->IndexAltTagMeta );
                    sw->IndexAltTagMeta = NULL;
                }
                sw->IndexAltTagMeta = estrdup( sl->word[1] );
            }
            continue;
        }




        /* Meta name to extract out <img src> links */
        if (strcasecmp(w0, "ImageLinksMetaName") == 0)
        {
            if (sl->n <= 1)
                progerr("%s: requires one value", w0);

            if ( !( sw->images_meta = getMetaNameByName( &indexf->header, sl->word[1]) ))
                sw->images_meta = addMetaEntry(&indexf->header, sl->word[1], META_INDEX, 0);

            continue;
        }





        if (strcasecmp(w0, "PropCompressionLevel") == 0)
        {

#ifdef HAVE_ZLIB
            if (sl->n == 2)
            {
                sw->PropCompressionLevel = read_integer( sl->word[1], w0, 0, 9 );
            }
            else
                progerr("%s: requires one value", w0);
#else
            progwarn("%s: Swish not built with zlib support -- cannot compress", w0);
#endif
            continue;
        }




        if (strcasecmp(w0, "PropertyNames") == 0)
        {
            if (sl->n <= 1)
                progerr("%s: requires at least one value", w0);

            for (i = 1; i < sl->n; i++)
            {
                if ( getPropNameByName( &indexf->header, sl->word[i]) )
                    progerr("%s - name '%s' is already a PropertyName", w0, sl->word[i] );

                addMetaEntry(&indexf->header, sl->word[i], META_PROP|META_STRING|META_IGNORE_CASE, 0);
            }

            continue;
        }


        if (strcasecmp(w0, "PropertyNamesUseStrcoll") == 0)
#ifndef HAVE_STRCOLL
            progerr("Option %s is not available on this platform",w0);
#else
        {
            struct metaEntry *m;

            if (sl->n <= 1)
                progerr("%s: requires at least one value", w0);

            for (i = 1; i < sl->n; i++)
            {
                if ( !(m = getPropNameByName( &indexf->header, sl->word[i])) )
                    addMetaEntry(&indexf->header, sl->word[i], META_PROP|META_STRING|META_USE_STRCOLL, 0);
                else
                {
                    if ( !is_meta_string( m ) )
                        progerr("%s - name '%s' is not a STRING type of Property", w0, sl->word[i] );

                    m->metaType |= META_USE_STRCOLL;
                }
            }

            continue;
        }
#endif

        if (strcasecmp(w0, "PropertyNamesIgnoreCase") == 0)
        {
            struct metaEntry *m;

            if (sl->n <= 1)
                progerr("%s: requires at least one value", w0);

            for (i = 1; i < sl->n; i++)
            {
                if ( !(m = getPropNameByName( &indexf->header, sl->word[i])) )
                    addMetaEntry(&indexf->header, sl->word[i], META_PROP|META_STRING|META_IGNORE_CASE, 0);
                else
                {
                    if ( !is_meta_string( m ) )
                        progerr("%s - name '%s' is not a STRING type of Property", w0, sl->word[i] );

                    m->metaType |= META_IGNORE_CASE;
                }
            }

            continue;
        }



        if (strcasecmp(w0, "PropertyNamesCompareCase") == 0)
        {
            struct metaEntry *m;

            if (sl->n <= 1)
                progerr("%s: requires at least one value", w0);

            for (i = 1; i < sl->n; i++)
            {
                if ( !(m = getPropNameByName( &indexf->header, sl->word[i])) )
                    addMetaEntry(&indexf->header, sl->word[i], META_PROP|META_STRING, 0);
                else
                {

                    if ( !is_meta_string( m ) )
                        progerr("%s - name '%s' is not a STRING type of Property", w0, sl->word[i] );

                    m->metaType &= ~META_IGNORE_CASE;
                }
            }

            continue;
        }

       /* --- this is duplicating.. */

        if (strcasecmp(w0, "PropertyNamesNoStripChars") == 0)
        {
            struct metaEntry *m;

            if (sl->n <= 1)
                progerr("%s: requires at least one value", w0);

            for (i = 1; i < sl->n; i++)
            {
                if ( !(m = getPropNameByName( &indexf->header, sl->word[i])) )
                    addMetaEntry(&indexf->header, sl->word[i], META_PROP|META_STRING|META_IGNORE_CASE|META_NOSTRIP, 0);
                else
                {
                    if ( !is_meta_string( m ) )
                        progerr("%s - name '%s' is not a STRING type of Property", w0, sl->word[i] );

                    m->metaType |= META_NOSTRIP;
                }
            }

            continue;
        }



        if (strcasecmp(w0, "PropertyNamesStripChars") == 0)
        {
            struct metaEntry *m;

            if (sl->n <= 1)
                progerr("%s: requires at least one value", w0);

            for (i = 1; i < sl->n; i++)
            {
                if ( !(m = getPropNameByName( &indexf->header, sl->word[i])) )
                    addMetaEntry(&indexf->header, sl->word[i], META_PROP|META_STRING|META_IGNORE_CASE, 0);
                else
                {

                    if ( !is_meta_string( m ) )
                        progerr("%s - name '%s' is not a STRING type of Property", w0, sl->word[i] );

                    m->metaType &= ~META_NOSTRIP;
                }
            }

            continue;
        }



        if (strcasecmp(w0, "PropertyNamesNumeric") == 0)
        {
            if (sl->n <= 1)
                progerr("%s: requires at least one value", w0);

            for (i = 1; i < sl->n; i++)
            {
                if ( getPropNameByName( &indexf->header, sl->word[i]) )
                    progerr("%s - name '%s' is already a PropertyName", w0, sl->word[i] );

                addMetaEntry(&indexf->header, sl->word[i], META_PROP|META_NUMBER, 0);
            }

            continue;
        }
        if (strcasecmp(w0, "PropertyNamesDate") == 0)
        {
            if (sl->n <= 1)
                progerr("%s: requires at least one value", w0);

            for (i = 1; i < sl->n; i++)
            {
                if ( getPropNameByName( &indexf->header, sl->word[i]) )
                    progerr("%s - name '%s' is already a PropertyName", w0, sl->word[i] );

                addMetaEntry(&indexf->header, sl->word[i], META_PROP|META_DATE, 0);
            }

            continue;
        }


        if (strcasecmp(w0, "PropertyNameAlias") == 0)
        {
            struct metaEntry *meta_entry;
            struct metaEntry *new_meta;

            if (sl->n < 3)
                progerr("%s: requires at least two values", w0);


            /* Make sure first entry is not an alias */
            /* Lookup entry, and do not follow alias */
            if ( !(meta_entry = getPropNameByNameNoAlias( &indexf->header, sl->word[1]) ) )
                progerr("%s - name '%s' not a PropertyName", w0, sl->word[1] );


            if ( meta_entry->alias )
                progerr("%s - name '%s' must not be an alias", w0, sl->word[1] );


            for (i = 2; i < sl->n; i++)
            {
                if ( getPropNameByNameNoAlias( &indexf->header, sl->word[i]) )
                    progerr("%s - name '%s' is already a PropertyName or PropertyNameAlias", w0, sl->word[i] );

                new_meta = addMetaEntry(&indexf->header, sl->word[i], meta_entry->metaType, 0);
                new_meta->alias = meta_entry->metaID;
            }

            continue;
        }


        /* This allows setting a limit on a property's string length */
        // One question would be if this should set the length on the alias, or the real property. */
        // If on the alias then you could really fine tune:
        //    PropertyNames description
        //    PropertyNameAlias description td h1 h2 h3
        //    PropertyNameMaxLength 5000 description
        //    PropertyNameMaxLength 100 td
        //    PropertyNameMaxLength 10 h1 h2 h3
        // then the total length would be 5000, but each one would be limited, too.  I find that hard to imagine
        // it would be useful.  So the current design is you can only assign to a non-alias.


        if (strcasecmp(w0, "PropertyNamesMaxLength") == 0)
        {
            struct metaEntry *meta_entry;
            int               max_length = 0;

            if (sl->n < 3)
                progerr("%s: requires only two or more values, a length and a list of property names", w0);


            max_length = read_integer( sl->word[1], w0, 0, INT_MAX );


            for (i = 2; i < sl->n; i++)
            {
                /* already exists? */
                if ( (meta_entry = getPropNameByNameNoAlias( &indexf->header, sl->word[i])) )
                {
                    if ( meta_entry->alias )
                        progerr("Can't assign a length to property '%s': it is an alias", meta_entry->metaName );

                    if ( meta_entry->max_len )
                        progwarn("Why are you redefining the max length of property '%s'?", meta_entry->metaName );

                    if ( !is_meta_string( meta_entry ) )
                        progerr("%s - name '%s' is not a STRING type of Property", w0, sl->word[i] );
                }
                else
                    meta_entry = addMetaEntry(&indexf->header, sl->word[i], META_PROP|META_STRING, 0);


                meta_entry->max_len = max_length;
            }

            continue;
        }

        /* Set the sort length */

        if (strcasecmp(w0, "PropertyNamesSortKeyLength") == 0)
        {
            struct metaEntry *meta_entry;
            int               max_length = 0;

            if (sl->n < 3)
                progerr("%s: requires only two or more values, a length and a list of property names", w0);


            max_length = read_integer( sl->word[1], w0, 1, INT_MAX );


            for (i = 2; i < sl->n; i++)
            {
                /* already exists? */
                if ( (meta_entry = getPropNameByNameNoAlias( &indexf->header, sl->word[i])) )
                {
                    if ( meta_entry->alias )
                        progerr("Can't assign a length to property '%s': it is an alias", meta_entry->metaName );

                    if ( meta_entry->max_len )
                        progwarn("Why are you redefining the max sort key length of property '%s'?", meta_entry->metaName );

                    if ( !is_meta_string( meta_entry ) )
                        progerr("%s - name '%s' is not a STRING type of Property", w0, sl->word[i] );
                }
                else
                    meta_entry = addMetaEntry(&indexf->header, sl->word[i], META_PROP|META_STRING, 0);


                meta_entry->sort_len = max_length;
            }

            continue;
        }



        /* Hashed word lists */

        if ( !strcasecmp(w0, "IgnoreWords") || !strcasecmp(w0, "StopWords"))
        {
            word_hash_config( sl, &indexf->header.hashstoplist );
            continue;
        }

        if (strcasecmp(w0, "BuzzWords") == 0)  /* 2001-04-24 moseley */
        {
            word_hash_config( sl, &indexf->header.hashbuzzwordlist );
            continue;
        }

        if (strcasecmp(w0, "UseWords") == 0)
        {
            word_hash_config( sl, &indexf->header.hashuselist );
            continue;
        }



        /* IndexVerbose is supported for backwards compatibility */
        if (strcasecmp(w0, "IndexVerbose") == 0)
        {
            sw->verbose = getYesNoOrAbort(sl, 1, 1);
            if (sw->verbose)
                sw->verbose = 3;

            continue;
        }


        if (strcasecmp(w0, "IndexOnly") == 0)
        {
            if (sl->n > 1)
            {
                grabCmdOptions(sl, 1, &sw->suffixlist);
            }
            else
                progerr("%s: requires at least one value", w0);

            continue;
        }


        if (strcasecmp(w0, "IndexContents") == 0)
        {
            if (sl->n > 2)
            {
                struct IndexContents *ic = (struct IndexContents *) emalloc(sizeof(struct IndexContents));

                ic->DocType = getDocTypeOrAbort(sl, 1);
                ic->patt = NULL;

                for (i = 2; i < sl->n; i++)
                    ic->patt = addswline(ic->patt, sl->word[i]);

                if (sw->indexcontents)
                    ic->next = sw->indexcontents;
                else
                    ic->next = NULL;

                sw->indexcontents = ic;
            }
            else
                progerr("%s: requires at least two values", w0);

            continue;
        }


        /* $$$ this needs fixing */
        if (strcasecmp(w0, "StoreDescription") == 0)
        {
            if (sl->n == 3 || sl->n == 4)
            {
                struct StoreDescription *sd = (struct StoreDescription *) emalloc(sizeof(struct StoreDescription));

                sd->DocType = getDocTypeOrAbort(sl, 1);
                sd->size = 0;
                sd->field = NULL;
                i = 2;

                if (sl->word[i][0] == '<' && sl->word[i][strlen(sl->word[i]) - 1] == '>')
                {
                    sl->word[i][strlen(sl->word[i]) - 1] = '\0';
                    sd->field = estrdup(sl->word[i] + 1);
                    i++;
                }

                if (i < sl->n && isnumstring(  (unsigned char *)sl->word[i] ))
                {
                    sd->size = read_integer( sl->word[i], w0, 0, INT_MAX );
                }
                if (sl->n == 3 && !sd->field && !sd->size)
                    progerr("%s: second parameter must be <fieldname> or a number", w0);
                if (sl->n == 4 && sd->field && !sd->size)
                    progerr("%s: third parameter must be empty or a number", w0);
                if (sw->storedescription)
                    sd->next = sw->storedescription;
                else
                    sd->next = NULL;

                sw->storedescription = sd;

                /* Make sure there's a property name */
                if ( !getPropNameByName( &indexf->header, AUTOPROPERTY_SUMMARY) )
                    addMetaEntry(&indexf->header, AUTOPROPERTY_SUMMARY, META_PROP|META_STRING, 0);
            }
            else
                progerr("%s: requires two or three values", w0);

            continue;
        }


        if (strcasecmp(w0, "DefaultContents") == 0)
        {
            if (sl->n == 2 )
            {
                sw->DefaultDocType = getDocTypeOrAbort(sl, 1);
            }
            else
                progerr("%s: requires one value -- a parser type", w0);

            continue;
        }


        if (strcasecmp(w0, "BumpPositionCounterCharacters") == 0)
        {
            if (sl->n > 1)
            {
                indexf->header.bumpposchars = SafeStrCopy(indexf->header.bumpposchars, sl->word[1], &indexf->header.lenbumpposchars);
                sortstring(indexf->header.bumpposchars);
                makelookuptable(indexf->header.bumpposchars, indexf->header.bumpposcharslookuptable);
            }
            else
                progerr("%s: requires at least one value", w0);

            continue;
        }


        /* #### Added UndefinedMetaTags as defined by Bill Moseley */
        if (strcasecmp(w0, "UndefinedMetaTags") == 0)
        {
            get_undefined_meta_flags( w0, sl, &sw->UndefinedMetaTags );
            if ( !sw->UndefinedMetaTags )
                progerr("%s: possible values are error, ignore, index or auto", w0);

            continue;
        }


        if (strcasecmp(w0, "UndefinedXMLAttributes") == 0)
        {
            get_undefined_meta_flags( w0, sl, &sw->UndefinedXMLAttributes );
            continue;
        }



        if (strcasecmp(w0, "IgnoreMetaTags") == 0)
        {
            if (sl->n > 1)
            {
                grabCmdOptions(sl, 1, &sw->ignoremetalist);
                /* Go lowercase */
                for (tmplist = sw->ignoremetalist; tmplist; tmplist = tmplist->next)
                    (void)strtolower(tmplist->line);
            }
            else
                progerr("%s: requires at least one value", w0);

            continue;
        }


        if (strcasecmp(w0, "XMLClassAttributes") == 0)
        {
            if (sl->n > 1)
            {
                grabCmdOptions(sl, 1, &sw->XMLClassAttributes);
                /* Go lowercase */
                for (tmplist = sw->XMLClassAttributes; tmplist; tmplist = tmplist->next)
                    (void)strtolower(tmplist->line);
            }
            else
                progerr("%s: requires at least one value", w0);

            continue;
        }


        if (strcasecmp(w0, "DontBumpPositionOnStartTags") == 0)
        {
            if (sl->n > 1)
                grabCmdOptions(sl, 1, &sw->dontbumpstarttagslist);
            else
                progerr("%s: requires at least one value", w0);

            continue;
        }

        if (strcasecmp(w0, "DontBumpPositionOnEndTags") == 0)
        {
            if (sl->n > 1)
                grabCmdOptions(sl, 1, &sw->dontbumpendtagslist);
            else
                progerr("%s: requires at least one value", w0);

            continue;
        }

        if (strcasecmp(w0, "TruncateDocSize") == 0)
        {                       /* rasc 2001-03 */
            if (sl->n == 2 && isnumstring( (unsigned char *)sl->word[1] ))
                sw->truncateDocSize = atol(sl->word[1]);
            else
                progerr("%s: requires size parameter in bytes", w0);

            continue;
        }

        if (strcasecmp(w0, "CompressPositions") == 0)
        {
            sw->compressPositions = getYesNoOrAbort(sl, 1, 1);
            continue;
        }


        else if (configModule_Entities(sw, sl));
        else if (configModule_Filter(sw, sl)); /* rasc */
        else if (configModule_ResultOutput(sw, sl)); /* rasc */
        else if (configModule_ResultSort(sw, sl)); /* jmruiz */
        else if (configModule_Index(sw, sl)); /* jmruiz */
        else if (configModule_Prog(sw, sl));
        else if (!parseconfline(sw, sl))
        {
            printf("Bad directive on line #%d of file %s: %s\n", linenumber, conffile, line);
            if ( ++baddirective > 30 )
                progerr("Too many errors.  Can not continue.");
        }

    }

    freeStringList(sl);

    fclose(fp);

    if (baddirective)
        exit(1);
    if (gotdir && !(*hasdir))
        *hasdir = 1;
    if (gotindex && !(*hasindex))
        *hasindex = 1;
}

/*************************************************************************
*  Fetch a integer
*
*************************************************************************/

static int read_integer( char *string,  char *message, int low, int high )
{
    char *badchar;
    long  num;
    int   result;

    if ( !string )
        progerr("'%s' requires an integer between %d and %d.", message, low, high );

    num = strtol( string, &badchar, 10 );

    if ( num == LONG_MAX || num == LONG_MIN )
        progerrno("'%s': Failed to convert '%s' to a number: ", message, string );

    if ( *badchar )
        progerr("Invalid char '%c' found in argument to '%s %s'", badchar[0], message, string);

    result = (int)num;


    if ( result < low || result > high )
        progerr("'%s' value of '%d' is not an integer between %d and %d.", message, result, low, high );


    return result;
}




/*
  -- some config helper routines
*/

/*
  --  check if word "n" in StringList  is yes/no
  --  "lastparam": 0/1 = is param last one for config directive?
  --  returns 1 (yes) or 0 (no)
  --  aborts if not "yes" or "no" (and prints first word of array)
  --  aborts if lastparam set and is not last param...
  --  2001-03-04 rasc
*/

int     getYesNoOrAbort(StringList * sl, int n, int lastparam)
{
    if (lastparam && n < (sl->n - 1))
    {
        progerr("%s has too many paramter", sl->word[0], n);
        return 0;
    }

    if (n < sl->n)
    {
        if (!strcasecmp(sl->word[n], "yes") || !strcasecmp(sl->word[n], "on") || !strcasecmp(sl->word[n], "1") )
            return 1;

        if (!strcasecmp(sl->word[n], "no") || !strcasecmp(sl->word[n], "off") ||!strcasecmp(sl->word[n], "0"))
            return 0;
    }
    progerr("%s requires parameter #%d of yes|on|1 or no|off|0", sl->word[0], n);
    return 0;
}


static  void add_ExtractPath( char *name, SWISH *sw, struct metaEntry *m, char **params )
{
    path_extract_list *list = sw->pathExtractList;
    path_extract_list *last = NULL;


    while ( list && list->meta_entry != m  )
    {
        last = list;
        list = list->next;
    }


    /* need to create a meta entry */
    if ( !list )
    {
        list = emalloc( sizeof( path_extract_list ));
        if ( last )
            last->next = list;
        else
            sw->pathExtractList = list;

        list->meta_entry = m;
        list->regex = NULL;
        list->next  = NULL;
    }

    /* now add regular expression to list */
    Build_ReplaceRules( name, params, &list->regex ); /* compile and add to list of expression */
}


/********************************************************
*  Free a ExtractPath list
*
*********************************************************/
void free_Extracted_Path( SWISH *sw )
{
    path_extract_list *list = sw->pathExtractList;
    path_extract_list *next;

    while ( list )
    {
        next = list->next;
        free_regex_list( &list->regex );
        efree( list );
        list = next;
    }

    sw->pathExtractList = NULL;
}

/*********************************************************************
*  Builds regex substitution strings of the FileRules type
*  But also includex ExtractPath
*
*********************************************************************/

static void Build_ReplaceRules( char *name, char **params, regex_list **reg_list )
{
    char *pattern = NULL;
    char *replace = NULL;
    int   cflags = REG_EXTENDED;
    int   global = 0;

    params++;

    /* these two could be optimized, of course */

    if ( strcasecmp( params[0], "append") == 0 )
    {
        pattern = estrdup("$");
        replace = estrdup( params[1] );
    }

    else if  ( strcasecmp( params[0], "prepend") == 0 )
    {
        pattern = estrdup("^");
        replace = estrdup(params[1]);
    }


    else if  ( strcasecmp( params[0], "remove") == 0 )
    {
        pattern = estrdup(params[1]);
        replace = estrdup( "" );
        global++;
    }


    else if  ( strcasecmp( params[0], "replace") == 0 )
    {
        pattern = estrdup(params[1]);
        replace = estrdup(params[2]);
        global++;
    }


    /* This should probably be moved to swregex.c */
    else if  ( strcasecmp( params[0], "regex") == 0 )
    {
        add_replace_expression( name, reg_list, params[1] );
        return;
    }


    else
        progerr("%s: unknown argument '%s'.  Must be prepend|append|remove|replace|regex.", name, params[0] );


    add_regular_expression( reg_list, pattern, replace, cflags, global, 0 );

    efree( pattern );
    efree( replace );
}




/*
  --  check if word "n" in StringList  is a DocumentType
  --  returns (doctype-id)
  --  aborts if not a DocumentType, or no param
  --  2001-03-04 rasc
*/


int	strtoDocType( char * s )
{
    static struct
    {
        char    *type;
        int      id;
    }
    doc_map[] =
    {
        {"TXT", TXT},
        {"HTML", HTML},
        {"XML", XML},
        {"WML", WML},
#ifdef HAVE_LIBXML2
        {"XML2", XML2 },
        {"HTML2", HTML2 },
        {"TXT2", TXT2 },
        {"XML*", XML2 },
        {"HTML*", HTML2 },
        {"TXT*", TXT2 },
#else
        {"XML*", XML },
        {"HTML*", HTML },
        {"TXT*", TXT }
#endif
    };
    int i;

    for (i = 0; i < (int)(sizeof(doc_map) / sizeof(doc_map[0])); i++)
        if ( strcasecmp(doc_map[i].type, s) == 0 )
		    return doc_map[i].id;

    return 0;
}

static int     getDocTypeOrAbort(StringList * sl, int n)
{
    int doctype;

    if (n < sl->n)
    {
        doctype = strtoDocType( sl->word[n] );

        if (!doctype )
            progerr("%s: Unknown document type \"%s\"", sl->word[0], sl->word[n]);
        else
            return doctype;
    }

    progerr("%s: missing %d. parameter", sl->word[0], n);
    return 0;                   /* never happens */
}


/*
  -- helper routine for misc. indexing methods
  -- (called via "jump" function array)
 02/2001 Rewritten Jmruiz
 */

void    grabCmdOptions(StringList * sl, int start, struct swline **listOfWords)
{
    int     i;

    for (i = start; i < sl->n; i++)
        *listOfWords = (struct swline *) addswline(*listOfWords, sl->word[i]);
    return;
}



/* --------------------------------------------------------- */




/*
  read stop words from file
  lines beginning with # are comments
  2000-06-15 rasc

*/


static void word_hash_config( StringList *sl, WORD_HASH_TABLE *table_ptr )
{
    int i;

    if (sl->n < 2)
        progerr("%s: requires at least one value", sl->word[0]);


    if (lstrstr(sl->word[1], "SwishDefault"))
        progwarn("SwishDefault is obsolete. See the CHANGES file.");


    if (lstrstr(sl->word[1], "File:"))
    {
        if (sl->n == 3)
        {
            normalize_path( sl->word[2] );
            readwordsfile(table_ptr, sl->word[2]);
            return;
        }
        else
            progerr("IgnoreWords File: requires path");
    }


    for (i = 1; i < sl->n; i++)
        add_word_to_hash_table( table_ptr, strtolower(sl->word[i]), HASHSIZE);
}



static void    readwordsfile(WORD_HASH_TABLE *table_ptr, char *stopw_file)
{
    char    line[MAXSTRLEN];
    FILE   *fp;
    StringList *sl;
    int     i;


    /* Not this reports "Sucess" on trying to open a directory. to lazy to fix now */

    if ((fp = fopen(stopw_file, F_READ_TEXT)) == NULL || !isfile(stopw_file))
        progerrno("Couldn't open the word file '%s': ", stopw_file);


    /* read all lines and store each word as stopword */

    while (fgets(line, MAXSTRLEN, fp) != NULL)
    {
        if (line[0] == '#' || line[0] == '\n')
            continue;

        sl = parse_line(line);
        if (sl && sl->n)
        {
            for (i = 0; i < sl->n; i++)
                add_word_to_hash_table( table_ptr, strtolower(sl->word[i]), HASHSIZE);

            freeStringList(sl);
        }
    }

    fclose(fp);
    return;
}



static int     parseconfline(SWISH * sw, StringList * sl)
{
    /* invoke routine to parse config file lines */
    return (*IndexingDataSource->parseconfline_fn) (sw, (void *) sl);
}



static void get_undefined_meta_flags( char *w0, StringList * sl, UndefMetaFlag *setting )
{
    if (sl->n != 2)
        progerr("%s: requires one value", w0);

    if (strcasecmp(sl->word[1], "error") == 0)
        *setting = UNDEF_META_ERROR;

    else if (strcasecmp(sl->word[1], "ignore") == 0)
        *setting = UNDEF_META_IGNORE;

    else if (strcasecmp(sl->word[1], "disable") == 0)  // default for xml attributes
        *setting = UNDEF_META_DISABLE;

    else if (strcasecmp(sl->word[1], "auto") == 0)
        *setting = UNDEF_META_AUTO;

    else if (strcasecmp(sl->word[1], "index") == 0)
        *setting = UNDEF_META_INDEX;
    else
        progerr("%s: possible values are error, ignore, index or auto", w0);
}


void freeSwishConfigOptions( SWISH *sw )
{
    /** Ah, these should all be in their own structure **/

    /* string lists */
    if (sw->dirlist)
        freeswline(sw->dirlist);

    if (sw->suffixlist)
        freeswline(sw->suffixlist);

    if (sw->nocontentslist)
        freeswline(sw->nocontentslist);

    if (sw->ignoremetalist)
        freeswline(sw->ignoremetalist);

    if (sw->XMLClassAttributes)
        freeswline(sw->XMLClassAttributes);

    if (sw->dontbumpstarttagslist)
        freeswline(sw->dontbumpstarttagslist);

    if (sw->dontbumpendtagslist)
        freeswline(sw->dontbumpendtagslist);


    /* IndexContents */
    {
        struct IndexContents *next;
        while ( sw->indexcontents )
        {
             next = sw->indexcontents->next;

             if ( sw->indexcontents->patt )
                freeswline( sw->indexcontents->patt );

             efree( sw->indexcontents );
             sw->indexcontents = next;
        }
    }


    /* StoreDescription */
    {
        struct StoreDescription *next;
        while ( sw->storedescription )
        {
             next = sw->storedescription->next;
             if ( sw->storedescription->field )
                efree( sw->storedescription->field );

             efree( sw->storedescription );
             sw->storedescription = next;
        }
    }


}



#define LINE_BUF_LEN MAXSTRLEN

static char *read_line_from_file( int * linenum, FILE *fp )
{
    char * line = NULL;         /* output buffer */
    int  buf_size = 0;

    /* Initialze the buffer */
    buf_size = LINE_BUF_LEN * sizeof( char );
    line = emalloc( buf_size );
    *line = '\0';


    /* repeat until we have either a full line or no line */
    while( 1 )
    {
        int cur_len = strlen( line );


        /* Make sure there's at least LINE_BUF_LEN room in the buffer */
        if ( buf_size - cur_len < LINE_BUF_LEN )
        {
            buf_size = cur_len + LINE_BUF_LEN;
            line = erealloc( line, buf_size );
        }

        /* Read line, if there is one */
        if ( !fgets( &(line[cur_len]), LINE_BUF_LEN, fp ) ) 
            break;

        (*linenum)++;

        /* Look for continuation mark (backslash+\n) and replace with space */
        cur_len = strlen( line );

        if ( cur_len < 2 ) break;

        if ( line[cur_len-2] == '\\' && line[cur_len-1] == '\n' )
            line[cur_len-2] = '\0';
        else
            break;
    }
    return line;
}

���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/soundex.h�������������������������������������������������������������������������0000664�0000771�0001750�00000002130�11166010110�012434� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
** soundex.h

$Id: soundex.h 1736 2005-05-12 15:41:22Z karman $


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL



*/

FUZZY_WORD *soundex( FUZZY_OBJECT *fi, const char *inword);
����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/result_output.c�������������������������������������������������������������������0000664�0000771�0001750�00000060627�11166010110�013717� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: result_output.c 1736 2005-05-12 15:41:22Z karman $
**
    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
    
** Mon May  9 14:50:41 CDT 2005
** added GPL

***************************************************************************************

   -- This module does result output for swish-e
   -- This module implements some methods about the 
   -- "-x fmt" cmd option.
   -- basically: handle output fmts like:  -x "%c|<swishtitle fmt=/%20s/>\n"
   -- 
   -- License: see swish licence file

   -- 2001-01  R. Scherg  (rasc)   initial coding

   -- 2001-02-09 rasc    make propertynames always lowercase!  (may change) 
				 this is get same handling as metanames...
   -- 2001-02-28 rasc    -b and counter corrected...
   -- 2001-03-13 rasc    result header output routine  -H <n>
   -- 2001-04-12 rasc    Module init rewritten

*/


//** should really "compile" the -x format string for each index, which means
//   basically looking up the properties only once for each index.


/* Prints the final results of a search.
   2001-01-01  rasc  Standard is swish 1.x default output

   if option extended format string is set, an alternate
   userdefined result output is possible (format like strftime or printf)
   in this case -d (delimiter is obsolete)
     e.g. : -x "result: COUNT:%c \t URL:%u\n"
*/


/* $$$ Remark / ToDO:
   -- The code is a prototype and needs optimizing:
   -- format control string is parsed on each result entry. (very bad!)
   -- ToDO: build an "action array" from an initial parsing of fmt
   --       ctrl string.
   --       on each entry step thru this action output list
   --       seems to be simple, but has to be done.
   -- but for now: get this stuff running on the easy way. 
   -- (rasc 2000-12)
   $$$ 
*/





#include <ctype.h>
#include <string.h>
#include <time.h>
#include <stdio.h>
#include <stdarg.h>

#include "swish.h"
#include "mem.h"
#include "swstring.h"
#include "merge.h"
#include "metanames.h"
#include "search.h"
#include "docprop.h"
#include "error.h"
#include "result_output.h"
#include "parse_conffile.h"  // for the fuzzy to string function


/* private module prototypes */

static void printExtResultEntry(SWISH * sw, FILE * f, char *fmt, RESULT * r);
static char *printResultControlChar(FILE * f, char *s);
static char *printTagAbbrevControl(SWISH * sw, FILE * f, char *s, RESULT * r);
static char *parsePropertyResultControl(char *s, char **propertyname, char **subfmt);
static void printPropertyResultControl(FILE * f, char *propname, char *subfmt, RESULT * r);
static void printStandardResultProperties(FILE *f, RESULT *r);

static struct ResultExtFmtStrList *addResultExtFormatStr(struct ResultExtFmtStrList *rp, char *name, char *fmtstr);


/*
** ----------------------------------------------
** 
**  Module management code starts here
**
** ----------------------------------------------
*/



/* 
  -- init structures for this module
*/

void    initModule_ResultOutput(SWISH * sw)
{
    struct MOD_ResultOutput *md;

    md = (struct MOD_ResultOutput *) emalloc(sizeof(struct MOD_ResultOutput));
    memset(md, 0, sizeof(struct MOD_ResultOutput));  

    sw->ResultOutput = md;

    md->resultextfmtlist = NULL;

    /* cmd options */
    md->extendedformat = NULL;  /* -x :cmd param  */
    md->stdResultFieldDelimiter = NULL; /* -d :old 1.x result output delimiter */

    return;
}


/* 
  -- release all wired memory for this module
  -- 2001-04-11 rasc
*/

void    freeModule_ResultOutput(SWISH * sw)
{
    struct MOD_ResultOutput *md = sw->ResultOutput;
    struct ResultExtFmtStrList *l,
           *ln;


    if (md->stdResultFieldDelimiter)
        efree(md->stdResultFieldDelimiter); /* -d :free swish 1.x delimiter */
    /* was not emalloc!# efree (md->extendedformat);               -x stuff */


    l = md->resultextfmtlist;   /* free ResultExtFormatName */
    while (l)
    {
        efree(l->name);
        efree(l->fmtstr);
        ln = l->next;
        efree(l);
        l = ln;
    }
    md->resultextfmtlist = NULL;


    /* Free display props arrays -- only used for -p list */

    /* First the common part to all the index files */
    if (md->propNameToDisplay)
    {
        int i;

        for( i=0; i < md->numPropertiesToDisplay; i++ )
            efree(md->propNameToDisplay[i]);

        efree(md->propNameToDisplay);
    }
    md->propNameToDisplay=NULL;

    if (md->propIDToDisplay)
    {
        int i;
        IndexFILE *indexf;
        for( i = 0, indexf = sw->indexlist; indexf; i++, indexf = indexf->next)
        {
            efree(md->propIDToDisplay[i]);
        }
        efree(md->propIDToDisplay);
    } 
    md->propIDToDisplay=NULL;

    md->numPropertiesToDisplay=0;
    md->currentMaxPropertiesToDisplay=0;

    /* free module data */
    efree(sw->ResultOutput);
    sw->ResultOutput = NULL;

    return;
}



/*
** ----------------------------------------------
** 
**  Module config code starts here
**
** ----------------------------------------------
*/


/*
 -- Config Directives
 -- Configuration directives for this Module
 -- return: 0/1 = none/config applied
*/

int     configModule_ResultOutput(SWISH * sw, StringList * sl)
{
    struct MOD_ResultOutput *md = sw->ResultOutput;
    char   *w0 = sl->word[0];
    int     retval = 1;



    /* $$$ this will not work unless swish is reading the config file also for search ... */

    if (strcasecmp(w0, "ResultExtFormatName") == 0)
    {                           /* 2001-02-15 rasc */
        /* ResultExt...   name  fmtstring */
        if (sl->n == 3)
        {
            md->resultextfmtlist = (struct ResultExtFmtStrList *) addResultExtFormatStr(md->resultextfmtlist, sl->word[1], sl->word[2]);
        }
        else
            progerr("%s: requires \"name\" \"fmtstr\"", w0);
    }
    else
    {
        retval = 0;             /* not a module directive */
    }

    return retval;
}





/*
** ----------------------------------------------
** 
**  Module code starts here
**
** ----------------------------------------------
*/



/*
   -- Init the print of result entry in extented output format.
   -- The parsed propertynames will be stored for result handling
   -- Only user properties will be stored.
   -- Routine has to be executed prior to search/result storing...
   -- (This behavior is for historic reasons and may change)
   -- ($$ this routine may build the print action list in the future...)
   2001-02-07   rasc
*/

void    initPrintExtResult(SWISH * sw, char *fmt)
{
    FILE   *f;
    char   *propname;
    char   *subfmt;

    f = (FILE *) NULL;          /* no output, just parsing!!! */


    while (*fmt)
    {                           /* loop fmt string */

        switch (*fmt)
        {

        case '%':              /* swish abbrevation controls */
            /* ignore (dummy param), because autoprop */
            fmt = printTagAbbrevControl(sw, f, fmt, NULL);
            break;

        case '<':
            /* -- Property - Control: read Property Tag  <name> */
            /* -- Save User PropertyNames for result handling   */
            // Oct 16, 2001 - moseley: Seem like this should lookup the property
            // and error if not found, plus, it should cache the propID to avoid lookups
            // when returning results.  Would parse the -x format for each index.
            fmt = parsePropertyResultControl(fmt, &propname, &subfmt);

            efree(subfmt);
            efree(propname);
            break;

        case '\\':             /* format controls */
            fmt = printResultControlChar(f, fmt);
            break;


        default:               /* a output character in fmt string */
            fmt++;
            break;
        }

    }

}





/* ------------------------------------------------------------ */




/*
  -- Output the resuult entries in the given order
  -- outputformat depends on some cmd opt settings
  This frees memory as it goes along, so this can't be called from the library.
*/

void    printSortedResults(RESULTS_OBJECT *results, int begin, int maxhits)
{
    SWISH  *sw = results->sw;
    struct MOD_ResultOutput *md = sw->ResultOutput;
    RESULT *r = NULL;
    FILE   *f_out;



    f_out = stdout;

    /* -b is begin +1 */
    if ( begin )
    {
        begin--;
        if ( begin < 0 )
            begin = 0;
    }


    /* Seek, and report errors if trying to seek past eof */

    if ( SwishSeekResult(results, begin) < 0 )
        SwishAbortLastError( results->sw );



    /* If maxhits = 0 then display all */
    if (maxhits <= 0)
        maxhits = -1;


    if ( md->extendedformat ) /* are we using -x for output? */
    {
        while ( (r = SwishNextResult(results)) && maxhits )
        {
            printExtResultEntry(sw, f_out, md->extendedformat, r);
            freefileinfo( &r->fi );
            if ( maxhits > 0)
                maxhits--;
        }
    }


    else /* not using -x and maybe -p (old style) */
    {
        char *format;
        char *delimiter;

        if ((delimiter = (md->stdResultFieldDelimiter)) )
        {
            format = emalloc( (3* strlen( delimiter )) + 100 );
            /* warning -- user data in a sprintf format string */
            sprintf( format, "%%r%s%%p%s%%t%s%%l", delimiter, delimiter, delimiter );
        }
        else
            format = estrdup( "%r %p \"%t\" %l" );


        while ( (r = SwishNextResult(results)) && maxhits )
        {
            printExtResultEntry(sw, f_out, format, r);
            printStandardResultProperties(f_out, r);  /* print any -p properties */
            freefileinfo( &r->fi );
            fprintf(f_out, "\n");

            if ( maxhits > 0)
                maxhits--;
        }

        efree( format );
    }

}





/*
   -- print a result entry in extented output format
   -- Format characters: see switch cases...
   -- f_out == NULL, use STDOUT
   -- fmt = output format
   -- count = current result record counter
   2001-01-01   rasc
*/

static void printExtResultEntry(SWISH * sw, FILE * f_out, char *fmt, RESULT * r)
{
    FILE   *f;
    char   *propname;
    char   *subfmt;


    f = (f_out) ? f_out : stdout;

    while (*fmt)
    {                           /* loop fmt string */

        switch (*fmt)
        {

        case '%':              /* swish abbrevation controls */
            fmt = printTagAbbrevControl(sw, f, fmt, r);
            break;

        case '<':
            /* Property - Control: read and print Property Tag  <name> */
            fmt = parsePropertyResultControl(fmt, &propname, &subfmt);
            printPropertyResultControl(f, propname, subfmt, r);
            efree(subfmt);
            efree(propname);
            break;

        case '\\':             /* print format controls */
            fmt = printResultControlChar(f, fmt);
            break;


        default:               /* just output the character in fmt string */
            if (f)
                fputc(*fmt, f);
            fmt++;
            break;
        }

    }


}







/*  -- parse print control and print it
    --  output on file <f>
    --  *s = "\....."
    -- return: string ptr to char after control sequence.
*/

static char *printResultControlChar(FILE * f, char *s)
{
    char    c,
           *se;

    if (*s != '\\')
        return s;

    c = charDecode_C_Escape(s, &se);
    if (f)
        fputc(c, f);
    return se;
}





/*  -- parse % control and print it
    --  in fact expand shortcut to fullnamed autoproperty tag
    --    output on file <f>, NULL = parse only mode
    --  *s = "%.....
    -- return: string ptr to char after control sequence.
*/

static char *printTagAbbrevControl(SWISH * sw, FILE * f, char *s, RESULT * r)
{
    char   *t;
    char    buf[MAXWORDLEN];

    if (*s != '%')
        return s;
    t = NULL;

    switch (*(++s))
    {
    case 'c':
        t = AUTOPROPERTY_REC_COUNT;
        break;
    case 'd':
        t = AUTOPROPERTY_SUMMARY;
        break;
    case 'D':
        t = AUTOPROPERTY_LASTMODIFIED;
        break;
    case 'I':
        t = AUTOPROPERTY_INDEXFILE;
        break;
    case 'p':
        t = AUTOPROPERTY_DOCPATH;
        break;
    case 'r':
        t = AUTOPROPERTY_RESULT_RANK;
        break;
    case 'l':
        t = AUTOPROPERTY_DOCSIZE;
        break;
    case 'S':
        t = AUTOPROPERTY_STARTPOS;
        break;
    case 't':
        t = AUTOPROPERTY_TITLE;
        break;

    case '%':
        if (f)
            fputc('%', f);
        break;
    default:
        progerr("Formatstring: unknown abbrev '%%%c'", *s);
        break;

    }

    if (t)
    {
        sprintf(buf, "<%s>", t); /* create <...> tag */
        if (f)
            printExtResultEntry(sw, f, buf, r);
        else
            initPrintExtResult(sw, buf); /* parse only ! */
    }
    return ++s;
}




/*  -- parse <tag fmt="..."> control
    --  *s = "<....." format control string
    --       possible subformat:  fmt="...", fmt=/..../, etc. 
    -- return: string ptr to char after control sequence.
    --         **propertyname = Tagname  (or NULL)
    --         **subfmt = NULL or subformat
*/

static char *parsePropertyResultControl(char *s, char **propertyname, char **subfmt)
{
    char   *s1;
    char    c;
    int     len;


    *propertyname = NULL;
    *subfmt = NULL;

    s = str_skip_ws(s);
    if (*s != '<')
        return s;
    s = str_skip_ws(++s);


    /* parse propertyname */

    s1 = s;
    while (*s)
    {                           /* read to end of propertyname */
        if ((*s == '>') || isspace((unsigned char) *s))
        {                       /* delim > or whitespace ? */
            break;              /* break on delim */
        }
        s++;
    }
    len = s - s1;
    *propertyname = (char *) emalloc(len + 1);
    strncpy(*propertyname, s1, len);
    *(*propertyname + len) = '\0';


    if (*s == '>')
        return ++s;             /* no fmt, return */
    s = str_skip_ws(s);


    /* parse optional fmt=<c>...<c>  e.g. fmt="..." */

    if (!strncmp(s, "fmt=", 4))
    {
        s += 4;                 /* skip "fmt="  */
        c = *(s++);             /* string delimiter */
        s1 = s;
        while (*s)
        {                       /* read to end of delim. char */
            if (*s == c)
            {                   /* c or \c */
                if (*(s - 1) != '\\')
                    break;      /* break on delim c */
            }
            s++;
        }

        len = s - s1;
        *subfmt = (char *) emalloc(len + 1);
        strncpy(*subfmt, s1, len);
        *(*subfmt + len) = '\0';
    }


    /* stupid "find end of tag" */

    while (*s && *s != '>')
        s++;
    if (*s == '>')
        s++;

    return s;
}




/*
  -- Print the result value of propertytag <name> on file <f>
  -- if a format is given use it (data type dependend)
  -- string and numeric types are using printfcontrols formatstrings
  -- date formats are using strftime fromat strings.
*/


static void printPropertyResultControl(FILE * f, char *propname, char *subfmt, RESULT * r)
{
    char   *fmt;
    PropValue *pv;
    char   *s;
    int     n;


    pv = getResultPropValue(r, propname, 0);

    /* If returning NULL then it's an invalid property name */
    if (!pv)
    {
	printf("(null)");
	return;
        /* or could just abort, but that's ugly in the middle of output */
        /* it would be nice to check the format strings (and cache the meta name lookups) */
        /* before generating resuls, but need to do that for each index */
        printf("\n"); /* might be in the middle of some text */
        SwishAbortLastError( r->db_results->indexf->sw );
    }


#ifdef USE_DOCPATH_AS_TITLE
    if ( ( PROP_UNDEFINED == pv->datatype ) && strcmp( AUTOPROPERTY_TITLE, propname ) == 0 )
    {
        freeResultPropValue( pv );

        pv = getResultPropValue(r, AUTOPROPERTY_DOCPATH, 0);

        if ( !pv )  /* in this case, let it slide */
            return;

        /* Just display the base name */
        if ( PROP_STRING == pv->datatype )
        {
            char *c = estrdup( str_basename( pv->value.v_str ) );
            efree( pv->value.v_str );
            pv->value.v_str = c;
        }
    }
#endif



    switch (pv->datatype)
    {
        /* use passed or default fmt */

    case PROP_INTEGER:
        fmt = (subfmt) ? subfmt : "%d";
        if (f)
            fprintf(f, fmt, pv->value.v_int);
        break;


    case PROP_ULONG:
        fmt = (subfmt) ? subfmt : "%lu";
        if (f)
            fprintf(f, fmt, pv->value.v_ulong);
        break;


    case PROP_STRING:
        fmt = (subfmt) ? subfmt : "%s";

        /* -- get rid of \n\r in string! */  // there shouldn't be any in the first place, I believe
        for (s = pv->value.v_str; *s; s++)
        {
            if ('\t' != *s && isspace((unsigned char) *s))
                *s = ' ';
        }

        /* $$$ ToDo: escaping of delimiter characters  $$$ */
        /* $$$ Also ToDo, escapeHTML entities (need config directive) */

        if (f)
            fprintf(f, fmt, (char *) pv->value.v_str);

        break;


    case PROP_DATE:
        fmt = (subfmt) ? subfmt : DATE_FORMAT_STRING;
        if (!strcmp(fmt, "%ld"))
        {
            /* special: Print date as serial int (for Bill) */
            if (f)
                fprintf(f, fmt, (long) pv->value.v_date);
        }
        else
        {
            /* fmt is strftime format control! */
            s = (char *) emalloc(MAXWORDLEN + 1);
            n = strftime(s, (size_t) MAXWORDLEN, fmt, localtime(&(pv->value.v_date)));
            if (n && f)
                fprintf(f, s);
            efree(s);
        }
        break;

    case PROP_FLOAT:
        fmt = (subfmt) ? subfmt : "%f";
        if (f)
            fprintf(f, fmt, (double) pv->value.v_float);
        break;

    case PROP_UNDEFINED:
        break;  /* Do nothing */

    default:
        progerr("Swish-e database error.  Unknown property type accessing property '%s'", propname);
        break;

    }


    freeResultPropValue(pv);
}



/*
  -------------------------------------------
  Result config stuff
  -------------------------------------------
*/


/*
  -- some code for  -x fmtByName:
  -- e.g.  ResultExtendedFormat   myformat   "<swishtitle>|....\n"
  --       ResultExtendedFormat   yourformat "%c|%t|%p|<author fmt=/%20s/>\n"
  --
  --    swish -w ... -x myformat ...
  --
  --  2001-02-15 rasc
*/


/*
   -- add name and string to list 
*/

static struct ResultExtFmtStrList *addResultExtFormatStr(struct ResultExtFmtStrList *rp, char *name, char *fmtstr)
{
    struct ResultExtFmtStrList *newnode;


    newnode = (struct ResultExtFmtStrList *) emalloc(sizeof(struct ResultExtFmtStrList));

    newnode->name = (char *) estrdup(name);
    newnode->fmtstr = (char *) estrdup(fmtstr);

    newnode->next = NULL;

    if (rp == NULL)
        rp = newnode;
    else
        rp->nodep->next = newnode;

    rp->nodep = newnode;
    return rp;
}



/* 
   -- check if name is a predefined format
   -- case sensitive
   -- return fmtstring for formatname or NULL
*/


char   *hasResultExtFmtStr(SWISH * sw, char *name)
{
    struct ResultExtFmtStrList *rfl;

    rfl = sw->ResultOutput->resultextfmtlist;
    if (!rfl)
        return (char *) NULL;

    while (rfl)
    {
        if (!strcmp(name, rfl->name))
            return rfl->fmtstr;
        rfl = rfl->next;
    }

    return (char *) NULL;
}





/*
  -------------------------------------------
  result header stuff
  -------------------------------------------
*/

/* $$$ result header stuff should be moved to swish.c (or a module that is not part of the library). */



/*
  -- print a line for the result output header
  -- the verbose level is checked for output
  -- <min_verbose> has to be >= sw->...headerOutVerbose
  -- outherwise nothing is outputted
  -- return: 0/1  (not printed/printed)
  -- 2001-03-13  rasc
*/

int     resultHeaderOut(SWISH * sw, int min_verbose, char *printfmt, ...)
{
    va_list args;

    /* min_verbose to low, no output */
    if (min_verbose > sw->headerOutVerbose)
        return 0;

    /* print header info... */
    va_start(args, printfmt);
    vfprintf(stdout, printfmt, args);
    va_end(args);
    return 1;
}








/*******************************************************************
*   Displays the "old" style properties for -p
*
*   Call with:
*       *RESULT
*
*   I think this could be done in result_output.c by creating a standard
*   -x format (plus properites) for use when there isn't one already,
xxxx
*
*
********************************************************************/
static void printStandardResultProperties(FILE *f, RESULT *r)
{
    int     i;
    IndexFILE *tmp, *indexf = r->db_results->indexf;
    SWISH  *sw = indexf->sw;
    struct  MOD_ResultOutput *md = sw->ResultOutput;
    char   *s;
    char   *propValue;
    int    *metaIDs = NULL;


    if (md->numPropertiesToDisplay == 0)
        return;

    for( i = 0, tmp = sw->indexlist; tmp ; i++, tmp = tmp->next )
    {
        if(tmp == indexf)
        {
           metaIDs = md->propIDToDisplay[i];
           break;
        }
    }

    if(!metaIDs)
        return;

    for ( i = 0; i < md->numPropertiesToDisplay; i++ )
    {
        propValue = s = getResultPropAsString( r, metaIDs[ i ] );

        if (sw->ResultOutput->stdResultFieldDelimiter)
            fprintf(f, "%s", sw->ResultOutput->stdResultFieldDelimiter);
        else
            fprintf(f, " \"");	/* default is to quote the string, with leading space */

        /* print value, handling newlines and quotes */
        while (*propValue)
        {
            /* no longer check for double-quote.  User should pick a good value for -d or use -x */

            if (*propValue == '\n')
                fprintf(f, " ");

            else
                fprintf(f,"%c", *propValue);

            propValue++;
        }

        //fprintf(f,"%s", propValue);

        if (!sw->ResultOutput->stdResultFieldDelimiter)
            fprintf(f,"\"");	/* default is to quote the string */

        efree( s );            
    }
}




void addSearchResultDisplayProperty(SWISH *sw, char *propName)
{
    struct  MOD_ResultOutput *md = sw->ResultOutput;

	/* add a property to the list of properties that will be displayed */
	if (md->numPropertiesToDisplay >= md->currentMaxPropertiesToDisplay)
	{
		if(md->currentMaxPropertiesToDisplay) {
			md->currentMaxPropertiesToDisplay+=2;
			md->propNameToDisplay=(char **)erealloc(md->propNameToDisplay,md->currentMaxPropertiesToDisplay*sizeof(char *));
		} else {
			md->currentMaxPropertiesToDisplay=5;
			md->propNameToDisplay=(char **)emalloc(md->currentMaxPropertiesToDisplay*sizeof(char *));
		}
	}
	md->propNameToDisplay[md->numPropertiesToDisplay++] = estrdup(propName);
}





/* For faster proccess, get de ID of the properties to sort */
int initSearchResultProperties(SWISH *sw)
{
    IndexFILE *indexf;
    int i, j, index_count;
    struct MOD_ResultOutput *md = sw->ResultOutput;
    struct metaEntry *meta_entry;


	/* lookup selected property names */

	if (md->numPropertiesToDisplay == 0)
		return RC_OK;

	/* get number of index files */
	for( index_count = 0, indexf = sw->indexlist; indexf; index_count++, indexf = indexf->next );

	md->propIDToDisplay = (int **) emalloc(index_count * sizeof(int *));

	for( i = 0 ,indexf = sw->indexlist; i < index_count; i++, indexf = indexf->next )
		md->propIDToDisplay[i]=(int *) emalloc(md->numPropertiesToDisplay*sizeof(int));

	for (i = 0; i < md->numPropertiesToDisplay; i++)
	{
		makeItLow(md->propNameToDisplay[i]);

		/* Get ID for each index file */
		for( j = 0, indexf = sw->indexlist; j < index_count; j++, indexf = indexf->next )
		{
		    if ( !(meta_entry = getPropNameByName( &indexf->header, md->propNameToDisplay[i])))
			{
				progerr ("Unknown Display property name \"%s\"", md->propNameToDisplay[i]);
				return (sw->lasterror=UNKNOWN_PROPERTY_NAME_IN_SEARCH_DISPLAY);
			}
			else
			    md->propIDToDisplay[j][i] = meta_entry->metaID;
		}
	}
	return RC_OK;
}



���������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/libtest.c�������������������������������������������������������������������������0000775�0000771�0001750�00000035773�11166010110�012436� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: libtest.c 2291 2009-03-31 01:56:00Z karpet $
**
    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 15:22:55 CDT 2005
** added GPL

**---------------------------------------------------------
*
*   Example program for interfacing a C program with the Swish-e C library.
*
*   ./libtest [optional index file]
*
*   use quotes for more than one file
*       ./libtest index.swish-e
*       ./libtest 'index1 index2 index3'
*
*   See the perl/API.xs file for more detail
*
*/


#include <stdio.h>
#include "swish-e.h"  /* use locally for testing */



#define MEM_TEST 1

#ifdef MEM_TEST
#include "mem.h"   // for mem_summary only
#endif



#define DISPLAY_COUNT 10  // max to display


static void display_results( SW_HANDLE, SW_RESULTS );
static void print_error_or_abort( SW_HANDLE swish_handle );

static void print_index_headers( SW_HANDLE swish_handle, SW_RESULTS results );
static void print_index_metadata( SW_HANDLE swish_handle );
static void print_header_value( SW_HANDLE swish_handle, const char *name, SWISH_HEADER_VALUE head_value, SWISH_HEADER_TYPE head_type );
static void demo_stemming( SW_RESULTS results );
static void stem_it( SW_RESULT r, char *word );


int     main(int argc, char **argv)
{
    SW_HANDLE   swish_handle = NULL;    /* Database handle */
    SW_SEARCH   search = NULL;          /* search handle -- holds search parameters */
    SW_RESULTS  results = NULL;         /* results handle -- holds list of results */

    char    input_buf[200];
    char   *index_file_list;



    SwishErrorsToStderr();      /* Send any errors or warnings to stderr (default is stdout) */

    /* Connect to the indexes specified */

    index_file_list = argv[1] && *(argv[1]) ? argv[1] : "index.swish-e";

    swish_handle = SwishInit( index_file_list );


    /* set ranking scheme. default is 0 */
    
    SwishRankScheme( swish_handle, 1 );

    /* return raw values */
    SwishReturnRawRank( swish_handle, 1 );


    /* Check for errors after every call */

    if ( SwishError( swish_handle ) )
        print_error_or_abort( swish_handle );  /* print an error or abort -- see below */


    /* Here's a short-cut to searching that creates a search object and searches at the same time */

    results = SwishQuery( swish_handle, "foo OR bar" );

    if ( SwishError( swish_handle ) )
        print_error_or_abort( swish_handle );  /* print an error or abort -- see below */
    else
    {
        display_results( swish_handle, results );

        printf( "Testing SW_ResultsToSW_HANDLE() = '%s'\n",
            SW_ResultsToSW_HANDLE( results ) == swish_handle ? "OK" : "Not OK" );

        demo_stemming( results );

        Free_Results_Object( results );
    }

    /* This may change since it only supports 8-bit chars */
    {
        const char *words = SwishWordsByLetter( swish_handle, "index.swish-e", 'f' );
        char *tmp = (char *)words;
        printf("Words that begin with 'f': ");
        for(;tmp && tmp[0]; tmp += strlen(tmp)+1 )
            printf("%s \n", tmp);

        printf("\n");
    }

    /* 
     * Stem a word -- this method is somewhat depreciated.
     * It stores the stemmed word in a single location in the SW_OBJECT
     */

    {
        char *stemmed = SwishStemWord( swish_handle, "running" );
        printf("SwishStemWord 'running' => '%s'\n\n", stemmed ? stemmed : "Failed to stem" );
    }


    /* Typical use of the library is to create a search object */
    /* and use the search object to make multiple queries */

    /* Create a search object for searching - the query string is optional */
    /* Remember to free the search object when done */

    search = New_Search_Object( swish_handle, "foo" );




    /* Adjust some of the search parameters if different than the defaults */
    SwishSetSort( search, "swishrank desc" );

    // SwishSetStructure( search, IN_TITLE );  /* limit to title only */

    /* Set Limit parameters like */

    /*****

    SwishSetSearchLimit( search, "swishtitle", "a", "z" );
    SwishSetSearchLimit( search, "age", "18", "65" );

    if ( SwishError( swish_handle ) )  // e.g. can't define two limits for same prop name
        print_error_or_abort( swish_handle );

    // use SwishResetLimit() if wish to change the parameters on a active search object        

    *****/


    /* Now we are ready to search  */


    while ( 1 )
    {
        printf("Enter search words: ");
        if ( !fgets( input_buf, 200, stdin ) )
            break;


        results = SwishExecute( search, input_buf );

        /* check for errors */

        if ( SwishError( swish_handle ) )
        {
            print_error_or_abort( swish_handle );

            if ( results ) /* probably always true */
                Free_Results_Object( results );

            continue;
        }

        display_results( swish_handle, results );
        Free_Results_Object( results );

#ifdef MEM_TEST
        /* It's expected to see some memory used here since a swish_handle exists */
        Mem_Summary("End of loop", 1);
#endif

    }

    Free_Search_Object( search );
    SwishClose( swish_handle );


    /* Look for memory leaks -- configure swish-e with --enable-memtrace to use */
#ifdef MEM_TEST
    Mem_Summary("At end of program", 1);
#endif

    return 0;
}

/* Display some standard properties -- see perl/SWISHE.xs for how to get at the data */

static void display_results( SW_HANDLE swish_handle, SW_RESULTS results )
{
    SW_RESULT result;
    int       hits;
    int       first = 1;

    if ( !results )  /* better safe than sorry */
        return;



    /* Display the set of headers for the index(es) */
    print_index_headers( swish_handle, results );


    /* Try to get metadata from the index */
    print_index_metadata( swish_handle );

    hits = SwishHits( results );

    if ( 0 == hits )
    {
        printf("no results!\n");
        return;
    }


    printf("# Total Results: %d\n", hits );




    if ( SwishSeekResult(results, 0 ) < 0 )  // how to seek to a page of results
    {
        print_error_or_abort( swish_handle );  /* seek past end of file */
        return;
    }

   

    while ( (result = SwishNextResult( results )) )
    {

        /* This SwishResultPropertyStr() will work for all types of props */
        /* But SwishResultPropertyULong() can be used to return numeric types */
        /* Should probably check for errors after every call  */
        /* SwishResultPropertyULong will return ULONG_MAX if the value cannot be returned */
        /* that could mean an error, or just that there was not a property assigned (which is not an error) */

        printf("Path: %s\n  Rank: %lu\n  Size: %lu\n  Title: %s\n  Index: %s\n  Modified: %s\n  Record #: %lu\n  File   #: %lu\n\n",
            SwishResultPropertyStr   ( result, "swishdocpath" ),
            SwishResultPropertyULong ( result, "swishrank" ),
            SwishResultPropertyULong ( result, "swishdocsize" ),
            SwishResultPropertyStr   ( result, "swishtitle"),
            SwishResultPropertyStr   ( result, "swishdbfile" ),
            SwishResultPropertyStr   ( result, "swishlastmodified" ),
            SwishResultPropertyULong ( result, "swishreccount" ),  /* can figure this out in loop, of course */
            SwishResultPropertyULong ( result, "swishfilenum" )
        );



        /* Generally not useful, but also can lookup Index header data via the current result */
        {
            SWISH_HEADER_VALUE header_value;
            SWISH_HEADER_TYPE  header_type;
            const char *example = "WordCharacters";
            
            header_value = SwishResultIndexValue( result, example, &header_type );
            print_header_value( swish_handle, example, header_value, header_type );
        }

        if ( first )
        {
            printf( "Testing SW_ResultToSW_HANDLE() = '%s'\n",
                SW_ResultToSW_HANDLE( result ) == swish_handle ? "OK" : "Not OK" );

            first = 0;
        }
            
    }

    
}


/**********************************************************************
* print_index_headers
*
*   This displays the standard headers associated with an index
*
*   Pass in:
*       swish_handle -- for standard headers
*
*   Note:
*       The SWISH_HEADER value, and the data it points to, is only
*       valid during the current call.
*
*
***********************************************************************/

static void print_index_headers( SW_HANDLE swish_handle, SW_RESULTS results )
{
    const char **header_names = SwishHeaderNames(swish_handle);  /* fetch the list of available header names */
    const char **index_name = SwishIndexNames( swish_handle );
    SWISH_HEADER_VALUE header_value;
    SWISH_HEADER_TYPE  header_type;

    /* display for each index */

    while ( *index_name )
    {
        const char **cur_header = header_names;

        while ( *cur_header )
        {
            header_value = SwishHeaderValue( swish_handle, *index_name, *cur_header, &header_type );
            print_header_value( swish_handle, *cur_header, header_value, header_type );


            cur_header++;  /* move to next header name */
        }


        /* Now print out results-specific data */

        header_value = SwishParsedWords( results, *index_name );
        print_header_value( swish_handle, "Parsed Words", header_value, SWISH_LIST );

        header_value = SwishRemovedStopwords( results, *index_name );
        print_header_value( swish_handle, "Removed Stopwords", header_value, SWISH_LIST );


        index_name++;  /* move to next index file */
    }
}

static void print_header_value( SW_HANDLE swish_handle, const char *name, SWISH_HEADER_VALUE head_value, SWISH_HEADER_TYPE head_type )
{
    const char **string_list;
    
    printf("# %s:", name );

    switch ( head_type )
    {
        case SWISH_STRING:
            printf(" %s\n", head_value.string ? head_value.string : "" );
            return;

        case SWISH_NUMBER:
            printf(" %lu\n", head_value.number );
            return;

        case SWISH_BOOL:
            printf(" %s\n", head_value.boolean ? "Yes" : "No" );
            return;

        case SWISH_LIST:
            string_list = head_value.string_list;
            
            while ( *string_list )
            {
                printf(" %s", *string_list );
                string_list++;
            }
            printf("\n");
            return;

        case SWISH_HEADER_ERROR:
            print_error_or_abort( swish_handle );
            return;

        default:
            printf(" Unknown header type '%d'\n", (int)head_type );
            return;
    }
}


/**********************************************************************
* print_index_metadata
*
*   This displays the metanames and property names in each index.
*
*   Pass in:
*       swish_handle -- for standard headers
*
*   Note:
*       The SWISH_HEADER value, and the data it points to, is only
*       valid during the current call.
*
*
***********************************************************************/

static void print_index_metadata( SW_HANDLE swish_handle )
{
    const char **index_name = SwishIndexNames( swish_handle );
    
    while ( *index_name ) {
      SWISH_META_LIST meta_list = SwishMetaList( swish_handle, *index_name );
      SWISH_META_LIST prop_list = SwishPropertyList( swish_handle, *index_name );

      while ( *meta_list ) {
	printf("# Meta: " );
	printf( "%s ", SwishMetaName(*meta_list));
	printf( "type=%d ", SwishMetaType(*meta_list));
	printf( "id=%d ", SwishMetaID(*meta_list));
	printf("\n");
	meta_list++;
      }      
      while ( *prop_list ) {
	printf("# Property: " );
	printf( "%s ", SwishMetaName(*prop_list));
	printf( "type=%d ", SwishMetaType(*prop_list));
	printf( "id=%d ", SwishMetaID(*prop_list));
	printf("\n");
	prop_list++;
      }      
      index_name++;
    }
}


/*************************************************************
*  print_error_or_abort -- display an error message / abort
*
*   This displays the error message, and aborts if it's a critical
*   error.  This is overkill -- normally a critical error means
*   that the you should call SwishClose() and start over.
*
*   On searches means that the search could not be completed
*
*
**************************************************************/

static void print_error_or_abort( SW_HANDLE swish_handle )
{
    if ( !SwishError( swish_handle ) )
        return;

    /* On critical errors simply exit -- normally you would SwishClose() and loop */

    if ( SwishCriticalError( swish_handle ) )
       SwishAbortLastError( swish_handle );   /* prints message and exits */


    /* print a message */        
    fprintf(stderr,
        "err: Number [%d], Type [%s],  Optional Message: [%s]\n",
        SwishError( swish_handle ),
        SwishErrorString( swish_handle ),
        SwishLastErrorMsg( swish_handle )
    );
}

/*
 * This shows how to use the stemmer based on a result.
 * It's done this way because a result is related to a
 * specific index (where a result list may contain results
 * from many indexes).
 * Typically, the stemmer is used at search time to highlight words
 * so it would be based on a given result.
 */

static void demo_stemming( SW_RESULTS results )
{
    SW_RESULT r;

    printf("\n-- Stemmer Test --\n");


    if ( !SwishHits( results ) )
    {
        printf("Couldn't test stemming because search returned no results\n");
        return;
    }

    if (SwishSeekResult( results, 0) )
    {
        printf("Failed to seek to result 0\n");
        return;
    }
    r = SwishNextResult( results );

    if ( !r )
    {
        printf("Failed to get first result\n");
        return;
    }

    printf("Fuzzy Mode: %s\n", SwishFuzzyMode( r ) );

    stem_it( r, "running" );
    stem_it( r, "runs" );
    stem_it( r, "12345" );
    stem_it( r, "abc3def" );
    stem_it( r, "");
    stem_it( r, "sugar" );  /* produces two metaphones */
}

static void stem_it( SW_RESULT r, char *word )
{
    const char **word_list;
    SW_FUZZYWORD fw;

    printf(" [%s] : ", word );
    
    fw = SwishFuzzyWord( r, word );
    printf(" Status: %d", SwishFuzzyWordError(fw) );
    printf(" Word Count: %d\n", SwishFuzzyWordCount(fw) );

    printf("   words:");
    word_list = SwishFuzzyWordList( fw );
    while ( *word_list )
    {
        printf(" %s", *word_list );
        word_list++;
    }
    
    printf("\n");

    SwishFuzzyWordFree( fw );
}

�����swish-e-2.4.7/src/methods.c�������������������������������������������������������������������������0000664�0000771�0001750�00000003276�11166010110�012421� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*

$Id: methods.c 1736 2005-05-12 15:41:22Z karman $


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 14:58:45 CDT 2005
** added GPL

*/

#include "swish.h"

#ifdef ALLOW_FILESYSTEM_INDEXING_DATA_SOURCE
extern struct _indexing_data_source_def FileSystemIndexingDataSource;
#endif

#ifdef ALLOW_HTTP_INDEXING_DATA_SOURCE
extern struct _indexing_data_source_def HTTPIndexingDataSource;
#endif

#ifdef ALLOW_EXTERNAL_PROGRAM_DATA_SOURCE
extern struct _indexing_data_source_def ExternalProgramDataSource;
#endif    


struct _indexing_data_source_def *data_sources[] = {

#ifdef ALLOW_FILESYSTEM_INDEXING_DATA_SOURCE
    &FileSystemIndexingDataSource,
#endif

#ifdef ALLOW_HTTP_INDEXING_DATA_SOURCE
    &HTTPIndexingDataSource,
#endif

#ifdef ALLOW_EXTERNAL_PROGRAM_DATA_SOURCE
    &ExternalProgramDataSource,
#endif    

    0
};
����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/compress.h������������������������������������������������������������������������0000664�0000771�0001750�00000006673�11166010110�012622� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
** Copyright (C) 1995, 1996, 1997, 1998 Hewlett-Packard Company
** Originally by Kevin Hughes, kev@kevcom.com, 3/11/94
**
$Id: compress.h 1736 2005-05-12 15:41:22Z karman $

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL


*/

int sizeofcompint(int number);
void compress1(int num, FILE *fp, int (*f_putc)(int , FILE *));
/* unsigned char *compress2(int num, unsigned char *buffer);*/
unsigned char *compress3(int num, unsigned char *buffer);

int uncompress1(FILE *fp, int (*f_getc)(FILE *fp));
int uncompress2(unsigned char **buffer);


unsigned long PACKLONG(unsigned long num);
void PACKLONG2(unsigned long num, unsigned char *buffer);

unsigned long UNPACKLONG(unsigned long num);
unsigned long UNPACKLONG2(unsigned char *buffer);


sw_off_t PACKFILEOFFSET(sw_off_t num);
void PACKFILEOFFSET2(sw_off_t num, unsigned char *buffer);

sw_off_t UNPACKFILEOFFSET(sw_off_t num);
sw_off_t UNPACKLONGFILEOFFSET2(unsigned char *buffer);

void compress_location_values(unsigned char **buf,unsigned char **flagp,int filenum,int frequency, unsigned int *posdata);
void compress_location_positions(unsigned char **buf,unsigned char *flag,int frequency, unsigned int *posdata);

void uncompress_location_values(unsigned char **buf,unsigned char *flag, int *filenum,int *frequency);
void uncompress_location_positions(unsigned char **buf, unsigned char flag, int frequency, unsigned int *posdata);

void CompressCurrentLocEntry(SWISH *, ENTRY *);

int compress_worddata(unsigned char *, int, int );
void uncompress_worddata(unsigned char **,int *, int);
void    remove_worddata_longs(unsigned char *,int *);

/* Here is the worst case size for a compressed number 
** MAXINTCOMPSIZE stands for MAXimum INTeger COMPressed SIZE
**
** There are many places in the code in which we allocate
** space for a compressed number. In the worst case this size is 5
** for 32 bit number, 10 for a 64 bit number.
**
** The way this compression works is reserving the first bit 
** in each byte to store a flag. The flag is set in all bytes
** except for the last one.
** This only gives 7 bits per byte to store the number.
**
** For example, to store 1000 (binary 1111101000) we will get:
** 
** 1st byte    2th byte
** 10000111    01101000
** ^           ^
** |           |
** |           Flag to indicate that this is tha last byte
** |
** Flag set to indicate that more bytes follow this one
**
** So, to compress a 32 bit number we need 5 bytes and for
** a 64 bit number we will use 10 bytes for the worst case
*/
#define MAXINTCOMPSIZE (((sizeof(int) * 8) / 7) + (((sizeof(int) * 8) % 7) ? 1 : 0))

���������������������������������������������������������������������swish-e-2.4.7/src/headers.h�������������������������������������������������������������������������0000775�0000771�0001750�00000003656�11166010110�012403� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: headers.h 1736 2005-05-12 15:41:22Z karman $

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL



*/

#ifndef HEADERS_H
#define HEADERS_H 1

#ifdef __cplusplus
extern "C" {
#endif


/*************************************************************************
*  Index File Header access interface
*
*   Notes:
*       Must match up with the public library interface headers.
*
*
**************************************************************************/


typedef enum {
    SWISH_NUMBER,
    SWISH_STRING,
    SWISH_LIST,
    SWISH_BOOL,
    SWISH_WORD_HASH,
    SWISH_OTHER_DATA,
    SWISH_HEADER_ERROR /* must check error in this case */
} SWISH_HEADER_TYPE;

typedef union
{
    const char           *string;
    const char          **string_list;
          unsigned long   number;
          int             boolean;
} SWISH_HEADER_VALUE;


void print_index_headers( IndexFILE *indexf );

IndexFILE *indexf_by_name( SWISH *sw, const char *index_name );

#ifdef __cplusplus
}
#endif /* __cplusplus */


#endif /* !HEADERS_H */


����������������������������������������������������������������������������������swish-e-2.4.7/src/extprog.c�������������������������������������������������������������������������0000775�0000771�0001750�00000042553�11166010110�012452� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*

$Id: extprog.c 1945 2007-10-22 14:54:07Z karpet $

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 15:51:39 CDT 2005
** added GPL
** removed Copyright HP statement, since moseley created this file from scratch.

** Mar 27, 2001 - created moseley
**
*/

#include "acconfig.h"

#ifdef HAVE_UNISTD_H
#include <unistd.h>
#endif

#include "swish.h"
#include "mem.h"
#include "swstring.h"
#include "index.h"
#include "file.h"
#include "error.h"
#include "parse_conffile.h"
#include "bash.h" /* for locating a program */

static char *find_command_in_path(const char *name, const char *path_list, int *path_index);
static char *get_env_path_with_libexecdir( void );


struct MOD_Prog
{
    /* prog system specific configuration parameters */
    struct swline *progparameterslist;
};


/* 
  -- init structures for this module
*/

void initModule_Prog (SWISH  *sw)
{
    struct MOD_Prog *self;

    self = (struct MOD_Prog *) emalloc(sizeof(struct MOD_Prog));
    sw->Prog = self;

    /* initialize buffers used by indexstring */
    self->progparameterslist = (struct swline *) NULL;

    return;
}

void freeModule_Prog (SWISH *sw)
{
    struct MOD_Prog *self = sw->Prog;


    if ( self->progparameterslist )
        efree( self->progparameterslist );

    efree ( self );
    sw->Prog = NULL;

    return;
}

int configModule_Prog (SWISH *sw, StringList *sl)

{
    struct MOD_Prog *self = sw->Prog;
    char *w0    = sl->word[0];

    if (strcasecmp(w0, "SwishProgParameters") == 0)
    {
        if (sl->n > 1)
        {
            grabCmdOptions(sl, 1, &self->progparameterslist);
        }
        else
            progerr("%s: requires at least one value", w0);
    }

    else 
    {
        return 0;                   /* not a module directive */
    }

    return 1;
}



static FILE   *open_external_program(SWISH * sw, char *prog)
{
    char   *cmd;
    char   *full_path;
    FILE   *fp;
    size_t  total_len;
    struct swline *progparameterslist = sw->Prog->progparameterslist;
    int    path_index = 0;  /* index into $PATH */
    char  *env_path = get_env_path_with_libexecdir();
	
    if ( ! strcmp( prog, "stdin") )
        return stdin;

    normalize_path( prog );  /* flip backslashes to forward slashes */

    full_path = find_command_in_path( (const char *)prog, env_path, &path_index );
    if ( !full_path )
        progerr("Failed to find program '%s' in PATH: %s ", prog, env_path );

    efree( env_path );

    if ( sw->verbose )
        printf("External Program found: %s\n", full_path );



    /* get total length of configuration parameters */
    total_len = 0;
    while (progparameterslist)
    {
        total_len += strlen(progparameterslist->line) + 1; /* separate by spaces */
        progparameterslist = progparameterslist->next;
    }

    cmd = emalloc(total_len + strlen( full_path ) + 1);
    strcpy(cmd, full_path);
    efree( full_path );


#if defined(_WIN32) && !defined(__CYGWIN__)

    make_windows_path( cmd );

#endif


    progparameterslist = sw->Prog->progparameterslist;
    while (progparameterslist)
    {
        strcat(cmd, " ");
        strcat(cmd, progparameterslist->line);
        progparameterslist = progparameterslist->next;
    }

    fp = popen(cmd, F_READ_TEXT);

    if (!fp)
        progerrno("Failed to spawn external program '%s': ", cmd);

    efree(cmd);
    return fp;
}

/* To make filters work with prog, need to write the file out to a temp file */
/* It will be faster to do the filtering from within the "prog" program */
/* This may not be safe if running as a threaded app, and I'm not clear on how portable this is */
/* This also uses read_stream to read in the file -- so the entire file is read into memory instead of chunked to the temp file */

/* Notice that the data is read out in TEXT mode -- this is because it's read from the */
/* external program in TEXT mode.  Binary files will be modified while in memory */
/* (under Windows) but writing back in TEXT mode should restore the file to its */
/* original binary format for use by the filter.  Really, don't use FileFilter with -S prog */

static void    save_to_temp_file(SWISH *sw, FileProp *fprop)
{
    FILE   *out;
    char   *rd_buffer = NULL;   /* complete file read into buffer */
    size_t  bytes;
    struct FilterList *filter_save = fprop->hasfilter;


    /* slirp entire file into memory -- yuck */
    fprop->hasfilter = NULL;  /* force reading fprop->fsize bytes */
    rd_buffer = read_stream(sw, fprop, 0);

    fprop->hasfilter = filter_save;

    /* Save content to a temporary file */
    efree( fprop->work_path );
    out = create_tempfile(sw, F_WRITE_TEXT, "fltr", &fprop->work_path, 0 );

    bytes = fwrite( rd_buffer, 1, fprop->fsize, out );

    if ( bytes != (size_t)fprop->fsize )
        progerrno("Failed to write temporary filter file '%s': ", fprop->work_path);


    /* hide the fact that it's an external program */
    fprop->fp = (FILE *) NULL;


/* **JMRUIZ    efree(rd_buffer);  */
    fclose( out );

}



static void    extprog_indexpath(SWISH * sw, char *prog)
{
    FileProp *fprop;
    FILE   *fp;
    char   *ln;
    char   *real_path;
    long    fsize;
    time_t  mtime;
    int     index_no_content;
    long    truncate_doc_size;
    int     docType = 0;
    int     original_update_mode = sw->Index->update_mode;

    mtime = 0;
    fsize = -1;
    index_no_content = 0;
    real_path = NULL;

    fp = open_external_program(sw, prog);

    ln = emalloc(MAXSTRLEN + 1);

    truncate_doc_size = sw->truncateDocSize;
    sw->truncateDocSize = 0;    /* can't truncate -- prog should make sure doc is not too large */
    // $$$ This is no longer true with libxml push parser

    // $$$ next time, break out the header parsing in its own function, please

    /* loop on headers */
    while (fgets(ln, MAXSTRLEN, fp) != NULL)
    {
        char    *end;
        char    *line;
        int     has_filter = 0;

        line = str_skip_ws(ln); /* skip leading white space */
        end = strrchr(line, '\n'); /* replace \n with null -- better to remove trailing white space */

        /* trim white space */
        if (end)
        {
            while ( end > line && isspace( (int)*(end-1) ) )
                end--;

            *end = '\0';
        }

        if (strlen(line) == 0) /* blank line indicates body */
        {
            if (!real_path)
                progerr("External program failed to return required headers Path-Name:");

            if ( fsize == -1)
                progerr("External program failed to return required headers Content-Length: when processing file '%s'", real_path);

            if ( fsize == 0 && sw->verbose >= 2)
                progwarn("External program returned zero Content-Length when processing file'%s'", real_path);



            /* Create the FileProp entry to describe this "file" */

            /* This is not great -- really should make creating a fprop more generic */
            /* this was done because file.c assumed that the "file" was on disk */
            /* which has changed over time due to filters, http, and prog */

            fprop = init_file_properties();  
            fprop->real_path = real_path;
            fprop->work_path = estrdup( real_path );
            fprop->orig_path = estrdup( real_path );

            /* Set the doc type from the header */
            if ( docType ) 
            {
                fprop->doctype   = docType;
                docType = 0;
            } 


            /* set real_path, doctype, index_no_content, filter, stordesc */
            init_file_prop_settings(sw, fprop);

            fprop->fp = fp; /* stream to read from */
            fprop->fsize = fsize; /* how much to read */
            fprop->source_size = fsize;  /* original size of input document - should be an extra header! */
            fprop->mtime = mtime;

            /* header can force index_no_content */
            if (index_no_content)
                fprop->index_no_content++;


            /*  the quick hack to make filters work is for FilterOpen
             *  to see that fprop->fp is set, read it into a buffer
             *  write it to a temporary file, then call the filter
             *  program as noramlly is done.  But much smarter to
             *  simply filter in the prog, after all.  Faster, too.
             */

            if (fprop->hasfilter)
            {
                save_to_temp_file( sw , fprop );
                has_filter++; /* save locally, in case it gets reset somewhere else */
            }

            if (sw->verbose >= 3)
                printf("%s", real_path);
            else if (sw->verbose >= 2)
                printf("Processing %s...\n", real_path);


            do_index_file(sw, fprop);

            /* Reset Update Mode */
            sw->Index->update_mode = original_update_mode;


            if ( has_filter && remove( fprop->work_path ) )
                progwarnno("Error removing temporary file '%s': ", fprop->work_path);

            free_file_properties(fprop);
            // efree(real_path); free_file_properties will free the paths
            real_path = NULL;
            mtime = 0;
            fsize = 0;
            index_no_content = 0;
            
        }


        else /* we are reading headers */
        {
            if (strncasecmp(line, "Content-Length", 14) == 0)
            {
                char *x = strchr(line, ':');
                if (!x)
                    progerr("Failed to parse Content-Length header '%s'", line);
                fsize = strtol(++x, NULL, 10);
                continue;
            }

            if (strncasecmp(line, "Last-Mtime", 10) == 0)
            {
                char *x = strchr(line, ':');
                if (!x)
                    progerr("Failed to parse Last-Mtime header '%s'", line);
                mtime = strtol(++x, NULL, 10);
                continue;
            }

            if (strncasecmp(line, "No-Contents:", 12) == 0)
            {
                index_no_content++;
                continue;
            }

            if (strncasecmp(line, "Charset", 7) == 0)
            {
                continue;
            }

            if (strncasecmp(line, "Path-Name", 9) == 0)
            {
                char *x = strchr(line, ':');
                if (!x)
                    progerr("Failed to parse Path-Name header '%s'", line);

                x = str_skip_ws(++x);
                if (!*x)
                    progerr("Failed to find path name in Path-Name header '%s'", line);

                real_path = emalloc(strlen(x) + 1);
                strcpy(real_path, x);
                continue;
            }

            if (strncasecmp(line, "Document-Type", 13) == 0)
            {
                char *x = strchr(line, ':');
                if (!x)
                    progerr("Failed to parse Document-Type '%s'", line);

                x = str_skip_ws(++x);
                if (!*x)
                    progerr("Failed to documnet type in Document-Type header '%s'", line);

                if ( !(docType = strtoDocType( x )) )
                    progerr("documnet type '%s' not a valid Swish-e document type in Document-Type header '%s'", x, line);

                continue;
            }

           /* new Update-Mode: [Update|Remove|Index] header
            * for this to work, swish-e has to be compiled with incremental option and
            * in update mode (-u) so that index is opened in read/write mode
            * dpavlin 2004-12-09
            */

            if (strncasecmp(line, "Update-Mode", 11) == 0)
            {
				char *x = strchr(line, ':');
#ifndef USE_BTREE
                progerr("Cannot use Update-Mode header with this version of Swish-e.  Rebuild with --enable-incremental.");
#endif
                /* April 8, 2005 - If update mode is set on the initial indexing job
                 * then the btree code fails.  So for now restrict this feature for only
                 * when -r or -r specified on the command line */
                if ( MODE_UPDATE != original_update_mode && MODE_REMOVE != original_update_mode )
                    progerr("Cannot use 'Update-Mode' header without -u or -r");

                if (!x)
                    progerr("Failed to parse Update-Mode '%s'", line);

                x = str_skip_ws(++x);
                if (!*x)
                    progerr("Failed to parse Update-Mode header '%s'", line);

               /* should we dump error here? It seem to work without update mode! - dpavlin
                * I say just let it run. Without -u or -r the index is recreated, though
                * that may be more of an issue.
                * In fact, maybe with USE_BTREE need a way to explicitly say to 
                * clear the index.  Forget using -u or -r can be a big mistake. - moseley
                * 
               if (sw->Index->update_mode != MODE_UPDATE && sw->Index->update_mode != MODE_REMOVE)
                       progwarn("Update-Mode header is supported only if swish-e is invoked in update (-u) mode");
                */



               if ( strncasecmp(x, "Update", 6) == 0 ) {
                       sw->Index->update_mode = MODE_UPDATE;
                       if ( sw->verbose >= 4 ) printf( "Input file selected: %s (MODE_UPDATE)\n", x );
               } else if ( strncasecmp(x, "Remove", 6) == 0 ) {
                       sw->Index->update_mode = MODE_REMOVE;
                       if ( sw->verbose >= 4 ) printf( "Input file selected: %s (MODE_REMOVE)\n", x );
               } else if ( strncasecmp(x, "Index", 5) == 0 ) {
                       sw->Index->update_mode = MODE_UPDATE;
                       if ( sw->verbose >= 4 ) printf( "Input file selecte: %s (MODE_UPDATE)\n", x );
               } else {
                       progerr("Unknown Update-Mode: %s", x);
               }

                continue;
            }




            progwarn("Unknown header line: '%s' from program %s", line, prog);

        }
    }

    efree(ln);

    /* restore the setting */
    sw->truncateDocSize = truncate_doc_size;

    if ( fp != stdin )
        if ( pclose(fp) == -1 )
            progwarnno("Failed to properly close external program: ");
    
}





/* Don't use old method of config checking */
static int     extprog_parseconfline(SWISH * sw, StringList *l)
{
    return 0;
}



struct _indexing_data_source_def ExternalProgramDataSource = {
    "External-Program",
    "prog",
    extprog_indexpath,
    extprog_parseconfline
};


/* From GNU Which */
static char *find_command_in_path(const char *name, const char *path_list, int *path_index)
{
  char *found = NULL, *full_path;
  int status, name_len;
  int absolute_path_given = 0;
  char *abs_path = NULL;
  name_len = strlen(name);

  if (!absolute_program(name))
    absolute_path_given = 0;
  else
  {
    char *p;
    absolute_path_given = 1;

    if (abs_path)
      xfree(abs_path);

    if (*name != '.' && *name != '/' && *name != '~')
    {
      abs_path = (char *)xmalloc(3 + name_len);
      strcpy(abs_path, "./");
      strcat(abs_path, name);
    }
    else
    {
      abs_path = (char *)xmalloc(1 + name_len);
      strcpy(abs_path, name);
    }

    path_list = abs_path;
    p = strrchr(abs_path, '/');
    *p++ = 0;
    name = p;
  }

  while (path_list && path_list[*path_index])
  {
    char *path;

    if (absolute_path_given)
    {
      path = savestring(path_list);
      *path_index = strlen(path);
    }
    else
      path = get_next_path_element(path_list, path_index);

    if (!path)
      break;

#ifdef SKIPTHIS
    if (*path == '~')
    {
      char *t = tilde_expand(path);
      xfree(path);
      path = t;

      if (skip_tilde)
      {
        xfree(path);
        continue;
      }
    }

    if (skip_dot && *path != '/')
    {
      xfree(path);
      continue;
    }

    found_path_starts_with_dot = (*path == '.');
#endif

    full_path = make_full_pathname(path, name, name_len);
    xfree(path);

    status = file_status(full_path);

    /* This is different from "where" because it stops at the first found file */
    /* but where (and shells) continue to find first executable program in path */
    if (status & FS_EXISTS) 
    {
      if (status & FS_EXECABLE)
      {
        found = full_path;
        break;
      }
      else
          progwarn("Found '%s' in PATH but is not executable", full_path);
    }
    xfree(full_path);
  }

  return (found);
}


static char *get_env_path_with_libexecdir( void )
{
    char *pathbuf;
    char *path = getenv("PATH");
    char *execdir = get_libexec();  /* Should free */

    if ( !path )
        return execdir;

    pathbuf = (char *)emalloc( strlen( path ) + strlen( execdir ) + strlen( PATH_SEPARATOR ) + 1 );

    sprintf(pathbuf, "%s%s%s", path, PATH_SEPARATOR, execdir );
    return pathbuf;
}

�����������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/httpserver.c����������������������������������������������������������������������0000664�0000771�0001750�00000032324�11166010110�013160� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: httpserver.c 1736 2005-05-12 15:41:22Z karman $
**
** Copyright (C) 1995, 1996, 1997, 1998 Hewlett-Packard Company
** Originally by Kevin Hughes, kev@kevcom.com, 3/11/94
**

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 15:51:39 CDT 2005
** added GPL

**--------------------------------------------------------------------
** All the code in this file added by Ron Klachko ron@ckm.ucsf.edu 9/98
**
** change sprintf to snprintf to avoid corruption
** SRE 11/17/99
**
** fixed cast to int problems pointed out by "gcc -Wall"
** SRE 2/22/00
** 
*/

/*
** httpserver.c
*/

#include "acconfig.h"

#ifdef HAVE_UNISTD_H
#include <unistd.h>
#endif

#include <time.h>
#include <stdarg.h>

#include "swish.h"
#include "mem.h"
#include "swstring.h"
#include "index.h"

#include "http.h"
#include "httpserver.h"
#include "file.h"


/* The list of servers that we are acting on.
**/
static httpserverinfo *servers = 0;


static void parserobotstxt(char *robots_buffer, int buflen, httpserverinfo *server);
static char *isolatevalue(char *line, char *keyword, int *plen);
static int serverinlist(char *url, struct swline *list);



/* Find the robot rules for this URL.  If haven't retrieved them
** yet, do so now.
**/
httpserverinfo *getserverinfo(SWISH *sw, char *url)
{
    httpserverinfo *server;
    char *method;
    int methodlen;
    char *serverport;
    int serverportlen;
    static int lencontenttype=0;
    static char *contenttype=NULL;
    static int lenbuffer=0;
    static char *buffer=NULL;
    FILE *fp;
    struct MOD_Index *idx = sw->Index;
    time_t  last_modified;

    // argh, this is ugly
    char   *file_prefix;  // prefix for use with files written by swishspider -- should just be on the stack!
    

    if(!lenbuffer)buffer=emalloc((lenbuffer=MAXSTRLEN)+1);
    if(!lencontenttype)contenttype=emalloc((lencontenttype=MAXSTRLEN)+1);

    if ((method = url_method(url, &methodlen)) == 0) {
		return 0;
    }
    if ((serverport = url_serverport(url, &serverportlen)) == 0) {
		return 0;
    }
	
    /* Search for the rules
    **/
    for (server = servers; server; server = server->next) {
		if (equivalentserver(sw, url, server->baseurl)) {
			return server;
		}
    }
    
    /* Create a new entry for this server and add it to the list.
    **/
    server = (httpserverinfo *)emalloc(sizeof(httpserverinfo));
	
    /* +3 for the ://, +1 for the trailing /, +1 for the terminating null
    **/
    server->baseurl = (char *)emalloc(methodlen + serverportlen + 5);
    /* These 4 lines to avoid a call to non ANSI snprintf . May not be the
     best way but it ensures no buffer overruns */
    memcpy (server->baseurl,method,methodlen);
    memcpy (server->baseurl+methodlen,"://",3);
    memcpy (server->baseurl+methodlen+3,serverport,serverportlen);
    strcpy (server->baseurl+methodlen+3+serverportlen,"/");
    
    server->lastretrieval = 0;
    server->robotrules = 0;
    server->next = servers;
    servers = server;
	
    /* Only http(s) servers can full rules, all the other ones just get dummies
    ** (this is useful for holding last retrieval)
    **
    ** http://info.webcrawler.com/mak/projects/robots/norobots.html holds what
    ** many people consider the official web exclusion rules.  Unfortunately,
    ** the rules are not consistent about how records are formed.  One line
    ** states "the file consists of one or more records separated by one or more
    ** blank lines" while another states "the record starts with one or more User-agent
    ** lines, followed by one or more Disallow lines."
    **
    ** So, does a blank line after a User-agent line end a record?  The spec is
    ** unclear on this matter.  If the next legal line afer the blank line is
    ** a Disallow line, the blank line should most likely be ignored.  But what
    ** if the next line is another User-agent line?  For example:
    **
    ** User-agent: MooBot
    **
    ** User-agent: CreepySpider
    ** Disallow: /cgi-bin
    **
    ** One interpretation (based on blank lines termination records) is that MooBot
    ** may visit any location (since there are no Disallows for it).  Another
    ** interpretation (based on records needing both User-agent and Disallow lines)
    ** is that MooBot may not visit /cgi-bin
    **
    ** While poking around, I found at least one site (www.sun.com) that uses blank
    ** lines within records.  Because of that, I have decided to rely on records
    ** having both User-agent and Disallow lines (the second interpretation above).
    **/
    if (strncmp(server->baseurl, "http", 4) == 0) {
		if((int)(strlen(server->baseurl)+20)>=lenbuffer) {
			lenbuffer=strlen(server->baseurl)+20+200;
			buffer=erealloc(buffer,lenbuffer+1);
		}
		sprintf(buffer, "%srobots.txt", server->baseurl);



        file_prefix = emalloc( strlen(idx->tmpdir) + MAXPIDLEN + strlen("/swishspider@.contents+fill") );
        sprintf(file_prefix, "%s/swishspider@%ld", idx->tmpdir, (long) lgetpid());


		if (get(sw,contenttype, &last_modified, &server->lastretrieval, file_prefix, buffer) == 200)
		{
		    char   *robots_buffer;
		    int     filelen;
		    int     bytes_read;
		    
			if((int)(strlen(idx->tmpdir)+MAXPIDLEN+30)>=lenbuffer) {
				lenbuffer=strlen(idx->tmpdir)+MAXPIDLEN+30+200;
				buffer=erealloc(buffer,lenbuffer+1);
			}
			sprintf(buffer, "%s/swishspider@%ld.contents", idx->tmpdir, (long)lgetpid());
			fp = fopen(buffer, F_READ_TEXT);

			filelen = getsize(buffer);

            robots_buffer = emalloc( filelen + 1 );
            *robots_buffer = '\0';
            bytes_read = fread(robots_buffer, 1, filelen, fp);
            robots_buffer[bytes_read] = '\0';
            parserobotstxt( robots_buffer, bytes_read, server );

			efree( robots_buffer );

			//parserobotstxt(fp, server);
			fclose(fp); /* Have to close before unlink on Windows */
		}
		efree( file_prefix );
		
		cmdf(unlink, "%s/swishspider@%ld.response", idx->tmpdir, lgetpid());
		cmdf(unlink, "%s/swishspider@%ld.contents", idx->tmpdir, lgetpid());
		cmdf(unlink, "%s/swishspider@%ld.links", idx->tmpdir, lgetpid());
    }
	
    return server;
}


int urldisallowed(SWISH *sw, char *url)
{
    httpserverinfo *server;
    robotrules *rule;
    char *uri;
    int urilen;
	
    if ((server = getserverinfo(sw, url)) == 0) {
		return 1;
    }
    if ((uri = url_uri(url, &urilen)) == 0) {
		return 1;
    }
	
    for (rule = server->robotrules; rule; rule = rule->next) {
		if (strncmp(uri, rule->disallow, strlen(rule->disallow)) == 0) {
			return 1;
		}
    }
	
    return 0;
}

// quick fix to parse from Mac and Windows.
// Pass in:
//      char **next_start == pointer to a *char that has where the next string starts.
//      char *last_char   == pointer to last char in buffer.  Buffer MUST have room for one more char
// 
// returns NULL on no more strings

static char *next_line( char **next_start, char *last_char  )
{
    char *buffer = *next_start;
    char *start;


    // skip over any leading new lines or cr.
    while ( buffer <= last_char && ( *buffer == '\0' || *buffer == '\n' || *buffer == '\r' ) )
        buffer++;

    if ( buffer > last_char )
        return NULL;

    start = buffer;  // start of this word

    // Now find the end of this string
    while ( buffer <= last_char && ( *buffer != '\0' && *buffer != '\n' && *buffer != '\r' ) )
        buffer++;

    *buffer = '\0';  // mark the end of the string

    buffer++;
    *next_start = buffer;

    return start;
}

static char useragent[] = "user-agent:";
static char disallow[] = "disallow:";
static char swishspider[] = "swishspider";

static void parserobotstxt(char *robots_buffer, int buflen, httpserverinfo *server)
{
    char *buffer;
    char *bufend = robots_buffer + buflen -1;  // last char of string
    char *next_start = robots_buffer;
    
    enum {START, USERAGENT, DISALLOW} state = START;
    enum {SPECIFIC, GENERIC, SKIPPING} useragentstate = SKIPPING;
    char *p;
    int len;
    robotrules *entry;
    robotrules *entry2;
	
    server->useragent = 0;

    buffer = NULL;

    while ( (buffer = next_line( &next_start, bufend ) ) )
    {
        if ( strchr( buffer, '#' ) )
            *(strchr( buffer, '#' )) = '\0';

		if ((*buffer == '#') || (*buffer == '\0'))
			continue;

		
		if (strncasecmp(buffer, useragent, sizeof(useragent) - 1) == 0) {
			switch (state) {
			case DISALLOW:
			/* Since we found our specific user-agent, we can
			** skip the rest of the file.
				**/
				if (useragentstate == SPECIFIC) {
					return;
				}
				
				useragentstate = SKIPPING;
				
				/* explict fallthrough */
				
			case START:
			case USERAGENT:
				state = USERAGENT;
				
				if (useragentstate != SPECIFIC) {
					p = isolatevalue(buffer, useragent, &len);
					
					if ((len == (sizeof(swishspider) - 1)) &&
						(strncasecmp(p, swishspider, sizeof(swishspider) - 1) == 0) ) {
						useragentstate = SPECIFIC;
						
						/* We might have already parsed generic rules,
						** so clean them up if necessary.
						*/
						if (server->useragent) {
							efree(server->useragent);
						}
						for (entry = server->robotrules; entry; ) {
							entry2 = entry->next;
							efree(entry);
							entry = entry2;
						}
						server->robotrules = 0;
						
						server->useragent = (char *)emalloc(len + 1);
						strncpy(server->useragent, p, len);
						*(server->useragent + len) = '\0';
						
					}
					else if ((len == 1) && (*p == '*')) {
						useragentstate = GENERIC;
						server->useragent = (char *)emalloc(2);
						strcpy(server->useragent, "*"); /* emalloc'd 2 bytes, no safestrcpy */
					}
					
				}
				
				
				break;
				
			}
		}
		
		if (strncasecmp(buffer, disallow, sizeof(disallow) - 1) == 0) {
			state = DISALLOW;
			if (useragentstate != SKIPPING) {
				p = isolatevalue(buffer, disallow, &len);
				if (len) {
					entry = (robotrules *)emalloc(sizeof(robotrules));
					entry->next = server->robotrules;
					server->robotrules = entry;
					entry->disallow = (char *)emalloc(len + 1);
					strncpy(entry->disallow, p, len);
					*(entry->disallow + len) = '\0';
				}
			}
		}
    }
}


static char *isolatevalue(char *line, char *keyword, int *plen)
{

    /* Find the beginning of the value  **/
    for (line += strlen(keyword); *line && isspace((int)((unsigned char)*line)); line++ ) { /* cast to int 2/22/00 */
    }

    if ( !strlen(line) )
    {
        *plen = 0;
        return line;
    }
	
    /* Strip off trailing spaces  **/
    for (*plen = strlen(line); isspace((int)((unsigned char)*(line + *plen - 1))); (*plen)--) { /* cast to int 2/22/00 */
    }
	
    return line;
}


int equivalentserver(SWISH *sw, char *url, char *baseurl)
{
char *method;
int methodlen;
char *serverport;
int serverportlen;
char *basemethod;
int basemethodlen;
char *baseserverport;
int baseserverportlen;
struct multiswline *walk=NULL;
struct MOD_HTTP *http = sw->HTTP;
	
    method = url_method(url, &methodlen);
    serverport = url_serverport(url, &serverportlen);
    basemethod = url_method(baseurl, &basemethodlen);
    baseserverport = url_serverport(baseurl, &baseserverportlen);
	
    if (!method || !serverport || !basemethod || !baseserverport) {
		return 0;
    }
	
    /* If this is the same server, we just go for it
    **/
    if ((methodlen == basemethodlen) && (serverportlen == baseserverportlen) &&
		(strncasecmp(method, basemethod, methodlen) == 0) &&
		(strncasecmp(serverport, baseserverport, serverportlen) == 0)) {
		return 1;
    }
	
    /* Do we find the method/server info for this and the base url
    ** in the same equivalence list?
    **/
    for (walk = http->equivalentservers; walk; walk = walk->next ) {
		if (serverinlist(url, walk->list) &&
			serverinlist(baseurl, walk->list)) {
			return 1;
		}
    }
	
    return 0;
}


static int serverinlist(char *url, struct swline *list)
{
    char *method;
    int methodlen;
    char *serverport;
    int serverportlen;
    char *listmethod;
    int listmethodlen;
    char *listserverport;
    int listserverportlen;
    
    method = url_method(url, &methodlen);
    serverport = url_serverport(url, &serverportlen);
    if (!method || !serverport) {
		return 0;
    }
	
    for ( ; list; list = list->next) {
		listmethod = url_method(list->line, &listmethodlen);
		listserverport = url_serverport(list->line, &listserverportlen);
		if (listmethod && listserverport) {
			if ((methodlen == listmethodlen) && (serverportlen == listserverportlen) &&
				(strncasecmp(method, listmethod, methodlen) == 0) &&
				(strncasecmp(serverport, listserverport, serverportlen) == 0)) {
				return 1;
			}
		}
    }
    return 0;
}

������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/mem.h�����������������������������������������������������������������������������0000664�0000771�0001750�00000010064�11166010110�011532� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: mem.h 1736 2005-05-12 15:41:22Z karman $
**

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL


** Author: Bill Meier, June 2001
**
*/

#ifndef MEM_H
#define MEM_H 1

/*
** The following settings control the memory allocator. Each setting is independent.
** They also affect the actual memory usage of the program, because (currently)
** turning on any of these settings increases the size of each allocation.
** MEM_STATISTICS allocates the least extra and MEM_DEBUG allocates the most extra per call. 
**
** In addition (currently) turning on any of these settings will map all
** realloc calls into a alloc and free for simplier implementation. However, this
** should be transparent to all programs!
*/

/*
** Normal settings (but not required):
**		If you turn on MEM_DEBUG, turn on MEM_STATISTICS
**		If you turn on MEM_TRACE, turn on MEM_STATISTICS
*/

#include <memory.h>

#ifdef __cplusplus
extern "C" {
#endif

/* MEM_DEBUG checks for memory consistency on alloc/free */
/* #define MEM_DEBUG 0 -- enable with --enable-memdebug */

/* MEM_TRACE checks for unfreed memory, and where it is allocated */
/* #define MEM_TRACE 0 -- use --enable-memtrace */

/* MEM_STATISTICS gives memory statistics (bytes allocated, calls, etc */
/* #define MEM_STATISTICS 0 -- use --enable-memstats */


typedef struct _mem_zone {
	struct _zone	*next;		/* link to free chunk */
	char			*name;		/* name of zone */
	size_t			size;		/* size to grow zone by */
	int				attributes;	/* attributes of zone (not used yet) */
	unsigned int	allocs;		/* count of allocations (for statistics) */
} MEM_ZONE;


/* The following are the basic malloc/realloc/free replacements */
#if MEM_TRACE
extern size_t memory_trace_counter;
void Mem_bp(int n);
#endif

void *ecalloc(size_t nelem, size_t size);

#if MEM_DEBUG | MEM_TRACE | MEM_STATISTICS

#define emalloc(size) Mem_Alloc(size, __FILE__, __LINE__)
#define erealloc(ptr, size) Mem_Realloc(ptr, size, __FILE__, __LINE__)
#define efree(ptr) Mem_Free(ptr, __FILE__, __LINE__)

void *Mem_Alloc(size_t size, char *file, int line);
void *Mem_Realloc(void *ptr, size_t size, char *file, int line);
void Mem_Free(void *ptr, char *file, int line);

#else

void *emalloc(size_t size);
void *erealloc(void *ptr, size_t size);
void efree(void *ptr);

#endif

/* Hook to print out statistics if enabled */
void Mem_Summary(char *title, int final);

/* Memory zone routines */

/* create a zone -- size should be some reasonable number */
MEM_ZONE *Mem_ZoneCreate(char *name, size_t size, int attributes);

/* allocate memory from a zone (can use like malloc if you aren't going to realloc) */
void *Mem_ZoneAlloc(MEM_ZONE *head, size_t size);

/* free all memory in a zone */
void Mem_ZoneFree(MEM_ZONE **head); 

/* memory zone statistics */
#if MEM_STATISTICS
void Mem_ZoneStatistics(MEM_ZONE *head); 
#endif

/* make all memory in a zone reusable */
void Mem_ZoneReset(MEM_ZONE *head); 

/* Returns the allocated memory owned by a zone */
int Mem_ZoneSize(MEM_ZONE *head);

/* Don't let people use the regular C calls */
#define malloc $Please_use_emalloc
#define realloc $Please_use_erealloc
#define free $Please_use_efree


#ifdef __cplusplus
}
#endif /* __cplusplus */


#endif /* !MEM_H */
����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/result_sort.c���������������������������������������������������������������������0000664�0000771�0001750�00000035447�11166010110�013350� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: result_sort.c 2271 2009-03-16 13:43:15Z karpet $
**

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
    
** Mon May  9 12:52:52 CDT 2005 -- added GPL


** Move index presorting code to pre_sort.c  - Sept 2002 - moseley
**
*/

#include "config.h"
#include "swish.h"
#include "swstring.h"
#include "mem.h"
#include "merge.h"
#include "list.h"
#include "search.h"
#include "docprop.h"
#include "metanames.h"
#include "compress.h"
#include "error.h"
#include "db.h"
#include "parse_conffile.h"
#include "swish_qsort.h"
#include "result_sort.h"
#include "array.h"
#include "rank.h"

// #define DEBUGSORT 1


/*****************************************************************************
* compare_results_single_index - qsort compare function
*
*  compares properties for a single index which means the results
*  share the same parent db_results structure and associated SortData.
*
*  Will lazy load properties as needed when primary key matches
*
*  Just to keep things confusing, +1 = desc sort, -1 = ascending
*  qsort's output is then reversed when rebuilding the linked list of results.
*
*
*****************************************************************************/
static int compare_results_single_index(const void *s1, const void *s2)
{
    RESULT    *r1 = *(RESULT * const *) s1;
    RESULT    *r2 = *(RESULT * const *) s2;
    int       i;
    int       rc;
    int       num_fields      = r1->db_results->num_sort_props;
    SortData  *sort_data;
    int       *presorted;

    for (i = 0; i < num_fields; i++)
    {
        sort_data = &r1->db_results->sort_data[i];


        /* special case of sorting by (raw) rank */
        if ( sort_data->is_rank_sort )
        {
            if ( (rc = r1->rank - r2->rank) )
                return ( rc * sort_data->direction );
            else
                continue;
        }


        /* If haven't checked this property for a pre-sorted table then try to load the table */
        if ( !sort_data->property->sorted_loaded )
            LoadSortedProps( r1->db_results->indexf, sort_data->property );


        /* can we compare with pre-sorted numbers? (i.e. is the presorted data (sorted_data) loaded into the metaEntry?) */


        if ( (presorted = sort_data->property->sorted_data) )
        {
            if ( (rc = DB_ReadSortedData( presorted, r1->filenum - 1) -  DB_ReadSortedData( presorted, r2->filenum - 1)) )
                return ( rc * sort_data->direction );  /* is the multiplication slow? */
        }


        else /* must compare properties directly */
        {
            /* First, does an array exist to hold the pointers to the properties? */
            if ( !sort_data->key )
            {
                sort_data->key = (propEntry **)emalloc( r1->db_results->result_count * sizeof( propEntry *) );
                memset( sort_data->key, -1, r1->db_results->result_count * sizeof( propEntry *) );
            }


            /* Now load the properties if they do not already exist (a -1 pointer indicates undefined) */
            /* tfrequency is used to store the index into the key (propery key) array */

            if ( sort_data->key[ r1->tfrequency ] == (propEntry *)-1 )
                sort_data->key[ r1->tfrequency ] = getDocProperty( r1, &sort_data->property, 0, sort_data->property->sort_len );

            if ( sort_data->key[ r2->tfrequency ] == (propEntry *)-1 )
                sort_data->key[ r2->tfrequency ] = getDocProperty( r2, &sort_data->property, 0, sort_data->property->sort_len );

            /* finally compare the properties */
            if ( (rc = Compare_Properties(  sort_data->property, sort_data->key[ r1->tfrequency ], sort_data->key[ r2->tfrequency ]) ) )
                return ( rc * sort_data->direction );
        }

    }
    return 0;
}
/*****************************************************************************
* compare_results
*
*  This code is used when tape-merging multiple indexes.  Almost the same
*  as above, but may be comparing results from different indexes must
*  associate data with each result and can not use the presorted index tables.
*
*  Must make sure that the sort keys are the same for each index
*  (e.g. same prop type, type of case compare)
*
*
*****************************************************************************/
int compare_results(const void *s1, const void *s2)
{
    RESULT    *r1 = *(RESULT * const *) s1;
    RESULT    *r2 = *(RESULT * const *) s2;
    int       i;
    int       rc;
    int       num_fields      = r1->db_results->num_sort_props;
    SortData  *sort_data1;
    SortData  *sort_data2;

    for (i = 0; i < num_fields; i++)
    {
        sort_data1 = &r1->db_results->sort_data[i];
        sort_data2 = &r2->db_results->sort_data[i];


        /* special case of sorting by (raw) rank */
        /* this is a bit more questionable because comparing raw ranks between two */
        /* indexes may not work well -- probably ok now with simple word-count based ranking */

        /* very unlikely that there is a property "swishrank" that was not the rank, so don't check both */
        if ( sort_data1 ->is_rank_sort )
        {
            if ( (rc = r1->rank - r2->rank) )
                return ( rc * sort_data1->direction );
            else
                continue;
        }



        /* First, does an array exist to hold the pointers to the properties for each result? */
        if ( !sort_data1->key )
        {
            sort_data1->key = (propEntry **)emalloc( r1->db_results->result_count * sizeof( propEntry *) );
            memset( sort_data1->key, -1, r1->db_results->result_count * sizeof( propEntry *) );
        }

        if ( !sort_data2->key )
        {
            sort_data2->key = (propEntry **)emalloc( r2->db_results->result_count * sizeof( propEntry *) );
            memset( sort_data2->key, -1, r2->db_results->result_count * sizeof( propEntry *) );
        }



        /* Now load the properties if they do not already exist (a -1 pointer indicates undefined) */

        if ( sort_data1->key[ r1->tfrequency ] == (propEntry *)-1 )
            sort_data1->key[ r1->tfrequency ] = getDocProperty( r1, &sort_data1->property, 0, sort_data1->property->sort_len );

        if ( sort_data2->key[ r2->tfrequency ] == (propEntry *)-1 )
            sort_data2->key[ r2->tfrequency ] = getDocProperty( r2, &sort_data2->property, 0, sort_data2->property->sort_len );

        /* finally compare the properties */
        if ( (rc = Compare_Properties(  sort_data1->property, sort_data1->key[ r1->tfrequency ], sort_data2->key[ r2->tfrequency ]) ) )
           return ( rc * sort_data1->direction );

    }
    return 0;
}



/*******************************************************************
*   Loads metaentry->sorted_data with sorted array for the given metaEntry
*
*   Call with:
*       *sw
*       *indexf
*       *metaEntry - meta entry in question
*
*   Returns:
*       pointer to an array of int (metaentry->sorted_data)
*
*   Notes:
*       This is also called by proplimit and merge code
*       And it's name sucks.
*
********************************************************************/
int    *LoadSortedProps(IndexFILE * indexf, struct metaEntry *m)
{
    unsigned char *buffer;
    int     sz_buffer;

    if ( m->sorted_loaded )
        return m->sorted_data;

    m->sorted_loaded = 1;  /* flag that we tried to load the data */

    DB_InitReadSortedIndex(indexf->sw, indexf->DB);

    /* Get the sorted index of the property */

    /* Convert to a property index */
    DB_ReadSortedIndex(indexf->sw, m->metaID, &buffer, &sz_buffer, indexf->DB);

#ifdef USE_PRESORT_ARRAY
    m->sorted_data = (int *)buffer;
    DB_EndReadSortedIndex(indexf->sw, indexf->DB);
    return m->sorted_data;

#endif


    /* If a table was found, then uncompress */
    /* FIX $$$ This should be in db_native.c */

    if (sz_buffer)
    {
        unsigned char *s = buffer;
        int j;

        m->sorted_data = (int *) emalloc(indexf->header.totalfiles * sizeof(int));

        /* Unpack / decompress the numbers */
        for (j = 0; j < indexf->header.totalfiles; j++)
            m->sorted_data[j] = uncompress2(&s);

        efree(buffer);
    }


    DB_EndReadSortedIndex(indexf->sw, indexf->DB);
    return m->sorted_data;
}



/***************************************************************************************
* sort_single_index_results
*
*   Call with
*       DB_RESULTS
*
*   Returns:
*       total results
*
*   This does all the work.
*       - initializes an array of arrays to hold properties
*       - pre-load the array[0] elements with properties, if needed
*         (i.e. when the first sort key is non-presorted
*
*   Todo:
*       This runs through the list results a number of times (plus qsort)
*
****************************************************************************************/
static int sort_single_index_results( DB_RESULTS *db_results )
{
    int lookup_props = 0;  /* flag if we need to load props initially */
    int results_in_index = 0;
    RESULT  *cur_result;
    RESULT **sort_array;
    SortData *sort_data;

    /* Any results to process? */
    if( !db_results->resultlist )
        return 0;


    /* sanity checks */

    /* Should check for too big, too? */
    if ( db_results->num_sort_props < 1 )
        progerr("called sort_single_index_results with invalid number of sort keys");

    if ( ! db_results->sort_data )
        progerr("called sort_single_index_results without a vaild sort_data struct");



    /* Need to tally up the number of results in this set */
    /* $$$$ can search.c do this when creating results? It's an extra loop */
    /* perhaps it can't be done in search.c -- need to set an index number on the result */

    cur_result = db_results->resultlist->head;

    while ( cur_result )
    {
        /* Set an index number which can be used to point into the sort_data->key array. */
        /* tfrequency is not used after rank calculations */

        cur_result->tfrequency = results_in_index++;

        cur_result = cur_result->next;
    }

    db_results->result_count = results_in_index;  /* needed so we know how big to create arrays */





    /* Do we need to lookup properties for the first key? */
    /* Not if sorting by rank (that's in the result) or if there's presorted data available */

    sort_data = &db_results->sort_data[0];



    if ( !sort_data->is_rank_sort && !sort_data->property->sorted_data )
        if ( !LoadSortedProps( db_results->indexf, sort_data->property ) )    /* can we load the array? */
        {
            /* otherwise, we must read all the properties off disk */

            lookup_props = 1;


            /* Create the array the size of the number of results to hold *propEntry's */
            /* This array sticks around until freeing the results object */

            sort_data->key = (propEntry **)emalloc( db_results->result_count * sizeof( propEntry *) );
            memset( sort_data->key, -1, db_results->result_count * sizeof( propEntry *) );
        }



    /* Now build an array to hold the results for sorting */

    sort_array = (RESULT **) emalloc(db_results->result_count * sizeof(RESULT *));


    /* Fill the array for qsort -- and load the properties, if they are needed */
    /* This could be optimized for rank, since that's in the result structure */
    /* but would add complexity to the sort functions -- need to benchmark or profile */

    cur_result = db_results->resultlist->head;


    while ( cur_result )
    {
        sort_array[ cur_result->tfrequency ] = cur_result;

        /* If can't use the presorted numbers ("meta->sorted_data") then load the properties */
        if ( lookup_props )
            sort_data->key[ cur_result->tfrequency ] =
                getDocProperty( cur_result, &sort_data->property, 0, sort_data->property->sort_len );

        cur_result = cur_result->next;
    }

    /* Sort them */
    swish_qsort(sort_array, db_results->result_count, sizeof(RESULT *), compare_results_single_index);




    /* Build the list -- the list is in reverse order, so build the list backwards */
    {
        RESULT *head = NULL;
        int j;

        for (j = 0; j < db_results->result_count; j++)
        {
            RESULT *r = sort_array[j];


            /* Now's a good time to normalize the rank as we are processing each result */


            /* Find the largest rank for scaling */
            if (r->rank > db_results->results->bigrank)
                db_results->results->bigrank = r->rank;


            if ( !head )             // first time
            {
                head = r;
                r->next = NULL;
            }
            else                    // otherwise, place this at the head of the list
            {
                r->next = head;
                head = r;
            }

        }
        db_results->sortresultlist = head;
        db_results->resultlist->head = head;
        db_results->currentresult = head;


    }


    /* Free the memory of the array */
    efree( sort_array );


    return db_results->result_count;
}


/***************************************************************************************
* sortresults - sorts the results for one or more indexes searched
*
*   Call with
*       A RESULTS_OBJECT which contains a list of DB_RESULTS
*
*   Returns:
*       total results
*
*   This simply loops through the indexex, and then scales the rank
*
*   rewritten Sept 2002 - moseley / again July 2003
*
****************************************************************************************/

int  sortresults(RESULTS_OBJECT *results)
{
    int         TotalResults = 0;
    DB_RESULTS *db_results = results->db_results;

    while ( db_results )
    {
        TotalResults += sort_single_index_results( db_results );
        db_results = db_results->next;
    }

    /* set rank scaling factor based on the largest rank found of all results */

    if (results->bigrank)
    {
        if ( DEBUG_RANK ) {
            fprintf(stderr, "bigrank found: %d\n", results->bigrank );
        }
        results->rank_scale_factor = 10000000 / results->bigrank;
    }
    else
        results->rank_scale_factor = 10000;


    return TotalResults;
}


�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/error.h���������������������������������������������������������������������������0000664�0000771�0001750�00000004652�11166010110�012113� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
** Copyright (C) 1995, 1996, 1997, 1998 Hewlett-Packard Company
** Originally by Kevin Hughes, kev@kevcom.com, 3/11/94
**


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL



**
**
** 2001-02-12 rasc   some parts rewritten (progerr uses vargs, now)
*/


#ifndef __HasSeenModule_Error
#define __HasSeenModule_Error		1


void set_error_handle( FILE *where );
void SwishErrorsToStderr( void );

void progerr (char *msgfmt, ...);
void progerrno (char *msgfmt, ...);

void set_progerr(int errornum, SWISH *sw, char *msgfmt,...);
void set_progerrno(int errornum, SWISH *sw, char *msgfmt,...);


void progwarn (char *msgfmt, ...);
void progwarnno (char *msgfmt, ...);


char *getErrorString(int);
int  SwishError(SWISH * sw);
char *SwishErrorString(SWISH *sw);
char *SwishLastErrorMsg(SWISH *sw);
void SwishAbortLastError(SWISH *sw);
int SwishCriticalError(SWISH *sw);
void reset_lasterror(SWISH *sw);



#define RC_OK 0

enum {
    INDEX_FILE_NOT_FOUND = -255,
    UNKNOWN_INDEX_FILE_FORMAT,
    NO_WORDS_IN_SEARCH,
    WORDS_TOO_COMMON,
    INDEX_FILE_IS_EMPTY,
    INDEX_FILE_ERROR,
    UNKNOWN_PROPERTY_NAME_IN_SEARCH_DISPLAY,
    UNKNOWN_PROPERTY_NAME_IN_SEARCH_SORT,
    INVALID_PROPERTY_TYPE,
    UNKNOWN_METANAME,
    UNIQUE_WILDCARD_NOT_ALLOWED_IN_WORD,
    WILDCARD_NOT_ALLOWED_AT_WORD_START,
    WORD_NOT_FOUND,
    SWISH_LISTRESULTS_EOF,
    HEADER_READ_ERROR,
    INVALID_SWISH_HANDLE,
    INVALID_RESULTS_HANDLE,
    SEARCH_WORD_TOO_BIG,
    QUERY_SYNTAX_ERROR,
    PROP_LIMIT_ERROR,
    WILDCARD_NOT_ALLOWED_WITHIN_WORD
};    
#endif

��������������������������������������������������������������������������������������swish-e-2.4.7/src/btree.c���������������������������������������������������������������������������0000664�0000771�0001750�00000066261�11166010110�012062� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*

$Id: btree.c 1946 2007-10-22 14:56:35Z karpet $

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 15:51:39 CDT 2005
** added GPL

*/


#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "swish.h"
#include "swstring.h"
#include "compress.h"
#include "mem.h"
#include "btree.h"
#include "error.h"

/* A BTREE page cannot be smaller than BTREE_MinPageSize */
#define BTREE_MinPageSize 4096
/* BTREE max key length*/
#define BTREE_MaxKeySize BTREE_MinPageSize>>2

/* A BTREE page can be greater than BTREE_MaxPageSize */
#define BTREE_MaxPageSize 65536

#define SizeInt16 2


/* Round in BTREE_MinPageSize */
#define BTREE_RoundPageSize(n) (sw_off_t)(((sw_off_t)(n) + (sw_off_t)(BTREE_MinPageSize - 1)) & (~(sw_off_t)(BTREE_MinPageSize - 1)))

#define BTREE_PageHeaderSize (2 * sizeof(sw_off_t) + 4 * sizeof(unsigned int)) 

#define BTREE_PageData(pg) ((pg)->data + BTREE_PageHeaderSize)
#define BTREE_EndData(pg) ((pg)->data + (pg)->data_end)
#define BTREE_KeyIndexOffset(data,i) (data + BTREE_PageHeaderSize + (i) * SizeInt16)
#define BTREE_KeyDataOffset(pg,i) ((*(BTREE_KeyIndexOffset((pg->data),(i))) <<8) + *(BTREE_KeyIndexOffset((pg->data),(i)) + 1))
#define BTREE_KeyData(pg,i) ((pg)->data + BTREE_KeyDataOffset((pg),(i)))

#define BTREE_SetNextPage(pg,num) ( *(int *)((pg)->data + 0 * sizeof(sw_off_t)) = PACKFILEOFFSET(num))
#define BTREE_SetPrevPage(pg,num) ( *(int *)((pg)->data + 1 * sizeof(sw_off_t)) = PACKFILEOFFSET(num))
#define BTREE_SetSize(pg,num)     ( *(int *)((pg)->data + 2 * sizeof(sw_off_t) + 0 * sizeof(unsigned int)) = PACKLONG(num))
#define BTREE_SetNumKeys(pg,num)  ( *(int *)((pg)->data + 2 * sizeof(sw_off_t) + 1 * sizeof(unsigned int)) = PACKLONG(num))
#define BTREE_SetFlags(pg,num)    ( *(int *)((pg)->data + 2 * sizeof(sw_off_t) + 2 * sizeof(unsigned int)) = PACKLONG(num))
#define BTREE_SetDataEnd(pg,num)  ( *(int *)((pg)->data + 2 * sizeof(sw_off_t) + 3 * sizeof(unsigned int)) = PACKLONG(num))

#define BTREE_GetNextPage(pg,num) ( (num) = UNPACKFILEOFFSET(*(int *)((pg)->data + 0 * sizeof(sw_off_t))))
#define BTREE_GetPrevPage(pg,num) ( (num) = UNPACKFILEOFFSET(*(int *)((pg)->data + 1 * sizeof(sw_off_t))))
#define BTREE_GetSize(pg,num)     ( (num) = UNPACKLONG(*(int *)((pg)->data + 2 * sizeof(sw_off_t) + 0 * sizeof(unsigned int))))
#define BTREE_GetNumKeys(pg,num)  ( (num) = UNPACKLONG(*(int *)((pg)->data + 2 * sizeof(sw_off_t) + 1 * sizeof(unsigned int))))
#define BTREE_GetFlags(pg,num)    ( (num) = UNPACKLONG(*(int *)((pg)->data + 2 * sizeof(sw_off_t) + 2 * sizeof(unsigned int))))
#define BTREE_GetDataEnd(pg,num)  ( (num) = UNPACKLONG(*(int *)((pg)->data + 3 * sizeof(sw_off_t) + 2 * sizeof(unsigned int))))

/* Flags */
#define BTREE_ROOT_NODE 0x1
#define BTREE_LEAF_NODE 0x2


int BTREE_WritePageToDisk(FILE *fp, BTREE_Page *pg)
{
    BTREE_SetNextPage(pg,pg->next);
    BTREE_SetPrevPage(pg,pg->prev);
    BTREE_SetSize(pg,pg->size);
    BTREE_SetNumKeys(pg,pg->n);
    BTREE_SetFlags(pg,pg->flags);
    BTREE_SetDataEnd(pg,pg->data_end);
    sw_fseek(fp,(sw_off_t)((sw_off_t)pg->page_number * (sw_off_t)BTREE_MinPageSize),SEEK_SET);
    return sw_fwrite(pg->data,pg->size,1,fp);
}

int BTREE_WritePage(BTREE *b, BTREE_Page *pg)
{
int hash = (int)(pg->page_number % (sw_off_t)BTREE_CACHE_SIZE);
BTREE_Page *tmp;
    pg->modified =1;
    if((tmp = b->cache[hash]))
    {
        while(tmp)
        {
            if(tmp->page_number == pg->page_number)
            {
                return 0;
            }
            tmp = tmp->next_cache;
        }
    }
    pg->next_cache = b->cache[hash];
    b->cache[hash] = pg;
    return 0;
}

int BTREE_FlushCache(BTREE *b)
{
int i;
BTREE_Page *tmp, *next;
    for(i = 0; i < BTREE_CACHE_SIZE; i++)
    {
        if((tmp = b->cache[i]))
        {
            while(tmp)
            {
#ifdef DEBUG
                if(tmp->in_use)
                {
                    printf("DEBUG Error in FlushCache: Page in use\n");
                    exit(0);
                }
#endif
                next = tmp->next_cache;
                if(tmp->modified)
                {
                    BTREE_WritePageToDisk(b->fp, tmp);
                    tmp->modified = 0;
                }
                if(tmp != b->cache[i])
                {
                    efree(tmp);
                }
                tmp = next;
            }
            b->cache[i]->next_cache = NULL;
        }
    }
    return 0;
}

int BTREE_CleanCache(BTREE *b)
{
int i;
BTREE_Page *tmp,*next;
    for(i = 0; i < BTREE_CACHE_SIZE; i++)
    {
        if((tmp = b->cache[i]))
        {
            while(tmp)
            {
                next = tmp->next_cache;
                efree(tmp);
                tmp = next;
            }
            b->cache[i] = NULL;
        }
    }
    return 0;
}

BTREE_Page *BTREE_ReadPageFromDisk(FILE *fp, sw_off_t page_number)
{
BTREE_Page *pg = (BTREE_Page *)emalloc(sizeof(BTREE_Page) + BTREE_MinPageSize);

    sw_fseek(fp,(sw_off_t)((sw_off_t)page_number * (sw_off_t)BTREE_MinPageSize),SEEK_SET);
    sw_fread(pg->data,BTREE_MinPageSize, 1, fp);

    BTREE_GetNextPage(pg,pg->next);
    BTREE_GetPrevPage(pg,pg->prev);
    BTREE_GetSize(pg,pg->size);
    BTREE_GetNumKeys(pg,pg->n);
    BTREE_GetFlags(pg,pg->flags);
    BTREE_GetDataEnd(pg,pg->data_end);

    pg->page_number = page_number;
    pg->modified = 0;
    return pg;
}

BTREE_Page *BTREE_ReadPage(BTREE *b, sw_off_t page_number)
{
int hash = (int)(page_number % (sw_off_t)BTREE_CACHE_SIZE);
BTREE_Page *tmp;
    if((tmp = b->cache[hash]))
    {
        while(tmp)
        {
            if(tmp->page_number == page_number)
            {
                return tmp;
            }
            tmp = tmp->next_cache;
        }
    }

    tmp = BTREE_ReadPageFromDisk(b->fp, page_number);
    tmp->modified = 0;
    tmp->in_use = 1;
    tmp->next_cache = b->cache[hash];
    b->cache[hash] = tmp;
    return tmp;
}

BTREE_Page *BTREE_NewPage(BTREE *b, unsigned int size, unsigned int flags)
{
BTREE_Page *pg;
sw_off_t offset;
FILE *fp = b->fp;
int hash;
    /* Round up size */
    size = BTREE_RoundPageSize(size);

    if(size > BTREE_MaxPageSize)
        return NULL;

    /* Get file pointer */
    if(sw_fseek(fp,(sw_off_t)0,SEEK_END) !=0)
    {
        printf("mal\n");
    }
    offset = sw_ftell(fp);
    /* Round up file pointer */
    offset = BTREE_RoundPageSize(offset);

    /* Set new file pointer - data will be aligned */
    if(sw_fseek(fp,offset, SEEK_SET)!=0 || offset != sw_ftell(fp))
    {
        printf("mal\n");
    }

    pg = (BTREE_Page *)emalloc(sizeof(BTREE_Page) + size);
    memset(pg,0,sizeof(BTREE_Page) + size);
    /* Reserve space in file */
    if(sw_fwrite(pg->data,1,size,fp)!=size || ((sw_off_t)size + offset) != sw_ftell(fp))
    {
        printf("mal\n");
    }

    pg->next = 0;
    pg->prev = 0;
    pg->size = size;
    pg->flags = flags;
    pg->data_end = BTREE_PageHeaderSize;
    pg->n = 0;

    pg->page_number = offset / (sw_off_t)BTREE_MinPageSize;

    /* add to cache */
    pg->modified = 1;
    pg->in_use = 1;
    hash = (int) (pg->page_number % (sw_off_t)BTREE_CACHE_SIZE);
    pg->next_cache = b->cache[hash];
    b->cache[hash] = pg;
    return pg;
}

void BTREE_FreePage(BTREE *b, BTREE_Page *pg)
{
int hash = (int)(pg->page_number % (sw_off_t)BTREE_CACHE_SIZE);
BTREE_Page *tmp;

    tmp = b->cache[hash];

#ifdef DEBUG
    if(!(tmp = b->cache[hash]))
    {
        /* This should never happen!!!! */
        printf("Error in FreePage\n");
        exit(0);
    }
#endif

    while(tmp)
    {
        if (tmp->page_number != pg->page_number)
            tmp = tmp->next_cache;
        else
        {
            tmp->in_use = 0;
            break;
        }
    }
}

int BTREE_CompareKeys(unsigned char *key1, int key_len1, unsigned char *key2, int key_len2)
{
int rc;

    if(key_len1 > key_len2)
        rc = memcmp(key1,key2,key_len2);
    else 
        rc = memcmp(key1,key2,key_len1);

    if(!rc)
        rc = key_len1 - key_len2;

    return rc;
}


int BTREE_GetPositionForKey(BTREE_Page *pg, unsigned char *key, int key_len, int *comp)
{
int i,j,k,isbigger=-1;
int key_len_k;
unsigned char *key_k;
    /* Use binary search for adding key */
    /* Look for the position to insert using a binary search */
    i = pg->n - 1;
    j = k = 0;
    while (i >= j)
    {
        k = j + (i - j) / 2;
        key_k = BTREE_KeyData(pg,k);
        key_len_k = uncompress2(&key_k);
        isbigger = BTREE_CompareKeys(key,key_len,key_k,key_len_k);
        if (!isbigger)
            break;
        else if (isbigger > 0)
            j = k + 1;
        else
            i = k - 1;
    }

    *comp = isbigger;

    return k;
}

sw_off_t BTREE_GetKeyFromPage(BTREE *b, BTREE_Page *pg, unsigned char *key, int key_len, unsigned char **found, int *found_len)
{
int k,comp = 0;
sw_off_t data_pointer;

    k = BTREE_GetPositionForKey(pg, key, key_len, &comp);

    if(comp > 0 && k == (int) pg->n && k)
        k--;

    if(comp < 0 && k) 
        k--;
    
    b->current_page = pg->page_number;
    b->current_position = k;

    /* Check for empty page */
    /* This can only be true after creation with no insertions */
    if((pg->n==0) && (k==0))
        return 0;

    *found = BTREE_KeyData(pg,k);
    *found_len = uncompress2(found);

    /* Solaris do not like this. Use memcpy instead
    data_pointer = *(sw_off_t *) (*found + *found_len);
    */
    memcpy((unsigned char *)&data_pointer,((*found) + (*found_len)),sizeof(data_pointer));
    data_pointer = UNPACKFILEOFFSET(data_pointer);
    return data_pointer;

}

int BTREE_AddKeyToPage(BTREE_Page *pg, int position, unsigned char *key, int key_len, sw_off_t data_pointer)
{
unsigned char buffer[BTREE_MaxPageSize];
int j,k;
unsigned char *p;
unsigned char *new_key_start, *new_key_end;
int new_entry_len , tmp;

    data_pointer = PACKFILEOFFSET(data_pointer);

    k = position;

    /* Write key */
    /* Write from Page header upto the key being inserted */
    p = buffer;
    memcpy(p,pg->data,BTREE_KeyIndexOffset(pg->data,k) - pg->data);
    p += BTREE_KeyIndexOffset(pg->data,k) - pg->data;

    p += (pg->n - k + 1) * SizeInt16;

    if(k == (int)pg->n)
    {
        if(k)
        {
            memcpy(p,BTREE_KeyData(pg,0),BTREE_EndData(pg) - BTREE_KeyData(pg,0));
            p += BTREE_EndData(pg) - BTREE_KeyData(pg,0);
        }
    }
    else
    {
        if(k)
        {
            memcpy(p,BTREE_KeyData(pg,0),BTREE_KeyData(pg,k) - BTREE_KeyData(pg,0));
            p += BTREE_KeyData(pg,k) - BTREE_KeyData(pg,0);
        }
    }

    new_key_start = p;
    p = compress3(key_len, p);
    memcpy(p , key, key_len);
    p += key_len;
    memcpy(p, &data_pointer, sizeof(data_pointer));
    p += sizeof(data_pointer);
    new_key_end = p;
    new_entry_len = new_key_end - new_key_start;

    if(k < (int) pg->n)
    {
        memcpy(p,BTREE_KeyData(pg,k), BTREE_EndData(pg) - BTREE_KeyData(pg,k));
        p += BTREE_EndData(pg) - BTREE_KeyData(pg,k);
    }

    for(j = 0; j < k; j++)
    {
        tmp = BTREE_KeyDataOffset(pg,j) + SizeInt16;
        *(BTREE_KeyIndexOffset(buffer,j)) =  (unsigned char)((tmp & 0xff00) >>8);
        *(BTREE_KeyIndexOffset(buffer,j) + 1) = (unsigned char) (tmp & 0xff);
    }

    for(j = (int)(pg->n - 1) ; j >=k; j--)
    {
        tmp = BTREE_KeyDataOffset(pg,j) + new_entry_len + SizeInt16;
        *(BTREE_KeyIndexOffset(buffer,j + 1)) =  (unsigned char)((tmp & 0xff00) >>8);
        *(BTREE_KeyIndexOffset(buffer,j + 1) + 1) = (unsigned char) (tmp & 0xff);
    }

    tmp = new_key_start - buffer;
    *(BTREE_KeyIndexOffset(buffer,k)) =  (unsigned char)((tmp & 0xff00) >>8);
    *(BTREE_KeyIndexOffset(buffer,k) + 1) = (unsigned char) (tmp & 0xff);

    memcpy(pg->data, buffer, p - buffer);
    pg->n++;
    pg->data_end = p - buffer;

    return k;
}

int BTREE_DelKeyInPage(BTREE_Page *pg, int pos)
{
unsigned char buffer[BTREE_MaxPageSize];
unsigned char *p, *q;
unsigned char *del_key_start, *del_key_end;
int del_entry_len , tmp;
int j, k = pos;



    /* Write key */
    /* Write from Page header upto the key being deleted */
    p = buffer;

    memcpy(p,pg->data,BTREE_KeyIndexOffset(pg->data,k) - pg->data);
       p += BTREE_KeyIndexOffset(pg->data,k) - pg->data;

    if((k + 1) < (int)pg->n)
    {
        memcpy(p,BTREE_KeyIndexOffset(pg->data,k + 1), BTREE_KeyIndexOffset(pg->data,pg->n) - BTREE_KeyIndexOffset(pg->data,k + 1));
          p += BTREE_KeyIndexOffset(pg->data,pg->n) - BTREE_KeyIndexOffset(pg->data,k + 1);
    }


    if(k)
    {
        memcpy(p,BTREE_KeyData(pg,0),BTREE_KeyData(pg,k) - BTREE_KeyData(pg,0));
        p += BTREE_KeyData(pg,k) - BTREE_KeyData(pg,0);
    }
    if((k + 1) < (int)pg->n)
    {
        memcpy(p,BTREE_KeyData(pg,k + 1),BTREE_EndData(pg) - BTREE_KeyData(pg,k + 1));
        p += BTREE_EndData(pg) - BTREE_KeyData(pg,k + 1);
    }

    /* Compute length of deleted key */
    del_key_start = q = BTREE_KeyData(pg,k);
    q += uncompress2(&q);
    q += sizeof(sw_off_t);
    del_key_end = q;
    del_entry_len = del_key_end - del_key_start;

    for(j = 0; j < k; j++)
    {
        tmp = BTREE_KeyDataOffset(pg,j) - SizeInt16;
        *(BTREE_KeyIndexOffset(buffer,j)) =  (unsigned char)((tmp & 0xff00) >>8);
        *(BTREE_KeyIndexOffset(buffer,j) + 1) = (unsigned char) (tmp & 0xff);
    }

    for(j = (int)(pg->n - 1) ; j >k; j--)
    {
        tmp = BTREE_KeyDataOffset(pg,j) - del_entry_len - SizeInt16;
        *(BTREE_KeyIndexOffset(buffer,j - 1)) =  (unsigned char)((tmp & 0xff00) >>8);
        *(BTREE_KeyIndexOffset(buffer,j - 1) + 1) = (unsigned char) (tmp & 0xff);
    }

    memcpy(pg->data, buffer, p - buffer);
    pg->n--;
    pg->data_end = p - buffer;
 
    return k;
}

BTREE *BTREE_New(FILE *fp, unsigned int size)
{
BTREE *b;
int i;
    b = (BTREE *) emalloc(sizeof(BTREE));
    b->page_size = size;
    b->fp = fp;
    for(i = 0; i < BTREE_CACHE_SIZE; i++)
        b->cache[i] = NULL;

    return b;
}

BTREE *BTREE_Create(FILE *fp, unsigned int size)
{
BTREE *b;
BTREE_Page *root;
    /* Round up size */
    size = BTREE_RoundPageSize(size);

    if(size > BTREE_MaxPageSize)
        return NULL;

    b = BTREE_New(fp , size);
    root = BTREE_NewPage(b, size, BTREE_ROOT_NODE | BTREE_LEAF_NODE);

    b->root_page = root->page_number;

    BTREE_WritePage(b, root);
    BTREE_FreePage(b, root);

    return b;
}


BTREE *BTREE_Open(FILE *fp, int size, sw_off_t root_page)
{
BTREE *b;
    /* Round up size */
    size = BTREE_RoundPageSize(size);

    if(size > BTREE_MaxPageSize)
        return NULL;

    b = BTREE_New(fp , size);

    b->root_page = root_page;

    return b;
}

sw_off_t BTREE_Close(BTREE *bt)
{
sw_off_t root_page = bt->root_page;
    BTREE_FlushCache(bt);
    BTREE_CleanCache(bt);
    efree(bt);
    return root_page;
}


BTREE_Page *BTREE_Walk(BTREE *b, unsigned char *key, int key_len)
{
BTREE_Page *pg = BTREE_ReadPage(b, b->root_page);
unsigned int i = 0;
sw_off_t next_page;
unsigned char *found;
unsigned int found_len;
sw_off_t father_page;

    b->tree[i++] = 0;  /* No father for root */    
    
    father_page = pg->page_number;
    while(!(pg->flags & BTREE_LEAF_NODE))
    {
        next_page = BTREE_GetKeyFromPage(b, pg, key, key_len, &found, &found_len);
        BTREE_FreePage(b, pg);
        pg = BTREE_ReadPage(b, next_page);
        b->tree[i++] = father_page;
        father_page = pg->page_number;
    }
    b->levels = i;
    return pg;
}

BTREE_Page *BTREE_SplitPage(BTREE *b, BTREE_Page *pg)
{
BTREE_Page *new_pg = BTREE_NewPage(b, pg->size, pg->flags);
int         i,n;
unsigned char *key_data, *p, *q, *start;
int key_len;
int tmp;

    n=pg->n / 2;

    /* Key data of new page starts here */
    p = q = BTREE_KeyIndexOffset(new_pg->data, n);

    for(i = 0; i < n; i++)
    {
        key_data = start = BTREE_KeyData(pg, pg->n - n + i);
        key_len = uncompress2(&key_data);

        memcpy(p, start, (key_data - start) + key_len + sizeof(sw_off_t));
        tmp = p - new_pg->data;
        p += (key_data - start) + key_len + sizeof(sw_off_t);

        *(BTREE_KeyIndexOffset(new_pg->data,i)) =  (unsigned char)((tmp & 0xff00) >>8);
        *(BTREE_KeyIndexOffset(new_pg->data,i) + 1) = (unsigned char) (tmp & 0xff);
    }

    new_pg->n = n;
    new_pg->data_end = p - new_pg->data;

    pg->n -= n;
    p = BTREE_KeyIndexOffset(pg->data, pg->n);
    for(i = 0; i < (int)pg->n ; i++)
    {
        key_data = start = BTREE_KeyData(pg,i);
        key_len = uncompress2(&key_data);

        memmove(p, start, (key_data - start) + key_len + sizeof(sw_off_t));
        tmp = p - pg->data;
        p += (key_data - start) + key_len + sizeof(sw_off_t);

        *(BTREE_KeyIndexOffset(pg->data,i)) =  (unsigned char)((tmp & 0xff00) >>8);
        *(BTREE_KeyIndexOffset(pg->data,i) + 1) = (unsigned char) (tmp & 0xff);
    }
    pg->data_end = p - pg->data;;

    return new_pg;
}


int BTREE_InsertInPage(BTREE *b, BTREE_Page *pg, unsigned char *key, int key_len, sw_off_t data_pointer, int level, int update)
{
BTREE_Page *new_pg, *next_pg, *root_page, *father_pg, *tmp_pg;
unsigned int free_space, required_space;
int key_pos, key_len0;
unsigned char *key_data0;
int comp;

    required_space = MAXINTCOMPSIZE + key_len + sizeof(sw_off_t);


    /* Check for Duplicate key if we are in a leaf page */
    key_pos = BTREE_GetPositionForKey(pg, key, key_len, &comp);

    if(comp == 0 && (pg->flags & BTREE_LEAF_NODE))
    {
        BTREE_FreePage(b, pg); /* Dup Key */
        return -1;
    }

    free_space = pg->size - pg->data_end;
    if(required_space <= free_space)
    {
        if (comp > 0)
            key_pos++;
        else if(comp == 0 && (pg->flags & BTREE_LEAF_NODE))
        {
            BTREE_FreePage(b, pg); /* Dup Key */
            return -1;
        }

        if(!(pg->flags & BTREE_LEAF_NODE) && update)
        {
            BTREE_DelKeyInPage(pg, key_pos);
        }

        BTREE_AddKeyToPage(pg, key_pos, key,  key_len, data_pointer);

        if(key_pos == 0)
        {

            if(!(pg->flags & BTREE_ROOT_NODE))
            {
                key_data0 = BTREE_KeyData(pg,0);
                key_len0 = uncompress2(&key_data0);
                father_pg = BTREE_ReadPage(b,b->tree[level]);
                BTREE_InsertInPage(b,father_pg, key_data0, key_len0, pg->page_number, level - 1, 1);
            }
        }
        BTREE_WritePage(b, pg);
        BTREE_FreePage(b, pg);
        return 0;
    }

    /* There is not enough free space - Split page */
    new_pg = BTREE_SplitPage(b, pg);
    if(pg->next)
    {
        next_pg = BTREE_ReadPage(b, pg->next);
        next_pg->prev = new_pg->page_number;
        BTREE_WritePage(b, next_pg);
        BTREE_FreePage(b, next_pg);
    }
    new_pg->next = pg->next;
    new_pg->prev = pg->page_number;
    pg->next = new_pg->page_number;

    key_data0 = BTREE_KeyData(new_pg,0);
    key_len0 = uncompress2(&key_data0);

            /* Let's see where to put the key */
    if(BTREE_CompareKeys(key, key_len, key_data0, key_len0) > 0)
    {
        tmp_pg = new_pg;
    }
    else
    {
        tmp_pg = pg;
    }

    key_pos = BTREE_GetPositionForKey(tmp_pg, key, key_len, &comp);
    if(comp>0)
        key_pos++;

    if(!(tmp_pg->flags & BTREE_LEAF_NODE) && update)
    {
        BTREE_DelKeyInPage(tmp_pg, key_pos);
    }
    BTREE_AddKeyToPage(tmp_pg, key_pos, key, key_len, data_pointer);

    if(pg->flags & BTREE_ROOT_NODE)
    {
        pg->flags &= ~BTREE_ROOT_NODE;
        new_pg->flags &= ~BTREE_ROOT_NODE;
        root_page = BTREE_NewPage(b,b->page_size, BTREE_ROOT_NODE);

        key_data0 = BTREE_KeyData(pg,0);
        key_len0 = uncompress2(&key_data0);
        BTREE_AddKeyToPage(root_page, 0, key_data0, key_len0 , pg->page_number);
        key_data0 = BTREE_KeyData(new_pg,0);
        key_len0 = uncompress2(&key_data0);
        BTREE_AddKeyToPage(root_page, 1, key_data0, key_len0, new_pg->page_number);

        b->root_page = root_page->page_number;
        BTREE_WritePage(b, pg);
        BTREE_FreePage(b, pg);
        BTREE_WritePage(b, new_pg);
        BTREE_FreePage(b, new_pg);
        BTREE_WritePage(b, root_page);
        BTREE_FreePage(b, root_page);

        return 0;
    } 

    if(key_pos == 0 && tmp_pg == pg)
    {
        if(!(pg->flags & BTREE_ROOT_NODE))
        {
            father_pg = BTREE_ReadPage(b,b->tree[level]);
            BTREE_InsertInPage(b,father_pg, key, key_len, pg->page_number, level - 1, 1);
        }
    
        BTREE_WritePage(b, pg);
        BTREE_FreePage(b, pg);

        key_data0 = BTREE_KeyData(new_pg,0);
        key_len0 = uncompress2(&key_data0);
        BTREE_FreePage(b, BTREE_Walk(b,key_data0,key_len0));
    }
    else
    {
        BTREE_WritePage(b, pg);
        BTREE_FreePage(b, pg);

        key_data0 = BTREE_KeyData(new_pg,0);
        key_len0 = uncompress2(&key_data0);
    }

    if(!(new_pg->flags & BTREE_ROOT_NODE))
    {
        father_pg = BTREE_ReadPage(b,b->tree[level]);
        BTREE_InsertInPage(b,father_pg, key_data0, key_len0, new_pg->page_number, level - 1, 0);
    }

    BTREE_WritePage(b, new_pg);
    BTREE_FreePage(b, new_pg);

    return 0;
}


int BTREE_Insert(BTREE *b, unsigned char *key, int key_len, sw_off_t data_pointer)
{
BTREE_Page *pg = BTREE_Walk(b,key,key_len);

    if(key_len>BTREE_MaxKeySize)
    {
        progwarn("BTREE: key_len excedes BTREE_MaxKeySize");
        return -1;
    }

    return BTREE_InsertInPage(b, pg, key, key_len, data_pointer, b->levels - 1, 0);
}

int BTREE_Update(BTREE *b, unsigned char *key, int key_len, sw_off_t new_data_pointer)
{
int comp, k, key_len_k;
unsigned char *key_k;
BTREE_Page *pg = BTREE_Walk(b,key,key_len);


    /* Pack pointer */
    new_data_pointer = PACKFILEOFFSET(new_data_pointer);

    /* Get key position */
    k = BTREE_GetPositionForKey(pg, key, key_len, &comp);

    if(comp)
    {
        return -1;  /*Key not found */
    }

    key_k = BTREE_KeyData(pg,k);

    key_len_k = uncompress2(&key_k);

    if ( key_len_k != key_len)
        return -1;   /* Error - Should never happen */

    key_k += key_len_k;

    memcpy(key_k, &new_data_pointer, sizeof(new_data_pointer));

    BTREE_WritePage(b, pg);
    BTREE_FreePage(b, pg);

    return 0;
}


sw_off_t BTREE_Search(BTREE *b, unsigned char *key, int key_len, unsigned char **found, int *found_len, int exact_match)
{
BTREE_Page *pg = BTREE_ReadPage(b, b->root_page);
unsigned int i = 0;
sw_off_t next_page;
unsigned char *key_k;
unsigned int key_len_k;
sw_off_t father_page;
sw_off_t data_pointer;

    b->tree[i++] = 0;  /* No father for root */    
    
    father_page = pg->page_number;
    while(!(pg->flags & BTREE_LEAF_NODE))
    {
        next_page = BTREE_GetKeyFromPage(b, pg, key, key_len, &key_k, &key_len_k);
        BTREE_FreePage(b, pg);
        pg = BTREE_ReadPage(b, next_page);
        b->tree[i++] = father_page;
        father_page = pg->page_number;
    }
    b->levels = i;
    data_pointer = BTREE_GetKeyFromPage(b, pg, key, key_len, &key_k, &key_len_k);

    if(!data_pointer)
        return -1L;

    if(exact_match)
    {
        if(BTREE_CompareKeys(key,key_len,key_k,key_len_k)!=0)
            return -1L;
    }

    *found_len = key_len_k;
    *found = emalloc(key_len_k);
    memcpy(*found,key_k,key_len_k);

    BTREE_FreePage(b,pg);
    return data_pointer;
}

sw_off_t BTREE_Next(BTREE *b, unsigned char **found, int *found_len)
{
BTREE_Page *pg = BTREE_ReadPage(b, b->current_page);
sw_off_t next_page;
sw_off_t data_pointer;
unsigned char *key_k;
int key_len_k;
    b->current_position++;
    if(pg->n == b->current_position)
    {
        next_page = pg->next;
        BTREE_FreePage(b,pg);
        if(!next_page)
            return -1;
        pg = BTREE_ReadPage(b, next_page);
        b->current_page = next_page;
        b->current_position = 0;
    }
    key_k = BTREE_KeyData(pg,b->current_position);
    *found_len = key_len_k = uncompress2(&key_k);
    *found = emalloc(key_len_k);
    memcpy(*found,key_k,key_len_k);
    data_pointer = UNPACKFILEOFFSET(*(unsigned long *) (key_k + key_len_k));

    BTREE_FreePage(b,pg);

    return data_pointer;
}

#ifdef DEBUG

#include <time.h>

#define N_TEST 300000

#define F_READ_BINARY           "rb"
#define F_WRITE_BINARY          "wb"
#define F_READWRITE_BINARY      "rb+"

#define F_READ_TEXT             "r"
#define F_WRITE_TEXT            "w"
#define F_READWRITE_TEXT        "r+"

int main()
{
FILE *fp;
BTREE *bt;
unsigned char buffer[20];
int i;
static int nums[N_TEST];
sw_off_t root_page;
unsigned char *found;
int found_len;
    srand(time(NULL));

    goto test2;

    fp = sw_fopen("kkkkk",F_WRITE_BINARY);
    sw_fwrite("asjhd",1,5,fp);
    sw_fclose(fp);
    fp = sw_fopen("kkkkk",F_READWRITE_BINARY);

printf("\n\nIndexing\n\n");

    bt = BTREE_Create(fp, 15);
    for(i=N_TEST - 1;i>=0;i--)
    {
//        nums[i] = rand();
        nums[i]=i;


        sprintf(buffer,"%d",nums[i]);
//        sprintf(buffer,"%.12d",nums[i]);
        BTREE_Insert(bt,buffer,strlen(buffer),nums[i]);
        if(nums[i]!= BTREE_Search(bt,buffer,strlen(buffer),&found,&found_len,1))
            printf("\n\nmal %s\n\n",buffer);
        if(!(i%1000))
        {
            BTREE_FlushCache(bt);
            printf("%d             \r",i);
        }
    }

    root_page = BTREE_Close(bt);
    sw_fclose(fp);

search:;
printf("\n\nSearching\n\n");

    fp = sw_fopen("kkkkk",F_READ_BINARY);
    bt = BTREE_Open(fp,15,root_page);

    for(i=0;i<N_TEST;i++)
    {
        sprintf(buffer,"%d",nums[i]);
//        sprintf(buffer,"%.12d",nums[i]);

        if(nums[i] != BTREE_Search(bt,buffer,strlen(buffer),&found,&found_len,1))
            printf("\n\nmal %s\n\n",buffer);
        if(!(i%1000))
            printf("%d             \r",i);
    }

    sw_fclose(fp);

test2:;


    fp = sw_fopen("kkkkk",F_WRITE_BINARY);
    sw_fclose(fp);
    fp = sw_fopen("kkkkk",F_READWRITE_BINARY);

    sw_fwrite("aaa",1,3,fp);

printf("\n\nIndexing\n\n");

    bt = BTREE_Create(fp, 15);
    for(i=0;i<N_TEST;i++)
    {
//        nums[i] = rand();
        nums[i]=i;
        sprintf(buffer,"%d",nums[i]);
//        sprintf(buffer,"%.12d",nums[i]);
        BTREE_Insert(bt,buffer,strlen(buffer),nums[i]);
        if(nums[i]!= BTREE_Search(bt,buffer,strlen(buffer),&found,&found_len,1))
            printf("\n\nmal %s\n\n",buffer);
        if(!(i%1000))
        {
            BTREE_FlushCache(bt);
            printf("%d            \r",i);
        }
    }

    root_page = BTREE_Close(bt);
    sw_fclose(fp);

printf("\n\nSearching\n\n");

    fp = sw_fopen("kkkkk",F_READ_BINARY);
    bt = BTREE_Open(fp,15,root_page);

    for(i=0;i<N_TEST;i++)
    {
        sprintf(buffer,"%d",nums[i]);
        if(nums[i] != BTREE_Search(bt,buffer,strlen(buffer),&found,&found_len,1))
            printf("\n\nmal %s\n\n",buffer);
        if(!(i%1000))
            printf("%d            \r",i);
    }


    sw_fclose(fp);
}

#endif
�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/fhash.h���������������������������������������������������������������������������0000664�0000771�0001750�00000003236�11166010110�012050� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/* 

$Id: fhash.h 1946 2007-10-22 14:56:35Z karpet $

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL


*/

#define FHASH_SIZE 10001

typedef struct FHASH
{
    sw_off_t hash_offsets[FHASH_SIZE];  /* Hash table */
    sw_off_t start;  /* Pointer to start of hash table in file */
    FILE *fp;
} FHASH;

FHASH *FHASH_Create(FILE *fp);
FHASH *FHASH_Open(FILE *fp, sw_off_t start);
sw_off_t FHASH_Close(FHASH *f);
int FHASH_Insert(FHASH *f, unsigned char *key, int key_len, unsigned char *data, int data_len);
int FHASH_Search(FHASH *f, unsigned char *key, int key_len, unsigned char *data, int data_len);
int FHASH_Update(FHASH *f, unsigned char *key, int key_len, unsigned char *data, int data_len);
int FHASH_Delete(FHASH *f, unsigned char *key, int key_len);





������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/sys.h�����������������������������������������������������������������������������0000664�0000771�0001750�00000003052�11166010110�011571� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:15:43 CDT 2005
** added GPL

*/


#include "acconfig.h"

#ifdef STDC_HEADERS
# include <string.h>
#else
# ifndef HAVE_STRCHR
#  define strchr index
#  define strrchr rindex
# endif
char *strchr(), *strrchr();
# ifndef HAVE_MEMCPY
#  define memcpy(d, s, n) bcopy ((s), (d), (n))
#  define memmove(d, s, n) bcopy ((s), (d), (n))
# endif
#endif

#ifdef HAVE_UNISTD_H
# include <sys/types.h>
# include <unistd.h>
#endif

#ifdef STAT_MACROS_BROKEN
# include "posixstat.h"
#endif

#if STDC_HEADERS
# include <stdlib.h>
#endif


#ifndef NULL
# ifdef __STDC__
#   define NULL ((void *)0)
# else
#   define NULL (0x0)
# endif
#endif

��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/swish.h���������������������������������������������������������������������������0000664�0000771�0001750�00000073516�11166010110�012124� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
** $Id: swish.h 2291 2009-03-31 01:56:00Z karpet $
**
** Copyright (C) 1995, 1996, 1997, 1998 Hewlett-Packard Company
** Originally by Kevin Hughes, kev@kevcom.com, 3/11/94
**

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:18:48 CDT 2005
** added GPL
***************************************************************************************
** Added support for METADATA
** G. Hill  ghill@library.berkeley.edu   3/18/97
**
** Added Document Properties support
** Mark Gaulin gaulin@designinfo.com  11/24/98
**
** Added safestrcpy() macro to avoid corruption from strcpy overflow
** SRE 11/17/99
**
** Added Document Filter support (e.g. PDF, Winword)
** Rainer.Scherg@t-online.de   (rasc)  1998-08-07, 1999-05-05, 1999-05-28
**
** Added some definitions for phrase search
** Structure location modified to add frequency and word positions
** Structure entry modified to add link hash values for direct search
**
** Jose Ruiz jmruiz@boe.es 04/04/00
**
** 2000-11-15 Rainer Scherg (rasc)  FileProp type and routines
**
** 2001-01-01 Jose Ruiz Added ISOTime
**
** 2001-01-xx Rainer Scherg (rasc) Added property type structures, etc.
** 2001-01-xx Rainer Scherg (rasc) cmd-opt should be own structure in SWISH * (started)
**
** 2001-02-xx rasc   replaced ISOTime by binary value
**                   removed SWISH.errorstr, etc.
**                   ResultExtFmtStrList & var
**
** 2001-02-28 rasc   some cleanup, ANSI compliant
** 2001-03-12 rasc   logical search operators via config changable
**                   moved some parts to config.h
**
** 2001-03-16 rasc   truncateDocSize
** 2001-03-17 rasc   fprop enhanced by real_filename
** 2001-04-09 rasc   filters changed and enhanced
** 2001-06-08 wsm    Add word to end of ENTRY and propValue to end of docPropertyEntry
**                     to save memory and less malloc/free
**
** 2001-08-12 jmruiz ENTRY struct modified to index in chunks
**
*/


#ifndef SWISH_H
#define SWISH_H 1



#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <locale.h>
#include <ctype.h>
#include <errno.h>
#include <time.h>
#include <ctype.h>
#include <time.h>
#include <setjmp.h>
#include "stemmer.h"  /* for fuzzy_object */

#ifdef HAVE_CONFIG_H
#include "acconfig.h"           /* These are defines created by autoconf */
#endif

#ifdef HAVE_WINDOWS_H
#include <windows.h>
#endif
#ifdef HAVE_PROCESS_H
#include <process.h>
#endif

/*  Include swish defaults (that's not autoconf's config.h) */
#include "config.h"


#ifdef NEXTSTEP
#include <sys/dir.h>
#endif

#ifndef PATH_SEPARATOR
#define PATH_SEPARATOR ":"
#endif




#if defined(__VMS)
# include "vms/regex.h"
# include <dirent.h>
# include <stdarg.h>
  extern int ssnprintf(char *, size_t, const char *, /*args */ ...);
  extern int vsnprintf(char *, size_t, const char *, va_list);

#else

#include <dirent.h>

#ifdef HAVE_PCRE
#include <pcreposix.h>
#else
#include <regex.h>
#endif

#ifndef HAVE_MKSTEMP
# include <mkstemp.h>
#endif

#endif



#ifdef __cplusplus
extern "C" {
#endif


/* $$$ THESE NEED TO BE UPGRADED WHEN THE INDEX FORMAT CHANGES

the numerical value is not important; it just needs to differ
from the last version. This is to prevent mismatches between the swish-e
binary and the index.

checked in db_native.c (DB_CheckHeader routine) */

#ifdef USE_BTREE
#define SWISH_MAGIC 05052004L
#else
#define SWISH_MAGIC 11282006L
#endif

#define INDEXFILE "index.swish-e"


#define BASEHEADER 1
#define INDEXHEADER "# SWISH format: " VERSION
#define INDEXHEADER_ID BASEHEADER + 1
#define INDEXVERSION "# Swish-e format: " VERSION
#define INDEXVERSION_ID (BASEHEADER + 2)

/* Admin header */
#define NAMEHEADERPARAMNAME "IndexName"
#define DESCRIPTIONPARAMNAME "IndexDescription"
#define POINTERPARAMNAME "IndexPointer"
#define MAINTAINEDBYPARAMNAME "IndexAdmin"


/* Other headers that can be looked via the swish-e library */
#define INDEXEDONPARAMNAME "IndexedOn"
#define WORDCHARSPARAMNAME "WordCharacters"
#define BEGINCHARSPARAMNAME "BeginCharacters"
#define ENDCHARSPARAMNAME "EndCharacters"
#define IGNOREFIRSTCHARPARAMNAME "IgnoreFirstChar"
#define IGNORELASTCHARPARAMNAME "IgnoreLastChar"
#define STEMMINGPARAMNAME "UseStemming"
#define SOUNDEXPARAMNAME "UseSoundex"
#define FUZZYMODEPARAMNAME "FuzzyIndexingMode"

#define FILECOUNTPARAMNAME "FileCount"


/* Headers for output, and their offsets */
#define NAMEHEADER "# Name:"
#define NAMEHEADER_ID (BASEHEADER + 3)

#define SAVEDASHEADER "# Saved as:"
#define SAVEDASHEADER_ID (BASEHEADER + 4)

#define COUNTSHEADER "# Counts:"
#define COUNTSHEADER_ID (BASEHEADER + 5)

#define INDEXEDONHEADER "# Indexed on:"
#define INDEXEDONHEADER_ID (BASEHEADER + 6)

#define DESCRIPTIONHEADER "# Description:"
#define DESCRIPTIONHEADER_ID (BASEHEADER + 7)

#define POINTERHEADER "# Pointer:"
#define POINTERHEADER_ID (BASEHEADER + 8)

#define MAINTAINEDBYHEADER "# Maintained by:"
#define MAINTAINEDBYHEADER_ID (BASEHEADER + 9)

#define WORDCHARSHEADER "# WordCharacters:"
#define WORDCHARSHEADER_ID (BASEHEADER + 10)

#define MINWORDLIMHEADER "# MinWordLimit:"
#define MINWORDLIMHEADER_ID (BASEHEADER + 11)

#define MAXWORDLIMHEADER "# MaxWordLimit:"
#define MAXWORDLIMHEADER_ID (BASEHEADER + 12)

#define BEGINCHARSHEADER "# BeginCharacters:"
#define BEGINCHARSHEADER_ID (BASEHEADER + 13)

#define ENDCHARSHEADER "# EndCharacters:"
#define ENDCHARSHEADER_ID (BASEHEADER + 14)

#define IGNOREFIRSTCHARHEADER "# IgnoreFirstChar:"
#define IGNOREFIRSTCHARHEADER_ID (BASEHEADER + 15)

#define IGNORELASTCHARHEADER "# IgnoreLastChar:"
#define IGNORELASTCHARHEADER_ID (BASEHEADER + 16)

#define STEMMINGHEADER  "# Stemming Applied:"
//#define STEMMINGHEADER_ID (BASEHEADER + 17)

#define SOUNDEXHEADER "# Soundex Applied:"
//#define SOUNDEXHEADER_ID (BASEHEADER + 18)

#define FUZZYMODE_HEADER "# Fuzzy Indexing Mode:"
#define FUZZYMODEHEADER_ID (BASEHEADER + 18)


#define MERGED_ID (BASEHEADER + 19)

/* vv not used vv */
#define DOCPROPHEADER "# DocProperty"
#define DOCPROPHEADER_ID (BASEHEADER + 20)
/* ^^ not used ^^ */

#define DOCPROPENHEADER "# DocumentProperties:"
#define DOCPROPENHEADER_ID (BASEHEADER + 21)

#define SORTDOCPROPHEADER_ID (BASEHEADER + 22)

#define IGNORETOTALWORDCOUNTWHENRANKING "# IgnoreTotalWordCountWhenRanking:"
#define IGNORETOTALWORDCOUNTWHENRANKINGPARAMNAME "IgnoreTotalWordCountWhenRanking"
#define IGNORETOTALWORDCOUNTWHENRANKING_ID (BASEHEADER + 23)

#define TRANSLATECHARTABLEHEADER "# TranslateCharacterTable:"
#define TRANSLATECHARTABLEPARAMNAME "TranslateCharacterTable"
#define TRANSLATECHARTABLE_ID (BASEHEADER + 25)

#define STOPWORDS_ID (BASEHEADER + 26)
#define METANAMES_ID (BASEHEADER + 27)
#define LOCATIONLOOKUPTABLE_ID (BASEHEADER + 28)
#define BUZZWORDS_ID (BASEHEADER + 29) /* 2001-04-24 moseley */

#ifndef USE_BTREE
#define TOTALWORDSPERFILE_ID (BASEHEADER + 30)  /* total words per file array */
#endif

#define TOTALWORDS_REMOVED_ID (BASEHEADER + 31) /* 2005-01-14 for tracking total words removed */

/* -- end of headers */

#define MAXFILELEN 1000
#define MAXSTRLEN 2000
#define MAXWORDLEN 1000
#define MAXTITLELEN 300

// #define HASHSIZE 101
// #define BIGSIZE 1009
// #define VERYBIGHASHSIZE 10001

// Change as suggested by Jean-François PIÉRONNE <jfp@altavista.net>
// on Fri, 28 Dec 2001 07:37:26 -0800 (PST)
#define HASHSIZE 1009
#define BIGHASHSIZE 10001
#define VERYBIGHASHSIZE 100003


#define MAXPAR 10
#define MAXCHARDEFINED 256
#define RD_BUFFER_SIZE  65356   /* init size, larger to avoid often reallocs  (2001-03-16 rasc) */

#define NOWORD "thisisnotaword"
#define SECSPERMIN 60

#define IN_FILE_BIT     0
#define IN_TITLE_BIT    1
#define IN_HEAD_BIT     2
#define IN_BODY_BIT     3
#define IN_COMMENTS_BIT 4
#define IN_HEADER_BIT   5
#define IN_EMPHASIZED_BIT   6
#define IN_META_BIT     7
#define STRUCTURE_END 7


#define IN_FILE         (1<<IN_FILE_BIT)
#define IN_TITLE        (1<<IN_TITLE_BIT)
#define IN_HEAD         (1<<IN_HEAD_BIT)
#define IN_BODY         (1<<IN_BODY_BIT)
#define IN_COMMENTS     (1<<IN_COMMENTS_BIT)
#define IN_HEADER       (1<<IN_HEADER_BIT)
#define IN_EMPHASIZED (1<<IN_EMPHASIZED_BIT)
#define IN_META         (1<<IN_META_BIT)
#define IN_ALL (IN_FILE|IN_TITLE|IN_HEAD|IN_BODY|IN_COMMENTS|IN_HEADER|IN_EMPHASIZED|IN_META)


/* Document Types */
enum {
        BASEDOCTYPE = 0, TXT, HTML, XML, WML, XML2, HTML2, TXT2
};


/* Possible run modes */
typedef enum {
    MODE_SEARCH,
    MODE_INDEX,
    MODE_DUMP,
    MODE_WORDS,
    MODE_MERGE,
    MODE_UPDATE,
    MODE_REMOVE
}
CMD_MODE;


#define NODOCTYPE BASEDOCTYPE

// This is used to build the property to read/write to disk
// It's here so the buffer can live between writes

typedef struct propEntry
{
    unsigned int propLen;       /* Length of buffer */
    unsigned char propValue[1]; /* Actual property value starts here */
}
propEntry;



typedef struct docProperties
{
    int n;  /* to be removed - can just use count of properties */
    struct propEntry *propEntry[1];  /* Array to hold properties */
}
docProperties;

#define RANK_BIAS_RANGE 10 /* max/min range ( -10 -> 10, with zero being no bias ) */

/* This structure is for storing both properties and metanames -- probably should be two lists */
struct metaEntry
{
    /* Stored in index */
    char       *metaName;           /* MetaName string */
    int         metaID;             /* Meta ID */
    int         metaType;           /* See metanames.h for values */
    int         alias;              /* if non-zero, this is an alias to the listed metaID */
    int         sort_len;           /* sort length used when sorting a property */
    int         rank_bias;          /* An integer used to bias hits on this metaname 0 = no bias */

    /* Fields used while indexing or searching */
    int         max_len;            /* If non-zero, limits properties to this length (for storedescription) */
    char       *extractpath_default; /* String to index under this metaname if none found with ExtractPath */
    int        *sorted_data;        /* Sorted data . NULL if not read/done */
    int         sorted_loaded;      /* true if have attempted to load sorted data (doesn't me it exists) */
    int         in_tag;             /* Flag to indicate that we are within this tag while indexing (parsing) */
};

/* These are used to build the table of seek pointers in the main index. */
typedef struct
{
    sw_off_t    seek;
} PROP_LOCATION;


typedef struct   // there used to be more in this structure ;)
{
    PROP_LOCATION   prop_position[1];  // one for each property in the index.
} PROP_INDEX;


typedef struct
{
    int             filenum;
    docProperties  *docProperties;  /* list of document props in memory */
    void     *prop_index;     /* pointers to properties on disk */
} FileRec;


/*
 -- FileProperties
 -- store for information about a file to be indexed...
 -- Unused items may be NULL (e.g. if File is not opened, fp == NULL)
 -- (2000-11 rasc)

 -- (2000-12 Jose Ruiz)
 -- Added StoreDescription

*/

typedef struct
{
    FILE   *fp;                 /* may be also a filter stream or NULL if not opened */
    pid_t  filter_pid;          /* process id of filter program, if forked */
    char   *real_path;          /* path/URL to indexed file - may be modified by ReplaceRules */
    char   *orig_path;          /* original path provided to swish */
    char   *work_path;          /* path to file to index (may be tmpfile or real_path) */
    char   *real_filename;      /* basename() of real_path  */
    long    source_size;        /* size reported by fstat() before filtering, if read from a file */
    long    fsize;              /* size of orig file, but once read into buffer is size of buffer */
    long    bytes_read;         /* Number of bytes read from the stream - important for sw->truncateDocSize and -S prog */
    int     done;               /* flag to read no more from this stream (truncate) */
    int     external_program;   /* Flag to only read fsize bytes from stream */
    time_t  mtime;              /* Date of last mod of or. file */
    int     doctype;            /* Type of document HTML, TXT, XML, ... */
    int     index_no_content;   /* Flag, index "filename/real_path" only! */
    struct StoreDescription *stordesc;   /* Null if no description/summary */
    struct FilterList *hasfilter;       /* NULL if no filter for this file */
}
FileProp;


typedef struct LOCATION
{
    struct LOCATION *next;
    int     metaID;
    int     filenum;
    int     frequency;
    unsigned int     posdata[1];
}
LOCATION;


/* 2002/01 jmruiz macros for accesing POSITION and structure */
#define SET_POSDATA(pos,str)  ((unsigned int)((unsigned int)(pos) << (unsigned int)8 | (unsigned int)(str)))
#define GET_POSITION(pos)      ((int)((unsigned int)(pos) >> (unsigned int)8))
#define GET_STRUCTURE(pos)     ((int)((unsigned int)(pos) & (unsigned int)0xff))

typedef struct ENTRY
{
    struct ENTRY *next;
    int     tfrequency;
       /* Chunk's LOCATIONs goes here */
    LOCATION *currentChunkLocationList;
    LOCATION *currentlocation;
       /* All locations goes here */
    LOCATION *allLocationList;

    /* this union is just for saving memory */
    struct
    {
        sw_off_t    wordID;
        int     last_filenum;
    }
    u1;
    char    word[1];    /* actual word starts here */
}
ENTRY;

typedef union
{
    struct swline *nodep;
    char *data;
} swline_other;

struct swline
{
    struct swline *next;
    swline_other other;
    char   line[1];
};


/* For word hash tables */


typedef struct {
    struct swline  **hash_array;
    int              hash_size;
    int              count;
    void            *mem_zone;
}  WORD_HASH_TABLE;



typedef struct
{
    /* vars for WordCharacters */
    int     lenwordchars;
    char   *wordchars;

    /* vars for BeginCharacters */
    int     lenbeginchars;
    char   *beginchars;

    /* vars for EndCharacters */
    int     lenendchars;
    char   *endchars;

    /* vars for IgnoreLastChar */
    int     lenignorelastchar;
    char   *ignorelastchar;

    /* vars for IgnoreFirstChar */
    int     lenignorefirstchar;
    char   *ignorefirstchar;

    /* vars for bump position chars */
    int     lenbumpposchars;
    char   *bumpposchars;

    /* vars for header values */
    char   *savedasheader;
    int     lensavedasheader;

    /* vars for numberchars */  /* Not yet stored in the header. */
    int     lennumberchars;     /* Probably don't need it for searching */
    char   *numberchars;
    int     numberchars_used_flag;


    int     lenindexedon;
    char   *indexedon;

    int     lenindexn;
    char   *indexn;

    int     lenindexd;
    char   *indexd;

    int     lenindexp;
    char   *indexp;

    int     lenindexa;
    char   *indexa;

    int     minwordlimit;
    int     maxwordlimit;

    FUZZY_OBJECT *fuzzy_data;

    /* Total files and words in index file */
    int     totalwords;    /* Total *unique* words */
    int     totalfiles;
    int     removedfiles;

    /* var to specify how to ranking while indexing */
    int     ignoreTotalWordCountWhenRanking; /* added 11/24/98 - MG */

    int     *TotalWordsPerFile;
    int     TotalWordsPerFileMax;  /* max size of array - this isn't saved in the header */


    /* Lookup tables for fast access */
    int     wordcharslookuptable[256];
    int     begincharslookuptable[256];
    int     endcharslookuptable[256];
    int     ignorefirstcharlookuptable[256];
    int     ignorelastcharlookuptable[256];
    int     bumpposcharslookuptable[256];
    int     translatecharslookuptable[256]; /* $$$ rasc 2001-02-21 */
    int     numbercharslookuptable[256];    /* Dec 12, 2001 - moseley -- mostly for ignoring numbers */

    /* values for handling stopwords */
    WORD_HASH_TABLE hashstoplist;


    /* Buzzwords hash */
    WORD_HASH_TABLE hashbuzzwordlist;

    /* values for handling "use" words - > Unused in the search proccess */
    WORD_HASH_TABLE hashuselist;


    /* This is an array of properties that are used */
    /* These should not be in the header, rather in indexf as they are not written to disk */
    int     *propIDX_to_metaID;
    int     *metaID_to_PropIDX;
    int     property_count;



    /* Values for fields (metanames) */
    struct metaEntry **metaEntryArray;
    int     metaCounter;        /* Number of metanames */

    int     total_word_positions;       /* IDF ranking */
    int     removed_word_positions;     /* total words (not just unique words) */
}
INDEXDATAHEADER;

typedef struct IndexFILE
{
    struct IndexFILE *next;
    struct IndexFILE *nodep;    /* last */

    struct SWISH *sw;           /* Parent object */

    char   *line;               /* Name of the index file */

    unsigned long total_bytes;  /* Just to show total size when indexing */
    unsigned long total_word_positions_cur_run;  /* count *while* indexing */



    /* DB handle */
    void   *DB;

    /* Header Info */
    INDEXDATAHEADER header;

    /* Pointer to cache the keywords */
    char   *keywords[256];

    /* Support for merge */
    int     *meta_map;              // maps metas from this index to the output index
    int     *path_order;            // lists files in order of pathname
    int     current_file;           // current file pointer, used for merged reading
    struct  metaEntry *path_meta;   // meta entry for the path name
    struct  metaEntry *modified_meta;
    propEntry *cur_prop;            // last read pathname
    int     filenum;                // current filenumber to use


    /* Used by merge.c */
    int    *merge_file_num_map;

    /* Cache for stemming */
    WORD_HASH_TABLE hashstemcache;

    /* Cached meta and property lists */
    struct metaEntry **meta_list;
    struct metaEntry **prop_list;
}
IndexFILE;










struct multiswline
{
    struct multiswline *next;
    struct swline *list;
};


typedef struct
{
    int     numWords;
    ENTRY **elist;     /* Sorted by word */
}
ENTRYARRAY;



struct url_info
{
    struct url_info *next;
    char   *url;
};

struct IndexContents
{
    struct IndexContents *next;
    int     DocType;
    struct swline *patt;
};

struct StoreDescription
{
    struct StoreDescription *next;
    int     DocType;
    char   *field;
    int     size;
};

/* These two structs are used for lookuptables in order to save memory */
/* Normally Metaname, frequency and structure are repetitive schemas */
/* and usually have also low values */
/* In this way three values can be fit in just one using a lookup table*/
/* Structure itself can use its own lookuptable */
struct int_st
{
    struct int_st *next;
    int     index;
    int     val[1];
};

struct int_lookup_st
{
    int     n_entries;
    struct int_st *hash_entries[HASHSIZE];
    struct int_st *all_entries[1];
};

/* These two structs are used for lookuptables in order to save memory */
/* Normally part of the path/url are repetitive schemas */
/* and usually have also low values */
struct char_st
{
    struct char_st *next;
    int     index;
    char   *val;
};

struct char_lookup_st
{
    int     n_entries;
    struct char_st *hash_entries[HASHSIZE];
    struct char_st *all_entries[1];
};


/* Place to store compiled regular expressions */

typedef struct regex_list
{
    struct regex_list *next;
    regex_t     re;
    char       *replace;
    int         replace_count;  /* number of pattern replacements - to estimate size of replacement string */
    int         replace_length; /* newstr_max = replace_length + ( replace_count * search_str_len ) */
    int         global;         /* /g flag to repeat sub */
    int         negate;         /* Flag for matches if the match should be negated */
    char       *pattern;        /* keep string pattern around for debugging */
} regex_list;

typedef struct path_extract_list
{
    struct path_extract_list    *next;
    struct metaEntry            *meta_entry;
    regex_list                  *regex;
} path_extract_list;



/* -- Property data types
   -- Result handling structures, (types storage, values)
   -- Warnung! Changing types inflicts outpur routines, etc
   -- 2001-01  rasc

   $$$ ToDO: data types are not yet fully supported by swish
   $$$ Future: to be part of module data_types.c/h
*/


typedef enum
{                               /* Property Datatypes */
    PROP_UNDEFINED = -1,
    PROP_UNKNOWN = 0,
    PROP_STRING,
    PROP_INTEGER,
    PROP_FLOAT,
    PROP_DATE,
    PROP_ULONG
}
PropType;

/* For undefined meta names */
typedef enum
{
    UNDEF_META_DISABLE = 0, // Only for XMLAtrributes - don't even try with attributes
    UNDEF_META_INDEX,       // index as plain text
    UNDEF_META_AUTO,        // create metaname if doesn't exist
    UNDEF_META_ERROR,       // throw a nasty error
    UNDEF_META_IGNORE       // don't index
}
UndefMetaFlag;


typedef union
{                               /* storage of the PropertyValue */
    char   *v_str;              /* strings */
    int     v_int;              /* Integer */
    time_t  v_date;             /* Date    */
    double  v_float;            /* Double Float */
    unsigned long v_ulong;      /* Unsigned long */
}
u_PropValue1;

typedef struct
{                               /* Propvalue with type info */
    PropType datatype;
    u_PropValue1 value;
    int      destroy;           /* flag to destroy (free) any pointer type */
}
PropValue;



/* --------------------------------------- */




#define MAX_ERROR_STRING_LEN 500

typedef struct SWISH
{
    /* New module design structure data */
    // struct MOD_SearchAlt     *SearchAlt;      /* search_alt module data */
    struct MOD_ResultOutput  *ResultOutput;   /* result_output module data */
    struct MOD_Filter        *Filter;         /* filter module data */
    struct MOD_ResultSort    *ResultSort;     /* result_sort module data */
    struct MOD_Entities      *Entities;       /* html entities module data */
    struct MOD_DB            *Db;             /* DB module data */
    struct MOD_Index         *Index;          /* Index module data */
    struct MOD_FS            *FS;             /* FileSystem Index module data */
    struct MOD_HTTP          *HTTP;           /* HTTP Index module data */
    struct MOD_Swish_Words   *SwishWords;     /* For parsing into "swish words" */
    struct MOD_Prog          *Prog;           /* For extprog.c */


    /** General Purpose **/

    /* list of associated index files  */
    IndexFILE *indexlist;


    unsigned char            *Prop_IO_Buf;      /* For compressing and uncompressing properties (static-like buffer) */
    unsigned long             PropIO_allocated;// total size of the structure
    int                       PropCompressionLevel;


    /* Total words and files in all index files */
    /* int     TotalWords;  Total *unique words*  $$$ doesn't seem to be used */
    int     TotalFiles;

    /* verbose flag */
    int     verbose;

    int     headerOutVerbose;   /* -H <n> print extended header info */


    /* Error vars */
    int     lasterror;
    char    lasterrorstr[MAX_ERROR_STRING_LEN+1];


    /* 06/00 Jose Ruiz */
    int     isvowellookuptable[256];  /* used in check.c */


    /********* Document Source info **********/

    /* structure for handling all the directories/files (IndexDIR) while indexing  */
    struct swline *dirlist;

    /* structure for handling IndexOnly config data while indexing */
    struct swline *suffixlist;




    /******** Structures for parsers **********/


    /* Limit indexing by a file date */
    time_t  mtime_limit;

    long    truncateDocSize;    /* size of doc, at which it will be truncated (2001-03-16 rasc) */


    /* structure for handling replace config data while searching */
    regex_list     *replaceRegexps;


    /* It's common to want to limit searches to areas of a file or web space */
    /* This allow extraction of a substring out of a file path, and indexed as a metaname */
    path_extract_list   *pathExtractList;



    /* structure for handling NoContents config data while searching */
    struct swline *nocontentslist;

    /* 08/00 Jose Ruiz Values for document type support */
    int     DefaultDocType;

    /* maps file endings to document types */
    struct IndexContents *indexcontents;


    /* Should comments be indexed */
    int     indexComments;

    /* Should positions be compressed */
    int     compressPositions;


    /******** Variables used by the parsers *********/

    /* 12/00 Jose Ruiz Values for summary support */
    struct StoreDescription *storedescription;


    /* structure to handle Ignoremeta metanames */
    struct swline *ignoremetalist;


    /* Structure for handling metatags from DontBumpPositionOnMetaTags */
    struct swline *dontbumpstarttagslist;
    struct swline *dontbumpendtagslist;


    /* Undefined MetaName indexing options */
    UndefMetaFlag   UndefinedMetaTags;
    UndefMetaFlag   UndefinedXMLAttributes;  // What to do with attributes  libxml2 only



    /*** libxml2 additions ***/

    /* parser error warning level */
    int     parser_warn_level;

    int     obeyRobotsNoIndex;

    /* for extracting links into a metaEntry */
    struct metaEntry *links_meta;

    /* for extracting image hrefs into a metaEntry */
    struct metaEntry *images_meta;


    /* if allocated the meta name to store alt tags as */
    int               IndexAltTag;
    char             *IndexAltTagMeta;   // use this meta-tag, if set

    /* for converting relative links in href's and img src tags absoulte */
    int               AbsoluteLinks;


    /* structure to handle XMLClassAttributes - list of attributes to use content to make a metaname*/
    /* <foo class="bar"> => generates a metaname foo.bar */
    struct swline *XMLClassAttributes;


    const char **header_names;  /* list of available header names */
    const char **index_names;   /* list of current in-use header names */

    /* Temporary place to store return string lists */
    const char **temp_string_buffer;
    int        temp_string_buffer_len;


    /* Temporary place to store a stemmed word -- so library user doesn't need to free memory */
    char * stemmed_word;
    int    stemmed_word_len;


    /* array to map the various possible HTML structure bits for rank */
    int     structure_map_set; /* flag */
    int     structure_map[256];


    /* karman Mon Aug 30 07:54:10 CDT 2004 */
    int     RankScheme;         /* Ranking Scheme */
    int     TotalWordPos;

    int     ReturnRawRank;


    void *ref_count_ptr;  /* pointer for use with SWISH::API */


} SWISH;


/* 06/00 Jose Ruiz
** Structure  StringList. Stores words up to a number of n
*/
typedef struct  {
        int n;
        char **word;
} StringList;

/*
 * This structure defines all of the functions that need to
 * be implemented to an Indexing Data Source.
 * Right now there are two Indexing Data Source types:
 *  file-system based and an HTTP web crawler.
 * Any Data Source can be created as long as all of the
 * functions below are properly initialized.
 */
struct _indexing_data_source_def
{
    const char *IndexingDataSourceName; /* long name for data source */
    const char *IndexingDataSourceId; /* short name for data source */
    void    (*indexpath_fn) (SWISH * sw, char *path); /* routine to index a "path" */
    int     (*parseconfline_fn) (SWISH * sw, StringList *l); /* parse config file lines */
};


extern struct _indexing_data_source_def *IndexingDataSource;



void    allocatedefaults(void);

int SwishAttach(SWISH *);
int open_single_index( SWISH *sw, IndexFILE *indexf, int db_mode );
SWISH  *SwishNew(void);
void    SwishFree(SWISH *);

/* strcpy doesn't check for overflow in the 'to' string */
/* strncpy doesn't guarantee null byte termination */
/* can't check strlen of 'from' arg since it is sometimes a function call */
#define safestrcpy(n,to,from)  { strncpy(to,from,n); (to)[(n)-1]='\0'; }

/* Jose Ruiz 04/00
** Macro for copying postions between arrays of integers
** copy num integers on dest (starting at posdest) from
** orig (starting at posorig)
*/
/*
#define CopyPositions(dest,posdest,orig,posorig,num) \
{int i;for(i=0;i<num,i++) (dest)[i+(posdest)]=(orig)[i+(posorig)];}
*/
#define CopyPositions(dest,posdest,orig,posorig,num) \
 memcpy((char *)((int *)(dest)+(posdest)),(char *)((int *)(orig)+(posorig)),(num)*sizeof(int))


/* Min macro */
#define Min(a,b) ((a) < (b) ? (a) : (b))



/* C library prototypes */
SWISH  *SwishInit(char *);
void    SwishClose(SWISH *);
void    SwishResetSearch(SWISH *);
void free_swish_memory( SWISH *sw );  /* in swish2.c */




/* These are only checked in dump.c */
#define DEBUG_INDEX_HEADER              (1<<0)
#define DEBUG_INDEX_WORDS               (1<<1)
#define DEBUG_INDEX_WORDS_FULL  (1<<2)
#define DEBUG_INDEX_STOPWORDS   (1<<3)
#define DEBUG_INDEX_FILES               (1<<4)
#define DEBUG_INDEX_METANAMES   (1<<5)
#define DEBUG_INDEX_ALL                 (1<<6)
#define DEBUG_INDEX_WORDS_ONLY  (1<<7)
#define DEBUG_INDEX_WORDS_META  (1<<8)
#define DEBUG_LIST_FUZZY        (1<<9)
#define DEBUG_INDEX_WORD_COUNT (1<<10)


/* These are only checked while indexing */
#define DEBUG_WORDS                             (1<<0)
#define DEBUG_PARSED_WORDS              (1<<1)
#define DEBUG_PROPERTIES                (1<<2)
#define DEBUG_REGEX             (1<<3)
#define DEBUG_PARSED_TAGS       (1<<4)
#define DEBUG_PARSED_TEXT               (1<<5)

/* These are only checked while searching */

/* These are are checked everywhere (can't share bits) */


extern unsigned int DEBUG_MASK;



#ifdef __cplusplus
}
#endif /* __cplusplus */


#endif /* !SWISH_H */

����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/getruntime.h����������������������������������������������������������������������0000664�0000771�0001750�00000002454�11166010110�013143� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
**
$Id: getruntime.h 1736 2005-05-12 15:41:22Z karman $

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL


**-------------------------------------------------------
**
**
*/


#ifndef GETRUNTIME_H
#define GETRUNTIME_H 1

#ifdef __cplusplus
extern "C" {
#endif

typedef double cpu_seconds;
cpu_seconds get_cpu_secs ();

#ifdef __cplusplus
}
#endif /* __cplusplus */

#endif /* GETRUNTIME_H */

��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/entities.h������������������������������������������������������������������������0000664�0000771�0001750�00000003227�11166010110�012603� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: entities.h 1736 2005-05-12 15:41:22Z karman $
**


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL

**
** (c) Rainer.Scherg
**
**
** 2001-05-05 rasc    initial coding
**
*/


#ifndef __HasSeenModule_Entities
#define __HasSeenModule_Entities	1



/* Global module data */

struct MOD_Entities {
   /* public:  */
   /* none */
   
   /* private: don't use outside this module! */
   int   convertEntities;
};





void initModule_Entities (SWISH *sw);
void freeModule_Entities (SWISH *sw);
int  configModule_Entities (SWISH *sw, StringList *sl);

unsigned char *sw_ConvHTMLEntities2ISO(SWISH *sw, unsigned char *s);
unsigned char *strConvHTMLEntities2ISO (unsigned char *buf);
int charEntityDecode (unsigned char *buf, unsigned char **end);


#endif


�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/replace/��������������������������������������������������������������������������0000777�0000771�0001750�00000000000�11166013172�012312� 5����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/replace/Makefile.am���������������������������������������������������������������0000664�0000771�0001750�00000000177�11166010105�014262� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������noinst_LTLIBRARIES = libreplace.la
libreplace_la_SOURCES = dummy.c 
libreplace_la_LIBADD = @LTLIBOBJS@
EXTRA_DIST = mkstemp.h 
�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/replace/mkstemp.c�����������������������������������������������������������������0000664�0000771�0001750�00000006405�11166010105�014052� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/* Copyright (C) 1991, 1992, 1996, 1998, 2001 Free Software Foundation, Inc.
   This file is derived from mkstemps.c from the GNU Libiberty Library
   which in turn is derived from the GNU C Library.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Library General Public License as
   published by the Free Software Foundation; either version 2 of the
   License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Library General Public License for more details.

   You should have received a copy of the GNU Library General Public
   License along with the GNU C Library; see the file COPYING.LIB.  If not,
   write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
   Boston, MA 02111-1307, USA. 

*/
#ifdef HAVE_CONFIG_H
#include "acconfig.h"
#endif
#ifdef HAVE_STDLIB_H
#include <stdlib.h>
#endif
#ifdef HAVE_STRING_H
#include <string.h>
#endif
#include <errno.h>
#include <stdio.h>
#include <fcntl.h>
#ifdef HAVE_UNISTD_H
#include <unistd.h>
#endif
#ifdef HAVE_SYS_TIME_H
#include <sys/time.h>
#endif
#ifdef HAVE_PROCESS_H
#include <process.h>
#endif

/* We need to provide a type for gcc_uint64_t.  */
#ifdef __GNUC__
typedef unsigned long long gcc_uint64_t;
#else
typedef unsigned long gcc_uint64_t;
#endif

#ifndef TMP_MAX
#define TMP_MAX 16384
#endif

/* Generate a unique temporary file name from TEMPLATE.

   TEMPLATE has the form:

   <path>/ccXXXXXX

   The last six characters of TEMPLATE must be "XXXXXX"; they are
   replaced with a string that makes the filename unique.

   Returns a file descriptor open on the file for reading and writing.  */
int
mkstemp (template)
     char *template;
{
  static const char letters[]
    = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
  static gcc_uint64_t value;
#ifdef HAVE_GETTIMEOFDAY
  struct timeval tv;
#endif
  char *XXXXXX;
  size_t len;
  int count;

  len = strlen (template);

  if ((int) len < 6
      || strncmp (&template[len - 6], "XXXXXX", 6))
    {
      return -1;
    }

  XXXXXX = &template[len - 6];

#ifdef HAVE_GETTIMEOFDAY
  /* Get some more or less random data.  */
  gettimeofday (&tv, NULL);
  value += ((gcc_uint64_t) tv.tv_usec << 16) ^ tv.tv_sec ^ getpid ();
#else
  value += getpid ();
#endif

  for (count = 0; count < TMP_MAX; ++count)
    {
      gcc_uint64_t v = value;
      int fd;

      /* Fill in the random bits.  */
      XXXXXX[0] = letters[v % 62];
      v /= 62;
      XXXXXX[1] = letters[v % 62];
      v /= 62;
      XXXXXX[2] = letters[v % 62];
      v /= 62;
      XXXXXX[3] = letters[v % 62];
      v /= 62;
      XXXXXX[4] = letters[v % 62];
      v /= 62;
      XXXXXX[5] = letters[v % 62];

      fd = open (template, O_RDWR|O_CREAT|O_EXCL|O_BINARY, 0600);
      if (fd >= 0)
	/* The file does not exist.  */
	return fd;

      /* This is a random value.  It is only necessary that the next
	 TMP_MAX values generated by adding 7777 to VALUE are different
	 with (module 2^32).  */
      value += 7777;
    }

  /* We return the null string if we can't find a unique file name.  */
  template[0] = '\0';
  return -1;
}
�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/replace/vsnprintf.c���������������������������������������������������������������0000664�0000771�0001750�00000007175�11166010105�014430� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
 * Revision 12: http://theos.com/~deraadt/snprintf.c
 *
 * Copyright (c) 1997 Theo de Raadt
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 *
 * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
 * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
 * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
 * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
 * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
 * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
 * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
 * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
 * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 */

#ifdef __VMS
#include <unistd.h>
#include <string.h>
#else
#include <sys/param.h>
#endif

#include <sys/types.h>
#include <sys/mman.h>
#include <signal.h>
#include <stdio.h>
#if __STDC__
#include <stdarg.h>
#include <stdlib.h>
#else
#include <varargs.h>
#endif
#include <setjmp.h>

#ifndef roundup
#define roundup(x, y) ((((x)+((y)-1))/(y))*(y))
#endif

#ifdef __sgi
#define size_t ssize_t
#endif

static int pgsize;
static char *curobj;
static int caught;
static sigjmp_buf bail;

#define EXTRABYTES	2	/* XXX: why 2? you don't want to know */

static char *
msetup(str, n)
	char *str;
	size_t n;
{
	char *e;

	if (n == 0)
		return NULL;
	if (pgsize == 0)
		pgsize = getpagesize();
	curobj = (char *)malloc(n + EXTRABYTES + pgsize * 2);
	if (curobj == NULL)
		return NULL;
	e = curobj + n + EXTRABYTES;
	e = (char *)roundup((unsigned long)e, pgsize);
	if (mprotect(e, pgsize, PROT_NONE) == -1) {
		free(curobj);
		curobj = NULL;
		return NULL;
	}
	e = e - n - EXTRABYTES;
	*e = '\0';
	return (e);
}

static void
mcatch()
{
	siglongjmp(bail, 1);
}

static void
mcleanup(str, n, p)
	char *str;
	size_t n;
	char *p;
{
	strncpy(str, p, n-1);
	str[n-1] = '\0';
	if (mprotect((caddr_t)(p + n + EXTRABYTES), pgsize,
	    PROT_READ|PROT_WRITE|PROT_EXEC) == -1)
		mprotect((caddr_t)(p + n + EXTRABYTES), pgsize,
		    PROT_READ|PROT_WRITE);
	free(curobj);
}

int
#if __STDC__
vsnprintf(char *str, size_t n, char const *fmt, va_list ap)
#else
vsnprintf(str, n, fmt, ap)
	char *str;
	size_t n;
	char *fmt;
	char *ap;
#endif
{
	struct sigaction osa, nsa;
	char *p;
	int ret = n + 1;	/* if we bail, indicated we overflowed */

	memset(&nsa, 0, sizeof nsa);
#ifdef __VMS
	nsa.sa_handler = (void (*)(int))mcatch;
#else
	nsa.sa_handler = mcatch;
#endif
	sigemptyset(&nsa.sa_mask);

	p = msetup(str, n);
	if (p == NULL) {
		*str = '\0';
		return 0;
	}
	if (sigsetjmp(bail, 1) == 0) {
		if (sigaction(SIGSEGV, &nsa, &osa) == -1) {
			mcleanup(str, n, p);
			return (0);
		}
		ret = vsprintf(p, fmt, ap);
	}
	mcleanup(str, n, p);
	(void) sigaction(SIGSEGV, &osa, NULL);
	return (ret);
}

int
#if __STDC__
snprintf(char *str, size_t n, char const *fmt, ...)
#else
snprintf(str, n, fmt, va_alist)
	char *str;
	size_t n;
	char *fmt;
	va_dcl
#endif
{
	va_list ap;
#if __STDC__
	va_start(ap, fmt);
#else
	va_start(ap);
#endif

	return (vsnprintf(str, n, fmt, ap));
	va_end(ap);
}



���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/replace/Makefile.in���������������������������������������������������������������0000664�0000771�0001750�00000033571�11166010105�014277� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������# Makefile.in generated by automake 1.9.6 from Makefile.am.
# @configure_input@

# Copyright (C) 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002,
# 2003, 2004, 2005  Free Software Foundation, Inc.
# This Makefile.in is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.

# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY, to the extent permitted by law; without
# even the implied warranty of MERCHANTABILITY or FITNESS FOR A
# PARTICULAR PURPOSE.

@SET_MAKE@

srcdir = @srcdir@
top_srcdir = @top_srcdir@
VPATH = @srcdir@
pkgdatadir = $(datadir)/@PACKAGE@
pkglibdir = $(libdir)/@PACKAGE@
pkgincludedir = $(includedir)/@PACKAGE@
top_builddir = ../..
am__cd = CDPATH="$${ZSH_VERSION+.}$(PATH_SEPARATOR)" && cd
INSTALL = @INSTALL@
install_sh_DATA = $(install_sh) -c -m 644
install_sh_PROGRAM = $(install_sh) -c
install_sh_SCRIPT = $(install_sh) -c
INSTALL_HEADER = $(INSTALL_DATA)
transform = $(program_transform_name)
NORMAL_INSTALL = :
PRE_INSTALL = :
POST_INSTALL = :
NORMAL_UNINSTALL = :
PRE_UNINSTALL = :
POST_UNINSTALL = :
build_triplet = @build@
host_triplet = @host@
subdir = src/replace
DIST_COMMON = $(srcdir)/Makefile.am $(srcdir)/Makefile.in mkstemp.c \
	vsnprintf.c
ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
am__aclocal_m4_deps = $(top_srcdir)/config/acinclude.m4 \
	$(top_srcdir)/configure.in
am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \
	$(ACLOCAL_M4)
mkinstalldirs = $(SHELL) $(top_srcdir)/config/mkinstalldirs
CONFIG_HEADER = $(top_builddir)/src/acconfig.h
CONFIG_CLEAN_FILES =
LTLIBRARIES = $(noinst_LTLIBRARIES)
libreplace_la_DEPENDENCIES = @LTLIBOBJS@
am_libreplace_la_OBJECTS = dummy.lo
libreplace_la_OBJECTS = $(am_libreplace_la_OBJECTS)
DEFAULT_INCLUDES = -I. -I$(srcdir) -I$(top_builddir)/src
depcomp = $(SHELL) $(top_srcdir)/config/depcomp
am__depfiles_maybe = depfiles
COMPILE = $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) \
	$(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS)
LTCOMPILE = $(LIBTOOL) --tag=CC --mode=compile $(CC) $(DEFS) \
	$(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) \
	$(AM_CFLAGS) $(CFLAGS)
CCLD = $(CC)
LINK = $(LIBTOOL) --tag=CC --mode=link $(CCLD) $(AM_CFLAGS) $(CFLAGS) \
	$(AM_LDFLAGS) $(LDFLAGS) -o $@
SOURCES = $(libreplace_la_SOURCES)
DIST_SOURCES = $(libreplace_la_SOURCES)
ETAGS = etags
CTAGS = ctags
DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST)
ACLOCAL = @ACLOCAL@
ALLOCA = @ALLOCA@
AMDEP_FALSE = @AMDEP_FALSE@
AMDEP_TRUE = @AMDEP_TRUE@
AMTAR = @AMTAR@
AR = @AR@
AS = @AS@
AUTOCONF = @AUTOCONF@
AUTOHEADER = @AUTOHEADER@
AUTOMAKE = @AUTOMAKE@
AWK = @AWK@
BTREE_OBJS = @BTREE_OBJS@
BUILDDOCS_FALSE = @BUILDDOCS_FALSE@
BUILDDOCS_TRUE = @BUILDDOCS_TRUE@
CC = @CC@
CCDEPMODE = @CCDEPMODE@
CFLAGS = @CFLAGS@
CPP = @CPP@
CPPFLAGS = @CPPFLAGS@
CXX = @CXX@
CXXCPP = @CXXCPP@
CXXDEPMODE = @CXXDEPMODE@
CXXFLAGS = @CXXFLAGS@
CYGPATH_W = @CYGPATH_W@
DEFS = @DEFS@
DEPDIR = @DEPDIR@
DLLTOOL = @DLLTOOL@
ECHO = @ECHO@
ECHO_C = @ECHO_C@
ECHO_N = @ECHO_N@
ECHO_T = @ECHO_T@
EGREP = @EGREP@
EXEEXT = @EXEEXT@
F77 = @F77@
FFLAGS = @FFLAGS@
INSTALLDOCS_FALSE = @INSTALLDOCS_FALSE@
INSTALLDOCS_TRUE = @INSTALLDOCS_TRUE@
INSTALL_DATA = @INSTALL_DATA@
INSTALL_PROGRAM = @INSTALL_PROGRAM@
INSTALL_SCRIPT = @INSTALL_SCRIPT@
INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@
LARGEFILES_MACROS = @LARGEFILES_MACROS@
LDFLAGS = @LDFLAGS@
LIBOBJS = @LIBOBJS@
LIBS = @LIBS@
LIBTOOL = @LIBTOOL@
LIBXML2_CFLAGS = @LIBXML2_CFLAGS@
LIBXML2_LIB = @LIBXML2_LIB@
LIBXML2_OBJS = @LIBXML2_OBJS@
LIBXML_REQUIRED_VERSION = @LIBXML_REQUIRED_VERSION@
LN_S = @LN_S@
LTLIBOBJS = @LTLIBOBJS@
MAINT = @MAINT@
MAINTAINER_MODE_FALSE = @MAINTAINER_MODE_FALSE@
MAINTAINER_MODE_TRUE = @MAINTAINER_MODE_TRUE@
MAKEINFO = @MAKEINFO@
OBJDUMP = @OBJDUMP@
OBJEXT = @OBJEXT@
PACKAGE = @PACKAGE@
PACKAGE_BUGREPORT = @PACKAGE_BUGREPORT@
PACKAGE_NAME = @PACKAGE_NAME@
PACKAGE_STRING = @PACKAGE_STRING@
PACKAGE_TARNAME = @PACKAGE_TARNAME@
PACKAGE_VERSION = @PACKAGE_VERSION@
PATH_SEPARATOR = @PATH_SEPARATOR@
PCRE_CFLAGS = @PCRE_CFLAGS@
PCRE_CONFIG = @PCRE_CONFIG@
PCRE_LIBS = @PCRE_LIBS@
PCRE_REQUIRED_VERSION = @PCRE_REQUIRED_VERSION@
PERL = @PERL@
POD2MAN = @POD2MAN@
RANLIB = @RANLIB@
SET_MAKE = @SET_MAKE@
SHELL = @SHELL@
STRIP = @STRIP@
SWISH_WEB = @SWISH_WEB@
VERSION = @VERSION@
XML2_CONFIG = @XML2_CONFIG@
Z_CFLAGS = @Z_CFLAGS@
Z_LIBS = @Z_LIBS@
ac_ct_AR = @ac_ct_AR@
ac_ct_AS = @ac_ct_AS@
ac_ct_CC = @ac_ct_CC@
ac_ct_CXX = @ac_ct_CXX@
ac_ct_DLLTOOL = @ac_ct_DLLTOOL@
ac_ct_F77 = @ac_ct_F77@
ac_ct_OBJDUMP = @ac_ct_OBJDUMP@
ac_ct_RANLIB = @ac_ct_RANLIB@
ac_ct_STRIP = @ac_ct_STRIP@
am__fastdepCC_FALSE = @am__fastdepCC_FALSE@
am__fastdepCC_TRUE = @am__fastdepCC_TRUE@
am__fastdepCXX_FALSE = @am__fastdepCXX_FALSE@
am__fastdepCXX_TRUE = @am__fastdepCXX_TRUE@
am__include = @am__include@
am__leading_dot = @am__leading_dot@
am__quote = @am__quote@
am__tar = @am__tar@
am__untar = @am__untar@
bindir = @bindir@
build = @build@
build_alias = @build_alias@
build_cpu = @build_cpu@
build_os = @build_os@
build_vendor = @build_vendor@
datadir = @datadir@
exec_prefix = @exec_prefix@
host = @host@
host_alias = @host_alias@
host_cpu = @host_cpu@
host_os = @host_os@
host_vendor = @host_vendor@
includedir = @includedir@
infodir = @infodir@
install_sh = @install_sh@
libdir = @libdir@
libexecdir = @libexecdir@
localstatedir = @localstatedir@
mandir = @mandir@
mkdir_p = @mkdir_p@
oldincludedir = @oldincludedir@
prefix = @prefix@
program_transform_name = @program_transform_name@
sbindir = @sbindir@
sharedstatedir = @sharedstatedir@
sysconfdir = @sysconfdir@
target_alias = @target_alias@
noinst_LTLIBRARIES = libreplace.la
libreplace_la_SOURCES = dummy.c 
libreplace_la_LIBADD = @LTLIBOBJS@
EXTRA_DIST = mkstemp.h 
all: all-am

.SUFFIXES:
.SUFFIXES: .c .lo .o .obj
$(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am  $(am__configure_deps)
	@for dep in $?; do \
	  case '$(am__configure_deps)' in \
	    *$$dep*) \
	      cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh \
		&& exit 0; \
	      exit 1;; \
	  esac; \
	done; \
	echo ' cd $(top_srcdir) && $(AUTOMAKE) --foreign  src/replace/Makefile'; \
	cd $(top_srcdir) && \
	  $(AUTOMAKE) --foreign  src/replace/Makefile
.PRECIOUS: Makefile
Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status
	@case '$?' in \
	  *config.status*) \
	    cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh;; \
	  *) \
	    echo ' cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe)'; \
	    cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe);; \
	esac;

$(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES)
	cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh

$(top_srcdir)/configure: @MAINTAINER_MODE_TRUE@ $(am__configure_deps)
	cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh
$(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps)
	cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh

clean-noinstLTLIBRARIES:
	-test -z "$(noinst_LTLIBRARIES)" || rm -f $(noinst_LTLIBRARIES)
	@list='$(noinst_LTLIBRARIES)'; for p in $$list; do \
	  dir="`echo $$p | sed -e 's|/[^/]*$$||'`"; \
	  test "$$dir" != "$$p" || dir=.; \
	  echo "rm -f \"$${dir}/so_locations\""; \
	  rm -f "$${dir}/so_locations"; \
	done
libreplace.la: $(libreplace_la_OBJECTS) $(libreplace_la_DEPENDENCIES) 
	$(LINK)  $(libreplace_la_LDFLAGS) $(libreplace_la_OBJECTS) $(libreplace_la_LIBADD) $(LIBS)

mostlyclean-compile:
	-rm -f *.$(OBJEXT)

distclean-compile:
	-rm -f *.tab.c

@AMDEP_TRUE@@am__include@ @am__quote@$(DEPDIR)/mkstemp.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@$(DEPDIR)/vsnprintf.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/dummy.Plo@am__quote@

.c.o:
@am__fastdepCC_TRUE@	if $(COMPILE) -MT $@ -MD -MP -MF "$(DEPDIR)/$*.Tpo" -c -o $@ $<; \
@am__fastdepCC_TRUE@	then mv -f "$(DEPDIR)/$*.Tpo" "$(DEPDIR)/$*.Po"; else rm -f "$(DEPDIR)/$*.Tpo"; exit 1; fi
@AMDEP_TRUE@@am__fastdepCC_FALSE@	source='$<' object='$@' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
@am__fastdepCC_FALSE@	$(COMPILE) -c $<

.c.obj:
@am__fastdepCC_TRUE@	if $(COMPILE) -MT $@ -MD -MP -MF "$(DEPDIR)/$*.Tpo" -c -o $@ `$(CYGPATH_W) '$<'`; \
@am__fastdepCC_TRUE@	then mv -f "$(DEPDIR)/$*.Tpo" "$(DEPDIR)/$*.Po"; else rm -f "$(DEPDIR)/$*.Tpo"; exit 1; fi
@AMDEP_TRUE@@am__fastdepCC_FALSE@	source='$<' object='$@' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
@am__fastdepCC_FALSE@	$(COMPILE) -c `$(CYGPATH_W) '$<'`

.c.lo:
@am__fastdepCC_TRUE@	if $(LTCOMPILE) -MT $@ -MD -MP -MF "$(DEPDIR)/$*.Tpo" -c -o $@ $<; \
@am__fastdepCC_TRUE@	then mv -f "$(DEPDIR)/$*.Tpo" "$(DEPDIR)/$*.Plo"; else rm -f "$(DEPDIR)/$*.Tpo"; exit 1; fi
@AMDEP_TRUE@@am__fastdepCC_FALSE@	source='$<' object='$@' libtool=yes @AMDEPBACKSLASH@
@AMDEP_TRUE@@am__fastdepCC_FALSE@	DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
@am__fastdepCC_FALSE@	$(LTCOMPILE) -c -o $@ $<

mostlyclean-libtool:
	-rm -f *.lo

clean-libtool:
	-rm -rf .libs _libs

distclean-libtool:
	-rm -f libtool
uninstall-info-am:

ID: $(HEADERS) $(SOURCES) $(LISP) $(TAGS_FILES)
	list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \
	unique=`for i in $$list; do \
	    if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \
	  done | \
	  $(AWK) '    { files[$$0] = 1; } \
	       END { for (i in files) print i; }'`; \
	mkid -fID $$unique
tags: TAGS

TAGS:  $(HEADERS) $(SOURCES)  $(TAGS_DEPENDENCIES) \
		$(TAGS_FILES) $(LISP)
	tags=; \
	here=`pwd`; \
	list='$(SOURCES) $(HEADERS)  $(LISP) $(TAGS_FILES)'; \
	unique=`for i in $$list; do \
	    if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \
	  done | \
	  $(AWK) '    { files[$$0] = 1; } \
	       END { for (i in files) print i; }'`; \
	if test -z "$(ETAGS_ARGS)$$tags$$unique"; then :; else \
	  test -n "$$unique" || unique=$$empty_fix; \
	  $(ETAGS) $(ETAGSFLAGS) $(AM_ETAGSFLAGS) $(ETAGS_ARGS) \
	    $$tags $$unique; \
	fi
ctags: CTAGS
CTAGS:  $(HEADERS) $(SOURCES)  $(TAGS_DEPENDENCIES) \
		$(TAGS_FILES) $(LISP)
	tags=; \
	here=`pwd`; \
	list='$(SOURCES) $(HEADERS)  $(LISP) $(TAGS_FILES)'; \
	unique=`for i in $$list; do \
	    if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \
	  done | \
	  $(AWK) '    { files[$$0] = 1; } \
	       END { for (i in files) print i; }'`; \
	test -z "$(CTAGS_ARGS)$$tags$$unique" \
	  || $(CTAGS) $(CTAGSFLAGS) $(AM_CTAGSFLAGS) $(CTAGS_ARGS) \
	     $$tags $$unique

GTAGS:
	here=`$(am__cd) $(top_builddir) && pwd` \
	  && cd $(top_srcdir) \
	  && gtags -i $(GTAGS_ARGS) $$here

distclean-tags:
	-rm -f TAGS ID GTAGS GRTAGS GSYMS GPATH tags

distdir: $(DISTFILES)
	@srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; \
	topsrcdirstrip=`echo "$(top_srcdir)" | sed 's|.|.|g'`; \
	list='$(DISTFILES)'; for file in $$list; do \
	  case $$file in \
	    $(srcdir)/*) file=`echo "$$file" | sed "s|^$$srcdirstrip/||"`;; \
	    $(top_srcdir)/*) file=`echo "$$file" | sed "s|^$$topsrcdirstrip/|$(top_builddir)/|"`;; \
	  esac; \
	  if test -f $$file || test -d $$file; then d=.; else d=$(srcdir); fi; \
	  dir=`echo "$$file" | sed -e 's,/[^/]*$$,,'`; \
	  if test "$$dir" != "$$file" && test "$$dir" != "."; then \
	    dir="/$$dir"; \
	    $(mkdir_p) "$(distdir)$$dir"; \
	  else \
	    dir=''; \
	  fi; \
	  if test -d $$d/$$file; then \
	    if test -d $(srcdir)/$$file && test $$d != $(srcdir); then \
	      cp -pR $(srcdir)/$$file $(distdir)$$dir || exit 1; \
	    fi; \
	    cp -pR $$d/$$file $(distdir)$$dir || exit 1; \
	  else \
	    test -f $(distdir)/$$file \
	    || cp -p $$d/$$file $(distdir)/$$file \
	    || exit 1; \
	  fi; \
	done
check-am: all-am
check: check-am
all-am: Makefile $(LTLIBRARIES)
installdirs:
install: install-am
install-exec: install-exec-am
install-data: install-data-am
uninstall: uninstall-am

install-am: all-am
	@$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am

installcheck: installcheck-am
install-strip:
	$(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \
	  install_sh_PROGRAM="$(INSTALL_STRIP_PROGRAM)" INSTALL_STRIP_FLAG=-s \
	  `test -z '$(STRIP)' || \
	    echo "INSTALL_PROGRAM_ENV=STRIPPROG='$(STRIP)'"` install
mostlyclean-generic:

clean-generic:

distclean-generic:
	-test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES)

maintainer-clean-generic:
	@echo "This command is intended for maintainers to use"
	@echo "it deletes files that may require special tools to rebuild."
clean: clean-am

clean-am: clean-generic clean-libtool clean-noinstLTLIBRARIES \
	mostlyclean-am

distclean: distclean-am
	-rm -rf $(DEPDIR) ./$(DEPDIR)
	-rm -f Makefile
distclean-am: clean-am distclean-compile distclean-generic \
	distclean-libtool distclean-tags

dvi: dvi-am

dvi-am:

html: html-am

info: info-am

info-am:

install-data-am:

install-exec-am:

install-info: install-info-am

install-man:

installcheck-am:

maintainer-clean: maintainer-clean-am
	-rm -rf $(DEPDIR) ./$(DEPDIR)
	-rm -f Makefile
maintainer-clean-am: distclean-am maintainer-clean-generic

mostlyclean: mostlyclean-am

mostlyclean-am: mostlyclean-compile mostlyclean-generic \
	mostlyclean-libtool

pdf: pdf-am

pdf-am:

ps: ps-am

ps-am:

uninstall-am: uninstall-info-am

.PHONY: CTAGS GTAGS all all-am check check-am clean clean-generic \
	clean-libtool clean-noinstLTLIBRARIES ctags distclean \
	distclean-compile distclean-generic distclean-libtool \
	distclean-tags distdir dvi dvi-am html html-am info info-am \
	install install-am install-data install-data-am install-exec \
	install-exec-am install-info install-info-am install-man \
	install-strip installcheck installcheck-am installdirs \
	maintainer-clean maintainer-clean-generic mostlyclean \
	mostlyclean-compile mostlyclean-generic mostlyclean-libtool \
	pdf pdf-am ps ps-am tags uninstall uninstall-am \
	uninstall-info-am

# Tell versions [3.59,3.63) of GNU make to not export all variables.
# Otherwise a system limit (for SysV at least) may be exceeded.
.NOEXPORT:
���������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/replace/dummy.c�������������������������������������������������������������������0000664�0000771�0001750�00000000060�11166010105�013514� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������void
dummy_solaris_fix( void )
{
    return;
}

��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/replace/mkstemp.h�����������������������������������������������������������������0000664�0000771�0001750�00000002241�11166010105�014051� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/* Copyright (C) 1991, 1992, 1996, 1998 Free Software Foundation, Inc.
   This file is part of the GNU C Library.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Library General Public License as
   published by the Free Software Foundation; either version 2 of the
   License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Library General Public License for more details.

   You should have received a copy of the GNU Library General Public
   License along with the GNU C Library; see the file COPYING.LIB.  If not,
   write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
   Boston, MA 02111-1307, USA.  */


/* Generate a unique temporary file name from TEMPLATE.
   The last six characters of TEMPLATE must be "XXXXXX";
   they are replaced with a string that makes the filename unique.
   Returns a file descriptor open on the file for reading and writing.  */
int mkstemp (char *Template);
���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/search.h��������������������������������������������������������������������������0000664�0000771�0001750�00000017145�11166010110�012230� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
**
** Originally by Kevin Hughes, kev@kevcom.com, 3/11/94
**


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL



**
**
**  Sept 2002 - isolate the search and results into more separate "objects".
**              Still misses the mark.  -L stores the lookup tabels in indexf instead
**              of in a "search" object.
*/


/* set to 1 and recompile to see debugging */
#define DEBUG_RANK 0

#ifndef __HasSeenModule_Search
#define __HasSeenModule_Search 1

#ifdef __cplusplus
extern "C" {
#endif


/*
   -- module data
*/

/* -------- Search Object Structures ------------ */


/* This holds the -L parameters */

typedef struct s_LIMIT_PARAMS LIMIT_PARAMS;
typedef struct s_RESULT RESULT;
typedef struct s_SEARCH_OBJECT SEARCH_OBJECT;
typedef struct s_RESULTS_OBJECT RESULTS_OBJECT;
typedef struct s_DB_RESULTS DB_RESULTS;


/* These are the input parameters */

struct s_LIMIT_PARAMS
{
    LIMIT_PARAMS     *next;
    unsigned char    *propname;
    unsigned char    *lowrange;
    unsigned char    *highrange;
};

/* These are the processed parameters ready for searching */


typedef struct
{
    unsigned char   *inPropRange;  /* indexed by file number -- should be a vector to save room, but what is fastest?  int? */
    propEntry       *loPropRange;
    propEntry       *hiPropRange;
} PROP_LIMITS;



struct s_SEARCH_OBJECT
{
    SWISH          *sw;                /* Parent object */
    char           *query;             /* Query string */
    int             PhraseDelimiter;   /* Phrase delimiter char */
    int             structure;         /* Structure for limiting to HTML tags */
    struct swline  *sort_params;       /* List of sort parameter strings */

    int             limits_prepared;   /* Flag that the parameters have been prepared */
    LIMIT_PARAMS   *limit_params;      /* linked list of -L limit settings */
    PROP_LIMITS   **prop_limits;       /* flags to detect if file should be limited -L for each index, and for each metaname*/
};


    /* == Results Structures == */ 



/* A single result */

struct s_RESULT
{
    RESULT     *next;
    DB_RESULTS *db_results;     /* parent object */

//    int         count;          /* result Entry-Counter */
    int         filenum;        /* there's an extra four bytes we don't need */
    FileRec     fi;             /* This is used to cache the properties and the seek index */
    int         rank;
    int         frequency;
    int         tfrequency;     /* Total frequency of result OR result index */
                                /* during result sorting tfrequency is used as an index number */
    /* proximity */
    int         bArea;     
    int         *pArea;

    unsigned int         posdata[1];     /* used for phrase searches */
};




/* This handles a list of results for a single index file */
/* This is probably not needed since results are always sorted might as well just have a pointer to the first result */
/* no real need to add results to the tail */

typedef struct RESULT_LIST
{
    RESULT *head;
    RESULT *tail;
    RESULTS_OBJECT *results;
    // DB_RESULTS *db_results;  /* parent object */
}
RESULT_LIST;


typedef struct
{
    int              direction;  /* -1 for asc and 1 for desc */
    propEntry        **key;      /* pointer to an array of PropEntry's indexed by result */
    struct metaEntry *property;  /* pointer to the metaEntry for this key - need for sorting propEntry */
    int              is_rank_sort; /* flag for faster sorting by rank */
} SortData;

/* Structure to hold all results per index */

struct s_DB_RESULTS
{
    DB_RESULTS   *next;

    RESULTS_OBJECT *results;            /* parent */
    SEARCH_OBJECT  *srch;               /* make life easy (only valid during search) */


    IndexFILE   *indexf;                /* the associated index file */
    int          index_num;             /* index into params indexed by index number */

    RESULT_LIST *resultlist;            /* pointer to list of results (indirectly) */
    RESULT      *sortresultlist;        /* linked list of RESULTs in sort order (actually just points to resultlist->head) */
    RESULT      *currentresult;         /* pointer to the current seek position */

    struct swline *parsed_words;        /* parsed search query */
    struct swline *removed_stopwords;   /* stopwords that were removed from the query */

    int          num_sort_props;        /* number of sort properties */
    SortData     *sort_data;            /* an array num_sort_props of SortData */

    char         **prop_string_cache;   /* place to cache a result's string properties  $$$ I think this may be a mistake */

    int          result_count;          /* number of results in set */


};

struct s_RESULTS_OBJECT
{
    SWISH          *sw;                 /* parent */
    char           *query;              /* in case user forgot what they searched for */

    void           *ref_count_ptr;      /* for SWISH::API */

    DB_RESULTS     *db_results;         /* Linked list of results - one for each index file */

    int             cur_rec_number;     /* current record number in list */
    
    int             total_results;      /* total number of results */
    int             total_files;        /* total number of files in all combined indexes */
    int             search_words_found; /* flag that some search words were found in some index after parsing -- for error message */
    int             lasterror;          /* used to save errors while processing more than one index file */
    int             bigrank;            /* Largest rank found, for scaling */
    int             rank_scale_factor;  /* for scaling each results rank when fetching with SwishNextResult */
    MEM_ZONE       *resultSearchZone;   /* pool for allocating results */

    MEM_ZONE       *resultSortZone;     /* pool for allocating sort keys for each result */

    RESULT         *resulthashlist[BIGHASHSIZE];    /* Hash array for merging results */

};

void SwishRankScheme( SWISH *sw, int scheme );	/* set ranking scheme */

SEARCH_OBJECT *New_Search_Object( SWISH *sw, char *query );
void SwishSetStructure( SEARCH_OBJECT *srch, int structure );
void SwishPhraseDelimiter( SEARCH_OBJECT *srch, char delimiter );
void SwishSetSort( SEARCH_OBJECT *srch, char *sort );
void SwishSetQuery( SEARCH_OBJECT *srch, char *query );
void Free_Search_Object( SEARCH_OBJECT *srch );


RESULTS_OBJECT *SwishQuery(SWISH *sw, char *words );
RESULTS_OBJECT *SwishExecute(SEARCH_OBJECT *srch, char *words);

int SwishHits( RESULTS_OBJECT *results );


void Free_Results_Object( RESULTS_OBJECT *results );

RESULT *SwishNextResult(RESULTS_OBJECT *results);
int     SwishSeekResult(RESULTS_OBJECT *results, int pos);
int isMetaNameOpNext(struct swline *);

#ifdef __cplusplus
}
#endif /* __cplusplus */



#endif /* __HasSeenModule_Search */

���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/proplimit.c�����������������������������������������������������������������������0000775�0000771�0001750�00000063756�11166010110�013011� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: proplimit.c 1946 2007-10-22 14:56:35Z karpet $
**
    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 14:53:28 CDT 2005
** added GPL
**
** module to limit within a range of properties
** Created June 10, 2001 - moseley
**
*/

#include "swish.h"
#include "swstring.h"
#include "mem.h"
#include "merge.h"      // why is this needed for docprop.h???
#include "search.h"
#include "docprop.h"
#include "index.h"
#include "metanames.h"
#include "compress.h"
#include "error.h"
#include "db.h"
#include "result_sort.h"
#include "swish_qsort.h"
#include "proplimit.h"
#include "array.h"


// #define DEBUGLIMIT

/*==================== These should be in other modules ================*/

/* Should be in docprop.c */

/*******************************************************************
*   Fetch a doc's properties by file number and metaID
*
*   Call with:
*       *sw
*       *indexf
*       filenum
*       metaID
*
*   Returns:
*       pointer to a docPropertyEntry or NULL if not found
*
********************************************************************/

static propEntry *GetPropertyByFile( IndexFILE *indexf, int filenum, struct metaEntry *m )
{
    propEntry *d;
    FileRec fi;
    memset(&fi, 0, sizeof( FileRec ));
    fi.filenum = filenum;
    

    d = ReadSingleDocPropertiesFromDisk(indexf, &fi, m->metaID, m->sort_len );
    freefileinfo(&fi);

    return d;
}

#ifdef DEBUGLIMIT
static void printdocprop( propEntry *d )
{
    char str[1000];
    int  j;

    for (j=0; j < d->propLen; j++)
        str[j] = (d->propValue)[j];

    str[ d->propLen ] = '\0';

    printf("%s (%d)", str, d->propLen );
}

static void printfileprop( SWISH *sw, IndexFILE *indexf, int filenum, struct metaEntry *m )
{
    propEntry *d;

    if ( (d = GetPropertyByFile( indexf, filenum, m )))
        printdocprop( d );
    else
        printf("File %d does not have a property for metaID %d", filenum, metaID );

    freeProperty( d );
}
#endif
    



/*==============================================================*/
/*                 typedefs and structures                      */
/*==============================================================*/

/* This is used to for inverting the metaEntry->sorted_data array */
typedef struct LOOKUP_TABLE
{
    int filenum;
    unsigned long   sort;
} LOOKUP_TABLE;




/*==============================================================*/
/*                  Code                                        */
/*==============================================================*/


/*******************************************************************
* SwishResetSearchLimit  -- clears memory used by -L 
*
*   Call with:
*       SEARCH_OBJECT
*
*   Returns:
*       void
*
*   This clears up the input params, and the stored data created
*   by Prepare_PropLookup(), but does not clear the tables create
*   in the search object to hold the data.
*
********************************************************************/


void SwishResetSearchLimit( SEARCH_OBJECT *srch )
{
    IndexFILE  *indexf = srch->sw->indexlist;
    int         index_count = 0;
    int         metaID;
    


    /* Free up the input parameters */
    ClearLimitParams( srch->limit_params );
    srch->limit_params = NULL;


    /* Free up the stored limits for each meta entry */

    if ( !srch->limits_prepared )
        return;


    while ( indexf )
    {
        PROP_LIMITS *index_limits = srch->prop_limits[index_count++];

        for ( metaID = 0; metaID <= indexf->header.metaCounter; metaID++ )
        {
            if ( index_limits[metaID].inPropRange )
            {
                efree ( index_limits[metaID].inPropRange );
                index_limits[metaID].inPropRange = NULL;
            }

            if ( index_limits[metaID].loPropRange )
            {
                efree ( index_limits[metaID].loPropRange );
                index_limits[metaID].loPropRange = NULL;
            }

            if ( index_limits[metaID].hiPropRange )
            {
                efree ( index_limits[metaID].hiPropRange );
                index_limits[metaID].hiPropRange = NULL;
            }
        }


        indexf = indexf->next;
    }

    srch->limits_prepared = 0;

}

/*  ClearLimitParams -- used internally */


void ClearLimitParams( LIMIT_PARAMS *params )
{
    LIMIT_PARAMS  *tmp;


    while ( params ) {
        efree( params->propname );
        efree( params->lowrange );
        efree( params->highrange );
        tmp = (LIMIT_PARAMS *)params->next;
        efree( params );
        params = tmp;
    }
}





/*******************************************************************
*  SwishSetSearchLimit - add a limit parameter
*
*   Stores strings away for later processing
*   called from someplace?
*
*   Call with:
*       Three strings, first must be metaname.
*
*   Returns:
*       returns false (0) on failure
*       pointer to a LIMIT_PARAMS
*       errors do not return (doesn't do many checks)
*
*   ToDo:
*       Error checking, and maybe pass in a StringList
*
********************************************************************/
int SwishSetSearchLimit(SEARCH_OBJECT *srch, char *propertyname, char *low, char *hi)
{
    LIMIT_PARAMS *params;
    reset_lasterror( srch->sw );
    
    if ( srch->limits_prepared )
    {
        set_progerr( PROP_LIMIT_ERROR, srch->sw, "Limits have been prepared (and executed) -- call SwishResetSearchLimit() first" );
        return 0;
    }

    /* Add new limit parameter to list */
    params  = setlimit_params( srch->sw, srch->limit_params, propertyname, low, hi );

    /* Only reset list if no error */
    if ( params )
        srch->limit_params = params;


    return ( srch->sw->lasterror == 0 );
}

/* This just sets the LIMIT_PARAMS struct -- useful when don't have a SEARCH_OBJECT */

LIMIT_PARAMS *setlimit_params( SWISH *sw, LIMIT_PARAMS *params, char *propertyname, char *low, char *hi )
{
    LIMIT_PARAMS *newparam;
    LIMIT_PARAMS *head = params;


    /* Currently, can only limit by one property -- so check that one hasn't already been used */
    while ( params )
    {
        if (strcmp( (char *)params->propname, propertyname ) == 0)
            break;
        params = (LIMIT_PARAMS *)params->next;
    }


    if ( params )
    {
        set_progerr( PROP_LIMIT_ERROR, sw, "Property '%s' is already limited", propertyname );
        return 0;
    }
        


    newparam = emalloc( sizeof( LIMIT_PARAMS ) );
    
    newparam->propname = (unsigned char *)estrdup( propertyname );
    newparam->lowrange = (unsigned char *)estrdup( low );
    newparam->highrange = (unsigned char *)estrdup( hi );

    /* put at head of list */
    newparam->next = head;

    return newparam;
}



/*******************************************************************
*   This compares the user supplied value with a file's property
*   The file's property is looked up and then Compare_Properties is called
*
*   Call with:
*       *SWISH
*       *indexf
*       *propEntry key - compare key
*       *LOOKUP_TABLE - element containing file number
*
*   Returns:
*
********************************************************************/
static int test_prop( IndexFILE *indexf, struct metaEntry *meta_entry, propEntry *key, LOOKUP_TABLE *sort_array)
{
    propEntry *fileprop;
    int        cmp_value;

#ifdef DEBUGLIMIT
    {
        char *p = DecodeDocProperty( meta_entry, key );
        printf("test_prop comparing '%s' cmp '%s' with ", meta_entry->metaName, p);
        efree( p );
    }
#endif    
        
        

    if ( !(fileprop = GetPropertyByFile( indexf, sort_array->filenum, meta_entry )) )
    {
#ifdef DEBUGLIMIT
        printf("(no prop found for filenum %d) - return +1\n", sort_array->filenum );
#endif        

        /* No property found, assume it's very, very, small */
        return +1;
    }

#ifdef DEBUGLIMIT
    {
        char *p = DecodeDocProperty( meta_entry, fileprop );
        int i = Compare_Properties( meta_entry, key, fileprop  );
        printf("'%s' returning %d\n", p, i );
        efree( p );
    }
#endif    


    cmp_value = Compare_Properties( meta_entry, key, fileprop  );
    freeProperty( fileprop );
    return cmp_value;
    
}

    


/************************************************************************
* Adapted from: msdn, I believe...
*
*    Call with:
*       See below
*
*   Returns:
*       Exact match, true (but could be more than one match location
*       Between two, returns false and the lower position
*       Below list, returns false and -1
*       Above list, return false and numelements (one past end of array)
*
*   ToDo:
*       Check for out of bounds on entry as that may be reasonably common
*
***************************************************************************/

static int binary_search(
    IndexFILE *indexf,              // 
    LOOKUP_TABLE *sort_array,       // table to search through
    int numelements,                // size of table
    propEntry *key,                 // property to compare against
    struct metaEntry *meta_entry,   // associated meta entry (for metaType)
    int *result,                    // result is stored here
    int direction,                  // looking up (positive) looking down (negative)
    int *exact_match)               // last exact match found
{
    int low = 0;
    int high = numelements - 1;
    int num  = numelements;
    int mid;
    int cmp;
    unsigned int half;

    *exact_match = -1;

#ifdef DEBUGLIMIT
    printf("\nbinary_search looking for %s entry\n", ( direction > 0 ? "high" : "low" ) );
#endif    

    while ( low <= high )
    {
        if ( (half = num / 2) )
        {
            mid = low + (num & 1 ? half : half - 1);


            if ( (cmp = test_prop( indexf, meta_entry, key, &sort_array[mid] )) == 0 )
            {
                *exact_match = mid;  // exact match
                cmp = direction;     // but still look for the lowest/highest exact match.
            }

            if ( cmp < 0 )
            {
                high = mid - 1;
                 num = (num & 1 ? half : half - 1);
            }

            else // cmp > 0
            {
                low = mid + 1;
                num = half;
            }
         }
         else if (num)
         {
            if( (cmp = test_prop( indexf, meta_entry, key, &sort_array[low] )) ==0)
            {
                *result = low;
                return 1;
            }
            if ( cmp < 0 ) // this breaks need another compare
            {
                /* less than current, but is is greater */
                if ( low > 0 && (test_prop( indexf, meta_entry, key, &sort_array[low-1] ) < 0))
                    *result = low - 1;
                else
                    *result = low;
                return 0;
            }
            else
            {
                *result = low + 1;
                return 0;
            }
         }
         else // if !num
         {
            /* I can't think of a case for this to match?? */
            progwarn("Binary Sort issue - please report to swish-e list");
            *result = -1;
            return 0;
         }
     }
     *result = low;  // was high, but wasn't returning expected results
     return 0;
}
         

/*******************************************************************
*   This takes a *sort_array and the low/hi range of limits and marks
*   which files are in that range
*
*   Call with:
*       pointer to SWISH
*       pointer to the IndexFile
*       pointer to the LOOKUP_TABLE
*       *metaEntry
*       LIMIT_PARAMS (low/hi range)
*
*   Returns:
*       true if any in range, otherwise false
*
********************************************************************/
static int find_prop(IndexFILE *indexf,  LOOKUP_TABLE *sort_array, int num, PROP_LIMITS *prop_limits, struct metaEntry *meta_entry )
{
    int low, high, j;
    int foundLo, foundHi;
    int some_selected = 0;
    int exact_match;
    

    if ( !prop_limits->loPropRange )
    {
        foundLo = 1;    /* signal exact match */
        low = 0;        /* and start at beginning */
    }
    else
    {
        foundLo = binary_search(indexf, sort_array, num, prop_limits->loPropRange, meta_entry, &low, -1, &exact_match);

        if ( !foundLo && exact_match >= 0 )
        {
            low = exact_match;
            foundLo = 1;  /* mark as an exact match */
        }
    }



    if ( !prop_limits->hiPropRange )
    {
        foundHi = 1;    /* signal exact match */
        high = num -1;  /* and end very end */
    }
    else
    {
        foundHi = binary_search(indexf, sort_array, num, prop_limits->hiPropRange, meta_entry, &high, +1, &exact_match);

        if ( !foundHi && exact_match >= 0 )
        {
            high = exact_match;
            foundHi = 1;
        }
    }

#ifdef DEBUGLIMIT
    printf("Returned range %d - %d (exact: %d %d) cnt: %u\n", low, high, foundLo, foundHi, num );
#endif    

    /* both inbetween two adjacent entries */
    if ( !foundLo && !foundHi && low == high )
    {
        for ( j = 0; j < num; j++ )
            sort_array[j].sort = 0;

        return 0;
    }


    /* now, if not an exact match for the high range, decrease high by one
     * because high is pointing to the *next* higher element, which is TOO high
     */
     
    if ( !foundHi && low < high )
        high--;


    /* Now mark by file number if it's within the range or not */
    for ( j = 0; j < num; j++ )
    {
         if ( j >= low && j <= high )
         {
            sort_array[j].sort = 1;
            some_selected++;
         }
         else
            sort_array[j].sort = 0;
    }

    return some_selected;        

}

/* These sort the LOOKUP_TABLE */
int sortbysort(const void *s1, const void *s2)
{
    LOOKUP_TABLE *a = (LOOKUP_TABLE *)s1;
    LOOKUP_TABLE *b = (LOOKUP_TABLE *)s2;

    return a->sort - b->sort;
}

int sortbyfile(const void *s1, const void *s2)
{
    LOOKUP_TABLE *a = (LOOKUP_TABLE *)s1;
    LOOKUP_TABLE *b = (LOOKUP_TABLE *)s2;

    return a->filenum - b->filenum;
}


/*******************************************************************
*   This creates the lookup table for the range of values selected
*   and stores it in the MetaEntry
*
*   Call with:
*       pointer to SWISH
*       pointer to the IndexFile
*       *metaEntry
*       LIMIT_PARAMS (low/hi range)
*
*   Returns:
*       true if any were marked as found
*       false means no match
*
********************************************************************/

static int create_lookup_array( IndexFILE *indexf, PROP_LIMITS *prop_limits, struct metaEntry *meta_entry )
{
    LOOKUP_TABLE *sort_array;
    int      i;
    int     size = indexf->header.totalfiles;
    int     some_found;

    /* Now do the work of creating the lookup table */

    /* Create memory  -- probably could do this once and use it over and over */
    sort_array = (LOOKUP_TABLE *) emalloc( size * sizeof(LOOKUP_TABLE) );

    /* copy in the data to the sort array */
    for (i = 0; i < size; i++)
    {
        sort_array[i].filenum = i+1;
        sort_array[i].sort = DB_ReadSortedData( meta_entry->sorted_data, i );
    }


    /* now sort by it's sort value */
    swish_qsort(sort_array, size, sizeof(LOOKUP_TABLE), &sortbysort);

    /* This marks in the new array which ones are in range */
    some_found = find_prop( indexf, sort_array, size, prop_limits, meta_entry );


#ifdef DEBUGLIMIT
    for (i = 0; i < size; i++)
    {
        printf("%d File: %d Sort: %lu : ", i, sort_array[i].filenum, sort_array[i].sort );
        printfileprop( sw, indexf, sort_array[i].filenum, meta_entry );
        printf("\n");
    }
#endif

    /* If everything in range, then don't even bother creating the lookup array */
    if ( some_found && sort_array[0].sort && sort_array[size-1].sort )
    {
        efree( sort_array );
        return 1;
    }


    /* sort back by file number */
    swish_qsort(sort_array, size, sizeof(LOOKUP_TABLE), &sortbyfile);


    /* allocate a place to save the lookup table */
    /* What size is best for speed?  bitvector for size would be good */

    prop_limits->inPropRange = (unsigned char *) emalloc( size * sizeof(char) );

    /* populate the array in the metaEntry */
    for (i = 0; i < size; i++)
        prop_limits->inPropRange[i] = (unsigned char)sort_array[i].sort;

    efree( sort_array );

    return some_found;        

}

/*******************************************************************
*   Encode parameters specified on -L command line into two propEntry's
*   which can be used to compare with a file's property
*
*   Call with:
*       *metaEntry  -> current meta entry 
*       *LIMIT_PARAMS     -> associated parameters
*
*   Returns:
*       True if a range was found, otherwise false.
*       sets sw->lasterror on failure
*
*
********************************************************************/
static int params_to_props( IndexFILE *indexf, PROP_LIMITS *prop_limits, struct metaEntry *meta_entry, LIMIT_PARAMS *param )
{
    int error_flag;
    unsigned char *lowrange  = param->lowrange;
    unsigned char *highrange = param->highrange;
    SWISH *sw = indexf->sw;

    /* properties do not have leading white space */


    /* Allow <= and >= in limits.  A NULL property means very low/very high */

    if ( (strcmp( "<=", (char *)lowrange ) == 0)   )
    {
        prop_limits->loPropRange = NULL; /* indicates very small */
        prop_limits->hiPropRange = CreateProperty( meta_entry, highrange, strlen( (char *)highrange ), 0, &error_flag );
    }

    else if ( (strcmp( ">=", (char *)lowrange ) == 0)   )
    {
        prop_limits->loPropRange = CreateProperty( meta_entry, highrange, strlen( (char *)highrange ), 0, &error_flag );
        prop_limits->hiPropRange = NULL; /* indicates very big */
    }

    else
    {
        prop_limits->loPropRange = CreateProperty( meta_entry, lowrange, strlen( (char *)lowrange ), 0, &error_flag );
        prop_limits->hiPropRange = CreateProperty( meta_entry, highrange, strlen( (char *)highrange ), 0, &error_flag );


        if ( !(prop_limits->loPropRange && prop_limits->hiPropRange) )
        {
            set_progerr(PROP_LIMIT_ERROR, sw, "Failed to set range for property '%s' values '%s' and '%s'", meta_entry->metaName, lowrange, highrange );
            return 0;
        }

        /* Validate range */
    
        if ( Compare_Properties( meta_entry, prop_limits->loPropRange, prop_limits->hiPropRange ) > 0 )
        {
            set_progerr(PROP_LIMIT_ERROR, sw, "Property '%s' value '%s' must be <= '%s'", meta_entry->metaName, lowrange, highrange );
            return 0;
        }
    }


    return (  prop_limits->loPropRange || prop_limits->hiPropRange );
}        


/*******************************************************************
*   Scans all the meta entries to see if any are limited, and if so, creates the lookup array
*
*   Call with:
*       pointer to SWISH
*       poinger to an IndexFile
*       user supplied limit parameters
*
*   Returns:
*       false if any arrays are all zero
*       no point in even searching.
*       (meaning that no possible matches exist)
*       but also return false on errors, caller must check sw->lasterror
*
*   ToDo:
*       This ONLY works if the limits are absolute -- that is
*       that you can't OR limits.  Will need fixing at some point
*
********************************************************************/
static int load_index( IndexFILE *indexf, PROP_LIMITS *prop_limits, LIMIT_PARAMS *params )
{
    struct metaEntry *meta_entry;
    LIMIT_PARAMS     *curp;
    PROP_LIMITS       *cur_prop_limits;
    int               found;
    SWISH            *sw = indexf->sw;
    
    

    curp = params;

    /* Look at each parameter */
    for (curp = params; curp; curp = curp->next )
    {
        found = 0;

        if ( !(meta_entry = getPropNameByName( &indexf->header, (char *)curp->propname )))
        {
            set_progerr( PROP_LIMIT_ERROR, sw, "Specified limit name '%s' is not a PropertyName", curp->propname );
            return 0;
        }


        /* This, of course, is not the truth -- but the only slightly useful would be filenum  */
        /* indexfile can be specified on the command line, rank and reccount is not really known */
       
        if ( is_meta_internal( meta_entry ) )
        {
            set_progerr( PROP_LIMIT_ERROR, sw, "Cannot limit by swish result property '%s'", curp->propname );
            return 0;
        }

        cur_prop_limits = &prop_limits[meta_entry->metaID];


        /* see if array has already been allocated (cached) */
        /* probably should catch this earlier */

        if ( cur_prop_limits->inPropRange )
            continue;



        /* Encode the parameters into properties for comparing, and store */

        if ( !params_to_props( indexf, cur_prop_limits, meta_entry, curp ) )
        {
            if ( sw->lasterror )  // check for failure
                return 0;
                
            continue;  /* This means that it failed to set a range */
        }
            

        /* load the sorted_data array, if not already done */

        if ( !meta_entry->sorted_data )
            if( !LoadSortedProps( indexf, meta_entry ) )
                continue;  /* thus it will sort manually without pre-sorted index */


        /* Now create the lookup table in the metaEntry */
        /* A false return means that an array was built but it was all zero */
        /* No need to check anything else at this time, since can only AND -L options */
        /* i.e. = return No Results right away */
        /* This allows search.c to bail out early */

        if ( !create_lookup_array( indexf, cur_prop_limits, meta_entry ) )
            return 0;

    }

    return 1;  // ** flag that it's ok to continue the search.
        
}

/*******************************************************************
*   Prepares the lookup tables for every index
*
*   Call with:
*       pointer to SEARCH_OBJECT
*
*   Returns:
*       true if ok to continue search
*       false indicates that a lookup array was created, but it is all zero
*       indicating there will never be a match
*       ( this falls apart if allow OR limits )
*
*   Notes:
*       See note in search.c limit_result_list()
*       processed limits are stored in the search object
*
*
********************************************************************/

int Prepare_PropLookup(SEARCH_OBJECT *srch )
{
    int             total_indexes = 0;
    int             total_no_docs = 0;
    LIMIT_PARAMS   *params = srch->limit_params;
    SWISH          *sw = srch->sw;
    IndexFILE      *indexf = sw->indexlist;


    /* nothing to limit by */
    if ( !params )
        return 1;

    /* already prepared? */
    if ( srch->limits_prepared++ )
        return 1;


    /* process each index file */
    while ( indexf )
    {
        /* Prop limits is where to store the tables */
        PROP_LIMITS *prop_limits = srch->prop_limits[total_indexes++];

        if ( !load_index( indexf, prop_limits, params ) )
        {
            if ( sw->lasterror )    // check for error
                return 0;
                
            total_no_docs++;
        }
        indexf = indexf->next;
    }

    /* if all indexes are all no docs within limits, then return false */
    return total_indexes != total_no_docs;            

}

/*******************************************************************
*   Removes results that don't fit within the limit
*
*   Call with:
*       *SWISH - to read a file entry if pre-sorted data not available
*       IndexFILE = current index file
*       File number
*
*   Returns
*       true if file should NOT be included in results
*
*
********************************************************************/
int LimitByProperty( IndexFILE *indexf, PROP_LIMITS *prop_limits, int filenum )
{
    int j;
    struct metaEntry  *meta_entry;
    for ( j = 0; j < indexf->header.metaCounter; j++)
    {
         PROP_LIMITS *cur_limit;

        /* Look at all the properties */

        /* Should cache this in the index file, or is this fast enough? */
        if ( !(meta_entry = getPropNameByID( &indexf->header, indexf->header.metaEntryArray[j]->metaID )))
            continue;  /* continue if it's not a property */

        cur_limit = &prop_limits[meta_entry->metaID];            

        /* anything to check? */
        if ( !cur_limit->loPropRange && !cur_limit->hiPropRange )
            continue;
            



        /* If inPropRange is allocated then there is an array for limiting already created from the presorted data */

        if ( cur_limit->inPropRange )
        {
            if ( !cur_limit->inPropRange[filenum-1] )
                return 1;
            continue;
        }




        /* Otherwise, if either range is set, then use a manual lookup of the property */
        
        {
            int limit = 0;
            propEntry *prop = GetPropertyByFile( indexf, filenum, meta_entry );

            /* Return true (i.e. limit) if the file's prop is less than the low range */
            /* or if its property is greater than the high range */
            if (
                (Compare_Properties( meta_entry, prop, cur_limit->loPropRange ) < 0 ) ||
                (cur_limit->hiPropRange && (Compare_Properties( meta_entry, prop, cur_limit->hiPropRange ) > 0 ))
               )
                limit = 1;

            freeProperty( prop );

            /* If limit by this property, then return to limit right away */
            if ( limit )
                return 1;
        }
    }

    return 0;  /* don't limit by default */
}    


������������������swish-e-2.4.7/src/dump.c����������������������������������������������������������������������������0000664�0000771�0001750�00000046470�11166010110�011726� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 15:51:39 CDT 2005
** added GPL

**
**
** 2001-05-07 jmruiz init coding
**
*/


#include "swish.h"
#include "mem.h"
#include "merge.h"
#include "search.h"
#include "docprop.h"
#include "hash.h"
#include "swstring.h"
#include "db.h"
#include "compress.h"
#include "index.h"
#include "search.h"
#include "result_output.h"
#include "metanames.h"
#include "dump.h"
#include "headers.h"
#include "error.h"


void dump_index_file_list( SWISH *sw, IndexFILE *indexf, int filenum, int maxhits )
{
    /* if maxhits is 0 then show all */
    if ( !maxhits )
        maxhits = indexf->header.totalfiles;

    if ( !filenum )
        filenum = 1;

    printf("\n\n-----> FILES in index %s <-----\n", indexf->line );


    while ( filenum <= indexf->header.totalfiles && maxhits )
    {
        FileRec fi;
        int     words = 0;


        /* See if file was deleted */
        if (indexf->header.removedfiles)
        {
            /* Assumes that zero words means that the file was deleted -- should likey be a macro */
            words = getTotalWordsInFile( indexf, filenum );
            if ( !words )
            {
                filenum++;
                continue;  /* skip this file -- it has been deleted */
            }
        }

        memset( &fi, 0, sizeof( FileRec ) );
        fi.filenum = filenum;

        fflush(stdout);
        printf("Dumping File Properties for File Number: %d\n", filenum);


        dump_file_properties( indexf, &fi );
        printf("\n");


        printf("ReadAllDocProperties:\n");
        fi.docProperties =  ReadAllDocPropertiesFromDisk( indexf, filenum );
        dump_file_properties( indexf, &fi );

        if ( words )
            printf("Filenum and words in this file: %d %d\n", filenum, words );

        freefileinfo( &fi );

        printf("\n");


        /* dump one at a time */
        {
            propEntry *p;
            int j;
            struct metaEntry *meta_entry;
            INDEXDATAHEADER *header = &indexf->header;
            int count = header->property_count;

            printf("ReadSingleDocPropertiesFromDisk:\n");

            for (j=0; j< count; j++) // just for testing
            {
                int metaID = header->propIDX_to_metaID[j];

                if ( !(p = ReadSingleDocPropertiesFromDisk(indexf, &fi, metaID, 0 )) )
                    continue;

                meta_entry = getPropNameByID( &indexf->header, metaID );
                dump_single_property( p, meta_entry );

                { // show compression
                    char    *buffer;
                    int     uncompressed_len;
                    int     buf_len;

                    if ( (buffer = DB_ReadProperty( sw, indexf, &fi, meta_entry->metaID, &buf_len, &uncompressed_len, indexf->DB )))
                    {
                        if ( uncompressed_len )
                            printf("  %20s: %d -> %d (%4.2f%%)\n", "**Compressed**", uncompressed_len , buf_len, (float)buf_len/(float)uncompressed_len * 100.00f );

                        efree(buffer);
                    }
                }



                freeProperty( p );
            }
        }
        printf("\n");


        freefileinfo(&fi);
        maxhits--;
        filenum++;
    }
    fflush(stdout);
}


/* prints out the number of words in every file in the index */
/* This will generate an error if not indexed with totalwords and not btree */

void    dump_word_count( SWISH *sw, IndexFILE *indexf, int filenum, int maxhits )
{
    /* if maxhits is 0 then show all */
    if ( !maxhits )
        maxhits = indexf->header.totalfiles;

    if ( !filenum )
        filenum = 1;

    while ( filenum <= indexf->header.totalfiles && maxhits )
    {
        int words = getTotalWordsInFile( indexf, filenum );
        if ( words )
        {
            printf("%d %d\n", filenum, words );
            maxhits++;
        }

        filenum++;
    }
}

/* Prints out the data in an index DB */
void    DB_decompress(SWISH * sw, IndexFILE * indexf, int begin, int maxhits)
{
    int     i,
            j,
            c,
            fieldnum,
            frequency,
            metaID,
            tmpval,
            printedword,
            filenum;
    unsigned int       *posdata;
    int     metadata_length;
    char    word[2];
    char   *resultword;
    unsigned char   *worddata, *s, *start, flag;
    int     sz_worddata, saved_bytes;
    sw_off_t    wordID;



    indexf->DB = DB_Open(sw, indexf->line,DB_READ);
    if ( sw->lasterror )
        SwishAbortLastError( sw );

    metaID = 0;

    metadata_length = 0;

    c = 0;

    frequency = 0;

        /* Read header */
    read_header(sw, &indexf->header, indexf->DB);


    if (DEBUG_MASK & (DEBUG_INDEX_ALL | DEBUG_INDEX_HEADER) )
    {
        sw->headerOutVerbose = 255;
        print_index_headers( indexf );
    }

    fieldnum = 0;


    /* Do metanames first as that will be helpful for decoding next */
    if (DEBUG_MASK & (DEBUG_INDEX_ALL | DEBUG_INDEX_METANAMES)  )
        dump_metanames( sw, indexf, 1 );

    if (DEBUG_MASK & DEBUG_INDEX_WORDS_ONLY)
    {
        DB_InitReadWords(sw, indexf->DB);

        for( j = 0; j < 256; j++ )
        {
            word[0] = (unsigned char) j;
            word[1] = '\0';
            DB_ReadFirstWordInvertedIndex(sw, word,&resultword,&wordID,indexf->DB);

            while(wordID && (((int)((unsigned char)resultword[0]))== j))
            {
              if(indexf->header.removedfiles)
              {
                /* We need to Read Word's data to check that there is
                ** at least one file that has not been removed */
                DB_ReadWordData(sw, wordID, &worddata, &sz_worddata, &saved_bytes, indexf->DB);
                uncompress_worddata(&worddata, &sz_worddata, saved_bytes);

                /* parse and print word's data */
                s = worddata;

                tmpval = uncompress2(&s);     /* tfrequency */
                metaID = uncompress2(&s);     /* metaID */
                metadata_length = uncompress2(&s);

                filenum = 0;
                start = s;
                while(1)
                {                   /* Read on all items */
                    uncompress_location_values(&s,&flag,&tmpval,&frequency);
                    filenum += tmpval;
                    posdata = (unsigned int *) emalloc(frequency * sizeof(int));
                    uncompress_location_positions(&s,flag,frequency,posdata);

                    /* 2004/09 jmruiz. Need to check for one file not being marked as deleted */
                    if (DB_CheckFileNum(sw,filenum,indexf->DB))
                    {
                        printf("%s\n",resultword);
                        break;
                    }
                    /* Check for end of worddata */
                    if ((s - worddata) == sz_worddata)
                        break;   /* End of worddata */

                    /* Check for end of current metaID data */
                    if ( metadata_length == (s - start))
                    {
                        filenum = 0;
                        metaID = uncompress2(&s);
                        metadata_length = uncompress2(&s);
                        start = s;
                    }

                }
                efree(posdata);
                efree(worddata);
              }
              else
                printf("%s\n",resultword);

              efree(resultword);
              DB_ReadNextWordInvertedIndex(sw, word,&resultword,&wordID,indexf->DB);
              if (wordID && ((int)((unsigned char)resultword[0]))!= j)
                efree(resultword);

            }
        }
        DB_EndReadWords(sw, indexf->DB);
    }


    else if (DEBUG_MASK & (DEBUG_INDEX_ALL | DEBUG_INDEX_WORDS | DEBUG_INDEX_WORDS_FULL | DEBUG_INDEX_WORDS_META)  )
    {
        int     *meta_used;
        int     end_meta = 0;

        printf("\n-----> WORD INFO in index %s <-----\n", indexf->line);

        for(i = 0; i < indexf->header.metaCounter; i++)
            if ( indexf->header.metaEntryArray[i]->metaID > end_meta )
                end_meta = indexf->header.metaEntryArray[i]->metaID;

        meta_used = emalloc( sizeof(int) * ( end_meta + 1) );

        /* _META only reports which tags the words are found in */
        for(i = 0; i <= end_meta; i++)
            meta_used[i] = 0;


        DB_InitReadWords(sw, indexf->DB);

        for(j=1;j<256;j++)
        {
            word[0] = (unsigned char) j; word[1] = '\0';
            DB_ReadFirstWordInvertedIndex(sw, word,&resultword,&wordID,indexf->DB);

            while(wordID && (((int)((unsigned char)resultword[0]))== j))
            {
                /* Flag to know if we must print a word or not */
                /* Words with all the files marked as deleted shoud not be
                ** printed */
                printedword = 0;
                /* Read Word's data */
                DB_ReadWordData(sw, wordID, &worddata, &sz_worddata, &saved_bytes, indexf->DB);
                uncompress_worddata(&worddata, &sz_worddata, saved_bytes);

                /* parse and print word's data */
                s = worddata;

                tmpval = uncompress2(&s);     /* tfrequency */
                metaID = uncompress2(&s);     /* metaID */
                metadata_length = uncompress2(&s);

                filenum = 0;
                start = s;
                while(1)
                {                   /* Read on all items */
                    uncompress_location_values(&s,&flag,&tmpval,&frequency);
                    filenum += tmpval;
                    posdata = (unsigned int *) emalloc(frequency * sizeof(int));
                    uncompress_location_positions(&s,flag,frequency,posdata);

                    /* 2004/09 jmruiz. Need to check for files marked as deleted */
                    if ((!indexf->header.removedfiles) || DB_CheckFileNum(sw,filenum,indexf->DB))
                    {
                    if(!printedword)
                    {
                        printf("\n%s",resultword);
                        printedword = 1;
                    }

                    // if (sw->verbose >= 4)
                    if (DEBUG_MASK & (DEBUG_INDEX_ALL|DEBUG_INDEX_WORDS_FULL))
                    {
                        struct metaEntry    *m;

                        printf("\n Meta:%d", metaID);


                        /* Get path from property list */
                        if ( (m = getPropNameByName( &sw->indexlist->header, AUTOPROPERTY_DOCPATH )) )
                        {
                            RESULT r;
                            DB_RESULTS db_results;
                            char  *s;

                            memset( &r, 0, sizeof( RESULT ) );
                            memset( &db_results, 0, sizeof( DB_RESULTS ) );
                            db_results.indexf = indexf;

                            r.db_results = &db_results;
                            r.filenum = filenum;
                            r.fi.filenum = filenum;

                            s = getResultPropAsString( &r, m->metaID);

                            printf(" %s", s );
                            efree( s );

                        }
                        else
                            printf(" Failed to lookup meta entry");


                        printf(" Freq:%d", frequency);
                        printf(" Pos/Struct:");
                    }
                    else if ( DEBUG_MASK & DEBUG_INDEX_WORDS_META)
                        meta_used[ metaID ]++;
                    else
                    {
                        printf(" [%d", metaID);
                        printf(" %d", filenum);
                        printf(" %d (", frequency);
                    }

                    for (i = 0; i < frequency; i++)
                    {
                        if (DEBUG_MASK & (DEBUG_INDEX_ALL | DEBUG_INDEX_WORDS_FULL))
                        //if (sw->verbose >= 4)
                        {
                            if (i)
                                printf(",%d/%x", GET_POSITION(posdata[i]),GET_STRUCTURE(posdata[i]));
                            else
                                printf("%d/%x", GET_POSITION(posdata[i]), GET_STRUCTURE(posdata[i]));
                        }
                        else if ( DEBUG_MASK & DEBUG_INDEX_WORDS)
                        {
                            if (i)
                                 printf(" %d/%x", GET_POSITION(posdata[i]),GET_STRUCTURE(posdata[i]));
                            else
                                 printf("%d/%x", GET_POSITION(posdata[i]),GET_STRUCTURE(posdata[i]));
                        }
                    }
                    if ( DEBUG_MASK & DEBUG_INDEX_WORDS )
                        printf(")]");

                    }  /* End of DB_CheckFileNum */

                    efree(posdata);

                    /* Check for end of worddata */
                    if ((s - worddata) == sz_worddata)
                        break;   /* End of worddata */

                    /* Check for end of current metaID data */
                    if ( metadata_length == (s - start))
                    {
                        filenum = 0;
                        metaID = uncompress2(&s);
                        metadata_length = uncompress2(&s);
                        start = s;
                    }
                }

                if ( DEBUG_MASK & DEBUG_INDEX_WORDS_META)
                {
                    for(i = 0; i <= end_meta; i++)
                    {
                        if ( meta_used[i] )
                            printf( "\t%d", i );
                        meta_used[i] = 0;
                    }
                }


                if ( !( DEBUG_MASK & DEBUG_INDEX_WORDS_META ))
                    printf("\n");

                efree(worddata);
                efree(resultword);
                DB_ReadNextWordInvertedIndex(sw, word,&resultword,&wordID,indexf->DB);
                if (wordID && ((int)((unsigned char)resultword[0]))!= j)
                  efree(resultword);
            }
        }
        DB_EndReadWords(sw, indexf->DB);

        efree( meta_used );
    }






    /* Decode File Info */
    if (DEBUG_MASK & (DEBUG_INDEX_ALL | DEBUG_INDEX_FILES)  )
        dump_index_file_list( sw, indexf, begin, maxhits );


    /* just print filenums and number of words per file for word ranking */
    if (DEBUG_MASK & DEBUG_INDEX_WORD_COUNT )
        dump_word_count( sw, indexf, begin, maxhits );

    DB_Close(sw, indexf->DB);
}


int check_sorted_index( SWISH *sw, IndexFILE *indexf, struct metaEntry *m )
{
    unsigned char *buffer = NULL;
    int     sz_buffer = 0;

    DB_InitReadSortedIndex(sw, indexf->DB);

    /* Get the sorted index of the property */
    DB_ReadSortedIndex(sw, m->metaID, &buffer, &sz_buffer, indexf->DB);

    /* For incremental index. It will be released when closing indexf->DB */
#ifndef USE_BTREE
    if ( sz_buffer )
        efree( buffer );
#endif

    /* Table doesn't exist */
    return sz_buffer;
}


void dump_metanames( SWISH *sw, IndexFILE *indexf, int check_presorted )
{
    struct metaEntry *meta_entry;
    int i;

    printf("\n\n-----> METANAMES for %s <-----\n", indexf->line );
    for(i = 0; i < indexf->header.metaCounter; i++)
    {
        meta_entry = indexf->header.metaEntryArray[i];

        printf("%20s : id=%2d type=%2d ",meta_entry->metaName, meta_entry->metaID, meta_entry->metaType);

        if ( is_meta_index( meta_entry ) )
            printf(" META_INDEX  Rank Bias=%3d", meta_entry->rank_bias );



        if ( is_meta_internal( meta_entry ) )
            printf(" META_INTERNAL");


        if ( is_meta_property( meta_entry ) )
        {
            printf(" META_PROP:");

            if  ( is_meta_string(meta_entry) )
            {
                printf("STRING(case:%s) ",
                    is_meta_use_strcoll(meta_entry)
                        ? "strcoll"
                        : is_meta_ignore_case(meta_entry)
                            ? "ignore"
                            : "compare");
                printf("SortKeyLen: %d ", meta_entry->sort_len );
            }

            else if ( is_meta_date(meta_entry) )
                printf("DATE");

            else if ( is_meta_number(meta_entry) )
                printf("NUMBER");

            else
                printf("unknown!");
        }



        if ( check_presorted && check_sorted_index( sw, indexf, meta_entry)  )
            printf(" *presorted*");


        if ( meta_entry->alias )
        {
            struct metaEntry *m = is_meta_index( meta_entry )
                                  ? getMetaNameByID( &indexf->header, meta_entry->alias )
                                  : getPropNameByID( &indexf->header, meta_entry->alias );

            printf(" [Alias for %s (%d)]", m->metaName, m->metaID );
        }


        printf("\n");

    }
    printf("\n");
}

/***************************************************************
* Dumps what's currently in the fi->docProperties structure
*
**************************************************************/

void dump_file_properties(IndexFILE * indexf, FileRec *fi )
{
    int j;
        propEntry *prop;
    struct metaEntry *meta_entry;

        if ( !fi->docProperties )  /* may not be any properties */
        {
            printf(" (No Properties)\n");
            return;
        }

    for (j = 0; j < fi->docProperties->n; j++)
    {
        if ( !fi->docProperties->propEntry[j] )
            continue;

        meta_entry = getPropNameByID( &indexf->header, j );
        prop = fi->docProperties->propEntry[j];

        dump_single_property( prop, meta_entry );
    }
}


void dump_single_property( propEntry *prop, struct metaEntry *meta_entry )
{
    char *propstr;
    char proptype = '?';
    int  i;


    if  ( is_meta_string(meta_entry) )
        proptype = 'S';

    else if ( is_meta_date(meta_entry) )
        proptype = 'D';

    else if ( is_meta_number(meta_entry) )
        proptype = 'N';


    i = prop ? prop->propLen : 0;

    printf("  %20s:%2d (%3d) %c:", meta_entry->metaName, meta_entry->metaID, i, proptype );


    if ( !prop )
    {
        printf(" propEntry=NULL\n");
        return;
    }

    propstr = DecodeDocProperty( meta_entry, prop );
    i = 0;
    printf(" \"");

    while ( i < (int)strlen( propstr ) )
    {
        if ( 1 ) // ( isprint( (int)propstr[i] ))
            printf("%c", propstr[i] );

        else if ( propstr[i] == '\n' )
            printf("\n");

        else
            printf("..");

        i++;
        if ( i > 300 )
        {
            printf(" ...");
            break;
        }
    }
    printf("\"\n");

    efree( propstr );
}


��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/src/swregex.c�������������������������������������������������������������������������0000775�0000771�0001750�00000031701�11166010110�012437� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: swregex.c 1736 2005-05-12 15:41:22Z karman $
**


    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
Mon May  9 10:57:22 CDT 2005 -- added GPL notice


**
**
** March 16, 2002 - Bill Moseley: moved regex routines out of string.c
**
** This is a collection of routines for building and testing regular expressions
** for use with swish-e.
**
*/

//#include <ctype.h>
#include "swish.h"
#include "mem.h"
//#include "index.h"
//#include "swish_qsort.h"
#include "swstring.h"
#include "error.h"
#include "swregex.h"

static char *regex_replace( char *str, regex_list *regex, int offset, int *matched );


/*********************************************************************
*   Adds a list of patterns to a reg_list.  Calls progerr on failure.
*   Call With:
*       name        = Descriptive name for errors - e.g. the name of the directive currently being processed
*       regex_list  = pointer to the list of regular expressions
*       params      = null-terminated list of pointers to strings
*       regex_pattern = flag to indicate that it's a delimited pattern (instead of just the pattern)
*
*   Returns:
*       void
*
*   ToDO:
*       Really should get passed in *SWISH so can set error string and return
*
*   Notes:
*       An expression can be proceeded by the word "not" to negate the matching of the pattern.
*   
*
**********************************************************************/
void add_regex_patterns( char *name, regex_list **reg_list, char **params, int regex_pattern )
{
    int     negate;
    char    *word;
    char    *pos;
    char    *ptr;
    int     delimiter;
    int     cflags;
    int     global;
    

    while ( *params )
    {
        negate = 0;
        global = 0;
        cflags = REG_EXTENDED;

        
        if ( (strcasecmp( *params, "not" ) == 0) && *(params+1) )
        {
            negate = 1;
            params++;
        }

        /* Simple case of a string pattern */
        if ( !regex_pattern )
        {
            add_regular_expression( reg_list, *params, NULL, cflags, global, negate );
            params++;
            continue;
        }

        word = *params;       
        delimiter = (int)*word;

        word++; /* past the first delimiter */

        if ( !(pos = strchr( word, delimiter )))
            progerr("%s regex: failed to find search pattern delimiter '%c' in pattern '%s'", name, (char)delimiter, *params );

        *pos = '\0';            


        /* now check for flags */
        for ( ptr = pos + 1; *ptr; ptr++ )
        {
            if ( *ptr == 'i' )
                cflags |= REG_ICASE;
            else if ( *ptr == 'm' )
                cflags |= REG_NEWLINE;
            else
                progerr("%s regexp %s: unknown flag '%c'", name, *params, *ptr );
        }

        add_regular_expression( reg_list, word, NULL, cflags, global, negate );

        *pos = delimiter;  /* put it back */
        params++;
    }
}

/*********************************************************************
*   Adds a single regex replacement pattern
*
*   Call With:
*       name        = Descriptive name for errors - e.g. the name of the directive currently being processed
*       regex_list  = pointer to the list of regular expressions
*       word        = delimited regex pattern
*
*   Returns:
*       void
*
*
*
**********************************************************************/

void  add_replace_expression( char *name, regex_list **reg_list, char *expression )
   
{
    char    *word = estrdup( expression );
    char    *save = word;
    int     delimiter = (int)*word;
    char    *pos;
    char    *pattern = NULL;
    char    *replace = NULL;
    int     cflags = REG_EXTENDED;
    int     global = 0;
    char    *ptr;
    

    word++; /* past the first delimiter */

    if ( !(pos = strchr( word, delimiter )))
        progerr("%s regex: failed to find search pattern delimiter '%c' in pattern '%s'", name, (char)delimiter, word );

    *pos = '\0';            
    pattern = estrdup(word);

    word = pos + 1;  /* now at replace pattern */

    if ( !(pos = strchr( word, delimiter )))
        progerr("%s regex: failed to find replace pattern delimiter '%c' in pattern '%s'", name, (char)delimiter, word );

    *pos = '\0';            
    replace = estrdup(word);


    /* now check for flags */
    for ( ptr = pos + 1; *ptr; ptr++ )
    {
        if ( *ptr == 'i' )
            cflags |= REG_ICASE;

        else if ( *ptr == 'm' )
            cflags |= REG_NEWLINE;

        else if ( *ptr == 'g' )
            global++;
        else
            progerr("%s regexp %s: unknown flag '%c'", name, expression, *ptr );
    }

    add_regular_expression( reg_list, pattern, replace, cflags, global, 0 );

    efree( pattern );
    efree( replace );
    efree( save );
}



/*********************************************************************
*   Match regular expressions
*   Works on a list of expressions, and returns true if *ANY* match
*   
*
**********************************************************************/
int match_regex_list( char *str, regex_list *regex, char *comment )
{
    regmatch_t pmatch[1];
    int        matched;

    while ( regex )
    {
        matched = regex->negate
            ? regexec(®ex->re, str, (size_t) 1, pmatch, 0) != 0
            : regexec(®ex->re, str, (size_t) 1, pmatch, 0) == 0;

        if ( DEBUG_MASK & DEBUG_REGEX )
            printf("%s match %s %c~ m[%s] : %s\n", comment, str, (int)(regex->negate ? '!' : '='), regex->pattern, matched ? "matched" : "nope" );            

        if ( matched )
            return 1;

        regex = regex->next;            
    }

    return 0;
}


/*********************************************************************
*   Process all the regular expressions in a regex_list
*
*
**********************************************************************/
char *process_regex_list( char *str, regex_list *regex, int *matched )
{
    if ( DEBUG_MASK & DEBUG_REGEX && regex )
        printf("\nOriginal String: '%s'\n", str );

    while ( regex )
    {
        str = regex_replace( str, regex, 0, matched );
        regex = regex->next;

        if ( DEBUG_MASK & DEBUG_REGEX )
            printf("  Result String: '%s'\n", str );

    }

    return str;
}

/*********************************************************************
*  Regular Expression Substitution
*
*   Rewritten 7/31/2001 - general purpose regexp
*
*   Pass in a string and a regex_list pointer
*
*   Returns:
*       a string.  Either the original, or a replacement string
*       Frees passed in string if return is different.
*
*   Notes:
*       Clearly, there must be a library to do this already.  For /g I'm
*       recursively calling this.
*
*
**********************************************************************/
static char *regex_replace( char *str, regex_list *regex, int offset, int *matched )
{
    regmatch_t pmatch[MAXPAR];
    char   *c;
    char   *newstr;
    int     escape = 0;
    int     pos = 0;
    int     j;
    int     last_offset = 0;

    if ( DEBUG_MASK & DEBUG_REGEX )
        printf("replace %s =~ m[%s][%s]: %s\n", str + offset, regex->pattern, regex->replace,
                regexec(®ex->re, str + offset, (size_t) MAXPAR, pmatch, 0) ? "No Match" : "Matched" );
    
    /* Run regex - return original string if no match (might be nice to print error msg? */
    if ( regexec(®ex->re, str + offset, (size_t) MAXPAR, pmatch, 0) )
        return str;


    /* Flag that a pattern matched */
    (*matched)++;        


    /* allocate a string long enough */
    newstr = (char *) emalloc( offset + strlen( str ) + regex->replace_length + (regex->replace_count * strlen( str )) + 1 );

    /* Copy everything before string */
    for ( j=0; j < offset; j++ )
        newstr[pos++] = str[j];


    /* Copy everything before the match */
    if ( pmatch[0].rm_so > 0 )
        for ( j = offset; j < pmatch[0].rm_so + offset; j++ )
            newstr[pos++] = str[j];


    /* ugly section */
    for ( c = regex->replace; *c; c++ )
    {
        if ( escape )
        {
            newstr[pos++] = *c;
            last_offset = pos;
            escape = 0;
            continue;
        }
        
        if ( *c == '\\' && *(c+1) )
        {
            escape = 1;
            continue;
        }

        if ( '$' == *c && *(c+1) )
        {
            char   *start = NULL;
            char   *end = NULL;

            c++;

            /* chars before match */
            if ( '`' == *c  ) 
            {
                if ( pmatch[0].rm_so + offset > 0 )
                {
                    start = str;
                    end   = str + pmatch[0].rm_so + offset;
                }
            }

            /* chars after match */
            else if ( '\'' == *c )
            {
                start = str + pmatch[0].rm_eo + offset;
                end = str + strlen( str );
            }

            else if ( *c >= '0' && *c <= '9' )
            {
                int i = (int)( *c ) - (int)'0';

                if ( pmatch[i].rm_so != -1 )
                {
                    start = str + pmatch[i].rm_so + offset;
                    end   = str + pmatch[i].rm_eo + offset;
                }
            }

            else  /* just copy the pattern */
            {
                start = c - 1;
                end   = c + 1;
            }

            if ( start )
                for ( ; start < end; start++ )
                    newstr[pos++] = *start;
        }

        /* not a replace pattern, just copy the char */
        else
            newstr[pos++] = *c;

        last_offset = pos;
    }

    newstr[pos] = '\0';

    /* Append any pattern after the string */
    strcat( newstr, str+pmatch[0].rm_eo + offset );
    

    efree( str );
    

    /* This allow /g processing to match repeatedly */
    /* I'm sure there a way to mess this up and end up with a regex loop... */
    
    if ( regex->global && last_offset < (int)strlen( newstr ) )
        newstr = regex_replace( newstr, regex, last_offset, matched );

    return newstr;
}

/*********************************************************
*  Free a regular express list
*
*********************************************************/

void free_regex_list( regex_list **reg_list )
{
    regex_list *list = *reg_list;
    regex_list *next;
    while ( list )
    {
        if ( list->replace )
            efree( list->replace );

        if ( list->pattern )
            efree( list->pattern );

        regfree(&list->re);

        next = list->next;
        efree( list );
        list = next;
    }
    *reg_list = NULL;
}
        
/****************************************************************************
*  Create or Add a regular expression to a list
*  pre-compiles expression to check for errors and for speed
*
*  Pattern and replace string passed in are duplicated
*
*
*****************************************************************************/

void add_regular_expression( regex_list **reg_list, char *pattern, char *replace, int cflags, int global, int negate )
{
    regex_list *new_node = emalloc( sizeof( regex_list ) );
    regex_list *last;
    char       *c;
    int         status;
    int         escape = 0;

    if ( (status = regcomp( &new_node->re, pattern, cflags )))
        progerr("Failed to complie regular expression '%s', pattern. Error: %d", pattern, status );



    new_node->pattern = pattern ? estrdup(pattern) : estrdup("");  /* only used for -T debugging */
    new_node->replace = replace ? estrdup(replace) : estrdup("");
    new_node->negate  = negate;

    new_node->global = global;  /* repeat flag */

    new_node->replace_length = strlen( new_node->replace );

    new_node->replace_count = 0;
    for ( c = new_node->replace; *c; c++ )
    {
        if ( escape )
        {
            escape = 0;
            continue;
        }
        
        if ( *c == '\\' )
        {
            escape = 1;
            continue;
        }

        if ( *c == '$' && *(c+1) )
            new_node->replace_count++;
    }
         
            
    new_node->next = NULL;


    if ( *reg_list == NULL )
        *reg_list = new_node;
    else
    {
        /* get end of list */
        for ( last = *reg_list; last->next; last = last->next );

        last->next = new_node;
    }

}

���������������������������������������������������������������swish-e-2.4.7/src/metanames.h�����������������������������������������������������������������������0000664�0000771�0001750�00000006770�11166010110�012737� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������/*
$Id: metanames.h 1736 2005-05-12 15:41:22Z karman $

    This file is part of Swish-e.

    Swish-e is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    Swish-e is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along  with Swish-e; if not, write to the Free Software
    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
    
    See the COPYING file that accompanies the Swish-e distribution for details
    of the GNU GPL and the special exception available for linking against
    the Swish-e library.
    
** Mon May  9 18:19:34 CDT 2005
** added GPL

*/

/* Jose Ruiz 2000/01 Definitions for MetaNames/Fields */

/* META_INDEX and META_PROP could now share the same bit, since props and metas are separated entries */
#define META_INDEX    (1<<0)      /* bynary 00000001 */  /* Meta is indexed */
#define META_PROP     (1<<1)      /* bynary 00000010 */ /* Also stored as property */
#define META_STRING   (1<<2)      /* String type of property */
#define META_NUMBER   (1<<3)      /* Data is binary number */
#define META_DATE     (1<<4)      /* Data is binary date */
#define META_INTERNAL (1<<5)      /* flag saying this is an internal metaname */
#define META_IGNORE_CASE (1<<6)   /* flag to say ignore case when comparing/sorting */
#define META_NOSTRIP  (1<<7)      /* Do not strip low ascii chars when indexing */
#define META_USE_STRCOLL (1<<8)  /* Use strcoll for sorting string properties */

/* Macros to test the type of a MetaName */
#define is_meta_internal(x)     ((x)->metaType & META_INTERNAL)
#define is_meta_index(x)        ((x)->metaType & META_INDEX)
#define is_meta_property(x)     ((x)->metaType & META_PROP)
#define is_meta_number(x)       ((x)->metaType & META_NUMBER)
#define is_meta_date(x)         ((x)->metaType & META_DATE)
#define is_meta_string(x)       ((x)->metaType & META_STRING)
#define is_meta_ignore_case(x)  ((x)->metaType & META_IGNORE_CASE)
#define is_meta_nostrip(x)      ((x)->metaType & META_NOSTRIP)
#define is_meta_use_strcoll(x)  ((x)->metaType & META_USE_STRCOLL)

int properties_compatible( struct metaEntry *m1, struct metaEntry *m2 );

void add_default_metanames(IndexFILE *);

struct metaEntry * getMetaNameByNameNoAlias(INDEXDATAHEADER * header, char *word);
struct metaEntry * getMetaNameByName(INDEXDATAHEADER *, char *);
struct metaEntry * getMetaNameByID(INDEXDATAHEADER *, int);

struct metaEntry * getPropNameByNameNoAlias(INDEXDATAHEADER * header, char *word);
struct metaEntry * getPropNameByName(INDEXDATAHEADER *, char *);
struct metaEntry * getPropNameByID(INDEXDATAHEADER *, int);


struct metaEntry * addMetaEntry(INDEXDATAHEADER *header, char *metaname, int metaType, int metaID);
struct metaEntry * addNewMetaEntry(INDEXDATAHEADER *header, char *metaWord, int metaType, int metaID);
struct metaEntry * cloneMetaEntry(INDEXDATAHEADER *header, struct metaEntry *meta );

void freeMetaEntries( INDEXDATAHEADER * );
int isDontBumpMetaName(struct swline *,char *tag);
int is_meta_entry( struct metaEntry *meta_entry, char *name );
void ClearInMetaFlags(INDEXDATAHEADER * header);

void init_property_list(INDEXDATAHEADER *header);
��������swish-e-2.4.7/TODO����������������������������������������������������������������������������������0000664�0000771�0001750�00000002624�11166010113�010512� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������A rough to-do list. 

Feel free to add/update items here (and feel freer to write code so that
we can remove them).

NOTE: This is not the same as the 3.0 planning document. See the pod/ dir for that.

=====================================================================


** Test USE_BTREE and the ARRAY code for both speed and accuracy.
Wed Feb  9 09:58:28 PST 2005

** Go over config.h

Seems like the intent was to create more user-visible config options, rather than "hiding" them in config.h.

** Configfile also for "read/search" operations


** enhance search (regex, wildmatch, exact match)
[ from the comments: ]
- New config directive SearchMode [exact|partial|wildmatch|regexpr]
- New cmdline option (to be defined)

allow to specifiy how search will be done:
- exakt string (no wildchars) (100% length and partial)
- wildcard search (?, *): e.g.: wor?d*to*tch
- regexpr. serach: e.g.: [a-z][0-9]*abc$


** Code cleanup...

sisyphus would be proud...

** Dealing with "and" "or: "not" "*" in searches

seems like part of your quest for a better query parser? 
read any non-sleep-inducing Bison lately?


** Altavista like search

This is hard to track. Looks like rasc put it in there, 
but now it looks like it has been removed. Any ideas? It's marked as 95% complete.



** licensing issues
------------------

* files marked with karman note from 5.09.05 re: how original they are
* is httpserver.c still used?

������������������������������������������������������������������������������������������������������������swish-e-2.4.7/aclocal.m4����������������������������������������������������������������������������0000664�0000771�0001750�00001001315�11166010113�011657� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������# generated automatically by aclocal 1.9.6 -*- Autoconf -*-

# Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004,
# 2005  Free Software Foundation, Inc.
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.

# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY, to the extent permitted by law; without
# even the implied warranty of MERCHANTABILITY or FITNESS FOR A
# PARTICULAR PURPOSE.

# libtool.m4 - Configure libtool for the host system. -*-Autoconf-*-

# serial 48 AC_PROG_LIBTOOL


# AC_PROVIDE_IFELSE(MACRO-NAME, IF-PROVIDED, IF-NOT-PROVIDED)
# -----------------------------------------------------------
# If this macro is not defined by Autoconf, define it here.
m4_ifdef([AC_PROVIDE_IFELSE],
         [],
         [m4_define([AC_PROVIDE_IFELSE],
	         [m4_ifdef([AC_PROVIDE_$1],
		           [$2], [$3])])])


# AC_PROG_LIBTOOL
# ---------------
AC_DEFUN([AC_PROG_LIBTOOL],
[AC_REQUIRE([_AC_PROG_LIBTOOL])dnl
dnl If AC_PROG_CXX has already been expanded, run AC_LIBTOOL_CXX
dnl immediately, otherwise, hook it in at the end of AC_PROG_CXX.
  AC_PROVIDE_IFELSE([AC_PROG_CXX],
    [AC_LIBTOOL_CXX],
    [define([AC_PROG_CXX], defn([AC_PROG_CXX])[AC_LIBTOOL_CXX
  ])])
dnl And a similar setup for Fortran 77 support
  AC_PROVIDE_IFELSE([AC_PROG_F77],
    [AC_LIBTOOL_F77],
    [define([AC_PROG_F77], defn([AC_PROG_F77])[AC_LIBTOOL_F77
])])

dnl Quote A][M_PROG_GCJ so that aclocal doesn't bring it in needlessly.
dnl If either AC_PROG_GCJ or A][M_PROG_GCJ have already been expanded, run
dnl AC_LIBTOOL_GCJ immediately, otherwise, hook it in at the end of both.
  AC_PROVIDE_IFELSE([AC_PROG_GCJ],
    [AC_LIBTOOL_GCJ],
    [AC_PROVIDE_IFELSE([A][M_PROG_GCJ],
      [AC_LIBTOOL_GCJ],
      [AC_PROVIDE_IFELSE([LT_AC_PROG_GCJ],
	[AC_LIBTOOL_GCJ],
      [ifdef([AC_PROG_GCJ],
	     [define([AC_PROG_GCJ], defn([AC_PROG_GCJ])[AC_LIBTOOL_GCJ])])
       ifdef([A][M_PROG_GCJ],
	     [define([A][M_PROG_GCJ], defn([A][M_PROG_GCJ])[AC_LIBTOOL_GCJ])])
       ifdef([LT_AC_PROG_GCJ],
	     [define([LT_AC_PROG_GCJ],
		defn([LT_AC_PROG_GCJ])[AC_LIBTOOL_GCJ])])])])
])])# AC_PROG_LIBTOOL


# _AC_PROG_LIBTOOL
# ----------------
AC_DEFUN([_AC_PROG_LIBTOOL],
[AC_REQUIRE([AC_LIBTOOL_SETUP])dnl
AC_BEFORE([$0],[AC_LIBTOOL_CXX])dnl
AC_BEFORE([$0],[AC_LIBTOOL_F77])dnl
AC_BEFORE([$0],[AC_LIBTOOL_GCJ])dnl

# This can be used to rebuild libtool when needed
LIBTOOL_DEPS="$ac_aux_dir/ltmain.sh"

# Always use our own libtool.
LIBTOOL='$(SHELL) $(top_builddir)/libtool'
AC_SUBST(LIBTOOL)dnl

# Prevent multiple expansion
define([AC_PROG_LIBTOOL], [])
])# _AC_PROG_LIBTOOL


# AC_LIBTOOL_SETUP
# ----------------
AC_DEFUN([AC_LIBTOOL_SETUP],
[AC_PREREQ(2.50)dnl
AC_REQUIRE([AC_ENABLE_SHARED])dnl
AC_REQUIRE([AC_ENABLE_STATIC])dnl
AC_REQUIRE([AC_ENABLE_FAST_INSTALL])dnl
AC_REQUIRE([AC_CANONICAL_HOST])dnl
AC_REQUIRE([AC_CANONICAL_BUILD])dnl
AC_REQUIRE([AC_PROG_CC])dnl
AC_REQUIRE([AC_PROG_LD])dnl
AC_REQUIRE([AC_PROG_LD_RELOAD_FLAG])dnl
AC_REQUIRE([AC_PROG_NM])dnl

AC_REQUIRE([AC_PROG_LN_S])dnl
AC_REQUIRE([AC_DEPLIBS_CHECK_METHOD])dnl
# Autoconf 2.13's AC_OBJEXT and AC_EXEEXT macros only works for C compilers!
AC_REQUIRE([AC_OBJEXT])dnl
AC_REQUIRE([AC_EXEEXT])dnl
dnl

AC_LIBTOOL_SYS_MAX_CMD_LEN
AC_LIBTOOL_SYS_GLOBAL_SYMBOL_PIPE
AC_LIBTOOL_OBJDIR

AC_REQUIRE([_LT_AC_SYS_COMPILER])dnl
_LT_AC_PROG_ECHO_BACKSLASH

case $host_os in
aix3*)
  # AIX sometimes has problems with the GCC collect2 program.  For some
  # reason, if we set the COLLECT_NAMES environment variable, the problems
  # vanish in a puff of smoke.
  if test "X${COLLECT_NAMES+set}" != Xset; then
    COLLECT_NAMES=
    export COLLECT_NAMES
  fi
  ;;
esac

# Sed substitution that helps us do robust quoting.  It backslashifies
# metacharacters that are still active within double-quoted strings.
Xsed='sed -e 1s/^X//'
[sed_quote_subst='s/\([\\"\\`$\\\\]\)/\\\1/g']

# Same as above, but do not quote variable references.
[double_quote_subst='s/\([\\"\\`\\\\]\)/\\\1/g']

# Sed substitution to delay expansion of an escaped shell variable in a
# double_quote_subst'ed string.
delay_variable_subst='s/\\\\\\\\\\\$/\\\\\\$/g'

# Sed substitution to avoid accidental globbing in evaled expressions
no_glob_subst='s/\*/\\\*/g'

# Constants:
rm="rm -f"

# Global variables:
default_ofile=libtool
can_build_shared=yes

# All known linkers require a `.a' archive for static linking (except MSVC,
# which needs '.lib').
libext=a
ltmain="$ac_aux_dir/ltmain.sh"
ofile="$default_ofile"
with_gnu_ld="$lt_cv_prog_gnu_ld"

AC_CHECK_TOOL(AR, ar, false)
AC_CHECK_TOOL(RANLIB, ranlib, :)
AC_CHECK_TOOL(STRIP, strip, :)

old_CC="$CC"
old_CFLAGS="$CFLAGS"

# Set sane defaults for various variables
test -z "$AR" && AR=ar
test -z "$AR_FLAGS" && AR_FLAGS=cru
test -z "$AS" && AS=as
test -z "$CC" && CC=cc
test -z "$LTCC" && LTCC=$CC
test -z "$LTCFLAGS" && LTCFLAGS=$CFLAGS
test -z "$DLLTOOL" && DLLTOOL=dlltool
test -z "$LD" && LD=ld
test -z "$LN_S" && LN_S="ln -s"
test -z "$MAGIC_CMD" && MAGIC_CMD=file
test -z "$NM" && NM=nm
test -z "$SED" && SED=sed
test -z "$OBJDUMP" && OBJDUMP=objdump
test -z "$RANLIB" && RANLIB=:
test -z "$STRIP" && STRIP=:
test -z "$ac_objext" && ac_objext=o

# Determine commands to create old-style static archives.
old_archive_cmds='$AR $AR_FLAGS $oldlib$oldobjs$old_deplibs'
old_postinstall_cmds='chmod 644 $oldlib'
old_postuninstall_cmds=

if test -n "$RANLIB"; then
  case $host_os in
  openbsd*)
    old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB -t \$oldlib"
    ;;
  *)
    old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB \$oldlib"
    ;;
  esac
  old_archive_cmds="$old_archive_cmds~\$RANLIB \$oldlib"
fi

_LT_CC_BASENAME([$compiler])

# Only perform the check for file, if the check method requires it
case $deplibs_check_method in
file_magic*)
  if test "$file_magic_cmd" = '$MAGIC_CMD'; then
    AC_PATH_MAGIC
  fi
  ;;
esac

AC_PROVIDE_IFELSE([AC_LIBTOOL_DLOPEN], enable_dlopen=yes, enable_dlopen=no)
AC_PROVIDE_IFELSE([AC_LIBTOOL_WIN32_DLL],
enable_win32_dll=yes, enable_win32_dll=no)

AC_ARG_ENABLE([libtool-lock],
    [AC_HELP_STRING([--disable-libtool-lock],
	[avoid locking (might break parallel builds)])])
test "x$enable_libtool_lock" != xno && enable_libtool_lock=yes

AC_ARG_WITH([pic],
    [AC_HELP_STRING([--with-pic],
	[try to use only PIC/non-PIC objects @<:@default=use both@:>@])],
    [pic_mode="$withval"],
    [pic_mode=default])
test -z "$pic_mode" && pic_mode=default

# Use C for the default configuration in the libtool script
tagname=
AC_LIBTOOL_LANG_C_CONFIG
_LT_AC_TAGCONFIG
])# AC_LIBTOOL_SETUP


# _LT_AC_SYS_COMPILER
# -------------------
AC_DEFUN([_LT_AC_SYS_COMPILER],
[AC_REQUIRE([AC_PROG_CC])dnl

# If no C compiler was specified, use CC.
LTCC=${LTCC-"$CC"}

# If no C compiler flags were specified, use CFLAGS.
LTCFLAGS=${LTCFLAGS-"$CFLAGS"}

# Allow CC to be a program name with arguments.
compiler=$CC
])# _LT_AC_SYS_COMPILER


# _LT_CC_BASENAME(CC)
# -------------------
# Calculate cc_basename.  Skip known compiler wrappers and cross-prefix.
AC_DEFUN([_LT_CC_BASENAME],
[for cc_temp in $1""; do
  case $cc_temp in
    compile | *[[\\/]]compile | ccache | *[[\\/]]ccache ) ;;
    distcc | *[[\\/]]distcc | purify | *[[\\/]]purify ) ;;
    \-*) ;;
    *) break;;
  esac
done
cc_basename=`$echo "X$cc_temp" | $Xsed -e 's%.*/%%' -e "s%^$host_alias-%%"`
])


# _LT_COMPILER_BOILERPLATE
# ------------------------
# Check for compiler boilerplate output or warnings with
# the simple compiler test code.
AC_DEFUN([_LT_COMPILER_BOILERPLATE],
[ac_outfile=conftest.$ac_objext
printf "$lt_simple_compile_test_code" >conftest.$ac_ext
eval "$ac_compile" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err
_lt_compiler_boilerplate=`cat conftest.err`
$rm conftest*
])# _LT_COMPILER_BOILERPLATE


# _LT_LINKER_BOILERPLATE
# ----------------------
# Check for linker boilerplate output or warnings with
# the simple link test code.
AC_DEFUN([_LT_LINKER_BOILERPLATE],
[ac_outfile=conftest.$ac_objext
printf "$lt_simple_link_test_code" >conftest.$ac_ext
eval "$ac_link" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err
_lt_linker_boilerplate=`cat conftest.err`
$rm conftest*
])# _LT_LINKER_BOILERPLATE


# _LT_AC_SYS_LIBPATH_AIX
# ----------------------
# Links a minimal program and checks the executable
# for the system default hardcoded library path. In most cases,
# this is /usr/lib:/lib, but when the MPI compilers are used
# the location of the communication and MPI libs are included too.
# If we don't find anything, use the default library path according
# to the aix ld manual.
AC_DEFUN([_LT_AC_SYS_LIBPATH_AIX],
[AC_LINK_IFELSE(AC_LANG_PROGRAM,[
aix_libpath=`dump -H conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0  *\(.*\)$/\1/; p; }
}'`
# Check for a 64-bit object if we didn't find anything.
if test -z "$aix_libpath"; then aix_libpath=`dump -HX64 conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0  *\(.*\)$/\1/; p; }
}'`; fi],[])
if test -z "$aix_libpath"; then aix_libpath="/usr/lib:/lib"; fi
])# _LT_AC_SYS_LIBPATH_AIX


# _LT_AC_SHELL_INIT(ARG)
# ----------------------
AC_DEFUN([_LT_AC_SHELL_INIT],
[ifdef([AC_DIVERSION_NOTICE],
	     [AC_DIVERT_PUSH(AC_DIVERSION_NOTICE)],
	 [AC_DIVERT_PUSH(NOTICE)])
$1
AC_DIVERT_POP
])# _LT_AC_SHELL_INIT


# _LT_AC_PROG_ECHO_BACKSLASH
# --------------------------
# Add some code to the start of the generated configure script which
# will find an echo command which doesn't interpret backslashes.
AC_DEFUN([_LT_AC_PROG_ECHO_BACKSLASH],
[_LT_AC_SHELL_INIT([
# Check that we are running under the correct shell.
SHELL=${CONFIG_SHELL-/bin/sh}

case X$ECHO in
X*--fallback-echo)
  # Remove one level of quotation (which was required for Make).
  ECHO=`echo "$ECHO" | sed 's,\\\\\[$]\\[$]0,'[$]0','`
  ;;
esac

echo=${ECHO-echo}
if test "X[$]1" = X--no-reexec; then
  # Discard the --no-reexec flag, and continue.
  shift
elif test "X[$]1" = X--fallback-echo; then
  # Avoid inline document here, it may be left over
  :
elif test "X`($echo '\t') 2>/dev/null`" = 'X\t' ; then
  # Yippee, $echo works!
  :
else
  # Restart under the correct shell.
  exec $SHELL "[$]0" --no-reexec ${1+"[$]@"}
fi

if test "X[$]1" = X--fallback-echo; then
  # used as fallback echo
  shift
  cat <<EOF
[$]*
EOF
  exit 0
fi

# The HP-UX ksh and POSIX shell print the target directory to stdout
# if CDPATH is set.
(unset CDPATH) >/dev/null 2>&1 && unset CDPATH

if test -z "$ECHO"; then
if test "X${echo_test_string+set}" != Xset; then
# find a string as large as possible, as long as the shell can cope with it
  for cmd in 'sed 50q "[$]0"' 'sed 20q "[$]0"' 'sed 10q "[$]0"' 'sed 2q "[$]0"' 'echo test'; do
    # expected sizes: less than 2Kb, 1Kb, 512 bytes, 16 bytes, ...
    if (echo_test_string=`eval $cmd`) 2>/dev/null &&
       echo_test_string=`eval $cmd` &&
       (test "X$echo_test_string" = "X$echo_test_string") 2>/dev/null
    then
      break
    fi
  done
fi

if test "X`($echo '\t') 2>/dev/null`" = 'X\t' &&
   echo_testing_string=`($echo "$echo_test_string") 2>/dev/null` &&
   test "X$echo_testing_string" = "X$echo_test_string"; then
  :
else
  # The Solaris, AIX, and Digital Unix default echo programs unquote
  # backslashes.  This makes it impossible to quote backslashes using
  #   echo "$something" | sed 's/\\/\\\\/g'
  #
  # So, first we look for a working echo in the user's PATH.

  lt_save_ifs="$IFS"; IFS=$PATH_SEPARATOR
  for dir in $PATH /usr/ucb; do
    IFS="$lt_save_ifs"
    if (test -f $dir/echo || test -f $dir/echo$ac_exeext) &&
       test "X`($dir/echo '\t') 2>/dev/null`" = 'X\t' &&
       echo_testing_string=`($dir/echo "$echo_test_string") 2>/dev/null` &&
       test "X$echo_testing_string" = "X$echo_test_string"; then
      echo="$dir/echo"
      break
    fi
  done
  IFS="$lt_save_ifs"

  if test "X$echo" = Xecho; then
    # We didn't find a better echo, so look for alternatives.
    if test "X`(print -r '\t') 2>/dev/null`" = 'X\t' &&
       echo_testing_string=`(print -r "$echo_test_string") 2>/dev/null` &&
       test "X$echo_testing_string" = "X$echo_test_string"; then
      # This shell has a builtin print -r that does the trick.
      echo='print -r'
    elif (test -f /bin/ksh || test -f /bin/ksh$ac_exeext) &&
	 test "X$CONFIG_SHELL" != X/bin/ksh; then
      # If we have ksh, try running configure again with it.
      ORIGINAL_CONFIG_SHELL=${CONFIG_SHELL-/bin/sh}
      export ORIGINAL_CONFIG_SHELL
      CONFIG_SHELL=/bin/ksh
      export CONFIG_SHELL
      exec $CONFIG_SHELL "[$]0" --no-reexec ${1+"[$]@"}
    else
      # Try using printf.
      echo='printf %s\n'
      if test "X`($echo '\t') 2>/dev/null`" = 'X\t' &&
	 echo_testing_string=`($echo "$echo_test_string") 2>/dev/null` &&
	 test "X$echo_testing_string" = "X$echo_test_string"; then
	# Cool, printf works
	:
      elif echo_testing_string=`($ORIGINAL_CONFIG_SHELL "[$]0" --fallback-echo '\t') 2>/dev/null` &&
	   test "X$echo_testing_string" = 'X\t' &&
	   echo_testing_string=`($ORIGINAL_CONFIG_SHELL "[$]0" --fallback-echo "$echo_test_string") 2>/dev/null` &&
	   test "X$echo_testing_string" = "X$echo_test_string"; then
	CONFIG_SHELL=$ORIGINAL_CONFIG_SHELL
	export CONFIG_SHELL
	SHELL="$CONFIG_SHELL"
	export SHELL
	echo="$CONFIG_SHELL [$]0 --fallback-echo"
      elif echo_testing_string=`($CONFIG_SHELL "[$]0" --fallback-echo '\t') 2>/dev/null` &&
	   test "X$echo_testing_string" = 'X\t' &&
	   echo_testing_string=`($CONFIG_SHELL "[$]0" --fallback-echo "$echo_test_string") 2>/dev/null` &&
	   test "X$echo_testing_string" = "X$echo_test_string"; then
	echo="$CONFIG_SHELL [$]0 --fallback-echo"
      else
	# maybe with a smaller string...
	prev=:

	for cmd in 'echo test' 'sed 2q "[$]0"' 'sed 10q "[$]0"' 'sed 20q "[$]0"' 'sed 50q "[$]0"'; do
	  if (test "X$echo_test_string" = "X`eval $cmd`") 2>/dev/null
	  then
	    break
	  fi
	  prev="$cmd"
	done

	if test "$prev" != 'sed 50q "[$]0"'; then
	  echo_test_string=`eval $prev`
	  export echo_test_string
	  exec ${ORIGINAL_CONFIG_SHELL-${CONFIG_SHELL-/bin/sh}} "[$]0" ${1+"[$]@"}
	else
	  # Oops.  We lost completely, so just stick with echo.
	  echo=echo
	fi
      fi
    fi
  fi
fi
fi

# Copy echo and quote the copy suitably for passing to libtool from
# the Makefile, instead of quoting the original, which is used later.
ECHO=$echo
if test "X$ECHO" = "X$CONFIG_SHELL [$]0 --fallback-echo"; then
   ECHO="$CONFIG_SHELL \\\$\[$]0 --fallback-echo"
fi

AC_SUBST(ECHO)
])])# _LT_AC_PROG_ECHO_BACKSLASH


# _LT_AC_LOCK
# -----------
AC_DEFUN([_LT_AC_LOCK],
[AC_ARG_ENABLE([libtool-lock],
    [AC_HELP_STRING([--disable-libtool-lock],
	[avoid locking (might break parallel builds)])])
test "x$enable_libtool_lock" != xno && enable_libtool_lock=yes

# Some flags need to be propagated to the compiler or linker for good
# libtool support.
case $host in
ia64-*-hpux*)
  # Find out which ABI we are using.
  echo 'int i;' > conftest.$ac_ext
  if AC_TRY_EVAL(ac_compile); then
    case `/usr/bin/file conftest.$ac_objext` in
    *ELF-32*)
      HPUX_IA64_MODE="32"
      ;;
    *ELF-64*)
      HPUX_IA64_MODE="64"
      ;;
    esac
  fi
  rm -rf conftest*
  ;;
*-*-irix6*)
  # Find out which ABI we are using.
  echo '[#]line __oline__ "configure"' > conftest.$ac_ext
  if AC_TRY_EVAL(ac_compile); then
   if test "$lt_cv_prog_gnu_ld" = yes; then
    case `/usr/bin/file conftest.$ac_objext` in
    *32-bit*)
      LD="${LD-ld} -melf32bsmip"
      ;;
    *N32*)
      LD="${LD-ld} -melf32bmipn32"
      ;;
    *64-bit*)
      LD="${LD-ld} -melf64bmip"
      ;;
    esac
   else
    case `/usr/bin/file conftest.$ac_objext` in
    *32-bit*)
      LD="${LD-ld} -32"
      ;;
    *N32*)
      LD="${LD-ld} -n32"
      ;;
    *64-bit*)
      LD="${LD-ld} -64"
      ;;
    esac
   fi
  fi
  rm -rf conftest*
  ;;

x86_64-*linux*|ppc*-*linux*|powerpc*-*linux*|s390*-*linux*|sparc*-*linux*)
  # Find out which ABI we are using.
  echo 'int i;' > conftest.$ac_ext
  if AC_TRY_EVAL(ac_compile); then
    case `/usr/bin/file conftest.o` in
    *32-bit*)
      case $host in
        x86_64-*linux*)
          LD="${LD-ld} -m elf_i386"
          ;;
        ppc64-*linux*|powerpc64-*linux*)
          LD="${LD-ld} -m elf32ppclinux"
          ;;
        s390x-*linux*)
          LD="${LD-ld} -m elf_s390"
          ;;
        sparc64-*linux*)
          LD="${LD-ld} -m elf32_sparc"
          ;;
      esac
      ;;
    *64-bit*)
      case $host in
        x86_64-*linux*)
          LD="${LD-ld} -m elf_x86_64"
          ;;
        ppc*-*linux*|powerpc*-*linux*)
          LD="${LD-ld} -m elf64ppc"
          ;;
        s390*-*linux*)
          LD="${LD-ld} -m elf64_s390"
          ;;
        sparc*-*linux*)
          LD="${LD-ld} -m elf64_sparc"
          ;;
      esac
      ;;
    esac
  fi
  rm -rf conftest*
  ;;

*-*-sco3.2v5*)
  # On SCO OpenServer 5, we need -belf to get full-featured binaries.
  SAVE_CFLAGS="$CFLAGS"
  CFLAGS="$CFLAGS -belf"
  AC_CACHE_CHECK([whether the C compiler needs -belf], lt_cv_cc_needs_belf,
    [AC_LANG_PUSH(C)
     AC_TRY_LINK([],[],[lt_cv_cc_needs_belf=yes],[lt_cv_cc_needs_belf=no])
     AC_LANG_POP])
  if test x"$lt_cv_cc_needs_belf" != x"yes"; then
    # this is probably gcc 2.8.0, egcs 1.0 or newer; no need for -belf
    CFLAGS="$SAVE_CFLAGS"
  fi
  ;;
sparc*-*solaris*)
  # Find out which ABI we are using.
  echo 'int i;' > conftest.$ac_ext
  if AC_TRY_EVAL(ac_compile); then
    case `/usr/bin/file conftest.o` in
    *64-bit*)
      case $lt_cv_prog_gnu_ld in
      yes*) LD="${LD-ld} -m elf64_sparc" ;;
      *)    LD="${LD-ld} -64" ;;
      esac
      ;;
    esac
  fi
  rm -rf conftest*
  ;;

AC_PROVIDE_IFELSE([AC_LIBTOOL_WIN32_DLL],
[*-*-cygwin* | *-*-mingw* | *-*-pw32*)
  AC_CHECK_TOOL(DLLTOOL, dlltool, false)
  AC_CHECK_TOOL(AS, as, false)
  AC_CHECK_TOOL(OBJDUMP, objdump, false)
  ;;
  ])
esac

need_locks="$enable_libtool_lock"

])# _LT_AC_LOCK


# AC_LIBTOOL_COMPILER_OPTION(MESSAGE, VARIABLE-NAME, FLAGS,
#		[OUTPUT-FILE], [ACTION-SUCCESS], [ACTION-FAILURE])
# ----------------------------------------------------------------
# Check whether the given compiler option works
AC_DEFUN([AC_LIBTOOL_COMPILER_OPTION],
[AC_REQUIRE([LT_AC_PROG_SED])
AC_CACHE_CHECK([$1], [$2],
  [$2=no
  ifelse([$4], , [ac_outfile=conftest.$ac_objext], [ac_outfile=$4])
   printf "$lt_simple_compile_test_code" > conftest.$ac_ext
   lt_compiler_flag="$3"
   # Insert the option either (1) after the last *FLAGS variable, or
   # (2) before a word containing "conftest.", or (3) at the end.
   # Note that $ac_compile itself does not contain backslashes and begins
   # with a dollar sign (not a hyphen), so the echo should work correctly.
   # The option is referenced via a variable to avoid confusing sed.
   lt_compile=`echo "$ac_compile" | $SED \
   -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \
   -e 's: [[^ ]]*conftest\.: $lt_compiler_flag&:; t' \
   -e 's:$: $lt_compiler_flag:'`
   (eval echo "\"\$as_me:__oline__: $lt_compile\"" >&AS_MESSAGE_LOG_FD)
   (eval "$lt_compile" 2>conftest.err)
   ac_status=$?
   cat conftest.err >&AS_MESSAGE_LOG_FD
   echo "$as_me:__oline__: \$? = $ac_status" >&AS_MESSAGE_LOG_FD
   if (exit $ac_status) && test -s "$ac_outfile"; then
     # The compiler can only warn and ignore the option if not recognized
     # So say no if there are warnings other than the usual output.
     $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' >conftest.exp
     $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2
     if test ! -s conftest.er2 || diff conftest.exp conftest.er2 >/dev/null; then
       $2=yes
     fi
   fi
   $rm conftest*
])

if test x"[$]$2" = xyes; then
    ifelse([$5], , :, [$5])
else
    ifelse([$6], , :, [$6])
fi
])# AC_LIBTOOL_COMPILER_OPTION


# AC_LIBTOOL_LINKER_OPTION(MESSAGE, VARIABLE-NAME, FLAGS,
#                          [ACTION-SUCCESS], [ACTION-FAILURE])
# ------------------------------------------------------------
# Check whether the given compiler option works
AC_DEFUN([AC_LIBTOOL_LINKER_OPTION],
[AC_CACHE_CHECK([$1], [$2],
  [$2=no
   save_LDFLAGS="$LDFLAGS"
   LDFLAGS="$LDFLAGS $3"
   printf "$lt_simple_link_test_code" > conftest.$ac_ext
   if (eval $ac_link 2>conftest.err) && test -s conftest$ac_exeext; then
     # The linker can only warn and ignore the option if not recognized
     # So say no if there are warnings
     if test -s conftest.err; then
       # Append any errors to the config.log.
       cat conftest.err 1>&AS_MESSAGE_LOG_FD
       $echo "X$_lt_linker_boilerplate" | $Xsed -e '/^$/d' > conftest.exp
       $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2
       if diff conftest.exp conftest.er2 >/dev/null; then
         $2=yes
       fi
     else
       $2=yes
     fi
   fi
   $rm conftest*
   LDFLAGS="$save_LDFLAGS"
])

if test x"[$]$2" = xyes; then
    ifelse([$4], , :, [$4])
else
    ifelse([$5], , :, [$5])
fi
])# AC_LIBTOOL_LINKER_OPTION


# AC_LIBTOOL_SYS_MAX_CMD_LEN
# --------------------------
AC_DEFUN([AC_LIBTOOL_SYS_MAX_CMD_LEN],
[# find the maximum length of command line arguments
AC_MSG_CHECKING([the maximum length of command line arguments])
AC_CACHE_VAL([lt_cv_sys_max_cmd_len], [dnl
  i=0
  teststring="ABCD"

  case $build_os in
  msdosdjgpp*)
    # On DJGPP, this test can blow up pretty badly due to problems in libc
    # (any single argument exceeding 2000 bytes causes a buffer overrun
    # during glob expansion).  Even if it were fixed, the result of this
    # check would be larger than it should be.
    lt_cv_sys_max_cmd_len=12288;    # 12K is about right
    ;;

  gnu*)
    # Under GNU Hurd, this test is not required because there is
    # no limit to the length of command line arguments.
    # Libtool will interpret -1 as no limit whatsoever
    lt_cv_sys_max_cmd_len=-1;
    ;;

  cygwin* | mingw*)
    # On Win9x/ME, this test blows up -- it succeeds, but takes
    # about 5 minutes as the teststring grows exponentially.
    # Worse, since 9x/ME are not pre-emptively multitasking,
    # you end up with a "frozen" computer, even though with patience
    # the test eventually succeeds (with a max line length of 256k).
    # Instead, let's just punt: use the minimum linelength reported by
    # all of the supported platforms: 8192 (on NT/2K/XP).
    lt_cv_sys_max_cmd_len=8192;
    ;;

  amigaos*)
    # On AmigaOS with pdksh, this test takes hours, literally.
    # So we just punt and use a minimum line length of 8192.
    lt_cv_sys_max_cmd_len=8192;
    ;;

  netbsd* | freebsd* | openbsd* | darwin* | dragonfly*)
    # This has been around since 386BSD, at least.  Likely further.
    if test -x /sbin/sysctl; then
      lt_cv_sys_max_cmd_len=`/sbin/sysctl -n kern.argmax`
    elif test -x /usr/sbin/sysctl; then
      lt_cv_sys_max_cmd_len=`/usr/sbin/sysctl -n kern.argmax`
    else
      lt_cv_sys_max_cmd_len=65536	# usable default for all BSDs
    fi
    # And add a safety zone
    lt_cv_sys_max_cmd_len=`expr $lt_cv_sys_max_cmd_len \/ 4`
    lt_cv_sys_max_cmd_len=`expr $lt_cv_sys_max_cmd_len \* 3`
    ;;

  interix*)
    # We know the value 262144 and hardcode it with a safety zone (like BSD)
    lt_cv_sys_max_cmd_len=196608
    ;;

  osf*)
    # Dr. Hans Ekkehard Plesser reports seeing a kernel panic running configure
    # due to this test when exec_disable_arg_limit is 1 on Tru64. It is not
    # nice to cause kernel panics so lets avoid the loop below.
    # First set a reasonable default.
    lt_cv_sys_max_cmd_len=16384
    #
    if test -x /sbin/sysconfig; then
      case `/sbin/sysconfig -q proc exec_disable_arg_limit` in
        *1*) lt_cv_sys_max_cmd_len=-1 ;;
      esac
    fi
    ;;
  sco3.2v5*)
    lt_cv_sys_max_cmd_len=102400
    ;;
  sysv5* | sco5v6* | sysv4.2uw2*)
    kargmax=`grep ARG_MAX /etc/conf/cf.d/stune 2>/dev/null`
    if test -n "$kargmax"; then
      lt_cv_sys_max_cmd_len=`echo $kargmax | sed 's/.*[[ 	]]//'`
    else
      lt_cv_sys_max_cmd_len=32768
    fi
    ;;
  *)
    # If test is not a shell built-in, we'll probably end up computing a
    # maximum length that is only half of the actual maximum length, but
    # we can't tell.
    SHELL=${SHELL-${CONFIG_SHELL-/bin/sh}}
    while (test "X"`$SHELL [$]0 --fallback-echo "X$teststring" 2>/dev/null` \
	       = "XX$teststring") >/dev/null 2>&1 &&
	    new_result=`expr "X$teststring" : ".*" 2>&1` &&
	    lt_cv_sys_max_cmd_len=$new_result &&
	    test $i != 17 # 1/2 MB should be enough
    do
      i=`expr $i + 1`
      teststring=$teststring$teststring
    done
    teststring=
    # Add a significant safety factor because C++ compilers can tack on massive
    # amounts of additional arguments before passing them to the linker.
    # It appears as though 1/2 is a usable value.
    lt_cv_sys_max_cmd_len=`expr $lt_cv_sys_max_cmd_len \/ 2`
    ;;
  esac
])
if test -n $lt_cv_sys_max_cmd_len ; then
  AC_MSG_RESULT($lt_cv_sys_max_cmd_len)
else
  AC_MSG_RESULT(none)
fi
])# AC_LIBTOOL_SYS_MAX_CMD_LEN


# _LT_AC_CHECK_DLFCN
# ------------------
AC_DEFUN([_LT_AC_CHECK_DLFCN],
[AC_CHECK_HEADERS(dlfcn.h)dnl
])# _LT_AC_CHECK_DLFCN


# _LT_AC_TRY_DLOPEN_SELF (ACTION-IF-TRUE, ACTION-IF-TRUE-W-USCORE,
#                           ACTION-IF-FALSE, ACTION-IF-CROSS-COMPILING)
# ---------------------------------------------------------------------
AC_DEFUN([_LT_AC_TRY_DLOPEN_SELF],
[AC_REQUIRE([_LT_AC_CHECK_DLFCN])dnl
if test "$cross_compiling" = yes; then :
  [$4]
else
  lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
  lt_status=$lt_dlunknown
  cat > conftest.$ac_ext <<EOF
[#line __oline__ "configure"
#include "confdefs.h"

#if HAVE_DLFCN_H
#include <dlfcn.h>
#endif

#include <stdio.h>

#ifdef RTLD_GLOBAL
#  define LT_DLGLOBAL		RTLD_GLOBAL
#else
#  ifdef DL_GLOBAL
#    define LT_DLGLOBAL		DL_GLOBAL
#  else
#    define LT_DLGLOBAL		0
#  endif
#endif

/* We may have to define LT_DLLAZY_OR_NOW in the command line if we
   find out it does not work in some platform. */
#ifndef LT_DLLAZY_OR_NOW
#  ifdef RTLD_LAZY
#    define LT_DLLAZY_OR_NOW		RTLD_LAZY
#  else
#    ifdef DL_LAZY
#      define LT_DLLAZY_OR_NOW		DL_LAZY
#    else
#      ifdef RTLD_NOW
#        define LT_DLLAZY_OR_NOW	RTLD_NOW
#      else
#        ifdef DL_NOW
#          define LT_DLLAZY_OR_NOW	DL_NOW
#        else
#          define LT_DLLAZY_OR_NOW	0
#        endif
#      endif
#    endif
#  endif
#endif

#ifdef __cplusplus
extern "C" void exit (int);
#endif

void fnord() { int i=42;}
int main ()
{
  void *self = dlopen (0, LT_DLGLOBAL|LT_DLLAZY_OR_NOW);
  int status = $lt_dlunknown;

  if (self)
    {
      if (dlsym (self,"fnord"))       status = $lt_dlno_uscore;
      else if (dlsym( self,"_fnord")) status = $lt_dlneed_uscore;
      /* dlclose (self); */
    }
  else
    puts (dlerror ());

    exit (status);
}]
EOF
  if AC_TRY_EVAL(ac_link) && test -s conftest${ac_exeext} 2>/dev/null; then
    (./conftest; exit; ) >&AS_MESSAGE_LOG_FD 2>/dev/null
    lt_status=$?
    case x$lt_status in
      x$lt_dlno_uscore) $1 ;;
      x$lt_dlneed_uscore) $2 ;;
      x$lt_dlunknown|x*) $3 ;;
    esac
  else :
    # compilation failed
    $3
  fi
fi
rm -fr conftest*
])# _LT_AC_TRY_DLOPEN_SELF


# AC_LIBTOOL_DLOPEN_SELF
# ----------------------
AC_DEFUN([AC_LIBTOOL_DLOPEN_SELF],
[AC_REQUIRE([_LT_AC_CHECK_DLFCN])dnl
if test "x$enable_dlopen" != xyes; then
  enable_dlopen=unknown
  enable_dlopen_self=unknown
  enable_dlopen_self_static=unknown
else
  lt_cv_dlopen=no
  lt_cv_dlopen_libs=

  case $host_os in
  beos*)
    lt_cv_dlopen="load_add_on"
    lt_cv_dlopen_libs=
    lt_cv_dlopen_self=yes
    ;;

  mingw* | pw32*)
    lt_cv_dlopen="LoadLibrary"
    lt_cv_dlopen_libs=
   ;;

  cygwin*)
    lt_cv_dlopen="dlopen"
    lt_cv_dlopen_libs=
   ;;

  darwin*)
  # if libdl is installed we need to link against it
    AC_CHECK_LIB([dl], [dlopen],
		[lt_cv_dlopen="dlopen" lt_cv_dlopen_libs="-ldl"],[
    lt_cv_dlopen="dyld"
    lt_cv_dlopen_libs=
    lt_cv_dlopen_self=yes
    ])
   ;;

  *)
    AC_CHECK_FUNC([shl_load],
	  [lt_cv_dlopen="shl_load"],
      [AC_CHECK_LIB([dld], [shl_load],
	    [lt_cv_dlopen="shl_load" lt_cv_dlopen_libs="-dld"],
	[AC_CHECK_FUNC([dlopen],
	      [lt_cv_dlopen="dlopen"],
	  [AC_CHECK_LIB([dl], [dlopen],
		[lt_cv_dlopen="dlopen" lt_cv_dlopen_libs="-ldl"],
	    [AC_CHECK_LIB([svld], [dlopen],
		  [lt_cv_dlopen="dlopen" lt_cv_dlopen_libs="-lsvld"],
	      [AC_CHECK_LIB([dld], [dld_link],
		    [lt_cv_dlopen="dld_link" lt_cv_dlopen_libs="-dld"])
	      ])
	    ])
	  ])
	])
      ])
    ;;
  esac

  if test "x$lt_cv_dlopen" != xno; then
    enable_dlopen=yes
  else
    enable_dlopen=no
  fi

  case $lt_cv_dlopen in
  dlopen)
    save_CPPFLAGS="$CPPFLAGS"
    test "x$ac_cv_header_dlfcn_h" = xyes && CPPFLAGS="$CPPFLAGS -DHAVE_DLFCN_H"

    save_LDFLAGS="$LDFLAGS"
    wl=$lt_prog_compiler_wl eval LDFLAGS=\"\$LDFLAGS $export_dynamic_flag_spec\"

    save_LIBS="$LIBS"
    LIBS="$lt_cv_dlopen_libs $LIBS"

    AC_CACHE_CHECK([whether a program can dlopen itself],
	  lt_cv_dlopen_self, [dnl
	  _LT_AC_TRY_DLOPEN_SELF(
	    lt_cv_dlopen_self=yes, lt_cv_dlopen_self=yes,
	    lt_cv_dlopen_self=no, lt_cv_dlopen_self=cross)
    ])

    if test "x$lt_cv_dlopen_self" = xyes; then
      wl=$lt_prog_compiler_wl eval LDFLAGS=\"\$LDFLAGS $lt_prog_compiler_static\"
      AC_CACHE_CHECK([whether a statically linked program can dlopen itself],
    	  lt_cv_dlopen_self_static, [dnl
	  _LT_AC_TRY_DLOPEN_SELF(
	    lt_cv_dlopen_self_static=yes, lt_cv_dlopen_self_static=yes,
	    lt_cv_dlopen_self_static=no,  lt_cv_dlopen_self_static=cross)
      ])
    fi

    CPPFLAGS="$save_CPPFLAGS"
    LDFLAGS="$save_LDFLAGS"
    LIBS="$save_LIBS"
    ;;
  esac

  case $lt_cv_dlopen_self in
  yes|no) enable_dlopen_self=$lt_cv_dlopen_self ;;
  *) enable_dlopen_self=unknown ;;
  esac

  case $lt_cv_dlopen_self_static in
  yes|no) enable_dlopen_self_static=$lt_cv_dlopen_self_static ;;
  *) enable_dlopen_self_static=unknown ;;
  esac
fi
])# AC_LIBTOOL_DLOPEN_SELF


# AC_LIBTOOL_PROG_CC_C_O([TAGNAME])
# ---------------------------------
# Check to see if options -c and -o are simultaneously supported by compiler
AC_DEFUN([AC_LIBTOOL_PROG_CC_C_O],
[AC_REQUIRE([_LT_AC_SYS_COMPILER])dnl
AC_CACHE_CHECK([if $compiler supports -c -o file.$ac_objext],
  [_LT_AC_TAGVAR(lt_cv_prog_compiler_c_o, $1)],
  [_LT_AC_TAGVAR(lt_cv_prog_compiler_c_o, $1)=no
   $rm -r conftest 2>/dev/null
   mkdir conftest
   cd conftest
   mkdir out
   printf "$lt_simple_compile_test_code" > conftest.$ac_ext

   lt_compiler_flag="-o out/conftest2.$ac_objext"
   # Insert the option either (1) after the last *FLAGS variable, or
   # (2) before a word containing "conftest.", or (3) at the end.
   # Note that $ac_compile itself does not contain backslashes and begins
   # with a dollar sign (not a hyphen), so the echo should work correctly.
   lt_compile=`echo "$ac_compile" | $SED \
   -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \
   -e 's: [[^ ]]*conftest\.: $lt_compiler_flag&:; t' \
   -e 's:$: $lt_compiler_flag:'`
   (eval echo "\"\$as_me:__oline__: $lt_compile\"" >&AS_MESSAGE_LOG_FD)
   (eval "$lt_compile" 2>out/conftest.err)
   ac_status=$?
   cat out/conftest.err >&AS_MESSAGE_LOG_FD
   echo "$as_me:__oline__: \$? = $ac_status" >&AS_MESSAGE_LOG_FD
   if (exit $ac_status) && test -s out/conftest2.$ac_objext
   then
     # The compiler can only warn and ignore the option if not recognized
     # So say no if there are warnings
     $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' > out/conftest.exp
     $SED '/^$/d; /^ *+/d' out/conftest.err >out/conftest.er2
     if test ! -s out/conftest.er2 || diff out/conftest.exp out/conftest.er2 >/dev/null; then
       _LT_AC_TAGVAR(lt_cv_prog_compiler_c_o, $1)=yes
     fi
   fi
   chmod u+w . 2>&AS_MESSAGE_LOG_FD
   $rm conftest*
   # SGI C++ compiler will create directory out/ii_files/ for
   # template instantiation
   test -d out/ii_files && $rm out/ii_files/* && rmdir out/ii_files
   $rm out/* && rmdir out
   cd ..
   rmdir conftest
   $rm conftest*
])
])# AC_LIBTOOL_PROG_CC_C_O


# AC_LIBTOOL_SYS_HARD_LINK_LOCKS([TAGNAME])
# -----------------------------------------
# Check to see if we can do hard links to lock some files if needed
AC_DEFUN([AC_LIBTOOL_SYS_HARD_LINK_LOCKS],
[AC_REQUIRE([_LT_AC_LOCK])dnl

hard_links="nottested"
if test "$_LT_AC_TAGVAR(lt_cv_prog_compiler_c_o, $1)" = no && test "$need_locks" != no; then
  # do not overwrite the value of need_locks provided by the user
  AC_MSG_CHECKING([if we can lock with hard links])
  hard_links=yes
  $rm conftest*
  ln conftest.a conftest.b 2>/dev/null && hard_links=no
  touch conftest.a
  ln conftest.a conftest.b 2>&5 || hard_links=no
  ln conftest.a conftest.b 2>/dev/null && hard_links=no
  AC_MSG_RESULT([$hard_links])
  if test "$hard_links" = no; then
    AC_MSG_WARN([`$CC' does not support `-c -o', so `make -j' may be unsafe])
    need_locks=warn
  fi
else
  need_locks=no
fi
])# AC_LIBTOOL_SYS_HARD_LINK_LOCKS


# AC_LIBTOOL_OBJDIR
# -----------------
AC_DEFUN([AC_LIBTOOL_OBJDIR],
[AC_CACHE_CHECK([for objdir], [lt_cv_objdir],
[rm -f .libs 2>/dev/null
mkdir .libs 2>/dev/null
if test -d .libs; then
  lt_cv_objdir=.libs
else
  # MS-DOS does not allow filenames that begin with a dot.
  lt_cv_objdir=_libs
fi
rmdir .libs 2>/dev/null])
objdir=$lt_cv_objdir
])# AC_LIBTOOL_OBJDIR


# AC_LIBTOOL_PROG_LD_HARDCODE_LIBPATH([TAGNAME])
# ----------------------------------------------
# Check hardcoding attributes.
AC_DEFUN([AC_LIBTOOL_PROG_LD_HARDCODE_LIBPATH],
[AC_MSG_CHECKING([how to hardcode library paths into programs])
_LT_AC_TAGVAR(hardcode_action, $1)=
if test -n "$_LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)" || \
   test -n "$_LT_AC_TAGVAR(runpath_var, $1)" || \
   test "X$_LT_AC_TAGVAR(hardcode_automatic, $1)" = "Xyes" ; then

  # We can hardcode non-existant directories.
  if test "$_LT_AC_TAGVAR(hardcode_direct, $1)" != no &&
     # If the only mechanism to avoid hardcoding is shlibpath_var, we
     # have to relink, otherwise we might link with an installed library
     # when we should be linking with a yet-to-be-installed one
     ## test "$_LT_AC_TAGVAR(hardcode_shlibpath_var, $1)" != no &&
     test "$_LT_AC_TAGVAR(hardcode_minus_L, $1)" != no; then
    # Linking always hardcodes the temporary library directory.
    _LT_AC_TAGVAR(hardcode_action, $1)=relink
  else
    # We can link without hardcoding, and we can hardcode nonexisting dirs.
    _LT_AC_TAGVAR(hardcode_action, $1)=immediate
  fi
else
  # We cannot hardcode anything, or else we can only hardcode existing
  # directories.
  _LT_AC_TAGVAR(hardcode_action, $1)=unsupported
fi
AC_MSG_RESULT([$_LT_AC_TAGVAR(hardcode_action, $1)])

if test "$_LT_AC_TAGVAR(hardcode_action, $1)" = relink; then
  # Fast installation is not supported
  enable_fast_install=no
elif test "$shlibpath_overrides_runpath" = yes ||
     test "$enable_shared" = no; then
  # Fast installation is not necessary
  enable_fast_install=needless
fi
])# AC_LIBTOOL_PROG_LD_HARDCODE_LIBPATH


# AC_LIBTOOL_SYS_LIB_STRIP
# ------------------------
AC_DEFUN([AC_LIBTOOL_SYS_LIB_STRIP],
[striplib=
old_striplib=
AC_MSG_CHECKING([whether stripping libraries is possible])
if test -n "$STRIP" && $STRIP -V 2>&1 | grep "GNU strip" >/dev/null; then
  test -z "$old_striplib" && old_striplib="$STRIP --strip-debug"
  test -z "$striplib" && striplib="$STRIP --strip-unneeded"
  AC_MSG_RESULT([yes])
else
# FIXME - insert some real tests, host_os isn't really good enough
  case $host_os in
   darwin*)
       if test -n "$STRIP" ; then
         striplib="$STRIP -x"
         AC_MSG_RESULT([yes])
       else
  AC_MSG_RESULT([no])
fi
       ;;
   *)
  AC_MSG_RESULT([no])
    ;;
  esac
fi
])# AC_LIBTOOL_SYS_LIB_STRIP


# AC_LIBTOOL_SYS_DYNAMIC_LINKER
# -----------------------------
# PORTME Fill in your ld.so characteristics
AC_DEFUN([AC_LIBTOOL_SYS_DYNAMIC_LINKER],
[AC_MSG_CHECKING([dynamic linker characteristics])
library_names_spec=
libname_spec='lib$name'
soname_spec=
shrext_cmds=".so"
postinstall_cmds=
postuninstall_cmds=
finish_cmds=
finish_eval=
shlibpath_var=
shlibpath_overrides_runpath=unknown
version_type=none
dynamic_linker="$host_os ld.so"
sys_lib_dlsearch_path_spec="/lib /usr/lib"
if test "$GCC" = yes; then
  sys_lib_search_path_spec=`$CC -print-search-dirs | grep "^libraries:" | $SED -e "s/^libraries://" -e "s,=/,/,g"`
  if echo "$sys_lib_search_path_spec" | grep ';' >/dev/null ; then
    # if the path contains ";" then we assume it to be the separator
    # otherwise default to the standard path separator (i.e. ":") - it is
    # assumed that no part of a normal pathname contains ";" but that should
    # okay in the real world where ";" in dirpaths is itself problematic.
    sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e 's/;/ /g'`
  else
    sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED  -e "s/$PATH_SEPARATOR/ /g"`
  fi
else
  sys_lib_search_path_spec="/lib /usr/lib /usr/local/lib"
fi
need_lib_prefix=unknown
hardcode_into_libs=no

# when you set need_version to no, make sure it does not cause -set_version
# flags to be left without arguments
need_version=unknown

case $host_os in
aix3*)
  version_type=linux
  library_names_spec='${libname}${release}${shared_ext}$versuffix $libname.a'
  shlibpath_var=LIBPATH

  # AIX 3 has no versioning support, so we append a major version to the name.
  soname_spec='${libname}${release}${shared_ext}$major'
  ;;

aix4* | aix5*)
  version_type=linux
  need_lib_prefix=no
  need_version=no
  hardcode_into_libs=yes
  if test "$host_cpu" = ia64; then
    # AIX 5 supports IA64
    library_names_spec='${libname}${release}${shared_ext}$major ${libname}${release}${shared_ext}$versuffix $libname${shared_ext}'
    shlibpath_var=LD_LIBRARY_PATH
  else
    # With GCC up to 2.95.x, collect2 would create an import file
    # for dependence libraries.  The import file would start with
    # the line `#! .'.  This would cause the generated library to
    # depend on `.', always an invalid library.  This was fixed in
    # development snapshots of GCC prior to 3.0.
    case $host_os in
      aix4 | aix4.[[01]] | aix4.[[01]].*)
      if { echo '#if __GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 97)'
	   echo ' yes '
	   echo '#endif'; } | ${CC} -E - | grep yes > /dev/null; then
	:
      else
	can_build_shared=no
      fi
      ;;
    esac
    # AIX (on Power*) has no versioning support, so currently we can not hardcode correct
    # soname into executable. Probably we can add versioning support to
    # collect2, so additional links can be useful in future.
    if test "$aix_use_runtimelinking" = yes; then
      # If using run time linking (on AIX 4.2 or later) use lib<name>.so
      # instead of lib<name>.a to let people know that these are not
      # typical AIX shared libraries.
      library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}'
    else
      # We preserve .a as extension for shared libraries through AIX4.2
      # and later when we are not doing run time linking.
      library_names_spec='${libname}${release}.a $libname.a'
      soname_spec='${libname}${release}${shared_ext}$major'
    fi
    shlibpath_var=LIBPATH
  fi
  ;;

amigaos*)
  library_names_spec='$libname.ixlibrary $libname.a'
  # Create ${libname}_ixlibrary.a entries in /sys/libs.
  finish_eval='for lib in `ls $libdir/*.ixlibrary 2>/dev/null`; do libname=`$echo "X$lib" | $Xsed -e '\''s%^.*/\([[^/]]*\)\.ixlibrary$%\1%'\''`; test $rm /sys/libs/${libname}_ixlibrary.a; $show "cd /sys/libs && $LN_S $lib ${libname}_ixlibrary.a"; cd /sys/libs && $LN_S $lib ${libname}_ixlibrary.a || exit 1; done'
  ;;

beos*)
  library_names_spec='${libname}${shared_ext}'
  dynamic_linker="$host_os ld.so"
  shlibpath_var=LIBRARY_PATH
  ;;

bsdi[[45]]*)
  version_type=linux
  need_version=no
  library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}'
  soname_spec='${libname}${release}${shared_ext}$major'
  finish_cmds='PATH="\$PATH:/sbin" ldconfig $libdir'
  shlibpath_var=LD_LIBRARY_PATH
  sys_lib_search_path_spec="/shlib /usr/lib /usr/X11/lib /usr/contrib/lib /lib /usr/local/lib"
  sys_lib_dlsearch_path_spec="/shlib /usr/lib /usr/local/lib"
  # the default ld.so.conf also contains /usr/contrib/lib and
  # /usr/X11R6/lib (/usr/X11 is a link to /usr/X11R6), but let us allow
  # libtool to hard-code these into programs
  ;;

cygwin* | mingw* | pw32*)
  version_type=windows
  shrext_cmds=".dll"
  need_version=no
  need_lib_prefix=no

  case $GCC,$host_os in
  yes,cygwin* | yes,mingw* | yes,pw32*)
    library_names_spec='$libname.dll.a'
    # DLL is installed to $(libdir)/../bin by postinstall_cmds
    postinstall_cmds='base_file=`basename \${file}`~
      dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i;echo \$dlname'\''`~
      dldir=$destdir/`dirname \$dlpath`~
      test -d \$dldir || mkdir -p \$dldir~
      $install_prog $dir/$dlname \$dldir/$dlname~
      chmod a+x \$dldir/$dlname'
    postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~
      dlpath=$dir/\$dldll~
       $rm \$dlpath'
    shlibpath_overrides_runpath=yes

    case $host_os in
    cygwin*)
      # Cygwin DLLs use 'cyg' prefix rather than 'lib'
      soname_spec='`echo ${libname} | sed -e 's/^lib/cyg/'``echo ${release} | $SED -e 's/[[.]]/-/g'`${versuffix}${shared_ext}'
      sys_lib_search_path_spec="/usr/lib /lib/w32api /lib /usr/local/lib"
      ;;
    mingw*)
      # MinGW DLLs use traditional 'lib' prefix
      soname_spec='${libname}`echo ${release} | $SED -e 's/[[.]]/-/g'`${versuffix}${shared_ext}'
      sys_lib_search_path_spec=`$CC -print-search-dirs | grep "^libraries:" | $SED -e "s/^libraries://" -e "s,=/,/,g"`
      if echo "$sys_lib_search_path_spec" | [grep ';[c-zC-Z]:/' >/dev/null]; then
        # It is most probably a Windows format PATH printed by
        # mingw gcc, but we are running on Cygwin. Gcc prints its search
        # path with ; separators, and with drive letters. We can handle the
        # drive letters (cygwin fileutils understands them), so leave them,
        # especially as we might pass files found there to a mingw objdump,
        # which wouldn't understand a cygwinified path. Ahh.
        sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e 's/;/ /g'`
      else
        sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED  -e "s/$PATH_SEPARATOR/ /g"`
      fi
      ;;
    pw32*)
      # pw32 DLLs use 'pw' prefix rather than 'lib'
      library_names_spec='`echo ${libname} | sed -e 's/^lib/pw/'``echo ${release} | $SED -e 's/[[.]]/-/g'`${versuffix}${shared_ext}'
      ;;
    esac
    ;;

  *)
    library_names_spec='${libname}`echo ${release} | $SED -e 's/[[.]]/-/g'`${versuffix}${shared_ext} $libname.lib'
    ;;
  esac
  dynamic_linker='Win32 ld.exe'
  # FIXME: first we should search . and the directory the executable is in
  shlibpath_var=PATH
  ;;

darwin* | rhapsody*)
  dynamic_linker="$host_os dyld"
  version_type=darwin
  need_lib_prefix=no
  need_version=no
  library_names_spec='${libname}${release}${versuffix}$shared_ext ${libname}${release}${major}$shared_ext ${libname}$shared_ext'
  soname_spec='${libname}${release}${major}$shared_ext'
  shlibpath_overrides_runpath=yes
  shlibpath_var=DYLD_LIBRARY_PATH
  shrext_cmds='`test .$module = .yes && echo .so || echo .dylib`'
  # Apple's gcc prints 'gcc -print-search-dirs' doesn't operate the same.
  if test "$GCC" = yes; then
    sys_lib_search_path_spec=`$CC -print-search-dirs | tr "\n" "$PATH_SEPARATOR" | sed -e 's/libraries:/@libraries:/' | tr "@" "\n" | grep "^libraries:" | sed -e "s/^libraries://" -e "s,=/,/,g" -e "s,$PATH_SEPARATOR, ,g" -e "s,.*,& /lib /usr/lib /usr/local/lib,g"`
  else
    sys_lib_search_path_spec='/lib /usr/lib /usr/local/lib'
  fi
  sys_lib_dlsearch_path_spec='/usr/local/lib /lib /usr/lib'
  ;;

dgux*)
  version_type=linux
  need_lib_prefix=no
  need_version=no
  library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname$shared_ext'
  soname_spec='${libname}${release}${shared_ext}$major'
  shlibpath_var=LD_LIBRARY_PATH
  ;;

freebsd1*)
  dynamic_linker=no
  ;;

kfreebsd*-gnu)
  version_type=linux
  need_lib_prefix=no
  need_version=no
  library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}'
  soname_spec='${libname}${release}${shared_ext}$major'
  shlibpath_var=LD_LIBRARY_PATH
  shlibpath_overrides_runpath=no
  hardcode_into_libs=yes
  dynamic_linker='GNU ld.so'
  ;;

freebsd* | dragonfly*)
  # DragonFly does not have aout.  When/if they implement a new
  # versioning mechanism, adjust this.
  if test -x /usr/bin/objformat; then
    objformat=`/usr/bin/objformat`
  else
    case $host_os in
    freebsd[[123]]*) objformat=aout ;;
    *) objformat=elf ;;
    esac
  fi
  version_type=freebsd-$objformat
  case $version_type in
    freebsd-elf*)
      library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext} $libname${shared_ext}'
      need_version=no
      need_lib_prefix=no
      ;;
    freebsd-*)
      library_names_spec='${libname}${release}${shared_ext}$versuffix $libname${shared_ext}$versuffix'
      need_version=yes
      ;;
  esac
  shlibpath_var=LD_LIBRARY_PATH
  case $host_os in
  freebsd2*)
    shlibpath_overrides_runpath=yes
    ;;
  freebsd3.[[01]]* | freebsdelf3.[[01]]*)
    shlibpath_overrides_runpath=yes
    hardcode_into_libs=yes
    ;;
  freebsd3.[[2-9]]* | freebsdelf3.[[2-9]]* | \
  freebsd4.[[0-5]] | freebsdelf4.[[0-5]] | freebsd4.1.1 | freebsdelf4.1.1)
    shlibpath_overrides_runpath=no
    hardcode_into_libs=yes
    ;;
  freebsd*) # from 4.6 on
    shlibpath_overrides_runpath=yes
    hardcode_into_libs=yes
    ;;
  esac
  ;;

gnu*)
  version_type=linux
  need_lib_prefix=no
  need_version=no
  library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}${major} ${libname}${shared_ext}'
  soname_spec='${libname}${release}${shared_ext}$major'
  shlibpath_var=LD_LIBRARY_PATH
  hardcode_into_libs=yes
  ;;

hpux9* | hpux10* | hpux11*)
  # Give a soname corresponding to the major version so that dld.sl refuses to
  # link against other versions.
  version_type=sunos
  need_lib_prefix=no
  need_version=no
  case $host_cpu in
  ia64*)
    shrext_cmds='.so'
    hardcode_into_libs=yes
    dynamic_linker="$host_os dld.so"
    shlibpath_var=LD_LIBRARY_PATH
    shlibpath_overrides_runpath=yes # Unless +noenvvar is specified.
    library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}'
    soname_spec='${libname}${release}${shared_ext}$major'
    if test "X$HPUX_IA64_MODE" = X32; then
      sys_lib_search_path_spec="/usr/lib/hpux32 /usr/local/lib/hpux32 /usr/local/lib"
    else
      sys_lib_search_path_spec="/usr/lib/hpux64 /usr/local/lib/hpux64"
    fi
    sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec
    ;;
   hppa*64*)
     shrext_cmds='.sl'
     hardcode_into_libs=yes
     dynamic_linker="$host_os dld.sl"
     shlibpath_var=LD_LIBRARY_PATH # How should we handle SHLIB_PATH
     shlibpath_overrides_runpath=yes # Unless +noenvvar is specified.
     library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}'
     soname_spec='${libname}${release}${shared_ext}$major'
     sys_lib_search_path_spec="/usr/lib/pa20_64 /usr/ccs/lib/pa20_64"
     sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec
     ;;
   *)
    shrext_cmds='.sl'
    dynamic_linker="$host_os dld.sl"
    shlibpath_var=SHLIB_PATH
    shlibpath_overrides_runpath=no # +s is required to enable SHLIB_PATH
    library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}'
    soname_spec='${libname}${release}${shared_ext}$major'
    ;;
  esac
  # HP-UX runs *really* slowly unless shared libraries are mode 555.
  postinstall_cmds='chmod 555 $lib'
  ;;

interix3*)
  version_type=linux
  need_lib_prefix=no
  need_version=no
  library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}'
  soname_spec='${libname}${release}${shared_ext}$major'
  dynamic_linker='Interix 3.x ld.so.1 (PE, like ELF)'
  shlibpath_var=LD_LIBRARY_PATH
  shlibpath_overrides_runpath=no
  hardcode_into_libs=yes
  ;;

irix5* | irix6* | nonstopux*)
  case $host_os in
    nonstopux*) version_type=nonstopux ;;
    *)
	if test "$lt_cv_prog_gnu_ld" = yes; then
		version_type=linux
	else
		version_type=irix
	fi ;;
  esac
  need_lib_prefix=no
  need_version=no
  soname_spec='${libname}${release}${shared_ext}$major'
  library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${release}${shared_ext} $libname${shared_ext}'
  case $host_os in
  irix5* | nonstopux*)
    libsuff= shlibsuff=
    ;;
  *)
    case $LD in # libtool.m4 will add one of these switches to LD
    *-32|*"-32 "|*-melf32bsmip|*"-melf32bsmip ")
      libsuff= shlibsuff= libmagic=32-bit;;
    *-n32|*"-n32 "|*-melf32bmipn32|*"-melf32bmipn32 ")
      libsuff=32 shlibsuff=N32 libmagic=N32;;
    *-64|*"-64 "|*-melf64bmip|*"-melf64bmip ")
      libsuff=64 shlibsuff=64 libmagic=64-bit;;
    *) libsuff= shlibsuff= libmagic=never-match;;
    esac
    ;;
  esac
  shlibpath_var=LD_LIBRARY${shlibsuff}_PATH
  shlibpath_overrides_runpath=no
  sys_lib_search_path_spec="/usr/lib${libsuff} /lib${libsuff} /usr/local/lib${libsuff}"
  sys_lib_dlsearch_path_spec="/usr/lib${libsuff} /lib${libsuff}"
  hardcode_into_libs=yes
  ;;

# No shared lib support for Linux oldld, aout, or coff.
linux*oldld* | linux*aout* | linux*coff*)
  dynamic_linker=no
  ;;

# This must be Linux ELF.
linux*)
  version_type=linux
  need_lib_prefix=no
  need_version=no
  library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}'
  soname_spec='${libname}${release}${shared_ext}$major'
  finish_cmds='PATH="\$PATH:/sbin" ldconfig -n $libdir'
  shlibpath_var=LD_LIBRARY_PATH
  shlibpath_overrides_runpath=no
  # This implies no fast_install, which is unacceptable.
  # Some rework will be needed to allow for fast_install
  # before this can be enabled.
  hardcode_into_libs=yes

  # Append ld.so.conf contents to the search path
  if test -f /etc/ld.so.conf; then
    lt_ld_extra=`awk '/^include / { system(sprintf("cd /etc; cat %s", \[$]2)); skip = 1; } { if (!skip) print \[$]0; skip = 0; }' < /etc/ld.so.conf | $SED -e 's/#.*//;s/[:,	]/ /g;s/=[^=]*$//;s/=[^= ]* / /g;/^$/d' | tr '\n' ' '`
    sys_lib_dlsearch_path_spec="/lib /usr/lib $lt_ld_extra"
  fi

  # We used to test for /lib/ld.so.1 and disable shared libraries on
  # powerpc, because MkLinux only supported shared libraries with the
  # GNU dynamic linker.  Since this was broken with cross compilers,
  # most powerpc-linux boxes support dynamic linking these days and
  # people can always --disable-shared, the test was removed, and we
  # assume the GNU/Linux dynamic linker is in use.
  dynamic_linker='GNU/Linux ld.so'
  ;;

knetbsd*-gnu)
  version_type=linux
  need_lib_prefix=no
  need_version=no
  library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}'
  soname_spec='${libname}${release}${shared_ext}$major'
  shlibpath_var=LD_LIBRARY_PATH
  shlibpath_overrides_runpath=no
  hardcode_into_libs=yes
  dynamic_linker='GNU ld.so'
  ;;

netbsd*)
  version_type=sunos
  need_lib_prefix=no
  need_version=no
  if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then
    library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix'
    finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir'
    dynamic_linker='NetBSD (a.out) ld.so'
  else
    library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}'
    soname_spec='${libname}${release}${shared_ext}$major'
    dynamic_linker='NetBSD ld.elf_so'
  fi
  shlibpath_var=LD_LIBRARY_PATH
  shlibpath_overrides_runpath=yes
  hardcode_into_libs=yes
  ;;

newsos6)
  version_type=linux
  library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}'
  shlibpath_var=LD_LIBRARY_PATH
  shlibpath_overrides_runpath=yes
  ;;

nto-qnx*)
  version_type=linux
  need_lib_prefix=no
  need_version=no
  library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}'
  soname_spec='${libname}${release}${shared_ext}$major'
  shlibpath_var=LD_LIBRARY_PATH
  shlibpath_overrides_runpath=yes
  ;;

openbsd*)
  version_type=sunos
  sys_lib_dlsearch_path_spec="/usr/lib"
  need_lib_prefix=no
  # Some older versions of OpenBSD (3.3 at least) *do* need versioned libs.
  case $host_os in
    openbsd3.3 | openbsd3.3.*) need_version=yes ;;
    *)                         need_version=no  ;;
  esac
  library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix'
  finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir'
  shlibpath_var=LD_LIBRARY_PATH
  if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`" || test "$host_os-$host_cpu" = "openbsd2.8-powerpc"; then
    case $host_os in
      openbsd2.[[89]] | openbsd2.[[89]].*)
	shlibpath_overrides_runpath=no
	;;
      *)
	shlibpath_overrides_runpath=yes
	;;
      esac
  else
    shlibpath_overrides_runpath=yes
  fi
  ;;

os2*)
  libname_spec='$name'
  shrext_cmds=".dll"
  need_lib_prefix=no
  library_names_spec='$libname${shared_ext} $libname.a'
  dynamic_linker='OS/2 ld.exe'
  shlibpath_var=LIBPATH
  ;;

osf3* | osf4* | osf5*)
  version_type=osf
  need_lib_prefix=no
  need_version=no
  soname_spec='${libname}${release}${shared_ext}$major'
  library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}'
  shlibpath_var=LD_LIBRARY_PATH
  sys_lib_search_path_spec="/usr/shlib /usr/ccs/lib /usr/lib/cmplrs/cc /usr/lib /usr/local/lib /var/shlib"
  sys_lib_dlsearch_path_spec="$sys_lib_search_path_spec"
  ;;

solaris*)
  version_type=linux
  need_lib_prefix=no
  need_version=no
  library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}'
  soname_spec='${libname}${release}${shared_ext}$major'
  shlibpath_var=LD_LIBRARY_PATH
  shlibpath_overrides_runpath=yes
  hardcode_into_libs=yes
  # ldd complains unless libraries are executable
  postinstall_cmds='chmod +x $lib'
  ;;

sunos4*)
  version_type=sunos
  library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix'
  finish_cmds='PATH="\$PATH:/usr/etc" ldconfig $libdir'
  shlibpath_var=LD_LIBRARY_PATH
  shlibpath_overrides_runpath=yes
  if test "$with_gnu_ld" = yes; then
    need_lib_prefix=no
  fi
  need_version=yes
  ;;

sysv4 | sysv4.3*)
  version_type=linux
  library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}'
  soname_spec='${libname}${release}${shared_ext}$major'
  shlibpath_var=LD_LIBRARY_PATH
  case $host_vendor in
    sni)
      shlibpath_overrides_runpath=no
      need_lib_prefix=no
      export_dynamic_flag_spec='${wl}-Blargedynsym'
      runpath_var=LD_RUN_PATH
      ;;
    siemens)
      need_lib_prefix=no
      ;;
    motorola)
      need_lib_prefix=no
      need_version=no
      shlibpath_overrides_runpath=no
      sys_lib_search_path_spec='/lib /usr/lib /usr/ccs/lib'
      ;;
  esac
  ;;

sysv4*MP*)
  if test -d /usr/nec ;then
    version_type=linux
    library_names_spec='$libname${shared_ext}.$versuffix $libname${shared_ext}.$major $libname${shared_ext}'
    soname_spec='$libname${shared_ext}.$major'
    shlibpath_var=LD_LIBRARY_PATH
  fi
  ;;

sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX* | sysv4*uw2*)
  version_type=freebsd-elf
  need_lib_prefix=no
  need_version=no
  library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext} $libname${shared_ext}'
  soname_spec='${libname}${release}${shared_ext}$major'
  shlibpath_var=LD_LIBRARY_PATH
  hardcode_into_libs=yes
  if test "$with_gnu_ld" = yes; then
    sys_lib_search_path_spec='/usr/local/lib /usr/gnu/lib /usr/ccs/lib /usr/lib /lib'
    shlibpath_overrides_runpath=no
  else
    sys_lib_search_path_spec='/usr/ccs/lib /usr/lib'
    shlibpath_overrides_runpath=yes
    case $host_os in
      sco3.2v5*)
        sys_lib_search_path_spec="$sys_lib_search_path_spec /lib"
	;;
    esac
  fi
  sys_lib_dlsearch_path_spec='/usr/lib'
  ;;

uts4*)
  version_type=linux
  library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}'
  soname_spec='${libname}${release}${shared_ext}$major'
  shlibpath_var=LD_LIBRARY_PATH
  ;;

*)
  dynamic_linker=no
  ;;
esac
AC_MSG_RESULT([$dynamic_linker])
test "$dynamic_linker" = no && can_build_shared=no

variables_saved_for_relink="PATH $shlibpath_var $runpath_var"
if test "$GCC" = yes; then
  variables_saved_for_relink="$variables_saved_for_relink GCC_EXEC_PREFIX COMPILER_PATH LIBRARY_PATH"
fi
])# AC_LIBTOOL_SYS_DYNAMIC_LINKER


# _LT_AC_TAGCONFIG
# ----------------
AC_DEFUN([_LT_AC_TAGCONFIG],
[AC_ARG_WITH([tags],
    [AC_HELP_STRING([--with-tags@<:@=TAGS@:>@],
        [include additional configurations @<:@automatic@:>@])],
    [tagnames="$withval"])

if test -f "$ltmain" && test -n "$tagnames"; then
  if test ! -f "${ofile}"; then
    AC_MSG_WARN([output file `$ofile' does not exist])
  fi

  if test -z "$LTCC"; then
    eval "`$SHELL ${ofile} --config | grep '^LTCC='`"
    if test -z "$LTCC"; then
      AC_MSG_WARN([output file `$ofile' does not look like a libtool script])
    else
      AC_MSG_WARN([using `LTCC=$LTCC', extracted from `$ofile'])
    fi
  fi
  if test -z "$LTCFLAGS"; then
    eval "`$SHELL ${ofile} --config | grep '^LTCFLAGS='`"
  fi

  # Extract list of available tagged configurations in $ofile.
  # Note that this assumes the entire list is on one line.
  available_tags=`grep "^available_tags=" "${ofile}" | $SED -e 's/available_tags=\(.*$\)/\1/' -e 's/\"//g'`

  lt_save_ifs="$IFS"; IFS="${IFS}$PATH_SEPARATOR,"
  for tagname in $tagnames; do
    IFS="$lt_save_ifs"
    # Check whether tagname contains only valid characters
    case `$echo "X$tagname" | $Xsed -e 's:[[-_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890,/]]::g'` in
    "") ;;
    *)  AC_MSG_ERROR([invalid tag name: $tagname])
	;;
    esac

    if grep "^# ### BEGIN LIBTOOL TAG CONFIG: $tagname$" < "${ofile}" > /dev/null
    then
      AC_MSG_ERROR([tag name \"$tagname\" already exists])
    fi

    # Update the list of available tags.
    if test -n "$tagname"; then
      echo appending configuration tag \"$tagname\" to $ofile

      case $tagname in
      CXX)
	if test -n "$CXX" && ( test "X$CXX" != "Xno" &&
	    ( (test "X$CXX" = "Xg++" && `g++ -v >/dev/null 2>&1` ) ||
	    (test "X$CXX" != "Xg++"))) ; then
	  AC_LIBTOOL_LANG_CXX_CONFIG
	else
	  tagname=""
	fi
	;;

      F77)
	if test -n "$F77" && test "X$F77" != "Xno"; then
	  AC_LIBTOOL_LANG_F77_CONFIG
	else
	  tagname=""
	fi
	;;

      GCJ)
	if test -n "$GCJ" && test "X$GCJ" != "Xno"; then
	  AC_LIBTOOL_LANG_GCJ_CONFIG
	else
	  tagname=""
	fi
	;;

      RC)
	AC_LIBTOOL_LANG_RC_CONFIG
	;;

      *)
	AC_MSG_ERROR([Unsupported tag name: $tagname])
	;;
      esac

      # Append the new tag name to the list of available tags.
      if test -n "$tagname" ; then
      available_tags="$available_tags $tagname"
    fi
    fi
  done
  IFS="$lt_save_ifs"

  # Now substitute the updated list of available tags.
  if eval "sed -e 's/^available_tags=.*\$/available_tags=\"$available_tags\"/' \"$ofile\" > \"${ofile}T\""; then
    mv "${ofile}T" "$ofile"
    chmod +x "$ofile"
  else
    rm -f "${ofile}T"
    AC_MSG_ERROR([unable to update list of available tagged configurations.])
  fi
fi
])# _LT_AC_TAGCONFIG


# AC_LIBTOOL_DLOPEN
# -----------------
# enable checks for dlopen support
AC_DEFUN([AC_LIBTOOL_DLOPEN],
 [AC_BEFORE([$0],[AC_LIBTOOL_SETUP])
])# AC_LIBTOOL_DLOPEN


# AC_LIBTOOL_WIN32_DLL
# --------------------
# declare package support for building win32 DLLs
AC_DEFUN([AC_LIBTOOL_WIN32_DLL],
[AC_BEFORE([$0], [AC_LIBTOOL_SETUP])
])# AC_LIBTOOL_WIN32_DLL


# AC_ENABLE_SHARED([DEFAULT])
# ---------------------------
# implement the --enable-shared flag
# DEFAULT is either `yes' or `no'.  If omitted, it defaults to `yes'.
AC_DEFUN([AC_ENABLE_SHARED],
[define([AC_ENABLE_SHARED_DEFAULT], ifelse($1, no, no, yes))dnl
AC_ARG_ENABLE([shared],
    [AC_HELP_STRING([--enable-shared@<:@=PKGS@:>@],
	[build shared libraries @<:@default=]AC_ENABLE_SHARED_DEFAULT[@:>@])],
    [p=${PACKAGE-default}
    case $enableval in
    yes) enable_shared=yes ;;
    no) enable_shared=no ;;
    *)
      enable_shared=no
      # Look at the argument we got.  We use all the common list separators.
      lt_save_ifs="$IFS"; IFS="${IFS}$PATH_SEPARATOR,"
      for pkg in $enableval; do
	IFS="$lt_save_ifs"
	if test "X$pkg" = "X$p"; then
	  enable_shared=yes
	fi
      done
      IFS="$lt_save_ifs"
      ;;
    esac],
    [enable_shared=]AC_ENABLE_SHARED_DEFAULT)
])# AC_ENABLE_SHARED


# AC_DISABLE_SHARED
# -----------------
# set the default shared flag to --disable-shared
AC_DEFUN([AC_DISABLE_SHARED],
[AC_BEFORE([$0],[AC_LIBTOOL_SETUP])dnl
AC_ENABLE_SHARED(no)
])# AC_DISABLE_SHARED


# AC_ENABLE_STATIC([DEFAULT])
# ---------------------------
# implement the --enable-static flag
# DEFAULT is either `yes' or `no'.  If omitted, it defaults to `yes'.
AC_DEFUN([AC_ENABLE_STATIC],
[define([AC_ENABLE_STATIC_DEFAULT], ifelse($1, no, no, yes))dnl
AC_ARG_ENABLE([static],
    [AC_HELP_STRING([--enable-static@<:@=PKGS@:>@],
	[build static libraries @<:@default=]AC_ENABLE_STATIC_DEFAULT[@:>@])],
    [p=${PACKAGE-default}
    case $enableval in
    yes) enable_static=yes ;;
    no) enable_static=no ;;
    *)
     enable_static=no
      # Look at the argument we got.  We use all the common list separators.
      lt_save_ifs="$IFS"; IFS="${IFS}$PATH_SEPARATOR,"
      for pkg in $enableval; do
	IFS="$lt_save_ifs"
	if test "X$pkg" = "X$p"; then
	  enable_static=yes
	fi
      done
      IFS="$lt_save_ifs"
      ;;
    esac],
    [enable_static=]AC_ENABLE_STATIC_DEFAULT)
])# AC_ENABLE_STATIC


# AC_DISABLE_STATIC
# -----------------
# set the default static flag to --disable-static
AC_DEFUN([AC_DISABLE_STATIC],
[AC_BEFORE([$0],[AC_LIBTOOL_SETUP])dnl
AC_ENABLE_STATIC(no)
])# AC_DISABLE_STATIC


# AC_ENABLE_FAST_INSTALL([DEFAULT])
# ---------------------------------
# implement the --enable-fast-install flag
# DEFAULT is either `yes' or `no'.  If omitted, it defaults to `yes'.
AC_DEFUN([AC_ENABLE_FAST_INSTALL],
[define([AC_ENABLE_FAST_INSTALL_DEFAULT], ifelse($1, no, no, yes))dnl
AC_ARG_ENABLE([fast-install],
    [AC_HELP_STRING([--enable-fast-install@<:@=PKGS@:>@],
    [optimize for fast installation @<:@default=]AC_ENABLE_FAST_INSTALL_DEFAULT[@:>@])],
    [p=${PACKAGE-default}
    case $enableval in
    yes) enable_fast_install=yes ;;
    no) enable_fast_install=no ;;
    *)
      enable_fast_install=no
      # Look at the argument we got.  We use all the common list separators.
      lt_save_ifs="$IFS"; IFS="${IFS}$PATH_SEPARATOR,"
      for pkg in $enableval; do
	IFS="$lt_save_ifs"
	if test "X$pkg" = "X$p"; then
	  enable_fast_install=yes
	fi
      done
      IFS="$lt_save_ifs"
      ;;
    esac],
    [enable_fast_install=]AC_ENABLE_FAST_INSTALL_DEFAULT)
])# AC_ENABLE_FAST_INSTALL


# AC_DISABLE_FAST_INSTALL
# -----------------------
# set the default to --disable-fast-install
AC_DEFUN([AC_DISABLE_FAST_INSTALL],
[AC_BEFORE([$0],[AC_LIBTOOL_SETUP])dnl
AC_ENABLE_FAST_INSTALL(no)
])# AC_DISABLE_FAST_INSTALL


# AC_LIBTOOL_PICMODE([MODE])
# --------------------------
# implement the --with-pic flag
# MODE is either `yes' or `no'.  If omitted, it defaults to `both'.
AC_DEFUN([AC_LIBTOOL_PICMODE],
[AC_BEFORE([$0],[AC_LIBTOOL_SETUP])dnl
pic_mode=ifelse($#,1,$1,default)
])# AC_LIBTOOL_PICMODE


# AC_PROG_EGREP
# -------------
# This is predefined starting with Autoconf 2.54, so this conditional
# definition can be removed once we require Autoconf 2.54 or later.
m4_ifndef([AC_PROG_EGREP], [AC_DEFUN([AC_PROG_EGREP],
[AC_CACHE_CHECK([for egrep], [ac_cv_prog_egrep],
   [if echo a | (grep -E '(a|b)') >/dev/null 2>&1
    then ac_cv_prog_egrep='grep -E'
    else ac_cv_prog_egrep='egrep'
    fi])
 EGREP=$ac_cv_prog_egrep
 AC_SUBST([EGREP])
])])


# AC_PATH_TOOL_PREFIX
# -------------------
# find a file program which can recognise shared library
AC_DEFUN([AC_PATH_TOOL_PREFIX],
[AC_REQUIRE([AC_PROG_EGREP])dnl
AC_MSG_CHECKING([for $1])
AC_CACHE_VAL(lt_cv_path_MAGIC_CMD,
[case $MAGIC_CMD in
[[\\/*] |  ?:[\\/]*])
  lt_cv_path_MAGIC_CMD="$MAGIC_CMD" # Let the user override the test with a path.
  ;;
*)
  lt_save_MAGIC_CMD="$MAGIC_CMD"
  lt_save_ifs="$IFS"; IFS=$PATH_SEPARATOR
dnl $ac_dummy forces splitting on constant user-supplied paths.
dnl POSIX.2 word splitting is done only on the output of word expansions,
dnl not every word.  This closes a longstanding sh security hole.
  ac_dummy="ifelse([$2], , $PATH, [$2])"
  for ac_dir in $ac_dummy; do
    IFS="$lt_save_ifs"
    test -z "$ac_dir" && ac_dir=.
    if test -f $ac_dir/$1; then
      lt_cv_path_MAGIC_CMD="$ac_dir/$1"
      if test -n "$file_magic_test_file"; then
	case $deplibs_check_method in
	"file_magic "*)
	  file_magic_regex=`expr "$deplibs_check_method" : "file_magic \(.*\)"`
	  MAGIC_CMD="$lt_cv_path_MAGIC_CMD"
	  if eval $file_magic_cmd \$file_magic_test_file 2> /dev/null |
	    $EGREP "$file_magic_regex" > /dev/null; then
	    :
	  else
	    cat <<EOF 1>&2

*** Warning: the command libtool uses to detect shared libraries,
*** $file_magic_cmd, produces output that libtool cannot recognize.
*** The result is that libtool may fail to recognize shared libraries
*** as such.  This will affect the creation of libtool libraries that
*** depend on shared libraries, but programs linked with such libtool
*** libraries will work regardless of this problem.  Nevertheless, you
*** may want to report the problem to your system manager and/or to
*** bug-libtool@gnu.org

EOF
	  fi ;;
	esac
      fi
      break
    fi
  done
  IFS="$lt_save_ifs"
  MAGIC_CMD="$lt_save_MAGIC_CMD"
  ;;
esac])
MAGIC_CMD="$lt_cv_path_MAGIC_CMD"
if test -n "$MAGIC_CMD"; then
  AC_MSG_RESULT($MAGIC_CMD)
else
  AC_MSG_RESULT(no)
fi
])# AC_PATH_TOOL_PREFIX


# AC_PATH_MAGIC
# -------------
# find a file program which can recognise a shared library
AC_DEFUN([AC_PATH_MAGIC],
[AC_PATH_TOOL_PREFIX(${ac_tool_prefix}file, /usr/bin$PATH_SEPARATOR$PATH)
if test -z "$lt_cv_path_MAGIC_CMD"; then
  if test -n "$ac_tool_prefix"; then
    AC_PATH_TOOL_PREFIX(file, /usr/bin$PATH_SEPARATOR$PATH)
  else
    MAGIC_CMD=:
  fi
fi
])# AC_PATH_MAGIC


# AC_PROG_LD
# ----------
# find the pathname to the GNU or non-GNU linker
AC_DEFUN([AC_PROG_LD],
[AC_ARG_WITH([gnu-ld],
    [AC_HELP_STRING([--with-gnu-ld],
	[assume the C compiler uses GNU ld @<:@default=no@:>@])],
    [test "$withval" = no || with_gnu_ld=yes],
    [with_gnu_ld=no])
AC_REQUIRE([LT_AC_PROG_SED])dnl
AC_REQUIRE([AC_PROG_CC])dnl
AC_REQUIRE([AC_CANONICAL_HOST])dnl
AC_REQUIRE([AC_CANONICAL_BUILD])dnl
ac_prog=ld
if test "$GCC" = yes; then
  # Check if gcc -print-prog-name=ld gives a path.
  AC_MSG_CHECKING([for ld used by $CC])
  case $host in
  *-*-mingw*)
    # gcc leaves a trailing carriage return which upsets mingw
    ac_prog=`($CC -print-prog-name=ld) 2>&5 | tr -d '\015'` ;;
  *)
    ac_prog=`($CC -print-prog-name=ld) 2>&5` ;;
  esac
  case $ac_prog in
    # Accept absolute paths.
    [[\\/]]* | ?:[[\\/]]*)
      re_direlt='/[[^/]][[^/]]*/\.\./'
      # Canonicalize the pathname of ld
      ac_prog=`echo $ac_prog| $SED 's%\\\\%/%g'`
      while echo $ac_prog | grep "$re_direlt" > /dev/null 2>&1; do
	ac_prog=`echo $ac_prog| $SED "s%$re_direlt%/%"`
      done
      test -z "$LD" && LD="$ac_prog"
      ;;
  "")
    # If it fails, then pretend we aren't using GCC.
    ac_prog=ld
    ;;
  *)
    # If it is relative, then search for the first ld in PATH.
    with_gnu_ld=unknown
    ;;
  esac
elif test "$with_gnu_ld" = yes; then
  AC_MSG_CHECKING([for GNU ld])
else
  AC_MSG_CHECKING([for non-GNU ld])
fi
AC_CACHE_VAL(lt_cv_path_LD,
[if test -z "$LD"; then
  lt_save_ifs="$IFS"; IFS=$PATH_SEPARATOR
  for ac_dir in $PATH; do
    IFS="$lt_save_ifs"
    test -z "$ac_dir" && ac_dir=.
    if test -f "$ac_dir/$ac_prog" || test -f "$ac_dir/$ac_prog$ac_exeext"; then
      lt_cv_path_LD="$ac_dir/$ac_prog"
      # Check to see if the program is GNU ld.  I'd rather use --version,
      # but apparently some variants of GNU ld only accept -v.
      # Break only if it was the GNU/non-GNU ld that we prefer.
      case `"$lt_cv_path_LD" -v 2>&1 </dev/null` in
      *GNU* | *'with BFD'*)
	test "$with_gnu_ld" != no && break
	;;
      *)
	test "$with_gnu_ld" != yes && break
	;;
      esac
    fi
  done
  IFS="$lt_save_ifs"
else
  lt_cv_path_LD="$LD" # Let the user override the test with a path.
fi])
LD="$lt_cv_path_LD"
if test -n "$LD"; then
  AC_MSG_RESULT($LD)
else
  AC_MSG_RESULT(no)
fi
test -z "$LD" && AC_MSG_ERROR([no acceptable ld found in \$PATH])
AC_PROG_LD_GNU
])# AC_PROG_LD


# AC_PROG_LD_GNU
# --------------
AC_DEFUN([AC_PROG_LD_GNU],
[AC_REQUIRE([AC_PROG_EGREP])dnl
AC_CACHE_CHECK([if the linker ($LD) is GNU ld], lt_cv_prog_gnu_ld,
[# I'd rather use --version here, but apparently some GNU lds only accept -v.
case `$LD -v 2>&1 </dev/null` in
*GNU* | *'with BFD'*)
  lt_cv_prog_gnu_ld=yes
  ;;
*)
  lt_cv_prog_gnu_ld=no
  ;;
esac])
with_gnu_ld=$lt_cv_prog_gnu_ld
])# AC_PROG_LD_GNU


# AC_PROG_LD_RELOAD_FLAG
# ----------------------
# find reload flag for linker
#   -- PORTME Some linkers may need a different reload flag.
AC_DEFUN([AC_PROG_LD_RELOAD_FLAG],
[AC_CACHE_CHECK([for $LD option to reload object files],
  lt_cv_ld_reload_flag,
  [lt_cv_ld_reload_flag='-r'])
reload_flag=$lt_cv_ld_reload_flag
case $reload_flag in
"" | " "*) ;;
*) reload_flag=" $reload_flag" ;;
esac
reload_cmds='$LD$reload_flag -o $output$reload_objs'
case $host_os in
  darwin*)
    if test "$GCC" = yes; then
      reload_cmds='$LTCC $LTCFLAGS -nostdlib ${wl}-r -o $output$reload_objs'
    else
      reload_cmds='$LD$reload_flag -o $output$reload_objs'
    fi
    ;;
esac
])# AC_PROG_LD_RELOAD_FLAG


# AC_DEPLIBS_CHECK_METHOD
# -----------------------
# how to check for library dependencies
#  -- PORTME fill in with the dynamic library characteristics
AC_DEFUN([AC_DEPLIBS_CHECK_METHOD],
[AC_CACHE_CHECK([how to recognise dependent libraries],
lt_cv_deplibs_check_method,
[lt_cv_file_magic_cmd='$MAGIC_CMD'
lt_cv_file_magic_test_file=
lt_cv_deplibs_check_method='unknown'
# Need to set the preceding variable on all platforms that support
# interlibrary dependencies.
# 'none' -- dependencies not supported.
# `unknown' -- same as none, but documents that we really don't know.
# 'pass_all' -- all dependencies passed with no checks.
# 'test_compile' -- check by making test program.
# 'file_magic [[regex]]' -- check by looking for files in library path
# which responds to the $file_magic_cmd with a given extended regex.
# If you have `file' or equivalent on your system and you're not sure
# whether `pass_all' will *always* work, you probably want this one.

case $host_os in
aix4* | aix5*)
  lt_cv_deplibs_check_method=pass_all
  ;;

beos*)
  lt_cv_deplibs_check_method=pass_all
  ;;

bsdi[[45]]*)
  lt_cv_deplibs_check_method='file_magic ELF [[0-9]][[0-9]]*-bit [[ML]]SB (shared object|dynamic lib)'
  lt_cv_file_magic_cmd='/usr/bin/file -L'
  lt_cv_file_magic_test_file=/shlib/libc.so
  ;;

cygwin*)
  # func_win32_libid is a shell function defined in ltmain.sh
  lt_cv_deplibs_check_method='file_magic ^x86 archive import|^x86 DLL'
  lt_cv_file_magic_cmd='func_win32_libid'
  ;;

mingw* | pw32*)
  # Base MSYS/MinGW do not provide the 'file' command needed by
  # func_win32_libid shell function, so use a weaker test based on 'objdump'.
  lt_cv_deplibs_check_method='file_magic file format pei*-i386(.*architecture: i386)?'
  lt_cv_file_magic_cmd='$OBJDUMP -f'
  ;;

darwin* | rhapsody*)
  lt_cv_deplibs_check_method=pass_all
  ;;

freebsd* | kfreebsd*-gnu | dragonfly*)
  if echo __ELF__ | $CC -E - | grep __ELF__ > /dev/null; then
    case $host_cpu in
    i*86 )
      # Not sure whether the presence of OpenBSD here was a mistake.
      # Let's accept both of them until this is cleared up.
      lt_cv_deplibs_check_method='file_magic (FreeBSD|OpenBSD|DragonFly)/i[[3-9]]86 (compact )?demand paged shared library'
      lt_cv_file_magic_cmd=/usr/bin/file
      lt_cv_file_magic_test_file=`echo /usr/lib/libc.so.*`
      ;;
    esac
  else
    lt_cv_deplibs_check_method=pass_all
  fi
  ;;

gnu*)
  lt_cv_deplibs_check_method=pass_all
  ;;

hpux10.20* | hpux11*)
  lt_cv_file_magic_cmd=/usr/bin/file
  case $host_cpu in
  ia64*)
    lt_cv_deplibs_check_method='file_magic (s[[0-9]][[0-9]][[0-9]]|ELF-[[0-9]][[0-9]]) shared object file - IA64'
    lt_cv_file_magic_test_file=/usr/lib/hpux32/libc.so
    ;;
  hppa*64*)
    [lt_cv_deplibs_check_method='file_magic (s[0-9][0-9][0-9]|ELF-[0-9][0-9]) shared object file - PA-RISC [0-9].[0-9]']
    lt_cv_file_magic_test_file=/usr/lib/pa20_64/libc.sl
    ;;
  *)
    lt_cv_deplibs_check_method='file_magic (s[[0-9]][[0-9]][[0-9]]|PA-RISC[[0-9]].[[0-9]]) shared library'
    lt_cv_file_magic_test_file=/usr/lib/libc.sl
    ;;
  esac
  ;;

interix3*)
  # PIC code is broken on Interix 3.x, that's why |\.a not |_pic\.a here
  lt_cv_deplibs_check_method='match_pattern /lib[[^/]]+(\.so|\.a)$'
  ;;

irix5* | irix6* | nonstopux*)
  case $LD in
  *-32|*"-32 ") libmagic=32-bit;;
  *-n32|*"-n32 ") libmagic=N32;;
  *-64|*"-64 ") libmagic=64-bit;;
  *) libmagic=never-match;;
  esac
  lt_cv_deplibs_check_method=pass_all
  ;;

# This must be Linux ELF.
linux*)
  lt_cv_deplibs_check_method=pass_all
  ;;

netbsd*)
  if echo __ELF__ | $CC -E - | grep __ELF__ > /dev/null; then
    lt_cv_deplibs_check_method='match_pattern /lib[[^/]]+(\.so\.[[0-9]]+\.[[0-9]]+|_pic\.a)$'
  else
    lt_cv_deplibs_check_method='match_pattern /lib[[^/]]+(\.so|_pic\.a)$'
  fi
  ;;

newos6*)
  lt_cv_deplibs_check_method='file_magic ELF [[0-9]][[0-9]]*-bit [[ML]]SB (executable|dynamic lib)'
  lt_cv_file_magic_cmd=/usr/bin/file
  lt_cv_file_magic_test_file=/usr/lib/libnls.so
  ;;

nto-qnx*)
  lt_cv_deplibs_check_method=unknown
  ;;

openbsd*)
  if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`" || test "$host_os-$host_cpu" = "openbsd2.8-powerpc"; then
    lt_cv_deplibs_check_method='match_pattern /lib[[^/]]+(\.so\.[[0-9]]+\.[[0-9]]+|\.so|_pic\.a)$'
  else
    lt_cv_deplibs_check_method='match_pattern /lib[[^/]]+(\.so\.[[0-9]]+\.[[0-9]]+|_pic\.a)$'
  fi
  ;;

osf3* | osf4* | osf5*)
  lt_cv_deplibs_check_method=pass_all
  ;;

solaris*)
  lt_cv_deplibs_check_method=pass_all
  ;;

sysv4 | sysv4.3*)
  case $host_vendor in
  motorola)
    lt_cv_deplibs_check_method='file_magic ELF [[0-9]][[0-9]]*-bit [[ML]]SB (shared object|dynamic lib) M[[0-9]][[0-9]]* Version [[0-9]]'
    lt_cv_file_magic_test_file=`echo /usr/lib/libc.so*`
    ;;
  ncr)
    lt_cv_deplibs_check_method=pass_all
    ;;
  sequent)
    lt_cv_file_magic_cmd='/bin/file'
    lt_cv_deplibs_check_method='file_magic ELF [[0-9]][[0-9]]*-bit [[LM]]SB (shared object|dynamic lib )'
    ;;
  sni)
    lt_cv_file_magic_cmd='/bin/file'
    lt_cv_deplibs_check_method="file_magic ELF [[0-9]][[0-9]]*-bit [[LM]]SB dynamic lib"
    lt_cv_file_magic_test_file=/lib/libc.so
    ;;
  siemens)
    lt_cv_deplibs_check_method=pass_all
    ;;
  pc)
    lt_cv_deplibs_check_method=pass_all
    ;;
  esac
  ;;

sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX* | sysv4*uw2*)
  lt_cv_deplibs_check_method=pass_all
  ;;
esac
])
file_magic_cmd=$lt_cv_file_magic_cmd
deplibs_check_method=$lt_cv_deplibs_check_method
test -z "$deplibs_check_method" && deplibs_check_method=unknown
])# AC_DEPLIBS_CHECK_METHOD


# AC_PROG_NM
# ----------
# find the pathname to a BSD-compatible name lister
AC_DEFUN([AC_PROG_NM],
[AC_CACHE_CHECK([for BSD-compatible nm], lt_cv_path_NM,
[if test -n "$NM"; then
  # Let the user override the test.
  lt_cv_path_NM="$NM"
else
  lt_nm_to_check="${ac_tool_prefix}nm"
  if test -n "$ac_tool_prefix" && test "$build" = "$host"; then 
    lt_nm_to_check="$lt_nm_to_check nm"
  fi
  for lt_tmp_nm in $lt_nm_to_check; do
    lt_save_ifs="$IFS"; IFS=$PATH_SEPARATOR
    for ac_dir in $PATH /usr/ccs/bin/elf /usr/ccs/bin /usr/ucb /bin; do
      IFS="$lt_save_ifs"
      test -z "$ac_dir" && ac_dir=.
      tmp_nm="$ac_dir/$lt_tmp_nm"
      if test -f "$tmp_nm" || test -f "$tmp_nm$ac_exeext" ; then
	# Check to see if the nm accepts a BSD-compat flag.
	# Adding the `sed 1q' prevents false positives on HP-UX, which says:
	#   nm: unknown option "B" ignored
	# Tru64's nm complains that /dev/null is an invalid object file
	case `"$tmp_nm" -B /dev/null 2>&1 | sed '1q'` in
	*/dev/null* | *'Invalid file or object type'*)
	  lt_cv_path_NM="$tmp_nm -B"
	  break
	  ;;
	*)
	  case `"$tmp_nm" -p /dev/null 2>&1 | sed '1q'` in
	  */dev/null*)
	    lt_cv_path_NM="$tmp_nm -p"
	    break
	    ;;
	  *)
	    lt_cv_path_NM=${lt_cv_path_NM="$tmp_nm"} # keep the first match, but
	    continue # so that we can try to find one that supports BSD flags
	    ;;
	  esac
	  ;;
	esac
      fi
    done
    IFS="$lt_save_ifs"
  done
  test -z "$lt_cv_path_NM" && lt_cv_path_NM=nm
fi])
NM="$lt_cv_path_NM"
])# AC_PROG_NM


# AC_CHECK_LIBM
# -------------
# check for math library
AC_DEFUN([AC_CHECK_LIBM],
[AC_REQUIRE([AC_CANONICAL_HOST])dnl
LIBM=
case $host in
*-*-beos* | *-*-cygwin* | *-*-pw32* | *-*-darwin*)
  # These system don't have libm, or don't need it
  ;;
*-ncr-sysv4.3*)
  AC_CHECK_LIB(mw, _mwvalidcheckl, LIBM="-lmw")
  AC_CHECK_LIB(m, cos, LIBM="$LIBM -lm")
  ;;
*)
  AC_CHECK_LIB(m, cos, LIBM="-lm")
  ;;
esac
])# AC_CHECK_LIBM


# AC_LIBLTDL_CONVENIENCE([DIRECTORY])
# -----------------------------------
# sets LIBLTDL to the link flags for the libltdl convenience library and
# LTDLINCL to the include flags for the libltdl header and adds
# --enable-ltdl-convenience to the configure arguments.  Note that
# AC_CONFIG_SUBDIRS is not called here.  If DIRECTORY is not provided,
# it is assumed to be `libltdl'.  LIBLTDL will be prefixed with
# '${top_builddir}/' and LTDLINCL will be prefixed with '${top_srcdir}/'
# (note the single quotes!).  If your package is not flat and you're not
# using automake, define top_builddir and top_srcdir appropriately in
# the Makefiles.
AC_DEFUN([AC_LIBLTDL_CONVENIENCE],
[AC_BEFORE([$0],[AC_LIBTOOL_SETUP])dnl
  case $enable_ltdl_convenience in
  no) AC_MSG_ERROR([this package needs a convenience libltdl]) ;;
  "") enable_ltdl_convenience=yes
      ac_configure_args="$ac_configure_args --enable-ltdl-convenience" ;;
  esac
  LIBLTDL='${top_builddir}/'ifelse($#,1,[$1],['libltdl'])/libltdlc.la
  LTDLINCL='-I${top_srcdir}/'ifelse($#,1,[$1],['libltdl'])
  # For backwards non-gettext consistent compatibility...
  INCLTDL="$LTDLINCL"
])# AC_LIBLTDL_CONVENIENCE


# AC_LIBLTDL_INSTALLABLE([DIRECTORY])
# -----------------------------------
# sets LIBLTDL to the link flags for the libltdl installable library and
# LTDLINCL to the include flags for the libltdl header and adds
# --enable-ltdl-install to the configure arguments.  Note that
# AC_CONFIG_SUBDIRS is not called here.  If DIRECTORY is not provided,
# and an installed libltdl is not found, it is assumed to be `libltdl'.
# LIBLTDL will be prefixed with '${top_builddir}/'# and LTDLINCL with
# '${top_srcdir}/' (note the single quotes!).  If your package is not
# flat and you're not using automake, define top_builddir and top_srcdir
# appropriately in the Makefiles.
# In the future, this macro may have to be called after AC_PROG_LIBTOOL.
AC_DEFUN([AC_LIBLTDL_INSTALLABLE],
[AC_BEFORE([$0],[AC_LIBTOOL_SETUP])dnl
  AC_CHECK_LIB(ltdl, lt_dlinit,
  [test x"$enable_ltdl_install" != xyes && enable_ltdl_install=no],
  [if test x"$enable_ltdl_install" = xno; then
     AC_MSG_WARN([libltdl not installed, but installation disabled])
   else
     enable_ltdl_install=yes
   fi
  ])
  if test x"$enable_ltdl_install" = x"yes"; then
    ac_configure_args="$ac_configure_args --enable-ltdl-install"
    LIBLTDL='${top_builddir}/'ifelse($#,1,[$1],['libltdl'])/libltdl.la
    LTDLINCL='-I${top_srcdir}/'ifelse($#,1,[$1],['libltdl'])
  else
    ac_configure_args="$ac_configure_args --enable-ltdl-install=no"
    LIBLTDL="-lltdl"
    LTDLINCL=
  fi
  # For backwards non-gettext consistent compatibility...
  INCLTDL="$LTDLINCL"
])# AC_LIBLTDL_INSTALLABLE


# AC_LIBTOOL_CXX
# --------------
# enable support for C++ libraries
AC_DEFUN([AC_LIBTOOL_CXX],
[AC_REQUIRE([_LT_AC_LANG_CXX])
])# AC_LIBTOOL_CXX


# _LT_AC_LANG_CXX
# ---------------
AC_DEFUN([_LT_AC_LANG_CXX],
[AC_REQUIRE([AC_PROG_CXX])
AC_REQUIRE([_LT_AC_PROG_CXXCPP])
_LT_AC_SHELL_INIT([tagnames=${tagnames+${tagnames},}CXX])
])# _LT_AC_LANG_CXX

# _LT_AC_PROG_CXXCPP
# ------------------
AC_DEFUN([_LT_AC_PROG_CXXCPP],
[
AC_REQUIRE([AC_PROG_CXX])
if test -n "$CXX" && ( test "X$CXX" != "Xno" &&
    ( (test "X$CXX" = "Xg++" && `g++ -v >/dev/null 2>&1` ) ||
    (test "X$CXX" != "Xg++"))) ; then
  AC_PROG_CXXCPP
fi
])# _LT_AC_PROG_CXXCPP

# AC_LIBTOOL_F77
# --------------
# enable support for Fortran 77 libraries
AC_DEFUN([AC_LIBTOOL_F77],
[AC_REQUIRE([_LT_AC_LANG_F77])
])# AC_LIBTOOL_F77


# _LT_AC_LANG_F77
# ---------------
AC_DEFUN([_LT_AC_LANG_F77],
[AC_REQUIRE([AC_PROG_F77])
_LT_AC_SHELL_INIT([tagnames=${tagnames+${tagnames},}F77])
])# _LT_AC_LANG_F77


# AC_LIBTOOL_GCJ
# --------------
# enable support for GCJ libraries
AC_DEFUN([AC_LIBTOOL_GCJ],
[AC_REQUIRE([_LT_AC_LANG_GCJ])
])# AC_LIBTOOL_GCJ


# _LT_AC_LANG_GCJ
# ---------------
AC_DEFUN([_LT_AC_LANG_GCJ],
[AC_PROVIDE_IFELSE([AC_PROG_GCJ],[],
  [AC_PROVIDE_IFELSE([A][M_PROG_GCJ],[],
    [AC_PROVIDE_IFELSE([LT_AC_PROG_GCJ],[],
      [ifdef([AC_PROG_GCJ],[AC_REQUIRE([AC_PROG_GCJ])],
	 [ifdef([A][M_PROG_GCJ],[AC_REQUIRE([A][M_PROG_GCJ])],
	   [AC_REQUIRE([A][C_PROG_GCJ_OR_A][M_PROG_GCJ])])])])])])
_LT_AC_SHELL_INIT([tagnames=${tagnames+${tagnames},}GCJ])
])# _LT_AC_LANG_GCJ


# AC_LIBTOOL_RC
# -------------
# enable support for Windows resource files
AC_DEFUN([AC_LIBTOOL_RC],
[AC_REQUIRE([LT_AC_PROG_RC])
_LT_AC_SHELL_INIT([tagnames=${tagnames+${tagnames},}RC])
])# AC_LIBTOOL_RC


# AC_LIBTOOL_LANG_C_CONFIG
# ------------------------
# Ensure that the configuration vars for the C compiler are
# suitably defined.  Those variables are subsequently used by
# AC_LIBTOOL_CONFIG to write the compiler configuration to `libtool'.
AC_DEFUN([AC_LIBTOOL_LANG_C_CONFIG], [_LT_AC_LANG_C_CONFIG])
AC_DEFUN([_LT_AC_LANG_C_CONFIG],
[lt_save_CC="$CC"
AC_LANG_PUSH(C)

# Source file extension for C test sources.
ac_ext=c

# Object file extension for compiled C test sources.
objext=o
_LT_AC_TAGVAR(objext, $1)=$objext

# Code to be used in simple compile tests
lt_simple_compile_test_code="int some_variable = 0;\n"

# Code to be used in simple link tests
lt_simple_link_test_code='int main(){return(0);}\n'

_LT_AC_SYS_COMPILER

# save warnings/boilerplate of simple test code
_LT_COMPILER_BOILERPLATE
_LT_LINKER_BOILERPLATE

AC_LIBTOOL_PROG_COMPILER_NO_RTTI($1)
AC_LIBTOOL_PROG_COMPILER_PIC($1)
AC_LIBTOOL_PROG_CC_C_O($1)
AC_LIBTOOL_SYS_HARD_LINK_LOCKS($1)
AC_LIBTOOL_PROG_LD_SHLIBS($1)
AC_LIBTOOL_SYS_DYNAMIC_LINKER($1)
AC_LIBTOOL_PROG_LD_HARDCODE_LIBPATH($1)
AC_LIBTOOL_SYS_LIB_STRIP
AC_LIBTOOL_DLOPEN_SELF

# Report which library types will actually be built
AC_MSG_CHECKING([if libtool supports shared libraries])
AC_MSG_RESULT([$can_build_shared])

AC_MSG_CHECKING([whether to build shared libraries])
test "$can_build_shared" = "no" && enable_shared=no

# On AIX, shared libraries and static libraries use the same namespace, and
# are all built from PIC.
case $host_os in
aix3*)
  test "$enable_shared" = yes && enable_static=no
  if test -n "$RANLIB"; then
    archive_cmds="$archive_cmds~\$RANLIB \$lib"
    postinstall_cmds='$RANLIB $lib'
  fi
  ;;

aix4* | aix5*)
  if test "$host_cpu" != ia64 && test "$aix_use_runtimelinking" = no ; then
    test "$enable_shared" = yes && enable_static=no
  fi
    ;;
esac
AC_MSG_RESULT([$enable_shared])

AC_MSG_CHECKING([whether to build static libraries])
# Make sure either enable_shared or enable_static is yes.
test "$enable_shared" = yes || enable_static=yes
AC_MSG_RESULT([$enable_static])

AC_LIBTOOL_CONFIG($1)

AC_LANG_POP
CC="$lt_save_CC"
])# AC_LIBTOOL_LANG_C_CONFIG


# AC_LIBTOOL_LANG_CXX_CONFIG
# --------------------------
# Ensure that the configuration vars for the C compiler are
# suitably defined.  Those variables are subsequently used by
# AC_LIBTOOL_CONFIG to write the compiler configuration to `libtool'.
AC_DEFUN([AC_LIBTOOL_LANG_CXX_CONFIG], [_LT_AC_LANG_CXX_CONFIG(CXX)])
AC_DEFUN([_LT_AC_LANG_CXX_CONFIG],
[AC_LANG_PUSH(C++)
AC_REQUIRE([AC_PROG_CXX])
AC_REQUIRE([_LT_AC_PROG_CXXCPP])

_LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no
_LT_AC_TAGVAR(allow_undefined_flag, $1)=
_LT_AC_TAGVAR(always_export_symbols, $1)=no
_LT_AC_TAGVAR(archive_expsym_cmds, $1)=
_LT_AC_TAGVAR(export_dynamic_flag_spec, $1)=
_LT_AC_TAGVAR(hardcode_direct, $1)=no
_LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)=
_LT_AC_TAGVAR(hardcode_libdir_flag_spec_ld, $1)=
_LT_AC_TAGVAR(hardcode_libdir_separator, $1)=
_LT_AC_TAGVAR(hardcode_minus_L, $1)=no
_LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=unsupported
_LT_AC_TAGVAR(hardcode_automatic, $1)=no
_LT_AC_TAGVAR(module_cmds, $1)=
_LT_AC_TAGVAR(module_expsym_cmds, $1)=
_LT_AC_TAGVAR(link_all_deplibs, $1)=unknown
_LT_AC_TAGVAR(old_archive_cmds, $1)=$old_archive_cmds
_LT_AC_TAGVAR(no_undefined_flag, $1)=
_LT_AC_TAGVAR(whole_archive_flag_spec, $1)=
_LT_AC_TAGVAR(enable_shared_with_static_runtimes, $1)=no

# Dependencies to place before and after the object being linked:
_LT_AC_TAGVAR(predep_objects, $1)=
_LT_AC_TAGVAR(postdep_objects, $1)=
_LT_AC_TAGVAR(predeps, $1)=
_LT_AC_TAGVAR(postdeps, $1)=
_LT_AC_TAGVAR(compiler_lib_search_path, $1)=

# Source file extension for C++ test sources.
ac_ext=cpp

# Object file extension for compiled C++ test sources.
objext=o
_LT_AC_TAGVAR(objext, $1)=$objext

# Code to be used in simple compile tests
lt_simple_compile_test_code="int some_variable = 0;\n"

# Code to be used in simple link tests
lt_simple_link_test_code='int main(int, char *[[]]) { return(0); }\n'

# ltmain only uses $CC for tagged configurations so make sure $CC is set.
_LT_AC_SYS_COMPILER

# save warnings/boilerplate of simple test code
_LT_COMPILER_BOILERPLATE
_LT_LINKER_BOILERPLATE

# Allow CC to be a program name with arguments.
lt_save_CC=$CC
lt_save_LD=$LD
lt_save_GCC=$GCC
GCC=$GXX
lt_save_with_gnu_ld=$with_gnu_ld
lt_save_path_LD=$lt_cv_path_LD
if test -n "${lt_cv_prog_gnu_ldcxx+set}"; then
  lt_cv_prog_gnu_ld=$lt_cv_prog_gnu_ldcxx
else
  $as_unset lt_cv_prog_gnu_ld
fi
if test -n "${lt_cv_path_LDCXX+set}"; then
  lt_cv_path_LD=$lt_cv_path_LDCXX
else
  $as_unset lt_cv_path_LD
fi
test -z "${LDCXX+set}" || LD=$LDCXX
CC=${CXX-"c++"}
compiler=$CC
_LT_AC_TAGVAR(compiler, $1)=$CC
_LT_CC_BASENAME([$compiler])

# We don't want -fno-exception wen compiling C++ code, so set the
# no_builtin_flag separately
if test "$GXX" = yes; then
  _LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, $1)=' -fno-builtin'
else
  _LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, $1)=
fi

if test "$GXX" = yes; then
  # Set up default GNU C++ configuration

  AC_PROG_LD

  # Check if GNU C++ uses GNU ld as the underlying linker, since the
  # archiving commands below assume that GNU ld is being used.
  if test "$with_gnu_ld" = yes; then
    _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $wl$soname -o $lib'
    _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib'

    _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}--rpath ${wl}$libdir'
    _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}--export-dynamic'

    # If archive_cmds runs LD, not CC, wlarc should be empty
    # XXX I think wlarc can be eliminated in ltcf-cxx, but I need to
    #     investigate it a little bit more. (MM)
    wlarc='${wl}'

    # ancient GNU ld didn't support --whole-archive et. al.
    if eval "`$CC -print-prog-name=ld` --help 2>&1" | \
	grep 'no-whole-archive' > /dev/null; then
      _LT_AC_TAGVAR(whole_archive_flag_spec, $1)="$wlarc"'--whole-archive$convenience '"$wlarc"'--no-whole-archive'
    else
      _LT_AC_TAGVAR(whole_archive_flag_spec, $1)=
    fi
  else
    with_gnu_ld=no
    wlarc=

    # A generic and very simple default shared library creation
    # command for GNU C++ for the case where it uses the native
    # linker, instead of GNU ld.  If possible, this setting should
    # overridden to take advantage of the native linker features on
    # the platform it is being used on.
    _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -o $lib'
  fi

  # Commands to make compiler produce verbose output that lists
  # what "hidden" libraries, object files and flags are used when
  # linking a shared library.
  output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | grep "\-L"'

else
  GXX=no
  with_gnu_ld=no
  wlarc=
fi

# PORTME: fill in a description of your system's C++ link characteristics
AC_MSG_CHECKING([whether the $compiler linker ($LD) supports shared libraries])
_LT_AC_TAGVAR(ld_shlibs, $1)=yes
case $host_os in
  aix3*)
    # FIXME: insert proper C++ library support
    _LT_AC_TAGVAR(ld_shlibs, $1)=no
    ;;
  aix4* | aix5*)
    if test "$host_cpu" = ia64; then
      # On IA64, the linker does run time linking by default, so we don't
      # have to do anything special.
      aix_use_runtimelinking=no
      exp_sym_flag='-Bexport'
      no_entry_flag=""
    else
      aix_use_runtimelinking=no

      # Test if we are trying to use run time linking or normal
      # AIX style linking. If -brtl is somewhere in LDFLAGS, we
      # need to do runtime linking.
      case $host_os in aix4.[[23]]|aix4.[[23]].*|aix5*)
	for ld_flag in $LDFLAGS; do
	  case $ld_flag in
	  *-brtl*)
	    aix_use_runtimelinking=yes
	    break
	    ;;
	  esac
	done
	;;
      esac

      exp_sym_flag='-bexport'
      no_entry_flag='-bnoentry'
    fi

    # When large executables or shared objects are built, AIX ld can
    # have problems creating the table of contents.  If linking a library
    # or program results in "error TOC overflow" add -mminimal-toc to
    # CXXFLAGS/CFLAGS for g++/gcc.  In the cases where that is not
    # enough to fix the problem, add -Wl,-bbigtoc to LDFLAGS.

    _LT_AC_TAGVAR(archive_cmds, $1)=''
    _LT_AC_TAGVAR(hardcode_direct, $1)=yes
    _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=':'
    _LT_AC_TAGVAR(link_all_deplibs, $1)=yes

    if test "$GXX" = yes; then
      case $host_os in aix4.[[012]]|aix4.[[012]].*)
      # We only want to do this on AIX 4.2 and lower, the check
      # below for broken collect2 doesn't work under 4.3+
	collect2name=`${CC} -print-prog-name=collect2`
	if test -f "$collect2name" && \
	   strings "$collect2name" | grep resolve_lib_name >/dev/null
	then
	  # We have reworked collect2
	  _LT_AC_TAGVAR(hardcode_direct, $1)=yes
	else
	  # We have old collect2
	  _LT_AC_TAGVAR(hardcode_direct, $1)=unsupported
	  # It fails to find uninstalled libraries when the uninstalled
	  # path is not listed in the libpath.  Setting hardcode_minus_L
	  # to unsupported forces relinking
	  _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes
	  _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir'
	  _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=
	fi
	;;
      esac
      shared_flag='-shared'
      if test "$aix_use_runtimelinking" = yes; then
	shared_flag="$shared_flag "'${wl}-G'
      fi
    else
      # not using gcc
      if test "$host_cpu" = ia64; then
	# VisualAge C++, Version 5.5 for AIX 5L for IA-64, Beta 3 Release
	# chokes on -Wl,-G. The following line is correct:
	shared_flag='-G'
      else
	if test "$aix_use_runtimelinking" = yes; then
	  shared_flag='${wl}-G'
	else
	  shared_flag='${wl}-bM:SRE'
	fi
      fi
    fi

    # It seems that -bexpall does not export symbols beginning with
    # underscore (_), so it is better to generate a list of symbols to export.
    _LT_AC_TAGVAR(always_export_symbols, $1)=yes
    if test "$aix_use_runtimelinking" = yes; then
      # Warning - without using the other runtime loading flags (-brtl),
      # -berok will link without error, but may produce a broken library.
      _LT_AC_TAGVAR(allow_undefined_flag, $1)='-berok'
      # Determine the default libpath from the value encoded in an empty executable.
      _LT_AC_SYS_LIBPATH_AIX
      _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-blibpath:$libdir:'"$aix_libpath"

      _LT_AC_TAGVAR(archive_expsym_cmds, $1)="\$CC"' -o $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' $compiler_flags `if test "x${allow_undefined_flag}" != "x"; then echo "${wl}${allow_undefined_flag}"; else :; fi` '"\${wl}$exp_sym_flag:\$export_symbols $shared_flag"
     else
      if test "$host_cpu" = ia64; then
	_LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-R $libdir:/usr/lib:/lib'
	_LT_AC_TAGVAR(allow_undefined_flag, $1)="-z nodefs"
	_LT_AC_TAGVAR(archive_expsym_cmds, $1)="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' $compiler_flags ${wl}${allow_undefined_flag} '"\${wl}$exp_sym_flag:\$export_symbols"
      else
	# Determine the default libpath from the value encoded in an empty executable.
	_LT_AC_SYS_LIBPATH_AIX
	_LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-blibpath:$libdir:'"$aix_libpath"
	# Warning - without using the other run time loading flags,
	# -berok will link without error, but may produce a broken library.
	_LT_AC_TAGVAR(no_undefined_flag, $1)=' ${wl}-bernotok'
	_LT_AC_TAGVAR(allow_undefined_flag, $1)=' ${wl}-berok'
	# Exported symbols can be pulled into shared objects from archives
	_LT_AC_TAGVAR(whole_archive_flag_spec, $1)='$convenience'
	_LT_AC_TAGVAR(archive_cmds_need_lc, $1)=yes
	# This is similar to how AIX traditionally builds its shared libraries.
	_LT_AC_TAGVAR(archive_expsym_cmds, $1)="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs ${wl}-bnoentry $compiler_flags ${wl}-bE:$export_symbols${allow_undefined_flag}~$AR $AR_FLAGS $output_objdir/$libname$release.a $output_objdir/$soname'
      fi
    fi
    ;;

  beos*)
    if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then
      _LT_AC_TAGVAR(allow_undefined_flag, $1)=unsupported
      # Joseph Beckenbach <jrb3@best.com> says some releases of gcc
      # support --undefined.  This deserves some investigation.  FIXME
      _LT_AC_TAGVAR(archive_cmds, $1)='$CC -nostart $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib'
    else
      _LT_AC_TAGVAR(ld_shlibs, $1)=no
    fi
    ;;

  chorus*)
    case $cc_basename in
      *)
	# FIXME: insert proper C++ library support
	_LT_AC_TAGVAR(ld_shlibs, $1)=no
	;;
    esac
    ;;

  cygwin* | mingw* | pw32*)
    # _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1) is actually meaningless,
    # as there is no search path for DLLs.
    _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir'
    _LT_AC_TAGVAR(allow_undefined_flag, $1)=unsupported
    _LT_AC_TAGVAR(always_export_symbols, $1)=no
    _LT_AC_TAGVAR(enable_shared_with_static_runtimes, $1)=yes

    if $LD --help 2>&1 | grep 'auto-import' > /dev/null; then
      _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib'
      # If the export-symbols file already is a .def file (1st line
      # is EXPORTS), use it as is; otherwise, prepend...
      _LT_AC_TAGVAR(archive_expsym_cmds, $1)='if test "x`$SED 1q $export_symbols`" = xEXPORTS; then
	cp $export_symbols $output_objdir/$soname.def;
      else
	echo EXPORTS > $output_objdir/$soname.def;
	cat $export_symbols >> $output_objdir/$soname.def;
      fi~
      $CC -shared -nostdlib $output_objdir/$soname.def $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib'
    else
      _LT_AC_TAGVAR(ld_shlibs, $1)=no
    fi
  ;;
      darwin* | rhapsody*)
        case $host_os in
        rhapsody* | darwin1.[[012]])
         _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-undefined ${wl}suppress'
         ;;
       *) # Darwin 1.3 on
         if test -z ${MACOSX_DEPLOYMENT_TARGET} ; then
           _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-flat_namespace ${wl}-undefined ${wl}suppress'
         else
           case ${MACOSX_DEPLOYMENT_TARGET} in
             10.[[012]])
               _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-flat_namespace ${wl}-undefined ${wl}suppress'
               ;;
             10.*)
               _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-undefined ${wl}dynamic_lookup'
               ;;
           esac
         fi
         ;;
        esac
      _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no
      _LT_AC_TAGVAR(hardcode_direct, $1)=no
      _LT_AC_TAGVAR(hardcode_automatic, $1)=yes
      _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=unsupported
      _LT_AC_TAGVAR(whole_archive_flag_spec, $1)=''
      _LT_AC_TAGVAR(link_all_deplibs, $1)=yes

    if test "$GXX" = yes ; then
      lt_int_apple_cc_single_mod=no
      output_verbose_link_cmd='echo'
      if $CC -dumpspecs 2>&1 | $EGREP 'single_module' >/dev/null ; then
       lt_int_apple_cc_single_mod=yes
      fi
      if test "X$lt_int_apple_cc_single_mod" = Xyes ; then
       _LT_AC_TAGVAR(archive_cmds, $1)='$CC -dynamiclib -single_module $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags -install_name $rpath/$soname $verstring'
      else
          _LT_AC_TAGVAR(archive_cmds, $1)='$CC -r -keep_private_externs -nostdlib -o ${lib}-master.o $libobjs~$CC -dynamiclib $allow_undefined_flag -o $lib ${lib}-master.o $deplibs $compiler_flags -install_name $rpath/$soname $verstring'
        fi
        _LT_AC_TAGVAR(module_cmds, $1)='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags'
        # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin lds
          if test "X$lt_int_apple_cc_single_mod" = Xyes ; then
            _LT_AC_TAGVAR(archive_expsym_cmds, $1)='sed -e "s,#.*,," -e "s,^[    ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -dynamiclib -single_module $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags -install_name $rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}'
          else
            _LT_AC_TAGVAR(archive_expsym_cmds, $1)='sed -e "s,#.*,," -e "s,^[    ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -r -keep_private_externs -nostdlib -o ${lib}-master.o $libobjs~$CC -dynamiclib $allow_undefined_flag -o $lib ${lib}-master.o $deplibs $compiler_flags -install_name $rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}'
          fi
            _LT_AC_TAGVAR(module_expsym_cmds, $1)='sed -e "s,#.*,," -e "s,^[    ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag  -o $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}'
      else
      case $cc_basename in
        xlc*)
         output_verbose_link_cmd='echo'
          _LT_AC_TAGVAR(archive_cmds, $1)='$CC -qmkshrobj ${wl}-single_module $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}`echo $rpath/$soname` $verstring'
          _LT_AC_TAGVAR(module_cmds, $1)='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags'
          # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin lds
          _LT_AC_TAGVAR(archive_expsym_cmds, $1)='sed -e "s,#.*,," -e "s,^[    ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -qmkshrobj ${wl}-single_module $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}$rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}'
          _LT_AC_TAGVAR(module_expsym_cmds, $1)='sed -e "s,#.*,," -e "s,^[    ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag  -o $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}'
          ;;
       *)
         _LT_AC_TAGVAR(ld_shlibs, $1)=no
          ;;
      esac
      fi
        ;;

  dgux*)
    case $cc_basename in
      ec++*)
	# FIXME: insert proper C++ library support
	_LT_AC_TAGVAR(ld_shlibs, $1)=no
	;;
      ghcx*)
	# Green Hills C++ Compiler
	# FIXME: insert proper C++ library support
	_LT_AC_TAGVAR(ld_shlibs, $1)=no
	;;
      *)
	# FIXME: insert proper C++ library support
	_LT_AC_TAGVAR(ld_shlibs, $1)=no
	;;
    esac
    ;;
  freebsd[[12]]*)
    # C++ shared libraries reported to be fairly broken before switch to ELF
    _LT_AC_TAGVAR(ld_shlibs, $1)=no
    ;;
  freebsd-elf*)
    _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no
    ;;
  freebsd* | kfreebsd*-gnu | dragonfly*)
    # FreeBSD 3 and later use GNU C++ and GNU ld with standard ELF
    # conventions
    _LT_AC_TAGVAR(ld_shlibs, $1)=yes
    ;;
  gnu*)
    ;;
  hpux9*)
    _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}+b ${wl}$libdir'
    _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=:
    _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E'
    _LT_AC_TAGVAR(hardcode_direct, $1)=yes
    _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes # Not in the search PATH,
				# but as the default
				# location of the library.

    case $cc_basename in
    CC*)
      # FIXME: insert proper C++ library support
      _LT_AC_TAGVAR(ld_shlibs, $1)=no
      ;;
    aCC*)
      _LT_AC_TAGVAR(archive_cmds, $1)='$rm $output_objdir/$soname~$CC -b ${wl}+b ${wl}$install_libdir -o $output_objdir/$soname $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags~test $output_objdir/$soname = $lib || mv $output_objdir/$soname $lib'
      # Commands to make compiler produce verbose output that lists
      # what "hidden" libraries, object files and flags are used when
      # linking a shared library.
      #
      # There doesn't appear to be a way to prevent this compiler from
      # explicitly linking system object files so we need to strip them
      # from the output so that they don't get included in the library
      # dependencies.
      output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v conftest.$objext 2>&1) | grep "[[-]]L"`; list=""; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; echo $list'
      ;;
    *)
      if test "$GXX" = yes; then
        _LT_AC_TAGVAR(archive_cmds, $1)='$rm $output_objdir/$soname~$CC -shared -nostdlib -fPIC ${wl}+b ${wl}$install_libdir -o $output_objdir/$soname $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags~test $output_objdir/$soname = $lib || mv $output_objdir/$soname $lib'
      else
        # FIXME: insert proper C++ library support
        _LT_AC_TAGVAR(ld_shlibs, $1)=no
      fi
      ;;
    esac
    ;;
  hpux10*|hpux11*)
    if test $with_gnu_ld = no; then
      _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}+b ${wl}$libdir'
      _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=:

      case $host_cpu in
      hppa*64*|ia64*)
	_LT_AC_TAGVAR(hardcode_libdir_flag_spec_ld, $1)='+b $libdir'
        ;;
      *)
	_LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E'
        ;;
      esac
    fi
    case $host_cpu in
    hppa*64*|ia64*)
      _LT_AC_TAGVAR(hardcode_direct, $1)=no
      _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no
      ;;
    *)
      _LT_AC_TAGVAR(hardcode_direct, $1)=yes
      _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes # Not in the search PATH,
					      # but as the default
					      # location of the library.
      ;;
    esac

    case $cc_basename in
      CC*)
	# FIXME: insert proper C++ library support
	_LT_AC_TAGVAR(ld_shlibs, $1)=no
	;;
      aCC*)
	case $host_cpu in
	hppa*64*)
	  _LT_AC_TAGVAR(archive_cmds, $1)='$CC -b ${wl}+h ${wl}$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags'
	  ;;
	ia64*)
	  _LT_AC_TAGVAR(archive_cmds, $1)='$CC -b ${wl}+h ${wl}$soname ${wl}+nodefaultrpath -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags'
	  ;;
	*)
	  _LT_AC_TAGVAR(archive_cmds, $1)='$CC -b ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags'
	  ;;
	esac
	# Commands to make compiler produce verbose output that lists
	# what "hidden" libraries, object files and flags are used when
	# linking a shared library.
	#
	# There doesn't appear to be a way to prevent this compiler from
	# explicitly linking system object files so we need to strip them
	# from the output so that they don't get included in the library
	# dependencies.
	output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v conftest.$objext 2>&1) | grep "\-L"`; list=""; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; echo $list'
	;;
      *)
	if test "$GXX" = yes; then
	  if test $with_gnu_ld = no; then
	    case $host_cpu in
	    hppa*64*)
	      _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib -fPIC ${wl}+h ${wl}$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags'
	      ;;
	    ia64*)
	      _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib -fPIC ${wl}+h ${wl}$soname ${wl}+nodefaultrpath -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags'
	      ;;
	    *)
	      _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib -fPIC ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags'
	      ;;
	    esac
	  fi
	else
	  # FIXME: insert proper C++ library support
	  _LT_AC_TAGVAR(ld_shlibs, $1)=no
	fi
	;;
    esac
    ;;
  interix3*)
    _LT_AC_TAGVAR(hardcode_direct, $1)=no
    _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no
    _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath,$libdir'
    _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E'
    # Hack: On Interix 3.x, we cannot compile PIC because of a broken gcc.
    # Instead, shared libraries are loaded at an image base (0x10000000 by
    # default) and relocated if they conflict, which is a slow very memory
    # consuming and fragmenting process.  To avoid this, we pick a random,
    # 256 KiB-aligned image base between 0x50000000 and 0x6FFC0000 at link
    # time.  Moving up from 0x10000000 also allows more sbrk(2) space.
    _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags ${wl}-h,$soname ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib'
    _LT_AC_TAGVAR(archive_expsym_cmds, $1)='sed "s,^,_," $export_symbols >$output_objdir/$soname.expsym~$CC -shared $pic_flag $libobjs $deplibs $compiler_flags ${wl}-h,$soname ${wl}--retain-symbols-file,$output_objdir/$soname.expsym ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib'
    ;;
  irix5* | irix6*)
    case $cc_basename in
      CC*)
	# SGI C++
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -all -multigot $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib'

	# Archives containing C++ object files must be created using
	# "CC -ar", where "CC" is the IRIX C++ compiler.  This is
	# necessary to make sure instantiated templates are included
	# in the archive.
	_LT_AC_TAGVAR(old_archive_cmds, $1)='$CC -ar -WR,-u -o $oldlib $oldobjs'
	;;
      *)
	if test "$GXX" = yes; then
	  if test "$with_gnu_ld" = no; then
	    _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib'
	  else
	    _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` -o $lib'
	  fi
	fi
	_LT_AC_TAGVAR(link_all_deplibs, $1)=yes
	;;
    esac
    _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath ${wl}$libdir'
    _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=:
    ;;
  linux*)
    case $cc_basename in
      KCC*)
	# Kuck and Associates, Inc. (KAI) C++ Compiler

	# KCC will only create a shared library if the output file
	# ends with ".so" (or ".sl" for HP-UX), so rename the library
	# to its proper name (with version) after linking.
	_LT_AC_TAGVAR(archive_cmds, $1)='tempext=`echo $shared_ext | $SED -e '\''s/\([[^()0-9A-Za-z{}]]\)/\\\\\1/g'\''`; templib=`echo $lib | $SED -e "s/\${tempext}\..*/.so/"`; $CC $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags --soname $soname -o \$templib; mv \$templib $lib'
	_LT_AC_TAGVAR(archive_expsym_cmds, $1)='tempext=`echo $shared_ext | $SED -e '\''s/\([[^()0-9A-Za-z{}]]\)/\\\\\1/g'\''`; templib=`echo $lib | $SED -e "s/\${tempext}\..*/.so/"`; $CC $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags --soname $soname -o \$templib ${wl}-retain-symbols-file,$export_symbols; mv \$templib $lib'
	# Commands to make compiler produce verbose output that lists
	# what "hidden" libraries, object files and flags are used when
	# linking a shared library.
	#
	# There doesn't appear to be a way to prevent this compiler from
	# explicitly linking system object files so we need to strip them
	# from the output so that they don't get included in the library
	# dependencies.
	output_verbose_link_cmd='templist=`$CC $CFLAGS -v conftest.$objext -o libconftest$shared_ext 2>&1 | grep "ld"`; rm -f libconftest$shared_ext; list=""; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; echo $list'

	_LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}--rpath,$libdir'
	_LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}--export-dynamic'

	# Archives containing C++ object files must be created using
	# "CC -Bstatic", where "CC" is the KAI C++ compiler.
	_LT_AC_TAGVAR(old_archive_cmds, $1)='$CC -Bstatic -o $oldlib $oldobjs'
	;;
      icpc*)
	# Intel C++
	with_gnu_ld=yes
	# version 8.0 and above of icpc choke on multiply defined symbols
	# if we add $predep_objects and $postdep_objects, however 7.1 and
	# earlier do not add the objects themselves.
	case `$CC -V 2>&1` in
	*"Version 7."*)
  	  _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $wl$soname -o $lib'
  	  _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib'
	  ;;
	*)  # Version 8.0 or newer
	  tmp_idyn=
	  case $host_cpu in
	    ia64*) tmp_idyn=' -i_dynamic';;
	  esac
  	  _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared'"$tmp_idyn"' $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib'
	  _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared'"$tmp_idyn"' $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib'
	  ;;
	esac
	_LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no
	_LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath,$libdir'
	_LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}--export-dynamic'
	_LT_AC_TAGVAR(whole_archive_flag_spec, $1)='${wl}--whole-archive$convenience ${wl}--no-whole-archive'
	;;
      pgCC*)
        # Portland Group C++ compiler
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $pic_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname -o $lib'
  	_LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $pic_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname ${wl}-retain-symbols-file ${wl}$export_symbols -o $lib'

	_LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}--rpath ${wl}$libdir'
	_LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}--export-dynamic'
	_LT_AC_TAGVAR(whole_archive_flag_spec, $1)='${wl}--whole-archive`for conv in $convenience\"\"; do test  -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}--no-whole-archive'
        ;;
      cxx*)
	# Compaq C++
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $wl$soname -o $lib'
	_LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $wl$soname  -o $lib ${wl}-retain-symbols-file $wl$export_symbols'

	runpath_var=LD_RUN_PATH
	_LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-rpath $libdir'
	_LT_AC_TAGVAR(hardcode_libdir_separator, $1)=:

	# Commands to make compiler produce verbose output that lists
	# what "hidden" libraries, object files and flags are used when
	# linking a shared library.
	#
	# There doesn't appear to be a way to prevent this compiler from
	# explicitly linking system object files so we need to strip them
	# from the output so that they don't get included in the library
	# dependencies.
	output_verbose_link_cmd='templist=`$CC -shared $CFLAGS -v conftest.$objext 2>&1 | grep "ld"`; templist=`echo $templist | $SED "s/\(^.*ld.*\)\( .*ld .*$\)/\1/"`; list=""; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; echo $list'
	;;
    esac
    ;;
  lynxos*)
    # FIXME: insert proper C++ library support
    _LT_AC_TAGVAR(ld_shlibs, $1)=no
    ;;
  m88k*)
    # FIXME: insert proper C++ library support
    _LT_AC_TAGVAR(ld_shlibs, $1)=no
    ;;
  mvs*)
    case $cc_basename in
      cxx*)
	# FIXME: insert proper C++ library support
	_LT_AC_TAGVAR(ld_shlibs, $1)=no
	;;
      *)
	# FIXME: insert proper C++ library support
	_LT_AC_TAGVAR(ld_shlibs, $1)=no
	;;
    esac
    ;;
  netbsd*)
    if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then
      _LT_AC_TAGVAR(archive_cmds, $1)='$LD -Bshareable  -o $lib $predep_objects $libobjs $deplibs $postdep_objects $linker_flags'
      wlarc=
      _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-R$libdir'
      _LT_AC_TAGVAR(hardcode_direct, $1)=yes
      _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no
    fi
    # Workaround some broken pre-1.5 toolchains
    output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | grep conftest.$objext | $SED -e "s:-lgcc -lc -lgcc::"'
    ;;
  openbsd2*)
    # C++ shared libraries are fairly broken
    _LT_AC_TAGVAR(ld_shlibs, $1)=no
    ;;
  openbsd*)
    _LT_AC_TAGVAR(hardcode_direct, $1)=yes
    _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no
    _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $pic_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -o $lib'
    _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath,$libdir'
    if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`" || test "$host_os-$host_cpu" = "openbsd2.8-powerpc"; then
      _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $pic_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-retain-symbols-file,$export_symbols -o $lib'
      _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E'
      _LT_AC_TAGVAR(whole_archive_flag_spec, $1)="$wlarc"'--whole-archive$convenience '"$wlarc"'--no-whole-archive'
    fi
    output_verbose_link_cmd='echo'
    ;;
  osf3*)
    case $cc_basename in
      KCC*)
	# Kuck and Associates, Inc. (KAI) C++ Compiler

	# KCC will only create a shared library if the output file
	# ends with ".so" (or ".sl" for HP-UX), so rename the library
	# to its proper name (with version) after linking.
	_LT_AC_TAGVAR(archive_cmds, $1)='tempext=`echo $shared_ext | $SED -e '\''s/\([[^()0-9A-Za-z{}]]\)/\\\\\1/g'\''`; templib=`echo $lib | $SED -e "s/\${tempext}\..*/.so/"`; $CC $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags --soname $soname -o \$templib; mv \$templib $lib'

	_LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath,$libdir'
	_LT_AC_TAGVAR(hardcode_libdir_separator, $1)=:

	# Archives containing C++ object files must be created using
	# "CC -Bstatic", where "CC" is the KAI C++ compiler.
	_LT_AC_TAGVAR(old_archive_cmds, $1)='$CC -Bstatic -o $oldlib $oldobjs'

	;;
      RCC*)
	# Rational C++ 2.4.1
	# FIXME: insert proper C++ library support
	_LT_AC_TAGVAR(ld_shlibs, $1)=no
	;;
      cxx*)
	_LT_AC_TAGVAR(allow_undefined_flag, $1)=' ${wl}-expect_unresolved ${wl}\*'
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared${allow_undefined_flag} $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $soname `test -n "$verstring" && echo ${wl}-set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib'

	_LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath ${wl}$libdir'
	_LT_AC_TAGVAR(hardcode_libdir_separator, $1)=:

	# Commands to make compiler produce verbose output that lists
	# what "hidden" libraries, object files and flags are used when
	# linking a shared library.
	#
	# There doesn't appear to be a way to prevent this compiler from
	# explicitly linking system object files so we need to strip them
	# from the output so that they don't get included in the library
	# dependencies.
	output_verbose_link_cmd='templist=`$CC -shared $CFLAGS -v conftest.$objext 2>&1 | grep "ld" | grep -v "ld:"`; templist=`echo $templist | $SED "s/\(^.*ld.*\)\( .*ld.*$\)/\1/"`; list=""; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; echo $list'
	;;
      *)
	if test "$GXX" = yes && test "$with_gnu_ld" = no; then
	  _LT_AC_TAGVAR(allow_undefined_flag, $1)=' ${wl}-expect_unresolved ${wl}\*'
	  _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib ${allow_undefined_flag} $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib'

	  _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath ${wl}$libdir'
	  _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=:

	  # Commands to make compiler produce verbose output that lists
	  # what "hidden" libraries, object files and flags are used when
	  # linking a shared library.
	  output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | grep "\-L"'

	else
	  # FIXME: insert proper C++ library support
	  _LT_AC_TAGVAR(ld_shlibs, $1)=no
	fi
	;;
    esac
    ;;
  osf4* | osf5*)
    case $cc_basename in
      KCC*)
	# Kuck and Associates, Inc. (KAI) C++ Compiler

	# KCC will only create a shared library if the output file
	# ends with ".so" (or ".sl" for HP-UX), so rename the library
	# to its proper name (with version) after linking.
	_LT_AC_TAGVAR(archive_cmds, $1)='tempext=`echo $shared_ext | $SED -e '\''s/\([[^()0-9A-Za-z{}]]\)/\\\\\1/g'\''`; templib=`echo $lib | $SED -e "s/\${tempext}\..*/.so/"`; $CC $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags --soname $soname -o \$templib; mv \$templib $lib'

	_LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath,$libdir'
	_LT_AC_TAGVAR(hardcode_libdir_separator, $1)=:

	# Archives containing C++ object files must be created using
	# the KAI C++ compiler.
	_LT_AC_TAGVAR(old_archive_cmds, $1)='$CC -o $oldlib $oldobjs'
	;;
      RCC*)
	# Rational C++ 2.4.1
	# FIXME: insert proper C++ library support
	_LT_AC_TAGVAR(ld_shlibs, $1)=no
	;;
      cxx*)
	_LT_AC_TAGVAR(allow_undefined_flag, $1)=' -expect_unresolved \*'
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared${allow_undefined_flag} $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -msym -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib'
	_LT_AC_TAGVAR(archive_expsym_cmds, $1)='for i in `cat $export_symbols`; do printf "%s %s\\n" -exported_symbol "\$i" >> $lib.exp; done~
	  echo "-hidden">> $lib.exp~
	  $CC -shared$allow_undefined_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -msym -soname $soname -Wl,-input -Wl,$lib.exp  `test -n "$verstring" && echo -set_version	$verstring` -update_registry ${output_objdir}/so_locations -o $lib~
	  $rm $lib.exp'

	_LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-rpath $libdir'
	_LT_AC_TAGVAR(hardcode_libdir_separator, $1)=:

	# Commands to make compiler produce verbose output that lists
	# what "hidden" libraries, object files and flags are used when
	# linking a shared library.
	#
	# There doesn't appear to be a way to prevent this compiler from
	# explicitly linking system object files so we need to strip them
	# from the output so that they don't get included in the library
	# dependencies.
	output_verbose_link_cmd='templist=`$CC -shared $CFLAGS -v conftest.$objext 2>&1 | grep "ld" | grep -v "ld:"`; templist=`echo $templist | $SED "s/\(^.*ld.*\)\( .*ld.*$\)/\1/"`; list=""; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; echo $list'
	;;
      *)
	if test "$GXX" = yes && test "$with_gnu_ld" = no; then
	  _LT_AC_TAGVAR(allow_undefined_flag, $1)=' ${wl}-expect_unresolved ${wl}\*'
	 _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib ${allow_undefined_flag} $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-msym ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib'

	  _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath ${wl}$libdir'
	  _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=:

	  # Commands to make compiler produce verbose output that lists
	  # what "hidden" libraries, object files and flags are used when
	  # linking a shared library.
	  output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | grep "\-L"'

	else
	  # FIXME: insert proper C++ library support
	  _LT_AC_TAGVAR(ld_shlibs, $1)=no
	fi
	;;
    esac
    ;;
  psos*)
    # FIXME: insert proper C++ library support
    _LT_AC_TAGVAR(ld_shlibs, $1)=no
    ;;
  sunos4*)
    case $cc_basename in
      CC*)
	# Sun C++ 4.x
	# FIXME: insert proper C++ library support
	_LT_AC_TAGVAR(ld_shlibs, $1)=no
	;;
      lcc*)
	# Lucid
	# FIXME: insert proper C++ library support
	_LT_AC_TAGVAR(ld_shlibs, $1)=no
	;;
      *)
	# FIXME: insert proper C++ library support
	_LT_AC_TAGVAR(ld_shlibs, $1)=no
	;;
    esac
    ;;
  solaris*)
    case $cc_basename in
      CC*)
	# Sun C++ 4.2, 5.x and Centerline C++
        _LT_AC_TAGVAR(archive_cmds_need_lc,$1)=yes
	_LT_AC_TAGVAR(no_undefined_flag, $1)=' -zdefs'
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -G${allow_undefined_flag}  -h$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags'
	_LT_AC_TAGVAR(archive_expsym_cmds, $1)='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~
	$CC -G${allow_undefined_flag}  ${wl}-M ${wl}$lib.exp -h$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags~$rm $lib.exp'

	_LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-R$libdir'
	_LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no
	case $host_os in
	  solaris2.[[0-5]] | solaris2.[[0-5]].*) ;;
	  *)
	    # The C++ compiler is used as linker so we must use $wl
	    # flag to pass the commands to the underlying system
	    # linker. We must also pass each convience library through
	    # to the system linker between allextract/defaultextract.
	    # The C++ compiler will combine linker options so we
	    # cannot just pass the convience library names through
	    # without $wl.
	    # Supported since Solaris 2.6 (maybe 2.5.1?)
	    _LT_AC_TAGVAR(whole_archive_flag_spec, $1)='${wl}-z ${wl}allextract`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}-z ${wl}defaultextract'
	    ;;
	esac
	_LT_AC_TAGVAR(link_all_deplibs, $1)=yes

	output_verbose_link_cmd='echo'

	# Archives containing C++ object files must be created using
	# "CC -xar", where "CC" is the Sun C++ compiler.  This is
	# necessary to make sure instantiated templates are included
	# in the archive.
	_LT_AC_TAGVAR(old_archive_cmds, $1)='$CC -xar -o $oldlib $oldobjs'
	;;
      gcx*)
	# Green Hills C++ Compiler
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-h $wl$soname -o $lib'

	# The C++ compiler must be used to create the archive.
	_LT_AC_TAGVAR(old_archive_cmds, $1)='$CC $LDFLAGS -archive -o $oldlib $oldobjs'
	;;
      *)
	# GNU C++ compiler with Solaris linker
	if test "$GXX" = yes && test "$with_gnu_ld" = no; then
	  _LT_AC_TAGVAR(no_undefined_flag, $1)=' ${wl}-z ${wl}defs'
	  if $CC --version | grep -v '^2\.7' > /dev/null; then
	    _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib $LDFLAGS $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-h $wl$soname -o $lib'
	    _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~
		$CC -shared -nostdlib ${wl}-M $wl$lib.exp -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags~$rm $lib.exp'

	    # Commands to make compiler produce verbose output that lists
	    # what "hidden" libraries, object files and flags are used when
	    # linking a shared library.
	    output_verbose_link_cmd="$CC -shared $CFLAGS -v conftest.$objext 2>&1 | grep \"\-L\""
	  else
	    # g++ 2.7 appears to require `-G' NOT `-shared' on this
	    # platform.
	    _LT_AC_TAGVAR(archive_cmds, $1)='$CC -G -nostdlib $LDFLAGS $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-h $wl$soname -o $lib'
	    _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~
		$CC -G -nostdlib ${wl}-M $wl$lib.exp -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags~$rm $lib.exp'

	    # Commands to make compiler produce verbose output that lists
	    # what "hidden" libraries, object files and flags are used when
	    # linking a shared library.
	    output_verbose_link_cmd="$CC -G $CFLAGS -v conftest.$objext 2>&1 | grep \"\-L\""
	  fi

	  _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-R $wl$libdir'
	fi
	;;
    esac
    ;;
  sysv4*uw2* | sysv5OpenUNIX* | sysv5UnixWare7.[[01]].[[10]]* | unixware7* | sco3.2v5.0.[[024]]*)
    _LT_AC_TAGVAR(no_undefined_flag, $1)='${wl}-z,text'
    _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no
    _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no
    runpath_var='LD_RUN_PATH'

    case $cc_basename in
      CC*)
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -G ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags'
	_LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -G ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags'
	;;
      *)
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags'
	_LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags'
	;;
    esac
    ;;
  sysv5* | sco3.2v5* | sco5v6*)
    # Note: We can NOT use -z defs as we might desire, because we do not
    # link with -lc, and that would cause any symbols used from libc to
    # always be unresolved, which means just about no library would
    # ever link correctly.  If we're not using GNU ld we use -z text
    # though, which does catch some bad symbols but isn't as heavy-handed
    # as -z defs.
    # For security reasons, it is highly recommended that you always
    # use absolute paths for naming shared libraries, and exclude the
    # DT_RUNPATH tag from executables and libraries.  But doing so
    # requires that you compile everything twice, which is a pain.
    # So that behaviour is only enabled if SCOABSPATH is set to a
    # non-empty value in the environment.  Most likely only useful for
    # creating official distributions of packages.
    # This is a hack until libtool officially supports absolute path
    # names for shared libraries.
    _LT_AC_TAGVAR(no_undefined_flag, $1)='${wl}-z,text'
    _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-z,nodefs'
    _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no
    _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no
    _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='`test -z "$SCOABSPATH" && echo ${wl}-R,$libdir`'
    _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=':'
    _LT_AC_TAGVAR(link_all_deplibs, $1)=yes
    _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-Bexport'
    runpath_var='LD_RUN_PATH'

    case $cc_basename in
      CC*)
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -G ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags'
	_LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -G ${wl}-Bexport:$export_symbols ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags'
	;;
      *)
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags'
	_LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared ${wl}-Bexport:$export_symbols ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags'
	;;
    esac
    ;;
  tandem*)
    case $cc_basename in
      NCC*)
	# NonStop-UX NCC 3.20
	# FIXME: insert proper C++ library support
	_LT_AC_TAGVAR(ld_shlibs, $1)=no
	;;
      *)
	# FIXME: insert proper C++ library support
	_LT_AC_TAGVAR(ld_shlibs, $1)=no
	;;
    esac
    ;;
  vxworks*)
    # FIXME: insert proper C++ library support
    _LT_AC_TAGVAR(ld_shlibs, $1)=no
    ;;
  *)
    # FIXME: insert proper C++ library support
    _LT_AC_TAGVAR(ld_shlibs, $1)=no
    ;;
esac
AC_MSG_RESULT([$_LT_AC_TAGVAR(ld_shlibs, $1)])
test "$_LT_AC_TAGVAR(ld_shlibs, $1)" = no && can_build_shared=no

_LT_AC_TAGVAR(GCC, $1)="$GXX"
_LT_AC_TAGVAR(LD, $1)="$LD"

AC_LIBTOOL_POSTDEP_PREDEP($1)
AC_LIBTOOL_PROG_COMPILER_PIC($1)
AC_LIBTOOL_PROG_CC_C_O($1)
AC_LIBTOOL_SYS_HARD_LINK_LOCKS($1)
AC_LIBTOOL_PROG_LD_SHLIBS($1)
AC_LIBTOOL_SYS_DYNAMIC_LINKER($1)
AC_LIBTOOL_PROG_LD_HARDCODE_LIBPATH($1)

AC_LIBTOOL_CONFIG($1)

AC_LANG_POP
CC=$lt_save_CC
LDCXX=$LD
LD=$lt_save_LD
GCC=$lt_save_GCC
with_gnu_ldcxx=$with_gnu_ld
with_gnu_ld=$lt_save_with_gnu_ld
lt_cv_path_LDCXX=$lt_cv_path_LD
lt_cv_path_LD=$lt_save_path_LD
lt_cv_prog_gnu_ldcxx=$lt_cv_prog_gnu_ld
lt_cv_prog_gnu_ld=$lt_save_with_gnu_ld
])# AC_LIBTOOL_LANG_CXX_CONFIG

# AC_LIBTOOL_POSTDEP_PREDEP([TAGNAME])
# ------------------------------------
# Figure out "hidden" library dependencies from verbose
# compiler output when linking a shared library.
# Parse the compiler output and extract the necessary
# objects, libraries and library flags.
AC_DEFUN([AC_LIBTOOL_POSTDEP_PREDEP],[
dnl we can't use the lt_simple_compile_test_code here,
dnl because it contains code intended for an executable,
dnl not a library.  It's possible we should let each
dnl tag define a new lt_????_link_test_code variable,
dnl but it's only used here...
ifelse([$1],[],[cat > conftest.$ac_ext <<EOF
int a;
void foo (void) { a = 0; }
EOF
],[$1],[CXX],[cat > conftest.$ac_ext <<EOF
class Foo
{
public:
  Foo (void) { a = 0; }
private:
  int a;
};
EOF
],[$1],[F77],[cat > conftest.$ac_ext <<EOF
      subroutine foo
      implicit none
      integer*4 a
      a=0
      return
      end
EOF
],[$1],[GCJ],[cat > conftest.$ac_ext <<EOF
public class foo {
  private int a;
  public void bar (void) {
    a = 0;
  }
};
EOF
])
dnl Parse the compiler output and extract the necessary
dnl objects, libraries and library flags.
if AC_TRY_EVAL(ac_compile); then
  # Parse the compiler output and extract the necessary
  # objects, libraries and library flags.

  # Sentinel used to keep track of whether or not we are before
  # the conftest object file.
  pre_test_object_deps_done=no

  # The `*' in the case matches for architectures that use `case' in
  # $output_verbose_cmd can trigger glob expansion during the loop
  # eval without this substitution.
  output_verbose_link_cmd=`$echo "X$output_verbose_link_cmd" | $Xsed -e "$no_glob_subst"`

  for p in `eval $output_verbose_link_cmd`; do
    case $p in

    -L* | -R* | -l*)
       # Some compilers place space between "-{L,R}" and the path.
       # Remove the space.
       if test $p = "-L" \
	  || test $p = "-R"; then
	 prev=$p
	 continue
       else
	 prev=
       fi

       if test "$pre_test_object_deps_done" = no; then
	 case $p in
	 -L* | -R*)
	   # Internal compiler library paths should come after those
	   # provided the user.  The postdeps already come after the
	   # user supplied libs so there is no need to process them.
	   if test -z "$_LT_AC_TAGVAR(compiler_lib_search_path, $1)"; then
	     _LT_AC_TAGVAR(compiler_lib_search_path, $1)="${prev}${p}"
	   else
	     _LT_AC_TAGVAR(compiler_lib_search_path, $1)="${_LT_AC_TAGVAR(compiler_lib_search_path, $1)} ${prev}${p}"
	   fi
	   ;;
	 # The "-l" case would never come before the object being
	 # linked, so don't bother handling this case.
	 esac
       else
	 if test -z "$_LT_AC_TAGVAR(postdeps, $1)"; then
	   _LT_AC_TAGVAR(postdeps, $1)="${prev}${p}"
	 else
	   _LT_AC_TAGVAR(postdeps, $1)="${_LT_AC_TAGVAR(postdeps, $1)} ${prev}${p}"
	 fi
       fi
       ;;

    *.$objext)
       # This assumes that the test object file only shows up
       # once in the compiler output.
       if test "$p" = "conftest.$objext"; then
	 pre_test_object_deps_done=yes
	 continue
       fi

       if test "$pre_test_object_deps_done" = no; then
	 if test -z "$_LT_AC_TAGVAR(predep_objects, $1)"; then
	   _LT_AC_TAGVAR(predep_objects, $1)="$p"
	 else
	   _LT_AC_TAGVAR(predep_objects, $1)="$_LT_AC_TAGVAR(predep_objects, $1) $p"
	 fi
       else
	 if test -z "$_LT_AC_TAGVAR(postdep_objects, $1)"; then
	   _LT_AC_TAGVAR(postdep_objects, $1)="$p"
	 else
	   _LT_AC_TAGVAR(postdep_objects, $1)="$_LT_AC_TAGVAR(postdep_objects, $1) $p"
	 fi
       fi
       ;;

    *) ;; # Ignore the rest.

    esac
  done

  # Clean up.
  rm -f a.out a.exe
else
  echo "libtool.m4: error: problem compiling $1 test program"
fi

$rm -f confest.$objext

# PORTME: override above test on systems where it is broken
ifelse([$1],[CXX],
[case $host_os in
interix3*)
  # Interix 3.5 installs completely hosed .la files for C++, so rather than
  # hack all around it, let's just trust "g++" to DTRT.
  _LT_AC_TAGVAR(predep_objects,$1)=
  _LT_AC_TAGVAR(postdep_objects,$1)=
  _LT_AC_TAGVAR(postdeps,$1)=
  ;;

solaris*)
  case $cc_basename in
  CC*)
    # Adding this requires a known-good setup of shared libraries for
    # Sun compiler versions before 5.6, else PIC objects from an old
    # archive will be linked into the output, leading to subtle bugs.
    _LT_AC_TAGVAR(postdeps,$1)='-lCstd -lCrun'
    ;;
  esac
  ;;
esac
])

case " $_LT_AC_TAGVAR(postdeps, $1) " in
*" -lc "*) _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no ;;
esac
])# AC_LIBTOOL_POSTDEP_PREDEP

# AC_LIBTOOL_LANG_F77_CONFIG
# --------------------------
# Ensure that the configuration vars for the C compiler are
# suitably defined.  Those variables are subsequently used by
# AC_LIBTOOL_CONFIG to write the compiler configuration to `libtool'.
AC_DEFUN([AC_LIBTOOL_LANG_F77_CONFIG], [_LT_AC_LANG_F77_CONFIG(F77)])
AC_DEFUN([_LT_AC_LANG_F77_CONFIG],
[AC_REQUIRE([AC_PROG_F77])
AC_LANG_PUSH(Fortran 77)

_LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no
_LT_AC_TAGVAR(allow_undefined_flag, $1)=
_LT_AC_TAGVAR(always_export_symbols, $1)=no
_LT_AC_TAGVAR(archive_expsym_cmds, $1)=
_LT_AC_TAGVAR(export_dynamic_flag_spec, $1)=
_LT_AC_TAGVAR(hardcode_direct, $1)=no
_LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)=
_LT_AC_TAGVAR(hardcode_libdir_flag_spec_ld, $1)=
_LT_AC_TAGVAR(hardcode_libdir_separator, $1)=
_LT_AC_TAGVAR(hardcode_minus_L, $1)=no
_LT_AC_TAGVAR(hardcode_automatic, $1)=no
_LT_AC_TAGVAR(module_cmds, $1)=
_LT_AC_TAGVAR(module_expsym_cmds, $1)=
_LT_AC_TAGVAR(link_all_deplibs, $1)=unknown
_LT_AC_TAGVAR(old_archive_cmds, $1)=$old_archive_cmds
_LT_AC_TAGVAR(no_undefined_flag, $1)=
_LT_AC_TAGVAR(whole_archive_flag_spec, $1)=
_LT_AC_TAGVAR(enable_shared_with_static_runtimes, $1)=no

# Source file extension for f77 test sources.
ac_ext=f

# Object file extension for compiled f77 test sources.
objext=o
_LT_AC_TAGVAR(objext, $1)=$objext

# Code to be used in simple compile tests
lt_simple_compile_test_code="      subroutine t\n      return\n      end\n"

# Code to be used in simple link tests
lt_simple_link_test_code="      program t\n      end\n"

# ltmain only uses $CC for tagged configurations so make sure $CC is set.
_LT_AC_SYS_COMPILER

# save warnings/boilerplate of simple test code
_LT_COMPILER_BOILERPLATE
_LT_LINKER_BOILERPLATE

# Allow CC to be a program name with arguments.
lt_save_CC="$CC"
CC=${F77-"f77"}
compiler=$CC
_LT_AC_TAGVAR(compiler, $1)=$CC
_LT_CC_BASENAME([$compiler])

AC_MSG_CHECKING([if libtool supports shared libraries])
AC_MSG_RESULT([$can_build_shared])

AC_MSG_CHECKING([whether to build shared libraries])
test "$can_build_shared" = "no" && enable_shared=no

# On AIX, shared libraries and static libraries use the same namespace, and
# are all built from PIC.
case $host_os in
aix3*)
  test "$enable_shared" = yes && enable_static=no
  if test -n "$RANLIB"; then
    archive_cmds="$archive_cmds~\$RANLIB \$lib"
    postinstall_cmds='$RANLIB $lib'
  fi
  ;;
aix4* | aix5*)
  if test "$host_cpu" != ia64 && test "$aix_use_runtimelinking" = no ; then
    test "$enable_shared" = yes && enable_static=no
  fi
  ;;
esac
AC_MSG_RESULT([$enable_shared])

AC_MSG_CHECKING([whether to build static libraries])
# Make sure either enable_shared or enable_static is yes.
test "$enable_shared" = yes || enable_static=yes
AC_MSG_RESULT([$enable_static])

_LT_AC_TAGVAR(GCC, $1)="$G77"
_LT_AC_TAGVAR(LD, $1)="$LD"

AC_LIBTOOL_PROG_COMPILER_PIC($1)
AC_LIBTOOL_PROG_CC_C_O($1)
AC_LIBTOOL_SYS_HARD_LINK_LOCKS($1)
AC_LIBTOOL_PROG_LD_SHLIBS($1)
AC_LIBTOOL_SYS_DYNAMIC_LINKER($1)
AC_LIBTOOL_PROG_LD_HARDCODE_LIBPATH($1)

AC_LIBTOOL_CONFIG($1)

AC_LANG_POP
CC="$lt_save_CC"
])# AC_LIBTOOL_LANG_F77_CONFIG


# AC_LIBTOOL_LANG_GCJ_CONFIG
# --------------------------
# Ensure that the configuration vars for the C compiler are
# suitably defined.  Those variables are subsequently used by
# AC_LIBTOOL_CONFIG to write the compiler configuration to `libtool'.
AC_DEFUN([AC_LIBTOOL_LANG_GCJ_CONFIG], [_LT_AC_LANG_GCJ_CONFIG(GCJ)])
AC_DEFUN([_LT_AC_LANG_GCJ_CONFIG],
[AC_LANG_SAVE

# Source file extension for Java test sources.
ac_ext=java

# Object file extension for compiled Java test sources.
objext=o
_LT_AC_TAGVAR(objext, $1)=$objext

# Code to be used in simple compile tests
lt_simple_compile_test_code="class foo {}\n"

# Code to be used in simple link tests
lt_simple_link_test_code='public class conftest { public static void main(String[[]] argv) {}; }\n'

# ltmain only uses $CC for tagged configurations so make sure $CC is set.
_LT_AC_SYS_COMPILER

# save warnings/boilerplate of simple test code
_LT_COMPILER_BOILERPLATE
_LT_LINKER_BOILERPLATE

# Allow CC to be a program name with arguments.
lt_save_CC="$CC"
CC=${GCJ-"gcj"}
compiler=$CC
_LT_AC_TAGVAR(compiler, $1)=$CC
_LT_CC_BASENAME([$compiler])

# GCJ did not exist at the time GCC didn't implicitly link libc in.
_LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no

_LT_AC_TAGVAR(old_archive_cmds, $1)=$old_archive_cmds

AC_LIBTOOL_PROG_COMPILER_NO_RTTI($1)
AC_LIBTOOL_PROG_COMPILER_PIC($1)
AC_LIBTOOL_PROG_CC_C_O($1)
AC_LIBTOOL_SYS_HARD_LINK_LOCKS($1)
AC_LIBTOOL_PROG_LD_SHLIBS($1)
AC_LIBTOOL_SYS_DYNAMIC_LINKER($1)
AC_LIBTOOL_PROG_LD_HARDCODE_LIBPATH($1)

AC_LIBTOOL_CONFIG($1)

AC_LANG_RESTORE
CC="$lt_save_CC"
])# AC_LIBTOOL_LANG_GCJ_CONFIG


# AC_LIBTOOL_LANG_RC_CONFIG
# -------------------------
# Ensure that the configuration vars for the Windows resource compiler are
# suitably defined.  Those variables are subsequently used by
# AC_LIBTOOL_CONFIG to write the compiler configuration to `libtool'.
AC_DEFUN([AC_LIBTOOL_LANG_RC_CONFIG], [_LT_AC_LANG_RC_CONFIG(RC)])
AC_DEFUN([_LT_AC_LANG_RC_CONFIG],
[AC_LANG_SAVE

# Source file extension for RC test sources.
ac_ext=rc

# Object file extension for compiled RC test sources.
objext=o
_LT_AC_TAGVAR(objext, $1)=$objext

# Code to be used in simple compile tests
lt_simple_compile_test_code='sample MENU { MENUITEM "&Soup", 100, CHECKED }\n'

# Code to be used in simple link tests
lt_simple_link_test_code="$lt_simple_compile_test_code"

# ltmain only uses $CC for tagged configurations so make sure $CC is set.
_LT_AC_SYS_COMPILER

# save warnings/boilerplate of simple test code
_LT_COMPILER_BOILERPLATE
_LT_LINKER_BOILERPLATE

# Allow CC to be a program name with arguments.
lt_save_CC="$CC"
CC=${RC-"windres"}
compiler=$CC
_LT_AC_TAGVAR(compiler, $1)=$CC
_LT_CC_BASENAME([$compiler])
_LT_AC_TAGVAR(lt_cv_prog_compiler_c_o, $1)=yes

AC_LIBTOOL_CONFIG($1)

AC_LANG_RESTORE
CC="$lt_save_CC"
])# AC_LIBTOOL_LANG_RC_CONFIG


# AC_LIBTOOL_CONFIG([TAGNAME])
# ----------------------------
# If TAGNAME is not passed, then create an initial libtool script
# with a default configuration from the untagged config vars.  Otherwise
# add code to config.status for appending the configuration named by
# TAGNAME from the matching tagged config vars.
AC_DEFUN([AC_LIBTOOL_CONFIG],
[# The else clause should only fire when bootstrapping the
# libtool distribution, otherwise you forgot to ship ltmain.sh
# with your package, and you will get complaints that there are
# no rules to generate ltmain.sh.
if test -f "$ltmain"; then
  # See if we are running on zsh, and set the options which allow our commands through
  # without removal of \ escapes.
  if test -n "${ZSH_VERSION+set}" ; then
    setopt NO_GLOB_SUBST
  fi
  # Now quote all the things that may contain metacharacters while being
  # careful not to overquote the AC_SUBSTed values.  We take copies of the
  # variables and quote the copies for generation of the libtool script.
  for var in echo old_CC old_CFLAGS AR AR_FLAGS EGREP RANLIB LN_S LTCC LTCFLAGS NM \
    SED SHELL STRIP \
    libname_spec library_names_spec soname_spec extract_expsyms_cmds \
    old_striplib striplib file_magic_cmd finish_cmds finish_eval \
    deplibs_check_method reload_flag reload_cmds need_locks \
    lt_cv_sys_global_symbol_pipe lt_cv_sys_global_symbol_to_cdecl \
    lt_cv_sys_global_symbol_to_c_name_address \
    sys_lib_search_path_spec sys_lib_dlsearch_path_spec \
    old_postinstall_cmds old_postuninstall_cmds \
    _LT_AC_TAGVAR(compiler, $1) \
    _LT_AC_TAGVAR(CC, $1) \
    _LT_AC_TAGVAR(LD, $1) \
    _LT_AC_TAGVAR(lt_prog_compiler_wl, $1) \
    _LT_AC_TAGVAR(lt_prog_compiler_pic, $1) \
    _LT_AC_TAGVAR(lt_prog_compiler_static, $1) \
    _LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, $1) \
    _LT_AC_TAGVAR(export_dynamic_flag_spec, $1) \
    _LT_AC_TAGVAR(thread_safe_flag_spec, $1) \
    _LT_AC_TAGVAR(whole_archive_flag_spec, $1) \
    _LT_AC_TAGVAR(enable_shared_with_static_runtimes, $1) \
    _LT_AC_TAGVAR(old_archive_cmds, $1) \
    _LT_AC_TAGVAR(old_archive_from_new_cmds, $1) \
    _LT_AC_TAGVAR(predep_objects, $1) \
    _LT_AC_TAGVAR(postdep_objects, $1) \
    _LT_AC_TAGVAR(predeps, $1) \
    _LT_AC_TAGVAR(postdeps, $1) \
    _LT_AC_TAGVAR(compiler_lib_search_path, $1) \
    _LT_AC_TAGVAR(archive_cmds, $1) \
    _LT_AC_TAGVAR(archive_expsym_cmds, $1) \
    _LT_AC_TAGVAR(postinstall_cmds, $1) \
    _LT_AC_TAGVAR(postuninstall_cmds, $1) \
    _LT_AC_TAGVAR(old_archive_from_expsyms_cmds, $1) \
    _LT_AC_TAGVAR(allow_undefined_flag, $1) \
    _LT_AC_TAGVAR(no_undefined_flag, $1) \
    _LT_AC_TAGVAR(export_symbols_cmds, $1) \
    _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1) \
    _LT_AC_TAGVAR(hardcode_libdir_flag_spec_ld, $1) \
    _LT_AC_TAGVAR(hardcode_libdir_separator, $1) \
    _LT_AC_TAGVAR(hardcode_automatic, $1) \
    _LT_AC_TAGVAR(module_cmds, $1) \
    _LT_AC_TAGVAR(module_expsym_cmds, $1) \
    _LT_AC_TAGVAR(lt_cv_prog_compiler_c_o, $1) \
    _LT_AC_TAGVAR(exclude_expsyms, $1) \
    _LT_AC_TAGVAR(include_expsyms, $1); do

    case $var in
    _LT_AC_TAGVAR(old_archive_cmds, $1) | \
    _LT_AC_TAGVAR(old_archive_from_new_cmds, $1) | \
    _LT_AC_TAGVAR(archive_cmds, $1) | \
    _LT_AC_TAGVAR(archive_expsym_cmds, $1) | \
    _LT_AC_TAGVAR(module_cmds, $1) | \
    _LT_AC_TAGVAR(module_expsym_cmds, $1) | \
    _LT_AC_TAGVAR(old_archive_from_expsyms_cmds, $1) | \
    _LT_AC_TAGVAR(export_symbols_cmds, $1) | \
    extract_expsyms_cmds | reload_cmds | finish_cmds | \
    postinstall_cmds | postuninstall_cmds | \
    old_postinstall_cmds | old_postuninstall_cmds | \
    sys_lib_search_path_spec | sys_lib_dlsearch_path_spec)
      # Double-quote double-evaled strings.
      eval "lt_$var=\\\"\`\$echo \"X\$$var\" | \$Xsed -e \"\$double_quote_subst\" -e \"\$sed_quote_subst\" -e \"\$delay_variable_subst\"\`\\\""
      ;;
    *)
      eval "lt_$var=\\\"\`\$echo \"X\$$var\" | \$Xsed -e \"\$sed_quote_subst\"\`\\\""
      ;;
    esac
  done

  case $lt_echo in
  *'\[$]0 --fallback-echo"')
    lt_echo=`$echo "X$lt_echo" | $Xsed -e 's/\\\\\\\[$]0 --fallback-echo"[$]/[$]0 --fallback-echo"/'`
    ;;
  esac

ifelse([$1], [],
  [cfgfile="${ofile}T"
  trap "$rm \"$cfgfile\"; exit 1" 1 2 15
  $rm -f "$cfgfile"
  AC_MSG_NOTICE([creating $ofile])],
  [cfgfile="$ofile"])

  cat <<__EOF__ >> "$cfgfile"
ifelse([$1], [],
[#! $SHELL

# `$echo "$cfgfile" | sed 's%^.*/%%'` - Provide generalized library-building support services.
# Generated automatically by $PROGRAM (GNU $PACKAGE $VERSION$TIMESTAMP)
# NOTE: Changes made to this file will be lost: look at ltmain.sh.
#
# Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001
# Free Software Foundation, Inc.
#
# This file is part of GNU Libtool:
# Originally by Gordon Matzigkeit <gord@gnu.ai.mit.edu>, 1996
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
#
# As a special exception to the GNU General Public License, if you
# distribute this file as part of a program that contains a
# configuration script generated by Autoconf, you may include it under
# the same distribution terms that you use for the rest of that program.

# A sed program that does not truncate output.
SED=$lt_SED

# Sed that helps us avoid accidentally triggering echo(1) options like -n.
Xsed="$SED -e 1s/^X//"

# The HP-UX ksh and POSIX shell print the target directory to stdout
# if CDPATH is set.
(unset CDPATH) >/dev/null 2>&1 && unset CDPATH

# The names of the tagged configurations supported by this script.
available_tags=

# ### BEGIN LIBTOOL CONFIG],
[# ### BEGIN LIBTOOL TAG CONFIG: $tagname])

# Libtool was configured on host `(hostname || uname -n) 2>/dev/null | sed 1q`:

# Shell to use when invoking shell scripts.
SHELL=$lt_SHELL

# Whether or not to build shared libraries.
build_libtool_libs=$enable_shared

# Whether or not to build static libraries.
build_old_libs=$enable_static

# Whether or not to add -lc for building shared libraries.
build_libtool_need_lc=$_LT_AC_TAGVAR(archive_cmds_need_lc, $1)

# Whether or not to disallow shared libs when runtime libs are static
allow_libtool_libs_with_static_runtimes=$_LT_AC_TAGVAR(enable_shared_with_static_runtimes, $1)

# Whether or not to optimize for fast installation.
fast_install=$enable_fast_install

# The host system.
host_alias=$host_alias
host=$host
host_os=$host_os

# The build system.
build_alias=$build_alias
build=$build
build_os=$build_os

# An echo program that does not interpret backslashes.
echo=$lt_echo

# The archiver.
AR=$lt_AR
AR_FLAGS=$lt_AR_FLAGS

# A C compiler.
LTCC=$lt_LTCC

# LTCC compiler flags.
LTCFLAGS=$lt_LTCFLAGS

# A language-specific compiler.
CC=$lt_[]_LT_AC_TAGVAR(compiler, $1)

# Is the compiler the GNU C compiler?
with_gcc=$_LT_AC_TAGVAR(GCC, $1)

# An ERE matcher.
EGREP=$lt_EGREP

# The linker used to build libraries.
LD=$lt_[]_LT_AC_TAGVAR(LD, $1)

# Whether we need hard or soft links.
LN_S=$lt_LN_S

# A BSD-compatible nm program.
NM=$lt_NM

# A symbol stripping program
STRIP=$lt_STRIP

# Used to examine libraries when file_magic_cmd begins "file"
MAGIC_CMD=$MAGIC_CMD

# Used on cygwin: DLL creation program.
DLLTOOL="$DLLTOOL"

# Used on cygwin: object dumper.
OBJDUMP="$OBJDUMP"

# Used on cygwin: assembler.
AS="$AS"

# The name of the directory that contains temporary libtool files.
objdir=$objdir

# How to create reloadable object files.
reload_flag=$lt_reload_flag
reload_cmds=$lt_reload_cmds

# How to pass a linker flag through the compiler.
wl=$lt_[]_LT_AC_TAGVAR(lt_prog_compiler_wl, $1)

# Object file suffix (normally "o").
objext="$ac_objext"

# Old archive suffix (normally "a").
libext="$libext"

# Shared library suffix (normally ".so").
shrext_cmds='$shrext_cmds'

# Executable file suffix (normally "").
exeext="$exeext"

# Additional compiler flags for building library objects.
pic_flag=$lt_[]_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)
pic_mode=$pic_mode

# What is the maximum length of a command?
max_cmd_len=$lt_cv_sys_max_cmd_len

# Does compiler simultaneously support -c and -o options?
compiler_c_o=$lt_[]_LT_AC_TAGVAR(lt_cv_prog_compiler_c_o, $1)

# Must we lock files when doing compilation?
need_locks=$lt_need_locks

# Do we need the lib prefix for modules?
need_lib_prefix=$need_lib_prefix

# Do we need a version for libraries?
need_version=$need_version

# Whether dlopen is supported.
dlopen_support=$enable_dlopen

# Whether dlopen of programs is supported.
dlopen_self=$enable_dlopen_self

# Whether dlopen of statically linked programs is supported.
dlopen_self_static=$enable_dlopen_self_static

# Compiler flag to prevent dynamic linking.
link_static_flag=$lt_[]_LT_AC_TAGVAR(lt_prog_compiler_static, $1)

# Compiler flag to turn off builtin functions.
no_builtin_flag=$lt_[]_LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, $1)

# Compiler flag to allow reflexive dlopens.
export_dynamic_flag_spec=$lt_[]_LT_AC_TAGVAR(export_dynamic_flag_spec, $1)

# Compiler flag to generate shared objects directly from archives.
whole_archive_flag_spec=$lt_[]_LT_AC_TAGVAR(whole_archive_flag_spec, $1)

# Compiler flag to generate thread-safe objects.
thread_safe_flag_spec=$lt_[]_LT_AC_TAGVAR(thread_safe_flag_spec, $1)

# Library versioning type.
version_type=$version_type

# Format of library name prefix.
libname_spec=$lt_libname_spec

# List of archive names.  First name is the real one, the rest are links.
# The last name is the one that the linker finds with -lNAME.
library_names_spec=$lt_library_names_spec

# The coded name of the library, if different from the real name.
soname_spec=$lt_soname_spec

# Commands used to build and install an old-style archive.
RANLIB=$lt_RANLIB
old_archive_cmds=$lt_[]_LT_AC_TAGVAR(old_archive_cmds, $1)
old_postinstall_cmds=$lt_old_postinstall_cmds
old_postuninstall_cmds=$lt_old_postuninstall_cmds

# Create an old-style archive from a shared archive.
old_archive_from_new_cmds=$lt_[]_LT_AC_TAGVAR(old_archive_from_new_cmds, $1)

# Create a temporary old-style archive to link instead of a shared archive.
old_archive_from_expsyms_cmds=$lt_[]_LT_AC_TAGVAR(old_archive_from_expsyms_cmds, $1)

# Commands used to build and install a shared archive.
archive_cmds=$lt_[]_LT_AC_TAGVAR(archive_cmds, $1)
archive_expsym_cmds=$lt_[]_LT_AC_TAGVAR(archive_expsym_cmds, $1)
postinstall_cmds=$lt_postinstall_cmds
postuninstall_cmds=$lt_postuninstall_cmds

# Commands used to build a loadable module (assumed same as above if empty)
module_cmds=$lt_[]_LT_AC_TAGVAR(module_cmds, $1)
module_expsym_cmds=$lt_[]_LT_AC_TAGVAR(module_expsym_cmds, $1)

# Commands to strip libraries.
old_striplib=$lt_old_striplib
striplib=$lt_striplib

# Dependencies to place before the objects being linked to create a
# shared library.
predep_objects=$lt_[]_LT_AC_TAGVAR(predep_objects, $1)

# Dependencies to place after the objects being linked to create a
# shared library.
postdep_objects=$lt_[]_LT_AC_TAGVAR(postdep_objects, $1)

# Dependencies to place before the objects being linked to create a
# shared library.
predeps=$lt_[]_LT_AC_TAGVAR(predeps, $1)

# Dependencies to place after the objects being linked to create a
# shared library.
postdeps=$lt_[]_LT_AC_TAGVAR(postdeps, $1)

# The library search path used internally by the compiler when linking
# a shared library.
compiler_lib_search_path=$lt_[]_LT_AC_TAGVAR(compiler_lib_search_path, $1)

# Method to check whether dependent libraries are shared objects.
deplibs_check_method=$lt_deplibs_check_method

# Command to use when deplibs_check_method == file_magic.
file_magic_cmd=$lt_file_magic_cmd

# Flag that allows shared libraries with undefined symbols to be built.
allow_undefined_flag=$lt_[]_LT_AC_TAGVAR(allow_undefined_flag, $1)

# Flag that forces no undefined symbols.
no_undefined_flag=$lt_[]_LT_AC_TAGVAR(no_undefined_flag, $1)

# Commands used to finish a libtool library installation in a directory.
finish_cmds=$lt_finish_cmds

# Same as above, but a single script fragment to be evaled but not shown.
finish_eval=$lt_finish_eval

# Take the output of nm and produce a listing of raw symbols and C names.
global_symbol_pipe=$lt_lt_cv_sys_global_symbol_pipe

# Transform the output of nm in a proper C declaration
global_symbol_to_cdecl=$lt_lt_cv_sys_global_symbol_to_cdecl

# Transform the output of nm in a C name address pair
global_symbol_to_c_name_address=$lt_lt_cv_sys_global_symbol_to_c_name_address

# This is the shared library runtime path variable.
runpath_var=$runpath_var

# This is the shared library path variable.
shlibpath_var=$shlibpath_var

# Is shlibpath searched before the hard-coded library search path?
shlibpath_overrides_runpath=$shlibpath_overrides_runpath

# How to hardcode a shared library path into an executable.
hardcode_action=$_LT_AC_TAGVAR(hardcode_action, $1)

# Whether we should hardcode library paths into libraries.
hardcode_into_libs=$hardcode_into_libs

# Flag to hardcode \$libdir into a binary during linking.
# This must work even if \$libdir does not exist.
hardcode_libdir_flag_spec=$lt_[]_LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)

# If ld is used when linking, flag to hardcode \$libdir into
# a binary during linking. This must work even if \$libdir does
# not exist.
hardcode_libdir_flag_spec_ld=$lt_[]_LT_AC_TAGVAR(hardcode_libdir_flag_spec_ld, $1)

# Whether we need a single -rpath flag with a separated argument.
hardcode_libdir_separator=$lt_[]_LT_AC_TAGVAR(hardcode_libdir_separator, $1)

# Set to yes if using DIR/libNAME${shared_ext} during linking hardcodes DIR into the
# resulting binary.
hardcode_direct=$_LT_AC_TAGVAR(hardcode_direct, $1)

# Set to yes if using the -LDIR flag during linking hardcodes DIR into the
# resulting binary.
hardcode_minus_L=$_LT_AC_TAGVAR(hardcode_minus_L, $1)

# Set to yes if using SHLIBPATH_VAR=DIR during linking hardcodes DIR into
# the resulting binary.
hardcode_shlibpath_var=$_LT_AC_TAGVAR(hardcode_shlibpath_var, $1)

# Set to yes if building a shared library automatically hardcodes DIR into the library
# and all subsequent libraries and executables linked against it.
hardcode_automatic=$_LT_AC_TAGVAR(hardcode_automatic, $1)

# Variables whose values should be saved in libtool wrapper scripts and
# restored at relink time.
variables_saved_for_relink="$variables_saved_for_relink"

# Whether libtool must link a program against all its dependency libraries.
link_all_deplibs=$_LT_AC_TAGVAR(link_all_deplibs, $1)

# Compile-time system search path for libraries
sys_lib_search_path_spec=$lt_sys_lib_search_path_spec

# Run-time system search path for libraries
sys_lib_dlsearch_path_spec=$lt_sys_lib_dlsearch_path_spec

# Fix the shell variable \$srcfile for the compiler.
fix_srcfile_path="$_LT_AC_TAGVAR(fix_srcfile_path, $1)"

# Set to yes if exported symbols are required.
always_export_symbols=$_LT_AC_TAGVAR(always_export_symbols, $1)

# The commands to list exported symbols.
export_symbols_cmds=$lt_[]_LT_AC_TAGVAR(export_symbols_cmds, $1)

# The commands to extract the exported symbol list from a shared archive.
extract_expsyms_cmds=$lt_extract_expsyms_cmds

# Symbols that should not be listed in the preloaded symbols.
exclude_expsyms=$lt_[]_LT_AC_TAGVAR(exclude_expsyms, $1)

# Symbols that must always be exported.
include_expsyms=$lt_[]_LT_AC_TAGVAR(include_expsyms, $1)

ifelse([$1],[],
[# ### END LIBTOOL CONFIG],
[# ### END LIBTOOL TAG CONFIG: $tagname])

__EOF__

ifelse([$1],[], [
  case $host_os in
  aix3*)
    cat <<\EOF >> "$cfgfile"

# AIX sometimes has problems with the GCC collect2 program.  For some
# reason, if we set the COLLECT_NAMES environment variable, the problems
# vanish in a puff of smoke.
if test "X${COLLECT_NAMES+set}" != Xset; then
  COLLECT_NAMES=
  export COLLECT_NAMES
fi
EOF
    ;;
  esac

  # We use sed instead of cat because bash on DJGPP gets confused if
  # if finds mixed CR/LF and LF-only lines.  Since sed operates in
  # text mode, it properly converts lines to CR/LF.  This bash problem
  # is reportedly fixed, but why not run on old versions too?
  sed '$q' "$ltmain" >> "$cfgfile" || (rm -f "$cfgfile"; exit 1)

  mv -f "$cfgfile" "$ofile" || \
    (rm -f "$ofile" && cp "$cfgfile" "$ofile" && rm -f "$cfgfile")
  chmod +x "$ofile"
])
else
  # If there is no Makefile yet, we rely on a make rule to execute
  # `config.status --recheck' to rerun these tests and create the
  # libtool script then.
  ltmain_in=`echo $ltmain | sed -e 's/\.sh$/.in/'`
  if test -f "$ltmain_in"; then
    test -f Makefile && make "$ltmain"
  fi
fi
])# AC_LIBTOOL_CONFIG


# AC_LIBTOOL_PROG_COMPILER_NO_RTTI([TAGNAME])
# -------------------------------------------
AC_DEFUN([AC_LIBTOOL_PROG_COMPILER_NO_RTTI],
[AC_REQUIRE([_LT_AC_SYS_COMPILER])dnl

_LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, $1)=

if test "$GCC" = yes; then
  _LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, $1)=' -fno-builtin'

  AC_LIBTOOL_COMPILER_OPTION([if $compiler supports -fno-rtti -fno-exceptions],
    lt_cv_prog_compiler_rtti_exceptions,
    [-fno-rtti -fno-exceptions], [],
    [_LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, $1)="$_LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, $1) -fno-rtti -fno-exceptions"])
fi
])# AC_LIBTOOL_PROG_COMPILER_NO_RTTI


# AC_LIBTOOL_SYS_GLOBAL_SYMBOL_PIPE
# ---------------------------------
AC_DEFUN([AC_LIBTOOL_SYS_GLOBAL_SYMBOL_PIPE],
[AC_REQUIRE([AC_CANONICAL_HOST])
AC_REQUIRE([AC_PROG_NM])
AC_REQUIRE([AC_OBJEXT])
# Check for command to grab the raw symbol name followed by C symbol from nm.
AC_MSG_CHECKING([command to parse $NM output from $compiler object])
AC_CACHE_VAL([lt_cv_sys_global_symbol_pipe],
[
# These are sane defaults that work on at least a few old systems.
# [They come from Ultrix.  What could be older than Ultrix?!! ;)]

# Character class describing NM global symbol codes.
symcode='[[BCDEGRST]]'

# Regexp to match symbols that can be accessed directly from C.
sympat='\([[_A-Za-z]][[_A-Za-z0-9]]*\)'

# Transform an extracted symbol line into a proper C declaration
lt_cv_sys_global_symbol_to_cdecl="sed -n -e 's/^. .* \(.*\)$/extern int \1;/p'"

# Transform an extracted symbol line into symbol name and symbol address
lt_cv_sys_global_symbol_to_c_name_address="sed -n -e 's/^: \([[^ ]]*\) $/  {\\\"\1\\\", (lt_ptr) 0},/p' -e 's/^$symcode \([[^ ]]*\) \([[^ ]]*\)$/  {\"\2\", (lt_ptr) \&\2},/p'"

# Define system-specific variables.
case $host_os in
aix*)
  symcode='[[BCDT]]'
  ;;
cygwin* | mingw* | pw32*)
  symcode='[[ABCDGISTW]]'
  ;;
hpux*) # Its linker distinguishes data from code symbols
  if test "$host_cpu" = ia64; then
    symcode='[[ABCDEGRST]]'
  fi
  lt_cv_sys_global_symbol_to_cdecl="sed -n -e 's/^T .* \(.*\)$/extern int \1();/p' -e 's/^$symcode* .* \(.*\)$/extern char \1;/p'"
  lt_cv_sys_global_symbol_to_c_name_address="sed -n -e 's/^: \([[^ ]]*\) $/  {\\\"\1\\\", (lt_ptr) 0},/p' -e 's/^$symcode* \([[^ ]]*\) \([[^ ]]*\)$/  {\"\2\", (lt_ptr) \&\2},/p'"
  ;;
linux*)
  if test "$host_cpu" = ia64; then
    symcode='[[ABCDGIRSTW]]'
    lt_cv_sys_global_symbol_to_cdecl="sed -n -e 's/^T .* \(.*\)$/extern int \1();/p' -e 's/^$symcode* .* \(.*\)$/extern char \1;/p'"
    lt_cv_sys_global_symbol_to_c_name_address="sed -n -e 's/^: \([[^ ]]*\) $/  {\\\"\1\\\", (lt_ptr) 0},/p' -e 's/^$symcode* \([[^ ]]*\) \([[^ ]]*\)$/  {\"\2\", (lt_ptr) \&\2},/p'"
  fi
  ;;
irix* | nonstopux*)
  symcode='[[BCDEGRST]]'
  ;;
osf*)
  symcode='[[BCDEGQRST]]'
  ;;
solaris*)
  symcode='[[BDRT]]'
  ;;
sco3.2v5*)
  symcode='[[DT]]'
  ;;
sysv4.2uw2*)
  symcode='[[DT]]'
  ;;
sysv5* | sco5v6* | unixware* | OpenUNIX*)
  symcode='[[ABDT]]'
  ;;
sysv4)
  symcode='[[DFNSTU]]'
  ;;
esac

# Handle CRLF in mingw tool chain
opt_cr=
case $build_os in
mingw*)
  opt_cr=`echo 'x\{0,1\}' | tr x '\015'` # option cr in regexp
  ;;
esac

# If we're using GNU nm, then use its standard symbol codes.
case `$NM -V 2>&1` in
*GNU* | *'with BFD'*)
  symcode='[[ABCDGIRSTW]]' ;;
esac

# Try without a prefix undercore, then with it.
for ac_symprfx in "" "_"; do

  # Transform symcode, sympat, and symprfx into a raw symbol and a C symbol.
  symxfrm="\\1 $ac_symprfx\\2 \\2"

  # Write the raw and C identifiers.
  lt_cv_sys_global_symbol_pipe="sed -n -e 's/^.*[[ 	]]\($symcode$symcode*\)[[ 	]][[ 	]]*$ac_symprfx$sympat$opt_cr$/$symxfrm/p'"

  # Check to see that the pipe works correctly.
  pipe_works=no

  rm -f conftest*
  cat > conftest.$ac_ext <<EOF
#ifdef __cplusplus
extern "C" {
#endif
char nm_test_var;
void nm_test_func(){}
#ifdef __cplusplus
}
#endif
int main(){nm_test_var='a';nm_test_func();return(0);}
EOF

  if AC_TRY_EVAL(ac_compile); then
    # Now try to grab the symbols.
    nlist=conftest.nm
    if AC_TRY_EVAL(NM conftest.$ac_objext \| $lt_cv_sys_global_symbol_pipe \> $nlist) && test -s "$nlist"; then
      # Try sorting and uniquifying the output.
      if sort "$nlist" | uniq > "$nlist"T; then
	mv -f "$nlist"T "$nlist"
      else
	rm -f "$nlist"T
      fi

      # Make sure that we snagged all the symbols we need.
      if grep ' nm_test_var$' "$nlist" >/dev/null; then
	if grep ' nm_test_func$' "$nlist" >/dev/null; then
	  cat <<EOF > conftest.$ac_ext
#ifdef __cplusplus
extern "C" {
#endif

EOF
	  # Now generate the symbol file.
	  eval "$lt_cv_sys_global_symbol_to_cdecl"' < "$nlist" | grep -v main >> conftest.$ac_ext'

	  cat <<EOF >> conftest.$ac_ext
#if defined (__STDC__) && __STDC__
# define lt_ptr_t void *
#else
# define lt_ptr_t char *
# define const
#endif

/* The mapping between symbol names and symbols. */
const struct {
  const char *name;
  lt_ptr_t address;
}
lt_preloaded_symbols[[]] =
{
EOF
	  $SED "s/^$symcode$symcode* \(.*\) \(.*\)$/  {\"\2\", (lt_ptr_t) \&\2},/" < "$nlist" | grep -v main >> conftest.$ac_ext
	  cat <<\EOF >> conftest.$ac_ext
  {0, (lt_ptr_t) 0}
};

#ifdef __cplusplus
}
#endif
EOF
	  # Now try linking the two files.
	  mv conftest.$ac_objext conftstm.$ac_objext
	  lt_save_LIBS="$LIBS"
	  lt_save_CFLAGS="$CFLAGS"
	  LIBS="conftstm.$ac_objext"
	  CFLAGS="$CFLAGS$_LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, $1)"
	  if AC_TRY_EVAL(ac_link) && test -s conftest${ac_exeext}; then
	    pipe_works=yes
	  fi
	  LIBS="$lt_save_LIBS"
	  CFLAGS="$lt_save_CFLAGS"
	else
	  echo "cannot find nm_test_func in $nlist" >&AS_MESSAGE_LOG_FD
	fi
      else
	echo "cannot find nm_test_var in $nlist" >&AS_MESSAGE_LOG_FD
      fi
    else
      echo "cannot run $lt_cv_sys_global_symbol_pipe" >&AS_MESSAGE_LOG_FD
    fi
  else
    echo "$progname: failed program was:" >&AS_MESSAGE_LOG_FD
    cat conftest.$ac_ext >&5
  fi
  rm -f conftest* conftst*

  # Do not use the global_symbol_pipe unless it works.
  if test "$pipe_works" = yes; then
    break
  else
    lt_cv_sys_global_symbol_pipe=
  fi
done
])
if test -z "$lt_cv_sys_global_symbol_pipe"; then
  lt_cv_sys_global_symbol_to_cdecl=
fi
if test -z "$lt_cv_sys_global_symbol_pipe$lt_cv_sys_global_symbol_to_cdecl"; then
  AC_MSG_RESULT(failed)
else
  AC_MSG_RESULT(ok)
fi
]) # AC_LIBTOOL_SYS_GLOBAL_SYMBOL_PIPE


# AC_LIBTOOL_PROG_COMPILER_PIC([TAGNAME])
# ---------------------------------------
AC_DEFUN([AC_LIBTOOL_PROG_COMPILER_PIC],
[_LT_AC_TAGVAR(lt_prog_compiler_wl, $1)=
_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)=
_LT_AC_TAGVAR(lt_prog_compiler_static, $1)=

AC_MSG_CHECKING([for $compiler option to produce PIC])
 ifelse([$1],[CXX],[
  # C++ specific cases for pic, static, wl, etc.
  if test "$GXX" = yes; then
    _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,'
    _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-static'

    case $host_os in
    aix*)
      # All AIX code is PIC.
      if test "$host_cpu" = ia64; then
	# AIX 5 now supports IA64 processor
	_LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
      fi
      ;;
    amigaos*)
      # FIXME: we need at least 68020 code to build shared libraries, but
      # adding the `-m68020' flag to GCC prevents building anything better,
      # like `-m68040'.
      _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-m68020 -resident32 -malways-restore-a4'
      ;;
    beos* | cygwin* | irix5* | irix6* | nonstopux* | osf3* | osf4* | osf5*)
      # PIC is the default for these OSes.
      ;;
    mingw* | os2* | pw32*)
      # This hack is so that the source file can tell whether it is being
      # built for inclusion in a dll (and should export symbols for example).
      _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-DDLL_EXPORT'
      ;;
    darwin* | rhapsody*)
      # PIC is the default on this platform
      # Common symbols not allowed in MH_DYLIB files
      _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fno-common'
      ;;
    *djgpp*)
      # DJGPP does not support shared libraries at all
      _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)=
      ;;
    interix3*)
      # Interix 3.x gcc -fpic/-fPIC options generate broken code.
      # Instead, we relocate shared libraries at runtime.
      ;;
    sysv4*MP*)
      if test -d /usr/nec; then
	_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)=-Kconform_pic
      fi
      ;;
    hpux*)
      # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but
      # not for PA HP-UX.
      case $host_cpu in
      hppa*64*|ia64*)
	;;
      *)
	_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC'
	;;
      esac
      ;;
    *)
      _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC'
      ;;
    esac
  else
    case $host_os in
      aix4* | aix5*)
	# All AIX code is PIC.
	if test "$host_cpu" = ia64; then
	  # AIX 5 now supports IA64 processor
	  _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
	else
	  _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-bnso -bI:/lib/syscalls.exp'
	fi
	;;
      chorus*)
	case $cc_basename in
	cxch68*)
	  # Green Hills C++ Compiler
	  # _LT_AC_TAGVAR(lt_prog_compiler_static, $1)="--no_auto_instantiation -u __main -u __premain -u _abort -r $COOL_DIR/lib/libOrb.a $MVME_DIR/lib/CC/libC.a $MVME_DIR/lib/classix/libcx.s.a"
	  ;;
	esac
	;;
       darwin*)
         # PIC is the default on this platform
         # Common symbols not allowed in MH_DYLIB files
         case $cc_basename in
           xlc*)
           _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-qnocommon'
           _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,'
           ;;
         esac
       ;;
      dgux*)
	case $cc_basename in
	  ec++*)
	    _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
	    ;;
	  ghcx*)
	    # Green Hills C++ Compiler
	    _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-pic'
	    ;;
	  *)
	    ;;
	esac
	;;
      freebsd* | kfreebsd*-gnu | dragonfly*)
	# FreeBSD uses GNU C++
	;;
      hpux9* | hpux10* | hpux11*)
	case $cc_basename in
	  CC*)
	    _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,'
	    _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='${wl}-a ${wl}archive'
	    if test "$host_cpu" != ia64; then
	      _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='+Z'
	    fi
	    ;;
	  aCC*)
	    _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,'
	    _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='${wl}-a ${wl}archive'
	    case $host_cpu in
	    hppa*64*|ia64*)
	      # +Z the default
	      ;;
	    *)
	      _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='+Z'
	      ;;
	    esac
	    ;;
	  *)
	    ;;
	esac
	;;
      interix*)
	# This is c89, which is MS Visual C++ (no shared libs)
	# Anyone wants to do a port?
	;;
      irix5* | irix6* | nonstopux*)
	case $cc_basename in
	  CC*)
	    _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,'
	    _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-non_shared'
	    # CC pic flag -KPIC is the default.
	    ;;
	  *)
	    ;;
	esac
	;;
      linux*)
	case $cc_basename in
	  KCC*)
	    # KAI C++ Compiler
	    _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='--backend -Wl,'
	    _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC'
	    ;;
	  icpc* | ecpc*)
	    # Intel C++
	    _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,'
	    _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
	    _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-static'
	    ;;
	  pgCC*)
	    # Portland Group C++ compiler.
	    _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,'
	    _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fpic'
	    _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
	    ;;
	  cxx*)
	    # Compaq C++
	    # Make sure the PIC flag is empty.  It appears that all Alpha
	    # Linux and Compaq Tru64 Unix objects are PIC.
	    _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)=
	    _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-non_shared'
	    ;;
	  *)
	    ;;
	esac
	;;
      lynxos*)
	;;
      m88k*)
	;;
      mvs*)
	case $cc_basename in
	  cxx*)
	    _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-W c,exportall'
	    ;;
	  *)
	    ;;
	esac
	;;
      netbsd*)
	;;
      osf3* | osf4* | osf5*)
	case $cc_basename in
	  KCC*)
	    _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='--backend -Wl,'
	    ;;
	  RCC*)
	    # Rational C++ 2.4.1
	    _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-pic'
	    ;;
	  cxx*)
	    # Digital/Compaq C++
	    _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,'
	    # Make sure the PIC flag is empty.  It appears that all Alpha
	    # Linux and Compaq Tru64 Unix objects are PIC.
	    _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)=
	    _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-non_shared'
	    ;;
	  *)
	    ;;
	esac
	;;
      psos*)
	;;
      solaris*)
	case $cc_basename in
	  CC*)
	    # Sun C++ 4.2, 5.x and Centerline C++
	    _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
	    _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
	    _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Qoption ld '
	    ;;
	  gcx*)
	    # Green Hills C++ Compiler
	    _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-PIC'
	    ;;
	  *)
	    ;;
	esac
	;;
      sunos4*)
	case $cc_basename in
	  CC*)
	    # Sun C++ 4.x
	    _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-pic'
	    _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
	    ;;
	  lcc*)
	    # Lucid
	    _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-pic'
	    ;;
	  *)
	    ;;
	esac
	;;
      tandem*)
	case $cc_basename in
	  NCC*)
	    # NonStop-UX NCC 3.20
	    _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
	    ;;
	  *)
	    ;;
	esac
	;;
      sysv5* | unixware* | sco3.2v5* | sco5v6* | OpenUNIX*)
	case $cc_basename in
	  CC*)
	    _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,'
	    _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
	    _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
	    ;;
	esac
	;;
      vxworks*)
	;;
      *)
	_LT_AC_TAGVAR(lt_prog_compiler_can_build_shared, $1)=no
	;;
    esac
  fi
],
[
  if test "$GCC" = yes; then
    _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,'
    _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-static'

    case $host_os in
      aix*)
      # All AIX code is PIC.
      if test "$host_cpu" = ia64; then
	# AIX 5 now supports IA64 processor
	_LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
      fi
      ;;

    amigaos*)
      # FIXME: we need at least 68020 code to build shared libraries, but
      # adding the `-m68020' flag to GCC prevents building anything better,
      # like `-m68040'.
      _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-m68020 -resident32 -malways-restore-a4'
      ;;

    beos* | cygwin* | irix5* | irix6* | nonstopux* | osf3* | osf4* | osf5*)
      # PIC is the default for these OSes.
      ;;

    mingw* | pw32* | os2*)
      # This hack is so that the source file can tell whether it is being
      # built for inclusion in a dll (and should export symbols for example).
      _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-DDLL_EXPORT'
      ;;

    darwin* | rhapsody*)
      # PIC is the default on this platform
      # Common symbols not allowed in MH_DYLIB files
      _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fno-common'
      ;;

    interix3*)
      # Interix 3.x gcc -fpic/-fPIC options generate broken code.
      # Instead, we relocate shared libraries at runtime.
      ;;

    msdosdjgpp*)
      # Just because we use GCC doesn't mean we suddenly get shared libraries
      # on systems that don't support them.
      _LT_AC_TAGVAR(lt_prog_compiler_can_build_shared, $1)=no
      enable_shared=no
      ;;

    sysv4*MP*)
      if test -d /usr/nec; then
	_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)=-Kconform_pic
      fi
      ;;

    hpux*)
      # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but
      # not for PA HP-UX.
      case $host_cpu in
      hppa*64*|ia64*)
	# +Z the default
	;;
      *)
	_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC'
	;;
      esac
      ;;

    *)
      _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC'
      ;;
    esac
  else
    # PORTME Check for flag to pass linker flags through the system compiler.
    case $host_os in
    aix*)
      _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,'
      if test "$host_cpu" = ia64; then
	# AIX 5 now supports IA64 processor
	_LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
      else
	_LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-bnso -bI:/lib/syscalls.exp'
      fi
      ;;
      darwin*)
        # PIC is the default on this platform
        # Common symbols not allowed in MH_DYLIB files
       case $cc_basename in
         xlc*)
         _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-qnocommon'
         _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,'
         ;;
       esac
       ;;

    mingw* | pw32* | os2*)
      # This hack is so that the source file can tell whether it is being
      # built for inclusion in a dll (and should export symbols for example).
      _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-DDLL_EXPORT'
      ;;

    hpux9* | hpux10* | hpux11*)
      _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,'
      # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but
      # not for PA HP-UX.
      case $host_cpu in
      hppa*64*|ia64*)
	# +Z the default
	;;
      *)
	_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='+Z'
	;;
      esac
      # Is there a better lt_prog_compiler_static that works with the bundled CC?
      _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='${wl}-a ${wl}archive'
      ;;

    irix5* | irix6* | nonstopux*)
      _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,'
      # PIC (with -KPIC) is the default.
      _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-non_shared'
      ;;

    newsos6)
      _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
      _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
      ;;

    linux*)
      case $cc_basename in
      icc* | ecc*)
	_LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,'
	_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
	_LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-static'
        ;;
      pgcc* | pgf77* | pgf90* | pgf95*)
        # Portland Group compilers (*not* the Pentium gcc compiler,
	# which looks to be a dead project)
	_LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,'
	_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fpic'
	_LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
        ;;
      ccc*)
        _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,'
        # All Alpha code is PIC.
        _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-non_shared'
        ;;
      esac
      ;;

    osf3* | osf4* | osf5*)
      _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,'
      # All OSF/1 code is PIC.
      _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-non_shared'
      ;;

    solaris*)
      _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
      _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
      case $cc_basename in
      f77* | f90* | f95*)
	_LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Qoption ld ';;
      *)
	_LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,';;
      esac
      ;;

    sunos4*)
      _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Qoption ld '
      _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-PIC'
      _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
      ;;

    sysv4 | sysv4.2uw2* | sysv4.3*)
      _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,'
      _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
      _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
      ;;

    sysv4*MP*)
      if test -d /usr/nec ;then
	_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-Kconform_pic'
	_LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
      fi
      ;;

    sysv5* | unixware* | sco3.2v5* | sco5v6* | OpenUNIX*)
      _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,'
      _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
      _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
      ;;

    unicos*)
      _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,'
      _LT_AC_TAGVAR(lt_prog_compiler_can_build_shared, $1)=no
      ;;

    uts4*)
      _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-pic'
      _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
      ;;

    *)
      _LT_AC_TAGVAR(lt_prog_compiler_can_build_shared, $1)=no
      ;;
    esac
  fi
])
AC_MSG_RESULT([$_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)])

#
# Check to make sure the PIC flag actually works.
#
if test -n "$_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)"; then
  AC_LIBTOOL_COMPILER_OPTION([if $compiler PIC flag $_LT_AC_TAGVAR(lt_prog_compiler_pic, $1) works],
    _LT_AC_TAGVAR(lt_prog_compiler_pic_works, $1),
    [$_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)ifelse([$1],[],[ -DPIC],[ifelse([$1],[CXX],[ -DPIC],[])])], [],
    [case $_LT_AC_TAGVAR(lt_prog_compiler_pic, $1) in
     "" | " "*) ;;
     *) _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)=" $_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)" ;;
     esac],
    [_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)=
     _LT_AC_TAGVAR(lt_prog_compiler_can_build_shared, $1)=no])
fi
case $host_os in
  # For platforms which do not support PIC, -DPIC is meaningless:
  *djgpp*)
    _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)=
    ;;
  *)
    _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)="$_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)ifelse([$1],[],[ -DPIC],[ifelse([$1],[CXX],[ -DPIC],[])])"
    ;;
esac

#
# Check to make sure the static flag actually works.
#
wl=$_LT_AC_TAGVAR(lt_prog_compiler_wl, $1) eval lt_tmp_static_flag=\"$_LT_AC_TAGVAR(lt_prog_compiler_static, $1)\"
AC_LIBTOOL_LINKER_OPTION([if $compiler static flag $lt_tmp_static_flag works],
  _LT_AC_TAGVAR(lt_prog_compiler_static_works, $1),
  $lt_tmp_static_flag,
  [],
  [_LT_AC_TAGVAR(lt_prog_compiler_static, $1)=])
])


# AC_LIBTOOL_PROG_LD_SHLIBS([TAGNAME])
# ------------------------------------
# See if the linker supports building shared libraries.
AC_DEFUN([AC_LIBTOOL_PROG_LD_SHLIBS],
[AC_MSG_CHECKING([whether the $compiler linker ($LD) supports shared libraries])
ifelse([$1],[CXX],[
  _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM $libobjs $convenience | $global_symbol_pipe | $SED '\''s/.* //'\'' | sort | uniq > $export_symbols'
  case $host_os in
  aix4* | aix5*)
    # If we're using GNU nm, then we don't want the "-C" option.
    # -C means demangle to AIX nm, but means don't demangle with GNU nm
    if $NM -V 2>&1 | grep 'GNU' > /dev/null; then
      _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM -Bpg $libobjs $convenience | awk '\''{ if (((\[$]2 == "T") || (\[$]2 == "D") || (\[$]2 == "B")) && ([substr](\[$]3,1,1) != ".")) { print \[$]3 } }'\'' | sort -u > $export_symbols'
    else
      _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM -BCpg $libobjs $convenience | awk '\''{ if (((\[$]2 == "T") || (\[$]2 == "D") || (\[$]2 == "B")) && ([substr](\[$]3,1,1) != ".")) { print \[$]3 } }'\'' | sort -u > $export_symbols'
    fi
    ;;
  pw32*)
    _LT_AC_TAGVAR(export_symbols_cmds, $1)="$ltdll_cmds"
  ;;
  cygwin* | mingw*)
    _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM $libobjs $convenience | $global_symbol_pipe | $SED -e '\''/^[[BCDGRS]] /s/.* \([[^ ]]*\)/\1 DATA/;/^.* __nm__/s/^.* __nm__\([[^ ]]*\) [[^ ]]*/\1 DATA/;/^I /d;/^[[AITW]] /s/.* //'\'' | sort | uniq > $export_symbols'
  ;;
  *)
    _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM $libobjs $convenience | $global_symbol_pipe | $SED '\''s/.* //'\'' | sort | uniq > $export_symbols'
  ;;
  esac
],[
  runpath_var=
  _LT_AC_TAGVAR(allow_undefined_flag, $1)=
  _LT_AC_TAGVAR(enable_shared_with_static_runtimes, $1)=no
  _LT_AC_TAGVAR(archive_cmds, $1)=
  _LT_AC_TAGVAR(archive_expsym_cmds, $1)=
  _LT_AC_TAGVAR(old_archive_From_new_cmds, $1)=
  _LT_AC_TAGVAR(old_archive_from_expsyms_cmds, $1)=
  _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)=
  _LT_AC_TAGVAR(whole_archive_flag_spec, $1)=
  _LT_AC_TAGVAR(thread_safe_flag_spec, $1)=
  _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)=
  _LT_AC_TAGVAR(hardcode_libdir_flag_spec_ld, $1)=
  _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=
  _LT_AC_TAGVAR(hardcode_direct, $1)=no
  _LT_AC_TAGVAR(hardcode_minus_L, $1)=no
  _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=unsupported
  _LT_AC_TAGVAR(link_all_deplibs, $1)=unknown
  _LT_AC_TAGVAR(hardcode_automatic, $1)=no
  _LT_AC_TAGVAR(module_cmds, $1)=
  _LT_AC_TAGVAR(module_expsym_cmds, $1)=
  _LT_AC_TAGVAR(always_export_symbols, $1)=no
  _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM $libobjs $convenience | $global_symbol_pipe | $SED '\''s/.* //'\'' | sort | uniq > $export_symbols'
  # include_expsyms should be a list of space-separated symbols to be *always*
  # included in the symbol list
  _LT_AC_TAGVAR(include_expsyms, $1)=
  # exclude_expsyms can be an extended regexp of symbols to exclude
  # it will be wrapped by ` (' and `)$', so one must not match beginning or
  # end of line.  Example: `a|bc|.*d.*' will exclude the symbols `a' and `bc',
  # as well as any symbol that contains `d'.
  _LT_AC_TAGVAR(exclude_expsyms, $1)="_GLOBAL_OFFSET_TABLE_"
  # Although _GLOBAL_OFFSET_TABLE_ is a valid symbol C name, most a.out
  # platforms (ab)use it in PIC code, but their linkers get confused if
  # the symbol is explicitly referenced.  Since portable code cannot
  # rely on this symbol name, it's probably fine to never include it in
  # preloaded symbol tables.
  extract_expsyms_cmds=
  # Just being paranoid about ensuring that cc_basename is set.
  _LT_CC_BASENAME([$compiler])
  case $host_os in
  cygwin* | mingw* | pw32*)
    # FIXME: the MSVC++ port hasn't been tested in a loooong time
    # When not using gcc, we currently assume that we are using
    # Microsoft Visual C++.
    if test "$GCC" != yes; then
      with_gnu_ld=no
    fi
    ;;
  interix*)
    # we just hope/assume this is gcc and not c89 (= MSVC++)
    with_gnu_ld=yes
    ;;
  openbsd*)
    with_gnu_ld=no
    ;;
  esac

  _LT_AC_TAGVAR(ld_shlibs, $1)=yes
  if test "$with_gnu_ld" = yes; then
    # If archive_cmds runs LD, not CC, wlarc should be empty
    wlarc='${wl}'

    # Set some defaults for GNU ld with shared library support. These
    # are reset later if shared libraries are not supported. Putting them
    # here allows them to be overridden if necessary.
    runpath_var=LD_RUN_PATH
    _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}--rpath ${wl}$libdir'
    _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}--export-dynamic'
    # ancient GNU ld didn't support --whole-archive et. al.
    if $LD --help 2>&1 | grep 'no-whole-archive' > /dev/null; then
	_LT_AC_TAGVAR(whole_archive_flag_spec, $1)="$wlarc"'--whole-archive$convenience '"$wlarc"'--no-whole-archive'
      else
  	_LT_AC_TAGVAR(whole_archive_flag_spec, $1)=
    fi
    supports_anon_versioning=no
    case `$LD -v 2>/dev/null` in
      *\ [[01]].* | *\ 2.[[0-9]].* | *\ 2.10.*) ;; # catch versions < 2.11
      *\ 2.11.93.0.2\ *) supports_anon_versioning=yes ;; # RH7.3 ...
      *\ 2.11.92.0.12\ *) supports_anon_versioning=yes ;; # Mandrake 8.2 ...
      *\ 2.11.*) ;; # other 2.11 versions
      *) supports_anon_versioning=yes ;;
    esac

    # See if GNU ld supports shared libraries.
    case $host_os in
    aix3* | aix4* | aix5*)
      # On AIX/PPC, the GNU linker is very broken
      if test "$host_cpu" != ia64; then
	_LT_AC_TAGVAR(ld_shlibs, $1)=no
	cat <<EOF 1>&2

*** Warning: the GNU linker, at least up to release 2.9.1, is reported
*** to be unable to reliably create shared libraries on AIX.
*** Therefore, libtool is disabling shared libraries support.  If you
*** really care for shared libraries, you may want to modify your PATH
*** so that a non-GNU linker is found, and then restart.

EOF
      fi
      ;;

    amigaos*)
      _LT_AC_TAGVAR(archive_cmds, $1)='$rm $output_objdir/a2ixlibrary.data~$echo "#define NAME $libname" > $output_objdir/a2ixlibrary.data~$echo "#define LIBRARY_ID 1" >> $output_objdir/a2ixlibrary.data~$echo "#define VERSION $major" >> $output_objdir/a2ixlibrary.data~$echo "#define REVISION $revision" >> $output_objdir/a2ixlibrary.data~$AR $AR_FLAGS $lib $libobjs~$RANLIB $lib~(cd $output_objdir && a2ixlibrary -32)'
      _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir'
      _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes

      # Samuel A. Falvo II <kc5tja@dolphin.openprojects.net> reports
      # that the semantics of dynamic libraries on AmigaOS, at least up
      # to version 4, is to share data among multiple programs linked
      # with the same dynamic library.  Since this doesn't match the
      # behavior of shared libraries on other platforms, we can't use
      # them.
      _LT_AC_TAGVAR(ld_shlibs, $1)=no
      ;;

    beos*)
      if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then
	_LT_AC_TAGVAR(allow_undefined_flag, $1)=unsupported
	# Joseph Beckenbach <jrb3@best.com> says some releases of gcc
	# support --undefined.  This deserves some investigation.  FIXME
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -nostart $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib'
      else
	_LT_AC_TAGVAR(ld_shlibs, $1)=no
      fi
      ;;

    cygwin* | mingw* | pw32*)
      # _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1) is actually meaningless,
      # as there is no search path for DLLs.
      _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir'
      _LT_AC_TAGVAR(allow_undefined_flag, $1)=unsupported
      _LT_AC_TAGVAR(always_export_symbols, $1)=no
      _LT_AC_TAGVAR(enable_shared_with_static_runtimes, $1)=yes
      _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM $libobjs $convenience | $global_symbol_pipe | $SED -e '\''/^[[BCDGRS]] /s/.* \([[^ ]]*\)/\1 DATA/'\'' | $SED -e '\''/^[[AITW]] /s/.* //'\'' | sort | uniq > $export_symbols'

      if $LD --help 2>&1 | grep 'auto-import' > /dev/null; then
        _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $libobjs $deplibs $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib'
	# If the export-symbols file already is a .def file (1st line
	# is EXPORTS), use it as is; otherwise, prepend...
	_LT_AC_TAGVAR(archive_expsym_cmds, $1)='if test "x`$SED 1q $export_symbols`" = xEXPORTS; then
	  cp $export_symbols $output_objdir/$soname.def;
	else
	  echo EXPORTS > $output_objdir/$soname.def;
	  cat $export_symbols >> $output_objdir/$soname.def;
	fi~
	$CC -shared $output_objdir/$soname.def $libobjs $deplibs $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib'
      else
	_LT_AC_TAGVAR(ld_shlibs, $1)=no
      fi
      ;;

    interix3*)
      _LT_AC_TAGVAR(hardcode_direct, $1)=no
      _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no
      _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath,$libdir'
      _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E'
      # Hack: On Interix 3.x, we cannot compile PIC because of a broken gcc.
      # Instead, shared libraries are loaded at an image base (0x10000000 by
      # default) and relocated if they conflict, which is a slow very memory
      # consuming and fragmenting process.  To avoid this, we pick a random,
      # 256 KiB-aligned image base between 0x50000000 and 0x6FFC0000 at link
      # time.  Moving up from 0x10000000 also allows more sbrk(2) space.
      _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags ${wl}-h,$soname ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib'
      _LT_AC_TAGVAR(archive_expsym_cmds, $1)='sed "s,^,_," $export_symbols >$output_objdir/$soname.expsym~$CC -shared $pic_flag $libobjs $deplibs $compiler_flags ${wl}-h,$soname ${wl}--retain-symbols-file,$output_objdir/$soname.expsym ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib'
      ;;

    linux*)
      if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then
	tmp_addflag=
	case $cc_basename,$host_cpu in
	pgcc*)				# Portland Group C compiler
	  _LT_AC_TAGVAR(whole_archive_flag_spec, $1)='${wl}--whole-archive`for conv in $convenience\"\"; do test  -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}--no-whole-archive'
	  tmp_addflag=' $pic_flag'
	  ;;
	pgf77* | pgf90* | pgf95*)	# Portland Group f77 and f90 compilers
	  _LT_AC_TAGVAR(whole_archive_flag_spec, $1)='${wl}--whole-archive`for conv in $convenience\"\"; do test  -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}--no-whole-archive'
	  tmp_addflag=' $pic_flag -Mnomain' ;;
	ecc*,ia64* | icc*,ia64*)		# Intel C compiler on ia64
	  tmp_addflag=' -i_dynamic' ;;
	efc*,ia64* | ifort*,ia64*)	# Intel Fortran compiler on ia64
	  tmp_addflag=' -i_dynamic -nofor_main' ;;
	ifc* | ifort*)			# Intel Fortran compiler
	  tmp_addflag=' -nofor_main' ;;
	esac
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared'"$tmp_addflag"' $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib'

	if test $supports_anon_versioning = yes; then
	  _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$echo "{ global:" > $output_objdir/$libname.ver~
  cat $export_symbols | sed -e "s/\(.*\)/\1;/" >> $output_objdir/$libname.ver~
  $echo "local: *; };" >> $output_objdir/$libname.ver~
	  $CC -shared'"$tmp_addflag"' $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-version-script ${wl}$output_objdir/$libname.ver -o $lib'
	fi
      else
	_LT_AC_TAGVAR(ld_shlibs, $1)=no
      fi
      ;;

    netbsd*)
      if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then
	_LT_AC_TAGVAR(archive_cmds, $1)='$LD -Bshareable $libobjs $deplibs $linker_flags -o $lib'
	wlarc=
      else
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib'
	_LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib'
      fi
      ;;

    solaris*)
      if $LD -v 2>&1 | grep 'BFD 2\.8' > /dev/null; then
	_LT_AC_TAGVAR(ld_shlibs, $1)=no
	cat <<EOF 1>&2

*** Warning: The releases 2.8.* of the GNU linker cannot reliably
*** create shared libraries on Solaris systems.  Therefore, libtool
*** is disabling shared libraries support.  We urge you to upgrade GNU
*** binutils to release 2.9.1 or newer.  Another option is to modify
*** your PATH or compiler configuration so that the native linker is
*** used, and then restart.

EOF
      elif $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib'
	_LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib'
      else
	_LT_AC_TAGVAR(ld_shlibs, $1)=no
      fi
      ;;

    sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX*)
      case `$LD -v 2>&1` in
        *\ [[01]].* | *\ 2.[[0-9]].* | *\ 2.1[[0-5]].*) 
	_LT_AC_TAGVAR(ld_shlibs, $1)=no
	cat <<_LT_EOF 1>&2

*** Warning: Releases of the GNU linker prior to 2.16.91.0.3 can not
*** reliably create shared libraries on SCO systems.  Therefore, libtool
*** is disabling shared libraries support.  We urge you to upgrade GNU
*** binutils to release 2.16.91.0.3 or newer.  Another option is to modify
*** your PATH or compiler configuration so that the native linker is
*** used, and then restart.

_LT_EOF
	;;
	*)
	  if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then
	    _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='`test -z "$SCOABSPATH" && echo ${wl}-rpath,$libdir`'
	    _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib'
	    _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname,\${SCOABSPATH:+${install_libdir}/}$soname,-retain-symbols-file,$export_symbols -o $lib'
	  else
	    _LT_AC_TAGVAR(ld_shlibs, $1)=no
	  fi
	;;
      esac
      ;;

    sunos4*)
      _LT_AC_TAGVAR(archive_cmds, $1)='$LD -assert pure-text -Bshareable -o $lib $libobjs $deplibs $linker_flags'
      wlarc=
      _LT_AC_TAGVAR(hardcode_direct, $1)=yes
      _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no
      ;;

    *)
      if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib'
	_LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib'
      else
	_LT_AC_TAGVAR(ld_shlibs, $1)=no
      fi
      ;;
    esac

    if test "$_LT_AC_TAGVAR(ld_shlibs, $1)" = no; then
      runpath_var=
      _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)=
      _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)=
      _LT_AC_TAGVAR(whole_archive_flag_spec, $1)=
    fi
  else
    # PORTME fill in a description of your system's linker (not GNU ld)
    case $host_os in
    aix3*)
      _LT_AC_TAGVAR(allow_undefined_flag, $1)=unsupported
      _LT_AC_TAGVAR(always_export_symbols, $1)=yes
      _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$LD -o $output_objdir/$soname $libobjs $deplibs $linker_flags -bE:$export_symbols -T512 -H512 -bM:SRE~$AR $AR_FLAGS $lib $output_objdir/$soname'
      # Note: this linker hardcodes the directories in LIBPATH if there
      # are no directories specified by -L.
      _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes
      if test "$GCC" = yes && test -z "$lt_prog_compiler_static"; then
	# Neither direct hardcoding nor static linking is supported with a
	# broken collect2.
	_LT_AC_TAGVAR(hardcode_direct, $1)=unsupported
      fi
      ;;

    aix4* | aix5*)
      if test "$host_cpu" = ia64; then
	# On IA64, the linker does run time linking by default, so we don't
	# have to do anything special.
	aix_use_runtimelinking=no
	exp_sym_flag='-Bexport'
	no_entry_flag=""
      else
	# If we're using GNU nm, then we don't want the "-C" option.
	# -C means demangle to AIX nm, but means don't demangle with GNU nm
	if $NM -V 2>&1 | grep 'GNU' > /dev/null; then
	  _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM -Bpg $libobjs $convenience | awk '\''{ if (((\[$]2 == "T") || (\[$]2 == "D") || (\[$]2 == "B")) && ([substr](\[$]3,1,1) != ".")) { print \[$]3 } }'\'' | sort -u > $export_symbols'
	else
	  _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM -BCpg $libobjs $convenience | awk '\''{ if (((\[$]2 == "T") || (\[$]2 == "D") || (\[$]2 == "B")) && ([substr](\[$]3,1,1) != ".")) { print \[$]3 } }'\'' | sort -u > $export_symbols'
	fi
	aix_use_runtimelinking=no

	# Test if we are trying to use run time linking or normal
	# AIX style linking. If -brtl is somewhere in LDFLAGS, we
	# need to do runtime linking.
	case $host_os in aix4.[[23]]|aix4.[[23]].*|aix5*)
	  for ld_flag in $LDFLAGS; do
  	  if (test $ld_flag = "-brtl" || test $ld_flag = "-Wl,-brtl"); then
  	    aix_use_runtimelinking=yes
  	    break
  	  fi
	  done
	  ;;
	esac

	exp_sym_flag='-bexport'
	no_entry_flag='-bnoentry'
      fi

      # When large executables or shared objects are built, AIX ld can
      # have problems creating the table of contents.  If linking a library
      # or program results in "error TOC overflow" add -mminimal-toc to
      # CXXFLAGS/CFLAGS for g++/gcc.  In the cases where that is not
      # enough to fix the problem, add -Wl,-bbigtoc to LDFLAGS.

      _LT_AC_TAGVAR(archive_cmds, $1)=''
      _LT_AC_TAGVAR(hardcode_direct, $1)=yes
      _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=':'
      _LT_AC_TAGVAR(link_all_deplibs, $1)=yes

      if test "$GCC" = yes; then
	case $host_os in aix4.[[012]]|aix4.[[012]].*)
	# We only want to do this on AIX 4.2 and lower, the check
	# below for broken collect2 doesn't work under 4.3+
	  collect2name=`${CC} -print-prog-name=collect2`
	  if test -f "$collect2name" && \
  	   strings "$collect2name" | grep resolve_lib_name >/dev/null
	  then
  	  # We have reworked collect2
  	  _LT_AC_TAGVAR(hardcode_direct, $1)=yes
	  else
  	  # We have old collect2
  	  _LT_AC_TAGVAR(hardcode_direct, $1)=unsupported
  	  # It fails to find uninstalled libraries when the uninstalled
  	  # path is not listed in the libpath.  Setting hardcode_minus_L
  	  # to unsupported forces relinking
  	  _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes
  	  _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir'
  	  _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=
	  fi
	  ;;
	esac
	shared_flag='-shared'
	if test "$aix_use_runtimelinking" = yes; then
	  shared_flag="$shared_flag "'${wl}-G'
	fi
      else
	# not using gcc
	if test "$host_cpu" = ia64; then
  	# VisualAge C++, Version 5.5 for AIX 5L for IA-64, Beta 3 Release
  	# chokes on -Wl,-G. The following line is correct:
	  shared_flag='-G'
	else
	  if test "$aix_use_runtimelinking" = yes; then
	    shared_flag='${wl}-G'
	  else
	    shared_flag='${wl}-bM:SRE'
	  fi
	fi
      fi

      # It seems that -bexpall does not export symbols beginning with
      # underscore (_), so it is better to generate a list of symbols to export.
      _LT_AC_TAGVAR(always_export_symbols, $1)=yes
      if test "$aix_use_runtimelinking" = yes; then
	# Warning - without using the other runtime loading flags (-brtl),
	# -berok will link without error, but may produce a broken library.
	_LT_AC_TAGVAR(allow_undefined_flag, $1)='-berok'
       # Determine the default libpath from the value encoded in an empty executable.
       _LT_AC_SYS_LIBPATH_AIX
       _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-blibpath:$libdir:'"$aix_libpath"
	_LT_AC_TAGVAR(archive_expsym_cmds, $1)="\$CC"' -o $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' $compiler_flags `if test "x${allow_undefined_flag}" != "x"; then echo "${wl}${allow_undefined_flag}"; else :; fi` '"\${wl}$exp_sym_flag:\$export_symbols $shared_flag"
       else
	if test "$host_cpu" = ia64; then
	  _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-R $libdir:/usr/lib:/lib'
	  _LT_AC_TAGVAR(allow_undefined_flag, $1)="-z nodefs"
	  _LT_AC_TAGVAR(archive_expsym_cmds, $1)="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' $compiler_flags ${wl}${allow_undefined_flag} '"\${wl}$exp_sym_flag:\$export_symbols"
	else
	 # Determine the default libpath from the value encoded in an empty executable.
	 _LT_AC_SYS_LIBPATH_AIX
	 _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-blibpath:$libdir:'"$aix_libpath"
	  # Warning - without using the other run time loading flags,
	  # -berok will link without error, but may produce a broken library.
	  _LT_AC_TAGVAR(no_undefined_flag, $1)=' ${wl}-bernotok'
	  _LT_AC_TAGVAR(allow_undefined_flag, $1)=' ${wl}-berok'
	  # Exported symbols can be pulled into shared objects from archives
	  _LT_AC_TAGVAR(whole_archive_flag_spec, $1)='$convenience'
	  _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=yes
	  # This is similar to how AIX traditionally builds its shared libraries.
	  _LT_AC_TAGVAR(archive_expsym_cmds, $1)="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs ${wl}-bnoentry $compiler_flags ${wl}-bE:$export_symbols${allow_undefined_flag}~$AR $AR_FLAGS $output_objdir/$libname$release.a $output_objdir/$soname'
	fi
      fi
      ;;

    amigaos*)
      _LT_AC_TAGVAR(archive_cmds, $1)='$rm $output_objdir/a2ixlibrary.data~$echo "#define NAME $libname" > $output_objdir/a2ixlibrary.data~$echo "#define LIBRARY_ID 1" >> $output_objdir/a2ixlibrary.data~$echo "#define VERSION $major" >> $output_objdir/a2ixlibrary.data~$echo "#define REVISION $revision" >> $output_objdir/a2ixlibrary.data~$AR $AR_FLAGS $lib $libobjs~$RANLIB $lib~(cd $output_objdir && a2ixlibrary -32)'
      _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir'
      _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes
      # see comment about different semantics on the GNU ld section
      _LT_AC_TAGVAR(ld_shlibs, $1)=no
      ;;

    bsdi[[45]]*)
      _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)=-rdynamic
      ;;

    cygwin* | mingw* | pw32*)
      # When not using gcc, we currently assume that we are using
      # Microsoft Visual C++.
      # hardcode_libdir_flag_spec is actually meaningless, as there is
      # no search path for DLLs.
      _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)=' '
      _LT_AC_TAGVAR(allow_undefined_flag, $1)=unsupported
      # Tell ltmain to make .lib files, not .a files.
      libext=lib
      # Tell ltmain to make .dll files, not .so files.
      shrext_cmds=".dll"
      # FIXME: Setting linknames here is a bad hack.
      _LT_AC_TAGVAR(archive_cmds, $1)='$CC -o $lib $libobjs $compiler_flags `echo "$deplibs" | $SED -e '\''s/ -lc$//'\''` -link -dll~linknames='
      # The linker will automatically build a .lib file if we build a DLL.
      _LT_AC_TAGVAR(old_archive_From_new_cmds, $1)='true'
      # FIXME: Should let the user specify the lib program.
      _LT_AC_TAGVAR(old_archive_cmds, $1)='lib /OUT:$oldlib$oldobjs$old_deplibs'
      _LT_AC_TAGVAR(fix_srcfile_path, $1)='`cygpath -w "$srcfile"`'
      _LT_AC_TAGVAR(enable_shared_with_static_runtimes, $1)=yes
      ;;

    darwin* | rhapsody*)
      case $host_os in
        rhapsody* | darwin1.[[012]])
         _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-undefined ${wl}suppress'
         ;;
       *) # Darwin 1.3 on
         if test -z ${MACOSX_DEPLOYMENT_TARGET} ; then
           _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-flat_namespace ${wl}-undefined ${wl}suppress'
         else
           case ${MACOSX_DEPLOYMENT_TARGET} in
             10.[[012]])
               _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-flat_namespace ${wl}-undefined ${wl}suppress'
               ;;
             10.*)
               _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-undefined ${wl}dynamic_lookup'
               ;;
           esac
         fi
         ;;
      esac
      _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no
      _LT_AC_TAGVAR(hardcode_direct, $1)=no
      _LT_AC_TAGVAR(hardcode_automatic, $1)=yes
      _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=unsupported
      _LT_AC_TAGVAR(whole_archive_flag_spec, $1)=''
      _LT_AC_TAGVAR(link_all_deplibs, $1)=yes
    if test "$GCC" = yes ; then
    	output_verbose_link_cmd='echo'
        _LT_AC_TAGVAR(archive_cmds, $1)='$CC -dynamiclib $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags -install_name $rpath/$soname $verstring'
      _LT_AC_TAGVAR(module_cmds, $1)='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags'
      # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin lds
      _LT_AC_TAGVAR(archive_expsym_cmds, $1)='sed -e "s,#.*,," -e "s,^[    ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -dynamiclib $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags -install_name $rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}'
      _LT_AC_TAGVAR(module_expsym_cmds, $1)='sed -e "s,#.*,," -e "s,^[    ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag  -o $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}'
    else
      case $cc_basename in
        xlc*)
         output_verbose_link_cmd='echo'
         _LT_AC_TAGVAR(archive_cmds, $1)='$CC -qmkshrobj $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}`echo $rpath/$soname` $verstring'
         _LT_AC_TAGVAR(module_cmds, $1)='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags'
          # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin lds
         _LT_AC_TAGVAR(archive_expsym_cmds, $1)='sed -e "s,#.*,," -e "s,^[    ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -qmkshrobj $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}$rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}'
          _LT_AC_TAGVAR(module_expsym_cmds, $1)='sed -e "s,#.*,," -e "s,^[    ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag  -o $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}'
          ;;
       *)
         _LT_AC_TAGVAR(ld_shlibs, $1)=no
          ;;
      esac
    fi
      ;;

    dgux*)
      _LT_AC_TAGVAR(archive_cmds, $1)='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags'
      _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir'
      _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no
      ;;

    freebsd1*)
      _LT_AC_TAGVAR(ld_shlibs, $1)=no
      ;;

    # FreeBSD 2.2.[012] allows us to include c++rt0.o to get C++ constructor
    # support.  Future versions do this automatically, but an explicit c++rt0.o
    # does not break anything, and helps significantly (at the cost of a little
    # extra space).
    freebsd2.2*)
      _LT_AC_TAGVAR(archive_cmds, $1)='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags /usr/lib/c++rt0.o'
      _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-R$libdir'
      _LT_AC_TAGVAR(hardcode_direct, $1)=yes
      _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no
      ;;

    # Unfortunately, older versions of FreeBSD 2 do not have this feature.
    freebsd2*)
      _LT_AC_TAGVAR(archive_cmds, $1)='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags'
      _LT_AC_TAGVAR(hardcode_direct, $1)=yes
      _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes
      _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no
      ;;

    # FreeBSD 3 and greater uses gcc -shared to do shared libraries.
    freebsd* | kfreebsd*-gnu | dragonfly*)
      _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -o $lib $libobjs $deplibs $compiler_flags'
      _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-R$libdir'
      _LT_AC_TAGVAR(hardcode_direct, $1)=yes
      _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no
      ;;

    hpux9*)
      if test "$GCC" = yes; then
	_LT_AC_TAGVAR(archive_cmds, $1)='$rm $output_objdir/$soname~$CC -shared -fPIC ${wl}+b ${wl}$install_libdir -o $output_objdir/$soname $libobjs $deplibs $compiler_flags~test $output_objdir/$soname = $lib || mv $output_objdir/$soname $lib'
      else
	_LT_AC_TAGVAR(archive_cmds, $1)='$rm $output_objdir/$soname~$LD -b +b $install_libdir -o $output_objdir/$soname $libobjs $deplibs $linker_flags~test $output_objdir/$soname = $lib || mv $output_objdir/$soname $lib'
      fi
      _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}+b ${wl}$libdir'
      _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=:
      _LT_AC_TAGVAR(hardcode_direct, $1)=yes

      # hardcode_minus_L: Not really in the search PATH,
      # but as the default location of the library.
      _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes
      _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E'
      ;;

    hpux10*)
      if test "$GCC" = yes -a "$with_gnu_ld" = no; then
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -fPIC ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags'
      else
	_LT_AC_TAGVAR(archive_cmds, $1)='$LD -b +h $soname +b $install_libdir -o $lib $libobjs $deplibs $linker_flags'
      fi
      if test "$with_gnu_ld" = no; then
	_LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}+b ${wl}$libdir'
	_LT_AC_TAGVAR(hardcode_libdir_separator, $1)=:

	_LT_AC_TAGVAR(hardcode_direct, $1)=yes
	_LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E'

	# hardcode_minus_L: Not really in the search PATH,
	# but as the default location of the library.
	_LT_AC_TAGVAR(hardcode_minus_L, $1)=yes
      fi
      ;;

    hpux11*)
      if test "$GCC" = yes -a "$with_gnu_ld" = no; then
	case $host_cpu in
	hppa*64*)
	  _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared ${wl}+h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags'
	  ;;
	ia64*)
	  _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared ${wl}+h ${wl}$soname ${wl}+nodefaultrpath -o $lib $libobjs $deplibs $compiler_flags'
	  ;;
	*)
	  _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -fPIC ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags'
	  ;;
	esac
      else
	case $host_cpu in
	hppa*64*)
	  _LT_AC_TAGVAR(archive_cmds, $1)='$CC -b ${wl}+h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags'
	  ;;
	ia64*)
	  _LT_AC_TAGVAR(archive_cmds, $1)='$CC -b ${wl}+h ${wl}$soname ${wl}+nodefaultrpath -o $lib $libobjs $deplibs $compiler_flags'
	  ;;
	*)
	  _LT_AC_TAGVAR(archive_cmds, $1)='$CC -b ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags'
	  ;;
	esac
      fi
      if test "$with_gnu_ld" = no; then
	_LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}+b ${wl}$libdir'
	_LT_AC_TAGVAR(hardcode_libdir_separator, $1)=:

	case $host_cpu in
	hppa*64*|ia64*)
	  _LT_AC_TAGVAR(hardcode_libdir_flag_spec_ld, $1)='+b $libdir'
	  _LT_AC_TAGVAR(hardcode_direct, $1)=no
	  _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no
	  ;;
	*)
	  _LT_AC_TAGVAR(hardcode_direct, $1)=yes
	  _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E'

	  # hardcode_minus_L: Not really in the search PATH,
	  # but as the default location of the library.
	  _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes
	  ;;
	esac
      fi
      ;;

    irix5* | irix6* | nonstopux*)
      if test "$GCC" = yes; then
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib'
      else
	_LT_AC_TAGVAR(archive_cmds, $1)='$LD -shared $libobjs $deplibs $linker_flags -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib'
	_LT_AC_TAGVAR(hardcode_libdir_flag_spec_ld, $1)='-rpath $libdir'
      fi
      _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath ${wl}$libdir'
      _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=:
      _LT_AC_TAGVAR(link_all_deplibs, $1)=yes
      ;;

    netbsd*)
      if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then
	_LT_AC_TAGVAR(archive_cmds, $1)='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags'  # a.out
      else
	_LT_AC_TAGVAR(archive_cmds, $1)='$LD -shared -o $lib $libobjs $deplibs $linker_flags'      # ELF
      fi
      _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-R$libdir'
      _LT_AC_TAGVAR(hardcode_direct, $1)=yes
      _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no
      ;;

    newsos6)
      _LT_AC_TAGVAR(archive_cmds, $1)='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags'
      _LT_AC_TAGVAR(hardcode_direct, $1)=yes
      _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath ${wl}$libdir'
      _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=:
      _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no
      ;;

    openbsd*)
      _LT_AC_TAGVAR(hardcode_direct, $1)=yes
      _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no
      if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`" || test "$host_os-$host_cpu" = "openbsd2.8-powerpc"; then
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags'
	_LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-retain-symbols-file,$export_symbols'
	_LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath,$libdir'
	_LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E'
      else
       case $host_os in
	 openbsd[[01]].* | openbsd2.[[0-7]] | openbsd2.[[0-7]].*)
	   _LT_AC_TAGVAR(archive_cmds, $1)='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags'
	   _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-R$libdir'
	   ;;
	 *)
	   _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags'
	   _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath,$libdir'
	   ;;
       esac
      fi
      ;;

    os2*)
      _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir'
      _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes
      _LT_AC_TAGVAR(allow_undefined_flag, $1)=unsupported
      _LT_AC_TAGVAR(archive_cmds, $1)='$echo "LIBRARY $libname INITINSTANCE" > $output_objdir/$libname.def~$echo "DESCRIPTION \"$libname\"" >> $output_objdir/$libname.def~$echo DATA >> $output_objdir/$libname.def~$echo " SINGLE NONSHARED" >> $output_objdir/$libname.def~$echo EXPORTS >> $output_objdir/$libname.def~emxexp $libobjs >> $output_objdir/$libname.def~$CC -Zdll -Zcrtdll -o $lib $libobjs $deplibs $compiler_flags $output_objdir/$libname.def'
      _LT_AC_TAGVAR(old_archive_From_new_cmds, $1)='emximp -o $output_objdir/$libname.a $output_objdir/$libname.def'
      ;;

    osf3*)
      if test "$GCC" = yes; then
	_LT_AC_TAGVAR(allow_undefined_flag, $1)=' ${wl}-expect_unresolved ${wl}\*'
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared${allow_undefined_flag} $libobjs $deplibs $compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib'
      else
	_LT_AC_TAGVAR(allow_undefined_flag, $1)=' -expect_unresolved \*'
	_LT_AC_TAGVAR(archive_cmds, $1)='$LD -shared${allow_undefined_flag} $libobjs $deplibs $linker_flags -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib'
      fi
      _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath ${wl}$libdir'
      _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=:
      ;;

    osf4* | osf5*)	# as osf3* with the addition of -msym flag
      if test "$GCC" = yes; then
	_LT_AC_TAGVAR(allow_undefined_flag, $1)=' ${wl}-expect_unresolved ${wl}\*'
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared${allow_undefined_flag} $libobjs $deplibs $compiler_flags ${wl}-msym ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib'
	_LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath ${wl}$libdir'
      else
	_LT_AC_TAGVAR(allow_undefined_flag, $1)=' -expect_unresolved \*'
	_LT_AC_TAGVAR(archive_cmds, $1)='$LD -shared${allow_undefined_flag} $libobjs $deplibs $linker_flags -msym -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib'
	_LT_AC_TAGVAR(archive_expsym_cmds, $1)='for i in `cat $export_symbols`; do printf "%s %s\\n" -exported_symbol "\$i" >> $lib.exp; done; echo "-hidden">> $lib.exp~
	$LD -shared${allow_undefined_flag} -input $lib.exp $linker_flags $libobjs $deplibs -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib~$rm $lib.exp'

	# Both c and cxx compiler support -rpath directly
	_LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-rpath $libdir'
      fi
      _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=:
      ;;

    solaris*)
      _LT_AC_TAGVAR(no_undefined_flag, $1)=' -z text'
      if test "$GCC" = yes; then
	wlarc='${wl}'
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared ${wl}-h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags'
	_LT_AC_TAGVAR(archive_expsym_cmds, $1)='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~
	  $CC -shared ${wl}-M ${wl}$lib.exp ${wl}-h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags~$rm $lib.exp'
      else
	wlarc=''
	_LT_AC_TAGVAR(archive_cmds, $1)='$LD -G${allow_undefined_flag} -h $soname -o $lib $libobjs $deplibs $linker_flags'
	_LT_AC_TAGVAR(archive_expsym_cmds, $1)='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~
  	$LD -G${allow_undefined_flag} -M $lib.exp -h $soname -o $lib $libobjs $deplibs $linker_flags~$rm $lib.exp'
      fi
      _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-R$libdir'
      _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no
      case $host_os in
      solaris2.[[0-5]] | solaris2.[[0-5]].*) ;;
      *)
 	# The compiler driver will combine linker options so we
 	# cannot just pass the convience library names through
 	# without $wl, iff we do not link with $LD.
 	# Luckily, gcc supports the same syntax we need for Sun Studio.
 	# Supported since Solaris 2.6 (maybe 2.5.1?)
 	case $wlarc in
 	'')
 	  _LT_AC_TAGVAR(whole_archive_flag_spec, $1)='-z allextract$convenience -z defaultextract' ;;
 	*)
 	  _LT_AC_TAGVAR(whole_archive_flag_spec, $1)='${wl}-z ${wl}allextract`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}-z ${wl}defaultextract' ;;
 	esac ;;
      esac
      _LT_AC_TAGVAR(link_all_deplibs, $1)=yes
      ;;

    sunos4*)
      if test "x$host_vendor" = xsequent; then
	# Use $CC to link under sequent, because it throws in some extra .o
	# files that make .init and .fini sections work.
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -G ${wl}-h $soname -o $lib $libobjs $deplibs $compiler_flags'
      else
	_LT_AC_TAGVAR(archive_cmds, $1)='$LD -assert pure-text -Bstatic -o $lib $libobjs $deplibs $linker_flags'
      fi
      _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir'
      _LT_AC_TAGVAR(hardcode_direct, $1)=yes
      _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes
      _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no
      ;;

    sysv4)
      case $host_vendor in
	sni)
	  _LT_AC_TAGVAR(archive_cmds, $1)='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags'
	  _LT_AC_TAGVAR(hardcode_direct, $1)=yes # is this really true???
	;;
	siemens)
	  ## LD is ld it makes a PLAMLIB
	  ## CC just makes a GrossModule.
	  _LT_AC_TAGVAR(archive_cmds, $1)='$LD -G -o $lib $libobjs $deplibs $linker_flags'
	  _LT_AC_TAGVAR(reload_cmds, $1)='$CC -r -o $output$reload_objs'
	  _LT_AC_TAGVAR(hardcode_direct, $1)=no
        ;;
	motorola)
	  _LT_AC_TAGVAR(archive_cmds, $1)='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags'
	  _LT_AC_TAGVAR(hardcode_direct, $1)=no #Motorola manual says yes, but my tests say they lie
	;;
      esac
      runpath_var='LD_RUN_PATH'
      _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no
      ;;

    sysv4.3*)
      _LT_AC_TAGVAR(archive_cmds, $1)='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags'
      _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no
      _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='-Bexport'
      ;;

    sysv4*MP*)
      if test -d /usr/nec; then
	_LT_AC_TAGVAR(archive_cmds, $1)='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags'
	_LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no
	runpath_var=LD_RUN_PATH
	hardcode_runpath_var=yes
	_LT_AC_TAGVAR(ld_shlibs, $1)=yes
      fi
      ;;

    sysv4*uw2* | sysv5OpenUNIX* | sysv5UnixWare7.[[01]].[[10]]* | unixware7*)
      _LT_AC_TAGVAR(no_undefined_flag, $1)='${wl}-z,text'
      _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no
      _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no
      runpath_var='LD_RUN_PATH'

      if test "$GCC" = yes; then
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags'
	_LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags'
      else
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -G ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags'
	_LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -G ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags'
      fi
      ;;

    sysv5* | sco3.2v5* | sco5v6*)
      # Note: We can NOT use -z defs as we might desire, because we do not
      # link with -lc, and that would cause any symbols used from libc to
      # always be unresolved, which means just about no library would
      # ever link correctly.  If we're not using GNU ld we use -z text
      # though, which does catch some bad symbols but isn't as heavy-handed
      # as -z defs.
      _LT_AC_TAGVAR(no_undefined_flag, $1)='${wl}-z,text'
      _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-z,nodefs'
      _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no
      _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no
      _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='`test -z "$SCOABSPATH" && echo ${wl}-R,$libdir`'
      _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=':'
      _LT_AC_TAGVAR(link_all_deplibs, $1)=yes
      _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-Bexport'
      runpath_var='LD_RUN_PATH'

      if test "$GCC" = yes; then
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags'
	_LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared ${wl}-Bexport:$export_symbols ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags'
      else
	_LT_AC_TAGVAR(archive_cmds, $1)='$CC -G ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags'
	_LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -G ${wl}-Bexport:$export_symbols ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags'
      fi
      ;;

    uts4*)
      _LT_AC_TAGVAR(archive_cmds, $1)='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags'
      _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir'
      _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no
      ;;

    *)
      _LT_AC_TAGVAR(ld_shlibs, $1)=no
      ;;
    esac
  fi
])
AC_MSG_RESULT([$_LT_AC_TAGVAR(ld_shlibs, $1)])
test "$_LT_AC_TAGVAR(ld_shlibs, $1)" = no && can_build_shared=no

#
# Do we need to explicitly link libc?
#
case "x$_LT_AC_TAGVAR(archive_cmds_need_lc, $1)" in
x|xyes)
  # Assume -lc should be added
  _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=yes

  if test "$enable_shared" = yes && test "$GCC" = yes; then
    case $_LT_AC_TAGVAR(archive_cmds, $1) in
    *'~'*)
      # FIXME: we may have to deal with multi-command sequences.
      ;;
    '$CC '*)
      # Test whether the compiler implicitly links with -lc since on some
      # systems, -lgcc has to come before -lc. If gcc already passes -lc
      # to ld, don't add -lc before -lgcc.
      AC_MSG_CHECKING([whether -lc should be explicitly linked in])
      $rm conftest*
      printf "$lt_simple_compile_test_code" > conftest.$ac_ext

      if AC_TRY_EVAL(ac_compile) 2>conftest.err; then
        soname=conftest
        lib=conftest
        libobjs=conftest.$ac_objext
        deplibs=
        wl=$_LT_AC_TAGVAR(lt_prog_compiler_wl, $1)
	pic_flag=$_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)
        compiler_flags=-v
        linker_flags=-v
        verstring=
        output_objdir=.
        libname=conftest
        lt_save_allow_undefined_flag=$_LT_AC_TAGVAR(allow_undefined_flag, $1)
        _LT_AC_TAGVAR(allow_undefined_flag, $1)=
        if AC_TRY_EVAL(_LT_AC_TAGVAR(archive_cmds, $1) 2\>\&1 \| grep \" -lc \" \>/dev/null 2\>\&1)
        then
	  _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no
        else
	  _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=yes
        fi
        _LT_AC_TAGVAR(allow_undefined_flag, $1)=$lt_save_allow_undefined_flag
      else
        cat conftest.err 1>&5
      fi
      $rm conftest*
      AC_MSG_RESULT([$_LT_AC_TAGVAR(archive_cmds_need_lc, $1)])
      ;;
    esac
  fi
  ;;
esac
])# AC_LIBTOOL_PROG_LD_SHLIBS


# _LT_AC_FILE_LTDLL_C
# -------------------
# Be careful that the start marker always follows a newline.
AC_DEFUN([_LT_AC_FILE_LTDLL_C], [
# /* ltdll.c starts here */
# #define WIN32_LEAN_AND_MEAN
# #include <windows.h>
# #undef WIN32_LEAN_AND_MEAN
# #include <stdio.h>
#
# #ifndef __CYGWIN__
# #  ifdef __CYGWIN32__
# #    define __CYGWIN__ __CYGWIN32__
# #  endif
# #endif
#
# #ifdef __cplusplus
# extern "C" {
# #endif
# BOOL APIENTRY DllMain (HINSTANCE hInst, DWORD reason, LPVOID reserved);
# #ifdef __cplusplus
# }
# #endif
#
# #ifdef __CYGWIN__
# #include <cygwin/cygwin_dll.h>
# DECLARE_CYGWIN_DLL( DllMain );
# #endif
# HINSTANCE __hDllInstance_base;
#
# BOOL APIENTRY
# DllMain (HINSTANCE hInst, DWORD reason, LPVOID reserved)
# {
#   __hDllInstance_base = hInst;
#   return TRUE;
# }
# /* ltdll.c ends here */
])# _LT_AC_FILE_LTDLL_C


# _LT_AC_TAGVAR(VARNAME, [TAGNAME])
# ---------------------------------
AC_DEFUN([_LT_AC_TAGVAR], [ifelse([$2], [], [$1], [$1_$2])])


# old names
AC_DEFUN([AM_PROG_LIBTOOL],   [AC_PROG_LIBTOOL])
AC_DEFUN([AM_ENABLE_SHARED],  [AC_ENABLE_SHARED($@)])
AC_DEFUN([AM_ENABLE_STATIC],  [AC_ENABLE_STATIC($@)])
AC_DEFUN([AM_DISABLE_SHARED], [AC_DISABLE_SHARED($@)])
AC_DEFUN([AM_DISABLE_STATIC], [AC_DISABLE_STATIC($@)])
AC_DEFUN([AM_PROG_LD],        [AC_PROG_LD])
AC_DEFUN([AM_PROG_NM],        [AC_PROG_NM])

# This is just to silence aclocal about the macro not being used
ifelse([AC_DISABLE_FAST_INSTALL])

AC_DEFUN([LT_AC_PROG_GCJ],
[AC_CHECK_TOOL(GCJ, gcj, no)
  test "x${GCJFLAGS+set}" = xset || GCJFLAGS="-g -O2"
  AC_SUBST(GCJFLAGS)
])

AC_DEFUN([LT_AC_PROG_RC],
[AC_CHECK_TOOL(RC, windres, no)
])

# NOTE: This macro has been submitted for inclusion into   #
#  GNU Autoconf as AC_PROG_SED.  When it is available in   #
#  a released version of Autoconf we should remove this    #
#  macro and use it instead.                               #
# LT_AC_PROG_SED
# --------------
# Check for a fully-functional sed program, that truncates
# as few characters as possible.  Prefer GNU sed if found.
AC_DEFUN([LT_AC_PROG_SED],
[AC_MSG_CHECKING([for a sed that does not truncate output])
AC_CACHE_VAL(lt_cv_path_SED,
[# Loop through the user's path and test for sed and gsed.
# Then use that list of sed's as ones to test for truncation.
as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
for as_dir in $PATH
do
  IFS=$as_save_IFS
  test -z "$as_dir" && as_dir=.
  for lt_ac_prog in sed gsed; do
    for ac_exec_ext in '' $ac_executable_extensions; do
      if $as_executable_p "$as_dir/$lt_ac_prog$ac_exec_ext"; then
        lt_ac_sed_list="$lt_ac_sed_list $as_dir/$lt_ac_prog$ac_exec_ext"
      fi
    done
  done
done
lt_ac_max=0
lt_ac_count=0
# Add /usr/xpg4/bin/sed as it is typically found on Solaris
# along with /bin/sed that truncates output.
for lt_ac_sed in $lt_ac_sed_list /usr/xpg4/bin/sed; do
  test ! -f $lt_ac_sed && continue
  cat /dev/null > conftest.in
  lt_ac_count=0
  echo $ECHO_N "0123456789$ECHO_C" >conftest.in
  # Check for GNU sed and select it if it is found.
  if "$lt_ac_sed" --version 2>&1 < /dev/null | grep 'GNU' > /dev/null; then
    lt_cv_path_SED=$lt_ac_sed
    break
  fi
  while true; do
    cat conftest.in conftest.in >conftest.tmp
    mv conftest.tmp conftest.in
    cp conftest.in conftest.nl
    echo >>conftest.nl
    $lt_ac_sed -e 's/a$//' < conftest.nl >conftest.out || break
    cmp -s conftest.out conftest.nl || break
    # 10000 chars as input seems more than enough
    test $lt_ac_count -gt 10 && break
    lt_ac_count=`expr $lt_ac_count + 1`
    if test $lt_ac_count -gt $lt_ac_max; then
      lt_ac_max=$lt_ac_count
      lt_cv_path_SED=$lt_ac_sed
    fi
  done
done
])
SED=$lt_cv_path_SED
AC_MSG_RESULT([$SED])
])

# Copyright (C) 2002, 2003, 2005  Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.

# AM_AUTOMAKE_VERSION(VERSION)
# ----------------------------
# Automake X.Y traces this macro to ensure aclocal.m4 has been
# generated from the m4 files accompanying Automake X.Y.
AC_DEFUN([AM_AUTOMAKE_VERSION], [am__api_version="1.9"])

# AM_SET_CURRENT_AUTOMAKE_VERSION
# -------------------------------
# Call AM_AUTOMAKE_VERSION so it can be traced.
# This function is AC_REQUIREd by AC_INIT_AUTOMAKE.
AC_DEFUN([AM_SET_CURRENT_AUTOMAKE_VERSION],
	 [AM_AUTOMAKE_VERSION([1.9.6])])

# AM_AUX_DIR_EXPAND                                         -*- Autoconf -*-

# Copyright (C) 2001, 2003, 2005  Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.

# For projects using AC_CONFIG_AUX_DIR([foo]), Autoconf sets
# $ac_aux_dir to `$srcdir/foo'.  In other projects, it is set to
# `$srcdir', `$srcdir/..', or `$srcdir/../..'.
#
# Of course, Automake must honor this variable whenever it calls a
# tool from the auxiliary directory.  The problem is that $srcdir (and
# therefore $ac_aux_dir as well) can be either absolute or relative,
# depending on how configure is run.  This is pretty annoying, since
# it makes $ac_aux_dir quite unusable in subdirectories: in the top
# source directory, any form will work fine, but in subdirectories a
# relative path needs to be adjusted first.
#
# $ac_aux_dir/missing
#    fails when called from a subdirectory if $ac_aux_dir is relative
# $top_srcdir/$ac_aux_dir/missing
#    fails if $ac_aux_dir is absolute,
#    fails when called from a subdirectory in a VPATH build with
#          a relative $ac_aux_dir
#
# The reason of the latter failure is that $top_srcdir and $ac_aux_dir
# are both prefixed by $srcdir.  In an in-source build this is usually
# harmless because $srcdir is `.', but things will broke when you
# start a VPATH build or use an absolute $srcdir.
#
# So we could use something similar to $top_srcdir/$ac_aux_dir/missing,
# iff we strip the leading $srcdir from $ac_aux_dir.  That would be:
#   am_aux_dir='\$(top_srcdir)/'`expr "$ac_aux_dir" : "$srcdir//*\(.*\)"`
# and then we would define $MISSING as
#   MISSING="\${SHELL} $am_aux_dir/missing"
# This will work as long as MISSING is not called from configure, because
# unfortunately $(top_srcdir) has no meaning in configure.
# However there are other variables, like CC, which are often used in
# configure, and could therefore not use this "fixed" $ac_aux_dir.
#
# Another solution, used here, is to always expand $ac_aux_dir to an
# absolute PATH.  The drawback is that using absolute paths prevent a
# configured tree to be moved without reconfiguration.

AC_DEFUN([AM_AUX_DIR_EXPAND],
[dnl Rely on autoconf to set up CDPATH properly.
AC_PREREQ([2.50])dnl
# expand $ac_aux_dir to an absolute path
am_aux_dir=`cd $ac_aux_dir && pwd`
])


# Copyright (C) 1996, 1997, 1999, 2000, 2001, 2002, 2003, 2005
# Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.

# serial 4

# This was merged into AC_PROG_CC in Autoconf.

AU_DEFUN([AM_PROG_CC_STDC],
[AC_PROG_CC
AC_DIAGNOSE([obsolete], [$0:
	your code should no longer depend upon `am_cv_prog_cc_stdc', but upon
	`ac_cv_prog_cc_stdc'.  Remove this warning and the assignment when
	you adjust the code.  You can also remove the above call to
	AC_PROG_CC if you already called it elsewhere.])
am_cv_prog_cc_stdc=$ac_cv_prog_cc_stdc
])
AU_DEFUN([fp_PROG_CC_STDC])

# AM_CONDITIONAL                                            -*- Autoconf -*-

# Copyright (C) 1997, 2000, 2001, 2003, 2004, 2005
# Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.

# serial 7

# AM_CONDITIONAL(NAME, SHELL-CONDITION)
# -------------------------------------
# Define a conditional.
AC_DEFUN([AM_CONDITIONAL],
[AC_PREREQ(2.52)dnl
 ifelse([$1], [TRUE],  [AC_FATAL([$0: invalid condition: $1])],
	[$1], [FALSE], [AC_FATAL([$0: invalid condition: $1])])dnl
AC_SUBST([$1_TRUE])
AC_SUBST([$1_FALSE])
if $2; then
  $1_TRUE=
  $1_FALSE='#'
else
  $1_TRUE='#'
  $1_FALSE=
fi
AC_CONFIG_COMMANDS_PRE(
[if test -z "${$1_TRUE}" && test -z "${$1_FALSE}"; then
  AC_MSG_ERROR([[conditional "$1" was never defined.
Usually this means the macro was only invoked conditionally.]])
fi])])


# Copyright (C) 1999, 2000, 2001, 2002, 2003, 2004, 2005
# Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.

# serial 8

# There are a few dirty hacks below to avoid letting `AC_PROG_CC' be
# written in clear, in which case automake, when reading aclocal.m4,
# will think it sees a *use*, and therefore will trigger all it's
# C support machinery.  Also note that it means that autoscan, seeing
# CC etc. in the Makefile, will ask for an AC_PROG_CC use...


# _AM_DEPENDENCIES(NAME)
# ----------------------
# See how the compiler implements dependency checking.
# NAME is "CC", "CXX", "GCJ", or "OBJC".
# We try a few techniques and use that to set a single cache variable.
#
# We don't AC_REQUIRE the corresponding AC_PROG_CC since the latter was
# modified to invoke _AM_DEPENDENCIES(CC); we would have a circular
# dependency, and given that the user is not expected to run this macro,
# just rely on AC_PROG_CC.
AC_DEFUN([_AM_DEPENDENCIES],
[AC_REQUIRE([AM_SET_DEPDIR])dnl
AC_REQUIRE([AM_OUTPUT_DEPENDENCY_COMMANDS])dnl
AC_REQUIRE([AM_MAKE_INCLUDE])dnl
AC_REQUIRE([AM_DEP_TRACK])dnl

ifelse([$1], CC,   [depcc="$CC"   am_compiler_list=],
       [$1], CXX,  [depcc="$CXX"  am_compiler_list=],
       [$1], OBJC, [depcc="$OBJC" am_compiler_list='gcc3 gcc'],
       [$1], GCJ,  [depcc="$GCJ"  am_compiler_list='gcc3 gcc'],
                   [depcc="$$1"   am_compiler_list=])

AC_CACHE_CHECK([dependency style of $depcc],
               [am_cv_$1_dependencies_compiler_type],
[if test -z "$AMDEP_TRUE" && test -f "$am_depcomp"; then
  # We make a subdir and do the tests there.  Otherwise we can end up
  # making bogus files that we don't know about and never remove.  For
  # instance it was reported that on HP-UX the gcc test will end up
  # making a dummy file named `D' -- because `-MD' means `put the output
  # in D'.
  mkdir conftest.dir
  # Copy depcomp to subdir because otherwise we won't find it if we're
  # using a relative directory.
  cp "$am_depcomp" conftest.dir
  cd conftest.dir
  # We will build objects and dependencies in a subdirectory because
  # it helps to detect inapplicable dependency modes.  For instance
  # both Tru64's cc and ICC support -MD to output dependencies as a
  # side effect of compilation, but ICC will put the dependencies in
  # the current directory while Tru64 will put them in the object
  # directory.
  mkdir sub

  am_cv_$1_dependencies_compiler_type=none
  if test "$am_compiler_list" = ""; then
     am_compiler_list=`sed -n ['s/^#*\([a-zA-Z0-9]*\))$/\1/p'] < ./depcomp`
  fi
  for depmode in $am_compiler_list; do
    # Setup a source with many dependencies, because some compilers
    # like to wrap large dependency lists on column 80 (with \), and
    # we should not choose a depcomp mode which is confused by this.
    #
    # We need to recreate these files for each test, as the compiler may
    # overwrite some of them when testing with obscure command lines.
    # This happens at least with the AIX C compiler.
    : > sub/conftest.c
    for i in 1 2 3 4 5 6; do
      echo '#include "conftst'$i'.h"' >> sub/conftest.c
      # Using `: > sub/conftst$i.h' creates only sub/conftst1.h with
      # Solaris 8's {/usr,}/bin/sh.
      touch sub/conftst$i.h
    done
    echo "${am__include} ${am__quote}sub/conftest.Po${am__quote}" > confmf

    case $depmode in
    nosideeffect)
      # after this tag, mechanisms are not by side-effect, so they'll
      # only be used when explicitly requested
      if test "x$enable_dependency_tracking" = xyes; then
	continue
      else
	break
      fi
      ;;
    none) break ;;
    esac
    # We check with `-c' and `-o' for the sake of the "dashmstdout"
    # mode.  It turns out that the SunPro C++ compiler does not properly
    # handle `-M -o', and we need to detect this.
    if depmode=$depmode \
       source=sub/conftest.c object=sub/conftest.${OBJEXT-o} \
       depfile=sub/conftest.Po tmpdepfile=sub/conftest.TPo \
       $SHELL ./depcomp $depcc -c -o sub/conftest.${OBJEXT-o} sub/conftest.c \
         >/dev/null 2>conftest.err &&
       grep sub/conftst6.h sub/conftest.Po > /dev/null 2>&1 &&
       grep sub/conftest.${OBJEXT-o} sub/conftest.Po > /dev/null 2>&1 &&
       ${MAKE-make} -s -f confmf > /dev/null 2>&1; then
      # icc doesn't choke on unknown options, it will just issue warnings
      # or remarks (even with -Werror).  So we grep stderr for any message
      # that says an option was ignored or not supported.
      # When given -MP, icc 7.0 and 7.1 complain thusly:
      #   icc: Command line warning: ignoring option '-M'; no argument required
      # The diagnosis changed in icc 8.0:
      #   icc: Command line remark: option '-MP' not supported
      if (grep 'ignoring option' conftest.err ||
          grep 'not supported' conftest.err) >/dev/null 2>&1; then :; else
        am_cv_$1_dependencies_compiler_type=$depmode
        break
      fi
    fi
  done

  cd ..
  rm -rf conftest.dir
else
  am_cv_$1_dependencies_compiler_type=none
fi
])
AC_SUBST([$1DEPMODE], [depmode=$am_cv_$1_dependencies_compiler_type])
AM_CONDITIONAL([am__fastdep$1], [
  test "x$enable_dependency_tracking" != xno \
  && test "$am_cv_$1_dependencies_compiler_type" = gcc3])
])


# AM_SET_DEPDIR
# -------------
# Choose a directory name for dependency files.
# This macro is AC_REQUIREd in _AM_DEPENDENCIES
AC_DEFUN([AM_SET_DEPDIR],
[AC_REQUIRE([AM_SET_LEADING_DOT])dnl
AC_SUBST([DEPDIR], ["${am__leading_dot}deps"])dnl
])


# AM_DEP_TRACK
# ------------
AC_DEFUN([AM_DEP_TRACK],
[AC_ARG_ENABLE(dependency-tracking,
[  --disable-dependency-tracking  speeds up one-time build
  --enable-dependency-tracking   do not reject slow dependency extractors])
if test "x$enable_dependency_tracking" != xno; then
  am_depcomp="$ac_aux_dir/depcomp"
  AMDEPBACKSLASH='\'
fi
AM_CONDITIONAL([AMDEP], [test "x$enable_dependency_tracking" != xno])
AC_SUBST([AMDEPBACKSLASH])
])

# Generate code to set up dependency tracking.              -*- Autoconf -*-

# Copyright (C) 1999, 2000, 2001, 2002, 2003, 2004, 2005
# Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.

#serial 3

# _AM_OUTPUT_DEPENDENCY_COMMANDS
# ------------------------------
AC_DEFUN([_AM_OUTPUT_DEPENDENCY_COMMANDS],
[for mf in $CONFIG_FILES; do
  # Strip MF so we end up with the name of the file.
  mf=`echo "$mf" | sed -e 's/:.*$//'`
  # Check whether this is an Automake generated Makefile or not.
  # We used to match only the files named `Makefile.in', but
  # some people rename them; so instead we look at the file content.
  # Grep'ing the first line is not enough: some people post-process
  # each Makefile.in and add a new line on top of each file to say so.
  # So let's grep whole file.
  if grep '^#.*generated by automake' $mf > /dev/null 2>&1; then
    dirpart=`AS_DIRNAME("$mf")`
  else
    continue
  fi
  # Extract the definition of DEPDIR, am__include, and am__quote
  # from the Makefile without running `make'.
  DEPDIR=`sed -n 's/^DEPDIR = //p' < "$mf"`
  test -z "$DEPDIR" && continue
  am__include=`sed -n 's/^am__include = //p' < "$mf"`
  test -z "am__include" && continue
  am__quote=`sed -n 's/^am__quote = //p' < "$mf"`
  # When using ansi2knr, U may be empty or an underscore; expand it
  U=`sed -n 's/^U = //p' < "$mf"`
  # Find all dependency output files, they are included files with
  # $(DEPDIR) in their names.  We invoke sed twice because it is the
  # simplest approach to changing $(DEPDIR) to its actual value in the
  # expansion.
  for file in `sed -n "
    s/^$am__include $am__quote\(.*(DEPDIR).*\)$am__quote"'$/\1/p' <"$mf" | \
       sed -e 's/\$(DEPDIR)/'"$DEPDIR"'/g' -e 's/\$U/'"$U"'/g'`; do
    # Make sure the directory exists.
    test -f "$dirpart/$file" && continue
    fdir=`AS_DIRNAME(["$file"])`
    AS_MKDIR_P([$dirpart/$fdir])
    # echo "creating $dirpart/$file"
    echo '# dummy' > "$dirpart/$file"
  done
done
])# _AM_OUTPUT_DEPENDENCY_COMMANDS


# AM_OUTPUT_DEPENDENCY_COMMANDS
# -----------------------------
# This macro should only be invoked once -- use via AC_REQUIRE.
#
# This code is only required when automatic dependency tracking
# is enabled.  FIXME.  This creates each `.P' file that we will
# need in order to bootstrap the dependency handling code.
AC_DEFUN([AM_OUTPUT_DEPENDENCY_COMMANDS],
[AC_CONFIG_COMMANDS([depfiles],
     [test x"$AMDEP_TRUE" != x"" || _AM_OUTPUT_DEPENDENCY_COMMANDS],
     [AMDEP_TRUE="$AMDEP_TRUE" ac_aux_dir="$ac_aux_dir"])
])

# Copyright (C) 1996, 1997, 2000, 2001, 2003, 2005
# Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.

# serial 8

# AM_CONFIG_HEADER is obsolete.  It has been replaced by AC_CONFIG_HEADERS.
AU_DEFUN([AM_CONFIG_HEADER], [AC_CONFIG_HEADERS($@)])

# Do all the work for Automake.                             -*- Autoconf -*-

# Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005
# Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.

# serial 12

# This macro actually does too much.  Some checks are only needed if
# your package does certain things.  But this isn't really a big deal.

# AM_INIT_AUTOMAKE(PACKAGE, VERSION, [NO-DEFINE])
# AM_INIT_AUTOMAKE([OPTIONS])
# -----------------------------------------------
# The call with PACKAGE and VERSION arguments is the old style
# call (pre autoconf-2.50), which is being phased out.  PACKAGE
# and VERSION should now be passed to AC_INIT and removed from
# the call to AM_INIT_AUTOMAKE.
# We support both call styles for the transition.  After
# the next Automake release, Autoconf can make the AC_INIT
# arguments mandatory, and then we can depend on a new Autoconf
# release and drop the old call support.
AC_DEFUN([AM_INIT_AUTOMAKE],
[AC_PREREQ([2.58])dnl
dnl Autoconf wants to disallow AM_ names.  We explicitly allow
dnl the ones we care about.
m4_pattern_allow([^AM_[A-Z]+FLAGS$])dnl
AC_REQUIRE([AM_SET_CURRENT_AUTOMAKE_VERSION])dnl
AC_REQUIRE([AC_PROG_INSTALL])dnl
# test to see if srcdir already configured
if test "`cd $srcdir && pwd`" != "`pwd`" &&
   test -f $srcdir/config.status; then
  AC_MSG_ERROR([source directory already configured; run "make distclean" there first])
fi

# test whether we have cygpath
if test -z "$CYGPATH_W"; then
  if (cygpath --version) >/dev/null 2>/dev/null; then
    CYGPATH_W='cygpath -w'
  else
    CYGPATH_W=echo
  fi
fi
AC_SUBST([CYGPATH_W])

# Define the identity of the package.
dnl Distinguish between old-style and new-style calls.
m4_ifval([$2],
[m4_ifval([$3], [_AM_SET_OPTION([no-define])])dnl
 AC_SUBST([PACKAGE], [$1])dnl
 AC_SUBST([VERSION], [$2])],
[_AM_SET_OPTIONS([$1])dnl
 AC_SUBST([PACKAGE], ['AC_PACKAGE_TARNAME'])dnl
 AC_SUBST([VERSION], ['AC_PACKAGE_VERSION'])])dnl

_AM_IF_OPTION([no-define],,
[AC_DEFINE_UNQUOTED(PACKAGE, "$PACKAGE", [Name of package])
 AC_DEFINE_UNQUOTED(VERSION, "$VERSION", [Version number of package])])dnl

# Some tools Automake needs.
AC_REQUIRE([AM_SANITY_CHECK])dnl
AC_REQUIRE([AC_ARG_PROGRAM])dnl
AM_MISSING_PROG(ACLOCAL, aclocal-${am__api_version})
AM_MISSING_PROG(AUTOCONF, autoconf)
AM_MISSING_PROG(AUTOMAKE, automake-${am__api_version})
AM_MISSING_PROG(AUTOHEADER, autoheader)
AM_MISSING_PROG(MAKEINFO, makeinfo)
AM_PROG_INSTALL_SH
AM_PROG_INSTALL_STRIP
AC_REQUIRE([AM_PROG_MKDIR_P])dnl
# We need awk for the "check" target.  The system "awk" is bad on
# some platforms.
AC_REQUIRE([AC_PROG_AWK])dnl
AC_REQUIRE([AC_PROG_MAKE_SET])dnl
AC_REQUIRE([AM_SET_LEADING_DOT])dnl
_AM_IF_OPTION([tar-ustar], [_AM_PROG_TAR([ustar])],
              [_AM_IF_OPTION([tar-pax], [_AM_PROG_TAR([pax])],
	      		     [_AM_PROG_TAR([v7])])])
_AM_IF_OPTION([no-dependencies],,
[AC_PROVIDE_IFELSE([AC_PROG_CC],
                  [_AM_DEPENDENCIES(CC)],
                  [define([AC_PROG_CC],
                          defn([AC_PROG_CC])[_AM_DEPENDENCIES(CC)])])dnl
AC_PROVIDE_IFELSE([AC_PROG_CXX],
                  [_AM_DEPENDENCIES(CXX)],
                  [define([AC_PROG_CXX],
                          defn([AC_PROG_CXX])[_AM_DEPENDENCIES(CXX)])])dnl
])
])


# When config.status generates a header, we must update the stamp-h file.
# This file resides in the same directory as the config header
# that is generated.  The stamp files are numbered to have different names.

# Autoconf calls _AC_AM_CONFIG_HEADER_HOOK (when defined) in the
# loop where config.status creates the headers, so we can generate
# our stamp files there.
AC_DEFUN([_AC_AM_CONFIG_HEADER_HOOK],
[# Compute $1's index in $config_headers.
_am_stamp_count=1
for _am_header in $config_headers :; do
  case $_am_header in
    $1 | $1:* )
      break ;;
    * )
      _am_stamp_count=`expr $_am_stamp_count + 1` ;;
  esac
done
echo "timestamp for $1" >`AS_DIRNAME([$1])`/stamp-h[]$_am_stamp_count])

# Copyright (C) 2001, 2003, 2005  Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.

# AM_PROG_INSTALL_SH
# ------------------
# Define $install_sh.
AC_DEFUN([AM_PROG_INSTALL_SH],
[AC_REQUIRE([AM_AUX_DIR_EXPAND])dnl
install_sh=${install_sh-"$am_aux_dir/install-sh"}
AC_SUBST(install_sh)])

# Copyright (C) 2003, 2005  Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.

# serial 2

# Check whether the underlying file-system supports filenames
# with a leading dot.  For instance MS-DOS doesn't.
AC_DEFUN([AM_SET_LEADING_DOT],
[rm -rf .tst 2>/dev/null
mkdir .tst 2>/dev/null
if test -d .tst; then
  am__leading_dot=.
else
  am__leading_dot=_
fi
rmdir .tst 2>/dev/null
AC_SUBST([am__leading_dot])])

# Add --enable-maintainer-mode option to configure.         -*- Autoconf -*-
# From Jim Meyering

# Copyright (C) 1996, 1998, 2000, 2001, 2002, 2003, 2004, 2005
# Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.

# serial 4

AC_DEFUN([AM_MAINTAINER_MODE],
[AC_MSG_CHECKING([whether to enable maintainer-specific portions of Makefiles])
  dnl maintainer-mode is disabled by default
  AC_ARG_ENABLE(maintainer-mode,
[  --enable-maintainer-mode  enable make rules and dependencies not useful
			  (and sometimes confusing) to the casual installer],
      USE_MAINTAINER_MODE=$enableval,
      USE_MAINTAINER_MODE=no)
  AC_MSG_RESULT([$USE_MAINTAINER_MODE])
  AM_CONDITIONAL(MAINTAINER_MODE, [test $USE_MAINTAINER_MODE = yes])
  MAINT=$MAINTAINER_MODE_TRUE
  AC_SUBST(MAINT)dnl
]
)

AU_DEFUN([jm_MAINTAINER_MODE], [AM_MAINTAINER_MODE])

# Check to see how 'make' treats includes.	            -*- Autoconf -*-

# Copyright (C) 2001, 2002, 2003, 2005  Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.

# serial 3

# AM_MAKE_INCLUDE()
# -----------------
# Check to see how make treats includes.
AC_DEFUN([AM_MAKE_INCLUDE],
[am_make=${MAKE-make}
cat > confinc << 'END'
am__doit:
	@echo done
.PHONY: am__doit
END
# If we don't find an include directive, just comment out the code.
AC_MSG_CHECKING([for style of include used by $am_make])
am__include="#"
am__quote=
_am_result=none
# First try GNU make style include.
echo "include confinc" > confmf
# We grep out `Entering directory' and `Leaving directory'
# messages which can occur if `w' ends up in MAKEFLAGS.
# In particular we don't look at `^make:' because GNU make might
# be invoked under some other name (usually "gmake"), in which
# case it prints its new name instead of `make'.
if test "`$am_make -s -f confmf 2> /dev/null | grep -v 'ing directory'`" = "done"; then
   am__include=include
   am__quote=
   _am_result=GNU
fi
# Now try BSD make style include.
if test "$am__include" = "#"; then
   echo '.include "confinc"' > confmf
   if test "`$am_make -s -f confmf 2> /dev/null`" = "done"; then
      am__include=.include
      am__quote="\""
      _am_result=BSD
   fi
fi
AC_SUBST([am__include])
AC_SUBST([am__quote])
AC_MSG_RESULT([$_am_result])
rm -f confinc confmf
])

# Fake the existence of programs that GNU maintainers use.  -*- Autoconf -*-

# Copyright (C) 1997, 1999, 2000, 2001, 2003, 2005
# Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.

# serial 4

# AM_MISSING_PROG(NAME, PROGRAM)
# ------------------------------
AC_DEFUN([AM_MISSING_PROG],
[AC_REQUIRE([AM_MISSING_HAS_RUN])
$1=${$1-"${am_missing_run}$2"}
AC_SUBST($1)])


# AM_MISSING_HAS_RUN
# ------------------
# Define MISSING if not defined so far and test if it supports --run.
# If it does, set am_missing_run to use it, otherwise, to nothing.
AC_DEFUN([AM_MISSING_HAS_RUN],
[AC_REQUIRE([AM_AUX_DIR_EXPAND])dnl
test x"${MISSING+set}" = xset || MISSING="\${SHELL} $am_aux_dir/missing"
# Use eval to expand $SHELL
if eval "$MISSING --run true"; then
  am_missing_run="$MISSING --run "
else
  am_missing_run=
  AC_MSG_WARN([`missing' script is too old or missing])
fi
])

# Copyright (C) 2003, 2004, 2005  Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.

# AM_PROG_MKDIR_P
# ---------------
# Check whether `mkdir -p' is supported, fallback to mkinstalldirs otherwise.
#
# Automake 1.8 used `mkdir -m 0755 -p --' to ensure that directories
# created by `make install' are always world readable, even if the
# installer happens to have an overly restrictive umask (e.g. 077).
# This was a mistake.  There are at least two reasons why we must not
# use `-m 0755':
#   - it causes special bits like SGID to be ignored,
#   - it may be too restrictive (some setups expect 775 directories).
#
# Do not use -m 0755 and let people choose whatever they expect by
# setting umask.
#
# We cannot accept any implementation of `mkdir' that recognizes `-p'.
# Some implementations (such as Solaris 8's) are not thread-safe: if a
# parallel make tries to run `mkdir -p a/b' and `mkdir -p a/c'
# concurrently, both version can detect that a/ is missing, but only
# one can create it and the other will error out.  Consequently we
# restrict ourselves to GNU make (using the --version option ensures
# this.)
AC_DEFUN([AM_PROG_MKDIR_P],
[if mkdir -p --version . >/dev/null 2>&1 && test ! -d ./--version; then
  # We used to keeping the `.' as first argument, in order to
  # allow $(mkdir_p) to be used without argument.  As in
  #   $(mkdir_p) $(somedir)
  # where $(somedir) is conditionally defined.  However this is wrong
  # for two reasons:
  #  1. if the package is installed by a user who cannot write `.'
  #     make install will fail,
  #  2. the above comment should most certainly read
  #     $(mkdir_p) $(DESTDIR)$(somedir)
  #     so it does not work when $(somedir) is undefined and
  #     $(DESTDIR) is not.
  #  To support the latter case, we have to write
  #     test -z "$(somedir)" || $(mkdir_p) $(DESTDIR)$(somedir),
  #  so the `.' trick is pointless.
  mkdir_p='mkdir -p --'
else
  # On NextStep and OpenStep, the `mkdir' command does not
  # recognize any option.  It will interpret all options as
  # directories to create, and then abort because `.' already
  # exists.
  for d in ./-p ./--version;
  do
    test -d $d && rmdir $d
  done
  # $(mkinstalldirs) is defined by Automake if mkinstalldirs exists.
  if test -f "$ac_aux_dir/mkinstalldirs"; then
    mkdir_p='$(mkinstalldirs)'
  else
    mkdir_p='$(install_sh) -d'
  fi
fi
AC_SUBST([mkdir_p])])

# Helper functions for option handling.                     -*- Autoconf -*-

# Copyright (C) 2001, 2002, 2003, 2005  Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.

# serial 3

# _AM_MANGLE_OPTION(NAME)
# -----------------------
AC_DEFUN([_AM_MANGLE_OPTION],
[[_AM_OPTION_]m4_bpatsubst($1, [[^a-zA-Z0-9_]], [_])])

# _AM_SET_OPTION(NAME)
# ------------------------------
# Set option NAME.  Presently that only means defining a flag for this option.
AC_DEFUN([_AM_SET_OPTION],
[m4_define(_AM_MANGLE_OPTION([$1]), 1)])

# _AM_SET_OPTIONS(OPTIONS)
# ----------------------------------
# OPTIONS is a space-separated list of Automake options.
AC_DEFUN([_AM_SET_OPTIONS],
[AC_FOREACH([_AM_Option], [$1], [_AM_SET_OPTION(_AM_Option)])])

# _AM_IF_OPTION(OPTION, IF-SET, [IF-NOT-SET])
# -------------------------------------------
# Execute IF-SET if OPTION is set, IF-NOT-SET otherwise.
AC_DEFUN([_AM_IF_OPTION],
[m4_ifset(_AM_MANGLE_OPTION([$1]), [$2], [$3])])

# Check to make sure that the build environment is sane.    -*- Autoconf -*-

# Copyright (C) 1996, 1997, 2000, 2001, 2003, 2005
# Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.

# serial 4

# AM_SANITY_CHECK
# ---------------
AC_DEFUN([AM_SANITY_CHECK],
[AC_MSG_CHECKING([whether build environment is sane])
# Just in case
sleep 1
echo timestamp > conftest.file
# Do `set' in a subshell so we don't clobber the current shell's
# arguments.  Must try -L first in case configure is actually a
# symlink; some systems play weird games with the mod time of symlinks
# (eg FreeBSD returns the mod time of the symlink's containing
# directory).
if (
   set X `ls -Lt $srcdir/configure conftest.file 2> /dev/null`
   if test "$[*]" = "X"; then
      # -L didn't work.
      set X `ls -t $srcdir/configure conftest.file`
   fi
   rm -f conftest.file
   if test "$[*]" != "X $srcdir/configure conftest.file" \
      && test "$[*]" != "X conftest.file $srcdir/configure"; then

      # If neither matched, then we have a broken ls.  This can happen
      # if, for instance, CONFIG_SHELL is bash and it inherits a
      # broken ls alias from the environment.  This has actually
      # happened.  Such a system could not be considered "sane".
      AC_MSG_ERROR([ls -t appears to fail.  Make sure there is not a broken
alias in your environment])
   fi

   test "$[2]" = conftest.file
   )
then
   # Ok.
   :
else
   AC_MSG_ERROR([newly created file is older than distributed files!
Check your system clock])
fi
AC_MSG_RESULT(yes)])

# Copyright (C) 2001, 2003, 2005  Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.

# AM_PROG_INSTALL_STRIP
# ---------------------
# One issue with vendor `install' (even GNU) is that you can't
# specify the program used to strip binaries.  This is especially
# annoying in cross-compiling environments, where the build's strip
# is unlikely to handle the host's binaries.
# Fortunately install-sh will honor a STRIPPROG variable, so we
# always use install-sh in `make install-strip', and initialize
# STRIPPROG with the value of the STRIP variable (set by the user).
AC_DEFUN([AM_PROG_INSTALL_STRIP],
[AC_REQUIRE([AM_PROG_INSTALL_SH])dnl
# Installed binaries are usually stripped using `strip' when the user
# run `make install-strip'.  However `strip' might not be the right
# tool to use in cross-compilation environments, therefore Automake
# will honor the `STRIP' environment variable to overrule this program.
dnl Don't test for $cross_compiling = yes, because it might be `maybe'.
if test "$cross_compiling" != no; then
  AC_CHECK_TOOL([STRIP], [strip], :)
fi
INSTALL_STRIP_PROGRAM="\${SHELL} \$(install_sh) -c -s"
AC_SUBST([INSTALL_STRIP_PROGRAM])])

# Check how to create a tarball.                            -*- Autoconf -*-

# Copyright (C) 2004, 2005  Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.

# serial 2

# _AM_PROG_TAR(FORMAT)
# --------------------
# Check how to create a tarball in format FORMAT.
# FORMAT should be one of `v7', `ustar', or `pax'.
#
# Substitute a variable $(am__tar) that is a command
# writing to stdout a FORMAT-tarball containing the directory
# $tardir.
#     tardir=directory && $(am__tar) > result.tar
#
# Substitute a variable $(am__untar) that extract such
# a tarball read from stdin.
#     $(am__untar) < result.tar
AC_DEFUN([_AM_PROG_TAR],
[# Always define AMTAR for backward compatibility.
AM_MISSING_PROG([AMTAR], [tar])
m4_if([$1], [v7],
     [am__tar='${AMTAR} chof - "$$tardir"'; am__untar='${AMTAR} xf -'],
     [m4_case([$1], [ustar],, [pax],,
              [m4_fatal([Unknown tar format])])
AC_MSG_CHECKING([how to create a $1 tar archive])
# Loop over all known methods to create a tar archive until one works.
_am_tools='gnutar m4_if([$1], [ustar], [plaintar]) pax cpio none'
_am_tools=${am_cv_prog_tar_$1-$_am_tools}
# Do not fold the above two line into one, because Tru64 sh and
# Solaris sh will not grok spaces in the rhs of `-'.
for _am_tool in $_am_tools
do
  case $_am_tool in
  gnutar)
    for _am_tar in tar gnutar gtar;
    do
      AM_RUN_LOG([$_am_tar --version]) && break
    done
    am__tar="$_am_tar --format=m4_if([$1], [pax], [posix], [$1]) -chf - "'"$$tardir"'
    am__tar_="$_am_tar --format=m4_if([$1], [pax], [posix], [$1]) -chf - "'"$tardir"'
    am__untar="$_am_tar -xf -"
    ;;
  plaintar)
    # Must skip GNU tar: if it does not support --format= it doesn't create
    # ustar tarball either.
    (tar --version) >/dev/null 2>&1 && continue
    am__tar='tar chf - "$$tardir"'
    am__tar_='tar chf - "$tardir"'
    am__untar='tar xf -'
    ;;
  pax)
    am__tar='pax -L -x $1 -w "$$tardir"'
    am__tar_='pax -L -x $1 -w "$tardir"'
    am__untar='pax -r'
    ;;
  cpio)
    am__tar='find "$$tardir" -print | cpio -o -H $1 -L'
    am__tar_='find "$tardir" -print | cpio -o -H $1 -L'
    am__untar='cpio -i -H $1 -d'
    ;;
  none)
    am__tar=false
    am__tar_=false
    am__untar=false
    ;;
  esac

  # If the value was cached, stop now.  We just wanted to have am__tar
  # and am__untar set.
  test -n "${am_cv_prog_tar_$1}" && break

  # tar/untar a dummy directory, and stop if the command works
  rm -rf conftest.dir
  mkdir conftest.dir
  echo GrepMe > conftest.dir/file
  AM_RUN_LOG([tardir=conftest.dir && eval $am__tar_ >conftest.tar])
  rm -rf conftest.dir
  if test -s conftest.tar; then
    AM_RUN_LOG([$am__untar <conftest.tar])
    grep GrepMe conftest.dir/file >/dev/null 2>&1 && break
  fi
done
rm -rf conftest.dir

AC_CACHE_VAL([am_cv_prog_tar_$1], [am_cv_prog_tar_$1=$_am_tool])
AC_MSG_RESULT([$am_cv_prog_tar_$1])])
AC_SUBST([am__tar])
AC_SUBST([am__untar])
]) # _AM_PROG_TAR

m4_include([config/acinclude.m4])
�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/config/�������������������������������������������������������������������������������0000777�0000771�0001750�00000000000�11166013167�011361� 5����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/config/missing������������������������������������������������������������������������0000775�0000771�0001750�00000021231�11166010112�012660� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#! /bin/sh
# Common stub for a few missing GNU programs while installing.
# Copyright 1996, 1997, 1999, 2000 Free Software Foundation, Inc.
# Originally by Fran,cois Pinard <pinard@iro.umontreal.ca>, 1996.

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2, or (at your option)
# any later version.

# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.

# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
# 02111-1307, USA.

# As a special exception to the GNU General Public License, if you
# distribute this file as part of a program that contains a
# configuration script generated by Autoconf, you may include it under
# the same distribution terms that you use for the rest of that program.

if test $# -eq 0; then
  echo 1>&2 "Try \`$0 --help' for more information"
  exit 1
fi

run=:

# In the cases where this matters, `missing' is being run in the
# srcdir already.
if test -f configure.ac; then
  configure_ac=configure.ac
else
  configure_ac=configure.in
fi

case "$1" in
--run)
  # Try to run requested program, and just exit if it succeeds.
  run=
  shift
  "$@" && exit 0
  ;;
esac

# If it does not exist, or fails to run (possibly an outdated version),
# try to emulate it.
case "$1" in

  -h|--h|--he|--hel|--help)
    echo "\
$0 [OPTION]... PROGRAM [ARGUMENT]...

Handle \`PROGRAM [ARGUMENT]...' for when PROGRAM is missing, or return an
error status if there is no known handling for PROGRAM.

Options:
  -h, --help      display this help and exit
  -v, --version   output version information and exit
  --run           try to run the given command, and emulate it if it fails

Supported PROGRAM values:
  aclocal      touch file \`aclocal.m4'
  autoconf     touch file \`configure'
  autoheader   touch file \`config.h.in'
  automake     touch all \`Makefile.in' files
  bison        create \`y.tab.[ch]', if possible, from existing .[ch]
  flex         create \`lex.yy.c', if possible, from existing .c
  help2man     touch the output file
  lex          create \`lex.yy.c', if possible, from existing .c
  makeinfo     touch the output file
  tar          try tar, gnutar, gtar, then tar without non-portable flags
  yacc         create \`y.tab.[ch]', if possible, from existing .[ch]"
    ;;

  -v|--v|--ve|--ver|--vers|--versi|--versio|--version)
    echo "missing 0.3 - GNU automake"
    ;;

  -*)
    echo 1>&2 "$0: Unknown \`$1' option"
    echo 1>&2 "Try \`$0 --help' for more information"
    exit 1
    ;;

  aclocal)
    echo 1>&2 "\
WARNING: \`$1' is missing on your system.  You should only need it if
         you modified \`acinclude.m4' or \`${configure_ac}'.  You might want
         to install the \`Automake' and \`Perl' packages.  Grab them from
         any GNU archive site."
    touch aclocal.m4
    ;;

  autoconf)
    echo 1>&2 "\
WARNING: \`$1' is missing on your system.  You should only need it if
         you modified \`${configure_ac}'.  You might want to install the
         \`Autoconf' and \`GNU m4' packages.  Grab them from any GNU
         archive site."
    touch configure
    ;;

  autoheader)
    echo 1>&2 "\
WARNING: \`$1' is missing on your system.  You should only need it if
         you modified \`acconfig.h' or \`${configure_ac}'.  You might want
         to install the \`Autoconf' and \`GNU m4' packages.  Grab them
         from any GNU archive site."
    files=`sed -n 's/^[ ]*A[CM]_CONFIG_HEADER(\([^)]*\)).*/\1/p' ${configure_ac}`
    test -z "$files" && files="config.h"
    touch_files=
    for f in $files; do
      case "$f" in
      *:*) touch_files="$touch_files "`echo "$f" |
				       sed -e 's/^[^:]*://' -e 's/:.*//'`;;
      *) touch_files="$touch_files $f.in";;
      esac
    done
    touch $touch_files
    ;;

  automake)
    echo 1>&2 "\
WARNING: \`$1' is missing on your system.  You should only need it if
         you modified \`Makefile.am', \`acinclude.m4' or \`${configure_ac}'.
         You might want to install the \`Automake' and \`Perl' packages.
         Grab them from any GNU archive site."
    find . -type f -name Makefile.am -print |
	   sed 's/\.am$/.in/' |
	   while read f; do touch "$f"; done
    ;;

  bison|yacc)
    echo 1>&2 "\
WARNING: \`$1' is missing on your system.  You should only need it if
         you modified a \`.y' file.  You may need the \`Bison' package
         in order for those modifications to take effect.  You can get
         \`Bison' from any GNU archive site."
    rm -f y.tab.c y.tab.h
    if [ $# -ne 1 ]; then
        eval LASTARG="\${$#}"
	case "$LASTARG" in
	*.y)
	    SRCFILE=`echo "$LASTARG" | sed 's/y$/c/'`
	    if [ -f "$SRCFILE" ]; then
	         cp "$SRCFILE" y.tab.c
	    fi
	    SRCFILE=`echo "$LASTARG" | sed 's/y$/h/'`
	    if [ -f "$SRCFILE" ]; then
	         cp "$SRCFILE" y.tab.h
	    fi
	  ;;
	esac
    fi
    if [ ! -f y.tab.h ]; then
	echo >y.tab.h
    fi
    if [ ! -f y.tab.c ]; then
	echo 'main() { return 0; }' >y.tab.c
    fi
    ;;

  lex|flex)
    echo 1>&2 "\
WARNING: \`$1' is missing on your system.  You should only need it if
         you modified a \`.l' file.  You may need the \`Flex' package
         in order for those modifications to take effect.  You can get
         \`Flex' from any GNU archive site."
    rm -f lex.yy.c
    if [ $# -ne 1 ]; then
        eval LASTARG="\${$#}"
	case "$LASTARG" in
	*.l)
	    SRCFILE=`echo "$LASTARG" | sed 's/l$/c/'`
	    if [ -f "$SRCFILE" ]; then
	         cp "$SRCFILE" lex.yy.c
	    fi
	  ;;
	esac
    fi
    if [ ! -f lex.yy.c ]; then
	echo 'main() { return 0; }' >lex.yy.c
    fi
    ;;

  help2man)
    echo 1>&2 "\
WARNING: \`$1' is missing on your system.  You should only need it if
	 you modified a dependency of a manual page.  You may need the
	 \`Help2man' package in order for those modifications to take
	 effect.  You can get \`Help2man' from any GNU archive site."

    file=`echo "$*" | sed -n 's/.*-o \([^ ]*\).*/\1/p'`
    if test -z "$file"; then
	file=`echo "$*" | sed -n 's/.*--output=\([^ ]*\).*/\1/p'`
    fi
    if [ -f "$file" ]; then
	touch $file
    else
	test -z "$file" || exec >$file
	echo ".ab help2man is required to generate this page"
	exit 1
    fi
    ;;

  makeinfo)
    if test -z "$run" && (makeinfo --version) > /dev/null 2>&1; then
       # We have makeinfo, but it failed.
       exit 1
    fi

    echo 1>&2 "\
WARNING: \`$1' is missing on your system.  You should only need it if
         you modified a \`.texi' or \`.texinfo' file, or any other file
         indirectly affecting the aspect of the manual.  The spurious
         call might also be the consequence of using a buggy \`make' (AIX,
         DU, IRIX).  You might want to install the \`Texinfo' package or
         the \`GNU make' package.  Grab either from any GNU archive site."
    file=`echo "$*" | sed -n 's/.*-o \([^ ]*\).*/\1/p'`
    if test -z "$file"; then
      file=`echo "$*" | sed 's/.* \([^ ]*\) *$/\1/'`
      file=`sed -n '/^@setfilename/ { s/.* \([^ ]*\) *$/\1/; p; q; }' $file`
    fi
    touch $file
    ;;

  tar)
    shift
    if test -n "$run"; then
      echo 1>&2 "ERROR: \`tar' requires --run"
      exit 1
    fi

    # We have already tried tar in the generic part.
    # Look for gnutar/gtar before invocation to avoid ugly error
    # messages.
    if (gnutar --version > /dev/null 2>&1); then
       gnutar ${1+"$@"} && exit 0
    fi
    if (gtar --version > /dev/null 2>&1); then
       gtar ${1+"$@"} && exit 0
    fi
    firstarg="$1"
    if shift; then
	case "$firstarg" in
	*o*)
	    firstarg=`echo "$firstarg" | sed s/o//`
	    tar "$firstarg" ${1+"$@"} && exit 0
	    ;;
	esac
	case "$firstarg" in
	*h*)
	    firstarg=`echo "$firstarg" | sed s/h//`
	    tar "$firstarg" ${1+"$@"} && exit 0
	    ;;
	esac
    fi

    echo 1>&2 "\
WARNING: I can't seem to be able to run \`tar' with the given arguments.
         You may want to install GNU tar or Free paxutils, or check the
         command line arguments."
    exit 1
    ;;

  *)
    echo 1>&2 "\
WARNING: \`$1' is needed, and you do not seem to have it handy on your
         system.  You might have modified some files without having the
         proper tools for further handling them.  Check the \`README' file,
         it often tells you about the needed prerequirements for installing
         this package.  You may also peek at any GNU archive site, in case
         some other package would contain this missing \`$1' program."
    exit 1
    ;;
esac

exit 0
�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/config/acinclude.m4�������������������������������������������������������������������0000664�0000771�0001750�00000001153�11166010112�013453� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������# ENABLE_DEFINE( variable, define variable, help message)
# Provides an --enable-foo option and a way to set a acconfig.h define switch
# e.g. ENABLE_DEFINE([memdebug], [MEM_DEBUG], [Memory Debugging])
#----------------------------------------------------------------

AC_DEFUN([ENABLE_DEFINE],
        [ AC_MSG_CHECKING([config option $1 for setting $2])
        AC_ARG_ENABLE([$1],
                AC_HELP_STRING([--enable-$1], [$3]),
                ,
                enableval=no)
        AC_MSG_RESULT($enableval)
        if test x"$enableval" != xno ; then
                AC_DEFINE([$2], 1, [$3])
        fi])


���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/config/config.guess�������������������������������������������������������������������0000775�0000771�0001750�00000125123�11166010112�013606� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#! /bin/sh
# Attempt to guess a canonical system name.
#   Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,
#   2000, 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc.

timestamp='2005-07-08'

# This file is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA
# 02110-1301, USA.
#
# As a special exception to the GNU General Public License, if you
# distribute this file as part of a program that contains a
# configuration script generated by Autoconf, you may include it under
# the same distribution terms that you use for the rest of that program.


# Originally written by Per Bothner <per@bothner.com>.
# Please send patches to <config-patches@gnu.org>.  Submit a context
# diff and a properly formatted ChangeLog entry.
#
# This script attempts to guess a canonical system name similar to
# config.sub.  If it succeeds, it prints the system name on stdout, and
# exits with 0.  Otherwise, it exits with 1.
#
# The plan is that this can be called by configure scripts if you
# don't specify an explicit build system type.

me=`echo "$0" | sed -e 's,.*/,,'`

usage="\
Usage: $0 [OPTION]

Output the configuration name of the system \`$me' is run on.

Operation modes:
  -h, --help         print this help, then exit
  -t, --time-stamp   print date of last modification, then exit
  -v, --version      print version number, then exit

Report bugs and patches to <config-patches@gnu.org>."

version="\
GNU config.guess ($timestamp)

Originally written by Per Bothner.
Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005
Free Software Foundation, Inc.

This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE."

help="
Try \`$me --help' for more information."

# Parse command line
while test $# -gt 0 ; do
  case $1 in
    --time-stamp | --time* | -t )
       echo "$timestamp" ; exit ;;
    --version | -v )
       echo "$version" ; exit ;;
    --help | --h* | -h )
       echo "$usage"; exit ;;
    -- )     # Stop option processing
       shift; break ;;
    - )	# Use stdin as input.
       break ;;
    -* )
       echo "$me: invalid option $1$help" >&2
       exit 1 ;;
    * )
       break ;;
  esac
done

if test $# != 0; then
  echo "$me: too many arguments$help" >&2
  exit 1
fi

trap 'exit 1' 1 2 15

# CC_FOR_BUILD -- compiler used by this script. Note that the use of a
# compiler to aid in system detection is discouraged as it requires
# temporary files to be created and, as you can see below, it is a
# headache to deal with in a portable fashion.

# Historically, `CC_FOR_BUILD' used to be named `HOST_CC'. We still
# use `HOST_CC' if defined, but it is deprecated.

# Portable tmp directory creation inspired by the Autoconf team.

set_cc_for_build='
trap "exitcode=\$?; (rm -f \$tmpfiles 2>/dev/null; rmdir \$tmp 2>/dev/null) && exit \$exitcode" 0 ;
trap "rm -f \$tmpfiles 2>/dev/null; rmdir \$tmp 2>/dev/null; exit 1" 1 2 13 15 ;
: ${TMPDIR=/tmp} ;
 { tmp=`(umask 077 && mktemp -d -q "$TMPDIR/cgXXXXXX") 2>/dev/null` && test -n "$tmp" && test -d "$tmp" ; } ||
 { test -n "$RANDOM" && tmp=$TMPDIR/cg$$-$RANDOM && (umask 077 && mkdir $tmp) ; } ||
 { tmp=$TMPDIR/cg-$$ && (umask 077 && mkdir $tmp) && echo "Warning: creating insecure temp directory" >&2 ; } ||
 { echo "$me: cannot create a temporary directory in $TMPDIR" >&2 ; exit 1 ; } ;
dummy=$tmp/dummy ;
tmpfiles="$dummy.c $dummy.o $dummy.rel $dummy" ;
case $CC_FOR_BUILD,$HOST_CC,$CC in
 ,,)    echo "int x;" > $dummy.c ;
	for c in cc gcc c89 c99 ; do
	  if ($c -c -o $dummy.o $dummy.c) >/dev/null 2>&1 ; then
	     CC_FOR_BUILD="$c"; break ;
	  fi ;
	done ;
	if test x"$CC_FOR_BUILD" = x ; then
	  CC_FOR_BUILD=no_compiler_found ;
	fi
	;;
 ,,*)   CC_FOR_BUILD=$CC ;;
 ,*,*)  CC_FOR_BUILD=$HOST_CC ;;
esac ; set_cc_for_build= ;'

# This is needed to find uname on a Pyramid OSx when run in the BSD universe.
# (ghazi@noc.rutgers.edu 1994-08-24)
if (test -f /.attbin/uname) >/dev/null 2>&1 ; then
	PATH=$PATH:/.attbin ; export PATH
fi

UNAME_MACHINE=`(uname -m) 2>/dev/null` || UNAME_MACHINE=unknown
UNAME_RELEASE=`(uname -r) 2>/dev/null` || UNAME_RELEASE=unknown
UNAME_SYSTEM=`(uname -s) 2>/dev/null`  || UNAME_SYSTEM=unknown
UNAME_VERSION=`(uname -v) 2>/dev/null` || UNAME_VERSION=unknown

case "${UNAME_MACHINE}" in
    i?86)
	test -z "$VENDOR" && VENDOR=pc
	;;
    *)
	test -z "$VENDOR" && VENDOR=unknown
	;;
esac
test -f /etc/SuSE-release -o -f /.buildenv && VENDOR=suse

# Note: order is significant - the case branches are not exclusive.

case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in
    *:NetBSD:*:*)
	# NetBSD (nbsd) targets should (where applicable) match one or
	# more of the tupples: *-*-netbsdelf*, *-*-netbsdaout*,
	# *-*-netbsdecoff* and *-*-netbsd*.  For targets that recently
	# switched to ELF, *-*-netbsd* would select the old
	# object file format.  This provides both forward
	# compatibility and a consistent mechanism for selecting the
	# object file format.
	#
	# Note: NetBSD doesn't particularly care about the vendor
	# portion of the name.  We always set it to "unknown".
	sysctl="sysctl -n hw.machine_arch"
	UNAME_MACHINE_ARCH=`(/sbin/$sysctl 2>/dev/null || \
	    /usr/sbin/$sysctl 2>/dev/null || echo unknown)`
	case "${UNAME_MACHINE_ARCH}" in
	    armeb) machine=armeb-unknown ;;
	    arm*) machine=arm-unknown ;;
	    sh3el) machine=shl-unknown ;;
	    sh3eb) machine=sh-unknown ;;
	    *) machine=${UNAME_MACHINE_ARCH}-unknown ;;
	esac
	# The Operating System including object format, if it has switched
	# to ELF recently, or will in the future.
	case "${UNAME_MACHINE_ARCH}" in
	    arm*|i386|m68k|ns32k|sh3*|sparc|vax)
		eval $set_cc_for_build
		if echo __ELF__ | $CC_FOR_BUILD -E - 2>/dev/null \
			| grep __ELF__ >/dev/null
		then
		    # Once all utilities can be ECOFF (netbsdecoff) or a.out (netbsdaout).
		    # Return netbsd for either.  FIX?
		    os=netbsd
		else
		    os=netbsdelf
		fi
		;;
	    *)
	        os=netbsd
		;;
	esac
	# The OS release
	# Debian GNU/NetBSD machines have a different userland, and
	# thus, need a distinct triplet. However, they do not need
	# kernel version information, so it can be replaced with a
	# suitable tag, in the style of linux-gnu.
	case "${UNAME_VERSION}" in
	    Debian*)
		release='-gnu'
		;;
	    *)
		release=`echo ${UNAME_RELEASE}|sed -e 's/[-_].*/\./'`
		;;
	esac
	# Since CPU_TYPE-MANUFACTURER-KERNEL-OPERATING_SYSTEM:
	# contains redundant information, the shorter form:
	# CPU_TYPE-MANUFACTURER-OPERATING_SYSTEM is used.
	echo "${machine}-${os}${release}"
	exit ;;
    *:OpenBSD:*:*)
	UNAME_MACHINE_ARCH=`arch | sed 's/OpenBSD.//'`
	echo ${UNAME_MACHINE_ARCH}-unknown-openbsd${UNAME_RELEASE}
	exit ;;
    *:ekkoBSD:*:*)
	echo ${UNAME_MACHINE}-unknown-ekkobsd${UNAME_RELEASE}
	exit ;;
    macppc:MirBSD:*:*)
	echo powerppc-unknown-mirbsd${UNAME_RELEASE}
	exit ;;
    *:MirBSD:*:*)
	echo ${UNAME_MACHINE}-unknown-mirbsd${UNAME_RELEASE}
	exit ;;
    alpha:OSF1:*:*)
	case $UNAME_RELEASE in
	*4.0)
		UNAME_RELEASE=`/usr/sbin/sizer -v | awk '{print $3}'`
		;;
	*5.*)
	        UNAME_RELEASE=`/usr/sbin/sizer -v | awk '{print $4}'`
		;;
	esac
	# According to Compaq, /usr/sbin/psrinfo has been available on
	# OSF/1 and Tru64 systems produced since 1995.  I hope that
	# covers most systems running today.  This code pipes the CPU
	# types through head -n 1, so we only detect the type of CPU 0.
	ALPHA_CPU_TYPE=`/usr/sbin/psrinfo -v | sed -n -e 's/^  The alpha \(.*\) processor.*$/\1/p' | head -n 1`
	case "$ALPHA_CPU_TYPE" in
	    "EV4 (21064)")
		UNAME_MACHINE="alpha" ;;
	    "EV4.5 (21064)")
		UNAME_MACHINE="alpha" ;;
	    "LCA4 (21066/21068)")
		UNAME_MACHINE="alpha" ;;
	    "EV5 (21164)")
		UNAME_MACHINE="alphaev5" ;;
	    "EV5.6 (21164A)")
		UNAME_MACHINE="alphaev56" ;;
	    "EV5.6 (21164PC)")
		UNAME_MACHINE="alphapca56" ;;
	    "EV5.7 (21164PC)")
		UNAME_MACHINE="alphapca57" ;;
	    "EV6 (21264)")
		UNAME_MACHINE="alphaev6" ;;
	    "EV6.7 (21264A)")
		UNAME_MACHINE="alphaev67" ;;
	    "EV6.8CB (21264C)")
		UNAME_MACHINE="alphaev68" ;;
	    "EV6.8AL (21264B)")
		UNAME_MACHINE="alphaev68" ;;
	    "EV6.8CX (21264D)")
		UNAME_MACHINE="alphaev68" ;;
	    "EV6.9A (21264/EV69A)")
		UNAME_MACHINE="alphaev69" ;;
	    "EV7 (21364)")
		UNAME_MACHINE="alphaev7" ;;
	    "EV7.9 (21364A)")
		UNAME_MACHINE="alphaev79" ;;
	esac
	# A Pn.n version is a patched version.
	# A Vn.n version is a released version.
	# A Tn.n version is a released field test version.
	# A Xn.n version is an unreleased experimental baselevel.
	# 1.2 uses "1.2" for uname -r.
	echo ${UNAME_MACHINE}-dec-osf`echo ${UNAME_RELEASE} | sed -e 's/^[PVTX]//' | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz'`
	exit ;;
    Alpha\ *:Windows_NT*:*)
	# How do we know it's Interix rather than the generic POSIX subsystem?
	# Should we change UNAME_MACHINE based on the output of uname instead
	# of the specific Alpha model?
	echo alpha-pc-interix
	exit ;;
    21064:Windows_NT:50:3)
	echo alpha-dec-winnt3.5
	exit ;;
    Amiga*:UNIX_System_V:4.0:*)
	echo m68k-unknown-sysv4
	exit ;;
    *:[Aa]miga[Oo][Ss]:*:*)
	echo ${UNAME_MACHINE}-unknown-amigaos
	exit ;;
    *:[Mm]orph[Oo][Ss]:*:*)
	echo ${UNAME_MACHINE}-unknown-morphos
	exit ;;
    *:OS/390:*:*)
	echo i370-ibm-openedition
	exit ;;
    *:z/VM:*:*)
	echo s390-ibm-zvmoe
	exit ;;
    *:OS400:*:*)
        echo powerpc-ibm-os400
	exit ;;
    arm:RISC*:1.[012]*:*|arm:riscix:1.[012]*:*)
	echo arm-acorn-riscix${UNAME_RELEASE}
	exit ;;
    arm:riscos:*:*|arm:RISCOS:*:*)
	echo arm-unknown-riscos
	exit ;;
    SR2?01:HI-UX/MPP:*:* | SR8000:HI-UX/MPP:*:*)
	echo hppa1.1-hitachi-hiuxmpp
	exit ;;
    Pyramid*:OSx*:*:* | MIS*:OSx*:*:* | MIS*:SMP_DC-OSx*:*:*)
	# akee@wpdis03.wpafb.af.mil (Earle F. Ake) contributed MIS and NILE.
	if test "`(/bin/universe) 2>/dev/null`" = att ; then
		echo pyramid-pyramid-sysv3
	else
		echo pyramid-pyramid-bsd
	fi
	exit ;;
    NILE*:*:*:dcosx)
	echo pyramid-pyramid-svr4
	exit ;;
    DRS?6000:unix:4.0:6*)
	echo sparc-icl-nx6
	exit ;;
    DRS?6000:UNIX_SV:4.2*:7* | DRS?6000:isis:4.2*:7*)
	case `/usr/bin/uname -p` in
	    sparc) echo sparc-icl-nx7; exit ;;
	esac ;;
    sun4H:SunOS:5.*:*)
	echo sparc-hal-solaris2`echo ${UNAME_RELEASE}|sed -e 's/[^.]*//'`
	exit ;;
    sun4*:SunOS:5.*:* | tadpole*:SunOS:5.*:*)
	echo sparc-sun-solaris2`echo ${UNAME_RELEASE}|sed -e 's/[^.]*//'`
	exit ;;
    i86pc:SunOS:5.*:*)
	echo i386-pc-solaris2`echo ${UNAME_RELEASE}|sed -e 's/[^.]*//'`
	exit ;;
    sun4*:SunOS:6*:*)
	# According to config.sub, this is the proper way to canonicalize
	# SunOS6.  Hard to guess exactly what SunOS6 will be like, but
	# it's likely to be more like Solaris than SunOS4.
	echo sparc-sun-solaris3`echo ${UNAME_RELEASE}|sed -e 's/[^.]*//'`
	exit ;;
    sun4*:SunOS:*:*)
	case "`/usr/bin/arch -k`" in
	    Series*|S4*)
		UNAME_RELEASE=`uname -v`
		;;
	esac
	# Japanese Language versions have a version number like `4.1.3-JL'.
	echo sparc-sun-sunos`echo ${UNAME_RELEASE}|sed -e 's/-/_/'`
	exit ;;
    sun3*:SunOS:*:*)
	echo m68k-sun-sunos${UNAME_RELEASE}
	exit ;;
    sun*:*:4.2BSD:*)
	UNAME_RELEASE=`(sed 1q /etc/motd | awk '{print substr($5,1,3)}') 2>/dev/null`
	test "x${UNAME_RELEASE}" = "x" && UNAME_RELEASE=3
	case "`/bin/arch`" in
	    sun3)
		echo m68k-sun-sunos${UNAME_RELEASE}
		;;
	    sun4)
		echo sparc-sun-sunos${UNAME_RELEASE}
		;;
	esac
	exit ;;
    aushp:SunOS:*:*)
	echo sparc-auspex-sunos${UNAME_RELEASE}
	exit ;;
    # The situation for MiNT is a little confusing.  The machine name
    # can be virtually everything (everything which is not
    # "atarist" or "atariste" at least should have a processor
    # > m68000).  The system name ranges from "MiNT" over "FreeMiNT"
    # to the lowercase version "mint" (or "freemint").  Finally
    # the system name "TOS" denotes a system which is actually not
    # MiNT.  But MiNT is downward compatible to TOS, so this should
    # be no problem.
    atarist[e]:*MiNT:*:* | atarist[e]:*mint:*:* | atarist[e]:*TOS:*:*)
        echo m68k-atari-mint${UNAME_RELEASE}
	exit ;;
    atari*:*MiNT:*:* | atari*:*mint:*:* | atarist[e]:*TOS:*:*)
	echo m68k-atari-mint${UNAME_RELEASE}
        exit ;;
    *falcon*:*MiNT:*:* | *falcon*:*mint:*:* | *falcon*:*TOS:*:*)
        echo m68k-atari-mint${UNAME_RELEASE}
	exit ;;
    milan*:*MiNT:*:* | milan*:*mint:*:* | *milan*:*TOS:*:*)
        echo m68k-milan-mint${UNAME_RELEASE}
        exit ;;
    hades*:*MiNT:*:* | hades*:*mint:*:* | *hades*:*TOS:*:*)
        echo m68k-hades-mint${UNAME_RELEASE}
        exit ;;
    *:*MiNT:*:* | *:*mint:*:* | *:*TOS:*:*)
        echo m68k-unknown-mint${UNAME_RELEASE}
        exit ;;
    m68k:machten:*:*)
	echo m68k-apple-machten${UNAME_RELEASE}
	exit ;;
    powerpc:machten:*:*)
	echo powerpc-apple-machten${UNAME_RELEASE}
	exit ;;
    RISC*:Mach:*:*)
	echo mips-dec-mach_bsd4.3
	exit ;;
    RISC*:ULTRIX:*:*)
	echo mips-dec-ultrix${UNAME_RELEASE}
	exit ;;
    VAX*:ULTRIX*:*:*)
	echo vax-dec-ultrix${UNAME_RELEASE}
	exit ;;
    2020:CLIX:*:* | 2430:CLIX:*:*)
	echo clipper-intergraph-clix${UNAME_RELEASE}
	exit ;;
    mips:*:*:UMIPS | mips:*:*:RISCos)
	eval $set_cc_for_build
	sed 's/^	//' << EOF >$dummy.c
#ifdef __cplusplus
#include <stdio.h>  /* for printf() prototype */
	int main (int argc, char *argv[]) {
#else
	int main (argc, argv) int argc; char *argv[]; {
#endif
	#if defined (host_mips) && defined (MIPSEB)
	#if defined (SYSTYPE_SYSV)
	  printf ("mips-mips-riscos%ssysv\n", argv[1]); exit (0);
	#endif
	#if defined (SYSTYPE_SVR4)
	  printf ("mips-mips-riscos%ssvr4\n", argv[1]); exit (0);
	#endif
	#if defined (SYSTYPE_BSD43) || defined(SYSTYPE_BSD)
	  printf ("mips-mips-riscos%sbsd\n", argv[1]); exit (0);
	#endif
	#endif
	  exit (-1);
	}
EOF
	$CC_FOR_BUILD -o $dummy $dummy.c &&
	  dummyarg=`echo "${UNAME_RELEASE}" | sed -n 's/\([0-9]*\).*/\1/p'` &&
	  SYSTEM_NAME=`$dummy $dummyarg` &&
	    { echo "$SYSTEM_NAME"; exit; }
	echo mips-mips-riscos${UNAME_RELEASE}
	exit ;;
    Motorola:PowerMAX_OS:*:*)
	echo powerpc-motorola-powermax
	exit ;;
    Motorola:*:4.3:PL8-*)
	echo powerpc-harris-powermax
	exit ;;
    Night_Hawk:*:*:PowerMAX_OS | Synergy:PowerMAX_OS:*:*)
	echo powerpc-harris-powermax
	exit ;;
    Night_Hawk:Power_UNIX:*:*)
	echo powerpc-harris-powerunix
	exit ;;
    m88k:CX/UX:7*:*)
	echo m88k-harris-cxux7
	exit ;;
    m88k:*:4*:R4*)
	echo m88k-motorola-sysv4
	exit ;;
    m88k:*:3*:R3*)
	echo m88k-motorola-sysv3
	exit ;;
    AViiON:dgux:*:*)
        # DG/UX returns AViiON for all architectures
        UNAME_PROCESSOR=`/usr/bin/uname -p`
	if [ $UNAME_PROCESSOR = mc88100 ] || [ $UNAME_PROCESSOR = mc88110 ]
	then
	    if [ ${TARGET_BINARY_INTERFACE}x = m88kdguxelfx ] || \
	       [ ${TARGET_BINARY_INTERFACE}x = x ]
	    then
		echo m88k-dg-dgux${UNAME_RELEASE}
	    else
		echo m88k-dg-dguxbcs${UNAME_RELEASE}
	    fi
	else
	    echo i586-dg-dgux${UNAME_RELEASE}
	fi
 	exit ;;
    M88*:DolphinOS:*:*)	# DolphinOS (SVR3)
	echo m88k-dolphin-sysv3
	exit ;;
    M88*:*:R3*:*)
	# Delta 88k system running SVR3
	echo m88k-motorola-sysv3
	exit ;;
    XD88*:*:*:*) # Tektronix XD88 system running UTekV (SVR3)
	echo m88k-tektronix-sysv3
	exit ;;
    Tek43[0-9][0-9]:UTek:*:*) # Tektronix 4300 system running UTek (BSD)
	echo m68k-tektronix-bsd
	exit ;;
    *:IRIX*:*:*)
	echo mips-sgi-irix`echo ${UNAME_RELEASE}|sed -e 's/-/_/g'`
	exit ;;
    ????????:AIX?:[12].1:2)   # AIX 2.2.1 or AIX 2.1.1 is RT/PC AIX.
	echo romp-ibm-aix     # uname -m gives an 8 hex-code CPU id
	exit ;;               # Note that: echo "'`uname -s`'" gives 'AIX '
    i*86:AIX:*:*)
	echo i386-ibm-aix
	exit ;;
    ia64:AIX:*:*)
	if [ -x /usr/bin/oslevel ] ; then
		IBM_REV=`/usr/bin/oslevel`
	else
		IBM_REV=${UNAME_VERSION}.${UNAME_RELEASE}
	fi
	echo ${UNAME_MACHINE}-ibm-aix${IBM_REV}
	exit ;;
    *:AIX:2:3)
	if grep bos325 /usr/include/stdio.h >/dev/null 2>&1; then
		eval $set_cc_for_build
		sed 's/^		//' << EOF >$dummy.c
		#include <sys/systemcfg.h>

		main()
			{
			if (!__power_pc())
				exit(1);
			puts("powerpc-ibm-aix3.2.5");
			exit(0);
			}
EOF
		if $CC_FOR_BUILD -o $dummy $dummy.c && SYSTEM_NAME=`$dummy`
		then
			echo "$SYSTEM_NAME"
		else
			echo rs6000-ibm-aix3.2.5
		fi
	elif grep bos324 /usr/include/stdio.h >/dev/null 2>&1; then
		echo rs6000-ibm-aix3.2.4
	else
		echo rs6000-ibm-aix3.2
	fi
	exit ;;
    *:AIX:*:[45])
	IBM_CPU_ID=`/usr/sbin/lsdev -C -c processor -S available | sed 1q | awk '{ print $1 }'`
	if /usr/sbin/lsattr -El ${IBM_CPU_ID} | grep ' POWER' >/dev/null 2>&1; then
		IBM_ARCH=rs6000
	else
		IBM_ARCH=powerpc
	fi
	if [ -x /usr/bin/oslevel ] ; then
		IBM_REV=`/usr/bin/oslevel`
	else
		IBM_REV=${UNAME_VERSION}.${UNAME_RELEASE}
	fi
	echo ${IBM_ARCH}-ibm-aix${IBM_REV}
	exit ;;
    *:AIX:*:*)
	echo rs6000-ibm-aix
	exit ;;
    ibmrt:4.4BSD:*|romp-ibm:BSD:*)
	echo romp-ibm-bsd4.4
	exit ;;
    ibmrt:*BSD:*|romp-ibm:BSD:*)            # covers RT/PC BSD and
	echo romp-ibm-bsd${UNAME_RELEASE}   # 4.3 with uname added to
	exit ;;                             # report: romp-ibm BSD 4.3
    *:BOSX:*:*)
	echo rs6000-bull-bosx
	exit ;;
    DPX/2?00:B.O.S.:*:*)
	echo m68k-bull-sysv3
	exit ;;
    9000/[34]??:4.3bsd:1.*:*)
	echo m68k-hp-bsd
	exit ;;
    hp300:4.4BSD:*:* | 9000/[34]??:4.3bsd:2.*:*)
	echo m68k-hp-bsd4.4
	exit ;;
    9000/[34678]??:HP-UX:*:*)
	HPUX_REV=`echo ${UNAME_RELEASE}|sed -e 's/[^.]*.[0B]*//'`
	case "${UNAME_MACHINE}" in
	    9000/31? )            HP_ARCH=m68000 ;;
	    9000/[34]?? )         HP_ARCH=m68k ;;
	    9000/[678][0-9][0-9])
		if [ -x /usr/bin/getconf ]; then
		    sc_cpu_version=`/usr/bin/getconf SC_CPU_VERSION 2>/dev/null`
                    sc_kernel_bits=`/usr/bin/getconf SC_KERNEL_BITS 2>/dev/null`
                    case "${sc_cpu_version}" in
                      523) HP_ARCH="hppa1.0" ;; # CPU_PA_RISC1_0
                      528) HP_ARCH="hppa1.1" ;; # CPU_PA_RISC1_1
                      532)                      # CPU_PA_RISC2_0
                        case "${sc_kernel_bits}" in
                          32) HP_ARCH="hppa2.0n" ;;
                          64) HP_ARCH="hppa2.0w" ;;
			  '') HP_ARCH="hppa2.0" ;;   # HP-UX 10.20
                        esac ;;
                    esac
		fi
		if [ "${HP_ARCH}" = "" ]; then
		    eval $set_cc_for_build
		    sed 's/^              //' << EOF >$dummy.c

              #define _HPUX_SOURCE
              #include <stdlib.h>
              #include <unistd.h>

              int main ()
              {
              #if defined(_SC_KERNEL_BITS)
                  long bits = sysconf(_SC_KERNEL_BITS);
              #endif
                  long cpu  = sysconf (_SC_CPU_VERSION);

                  switch (cpu)
              	{
              	case CPU_PA_RISC1_0: puts ("hppa1.0"); break;
              	case CPU_PA_RISC1_1: puts ("hppa1.1"); break;
              	case CPU_PA_RISC2_0:
              #if defined(_SC_KERNEL_BITS)
              	    switch (bits)
              		{
              		case 64: puts ("hppa2.0w"); break;
              		case 32: puts ("hppa2.0n"); break;
              		default: puts ("hppa2.0"); break;
              		} break;
              #else  /* !defined(_SC_KERNEL_BITS) */
              	    puts ("hppa2.0"); break;
              #endif
              	default: puts ("hppa1.0"); break;
              	}
                  exit (0);
              }
EOF
		    (CCOPTS= $CC_FOR_BUILD -o $dummy $dummy.c 2>/dev/null) && HP_ARCH=`$dummy`
		    test -z "$HP_ARCH" && HP_ARCH=hppa
		fi ;;
	esac
	if [ ${HP_ARCH} = "hppa2.0w" ]
	then
	    eval $set_cc_for_build

	    # hppa2.0w-hp-hpux* has a 64-bit kernel and a compiler generating
	    # 32-bit code.  hppa64-hp-hpux* has the same kernel and a compiler
	    # generating 64-bit code.  GNU and HP use different nomenclature:
	    #
	    # $ CC_FOR_BUILD=cc ./config.guess
	    # => hppa2.0w-hp-hpux11.23
	    # $ CC_FOR_BUILD="cc +DA2.0w" ./config.guess
	    # => hppa64-hp-hpux11.23

	    if echo __LP64__ | (CCOPTS= $CC_FOR_BUILD -E - 2>/dev/null) |
		grep __LP64__ >/dev/null
	    then
		HP_ARCH="hppa2.0w"
	    else
		HP_ARCH="hppa64"
	    fi
	fi
	echo ${HP_ARCH}-hp-hpux${HPUX_REV}
	exit ;;
    ia64:HP-UX:*:*)
	HPUX_REV=`echo ${UNAME_RELEASE}|sed -e 's/[^.]*.[0B]*//'`
	echo ia64-hp-hpux${HPUX_REV}
	exit ;;
    3050*:HI-UX:*:*)
	eval $set_cc_for_build
	sed 's/^	//' << EOF >$dummy.c
	#include <unistd.h>
	int
	main ()
	{
	  long cpu = sysconf (_SC_CPU_VERSION);
	  /* The order matters, because CPU_IS_HP_MC68K erroneously returns
	     true for CPU_PA_RISC1_0.  CPU_IS_PA_RISC returns correct
	     results, however.  */
	  if (CPU_IS_PA_RISC (cpu))
	    {
	      switch (cpu)
		{
		  case CPU_PA_RISC1_0: puts ("hppa1.0-hitachi-hiuxwe2"); break;
		  case CPU_PA_RISC1_1: puts ("hppa1.1-hitachi-hiuxwe2"); break;
		  case CPU_PA_RISC2_0: puts ("hppa2.0-hitachi-hiuxwe2"); break;
		  default: puts ("hppa-hitachi-hiuxwe2"); break;
		}
	    }
	  else if (CPU_IS_HP_MC68K (cpu))
	    puts ("m68k-hitachi-hiuxwe2");
	  else puts ("unknown-hitachi-hiuxwe2");
	  exit (0);
	}
EOF
	$CC_FOR_BUILD -o $dummy $dummy.c && SYSTEM_NAME=`$dummy` &&
		{ echo "$SYSTEM_NAME"; exit; }
	echo unknown-hitachi-hiuxwe2
	exit ;;
    9000/7??:4.3bsd:*:* | 9000/8?[79]:4.3bsd:*:* )
	echo hppa1.1-hp-bsd
	exit ;;
    9000/8??:4.3bsd:*:*)
	echo hppa1.0-hp-bsd
	exit ;;
    *9??*:MPE/iX:*:* | *3000*:MPE/iX:*:*)
	echo hppa1.0-hp-mpeix
	exit ;;
    hp7??:OSF1:*:* | hp8?[79]:OSF1:*:* )
	echo hppa1.1-hp-osf
	exit ;;
    hp8??:OSF1:*:*)
	echo hppa1.0-hp-osf
	exit ;;
    i*86:OSF1:*:*)
	if [ -x /usr/sbin/sysversion ] ; then
	    echo ${UNAME_MACHINE}-unknown-osf1mk
	else
	    echo ${UNAME_MACHINE}-unknown-osf1
	fi
	exit ;;
    parisc*:Lites*:*:*)
	echo hppa1.1-hp-lites
	exit ;;
    C1*:ConvexOS:*:* | convex:ConvexOS:C1*:*)
	echo c1-convex-bsd
        exit ;;
    C2*:ConvexOS:*:* | convex:ConvexOS:C2*:*)
	if getsysinfo -f scalar_acc
	then echo c32-convex-bsd
	else echo c2-convex-bsd
	fi
        exit ;;
    C34*:ConvexOS:*:* | convex:ConvexOS:C34*:*)
	echo c34-convex-bsd
        exit ;;
    C38*:ConvexOS:*:* | convex:ConvexOS:C38*:*)
	echo c38-convex-bsd
        exit ;;
    C4*:ConvexOS:*:* | convex:ConvexOS:C4*:*)
	echo c4-convex-bsd
        exit ;;
    CRAY*Y-MP:*:*:*)
	echo ymp-cray-unicos${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/'
	exit ;;
    CRAY*[A-Z]90:*:*:*)
	echo ${UNAME_MACHINE}-cray-unicos${UNAME_RELEASE} \
	| sed -e 's/CRAY.*\([A-Z]90\)/\1/' \
	      -e y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/ \
	      -e 's/\.[^.]*$/.X/'
	exit ;;
    CRAY*TS:*:*:*)
	echo t90-cray-unicos${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/'
	exit ;;
    CRAY*T3E:*:*:*)
	echo alphaev5-cray-unicosmk${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/'
	exit ;;
    CRAY*SV1:*:*:*)
	echo sv1-cray-unicos${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/'
	exit ;;
    *:UNICOS/mp:*:*)
	echo craynv-cray-unicosmp${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/'
	exit ;;
    F30[01]:UNIX_System_V:*:* | F700:UNIX_System_V:*:*)
	FUJITSU_PROC=`uname -m | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz'`
        FUJITSU_SYS=`uname -p | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz' | sed -e 's/\///'`
        FUJITSU_REL=`echo ${UNAME_RELEASE} | sed -e 's/ /_/'`
        echo "${FUJITSU_PROC}-fujitsu-${FUJITSU_SYS}${FUJITSU_REL}"
        exit ;;
    5000:UNIX_System_V:4.*:*)
        FUJITSU_SYS=`uname -p | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz' | sed -e 's/\///'`
        FUJITSU_REL=`echo ${UNAME_RELEASE} | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz' | sed -e 's/ /_/'`
        echo "sparc-fujitsu-${FUJITSU_SYS}${FUJITSU_REL}"
	exit ;;
    i*86:BSD/386:*:* | i*86:BSD/OS:*:* | *:Ascend\ Embedded/OS:*:*)
	echo ${UNAME_MACHINE}-pc-bsdi${UNAME_RELEASE}
	exit ;;
    sparc*:BSD/OS:*:*)
	echo sparc-unknown-bsdi${UNAME_RELEASE}
	exit ;;
    *:BSD/OS:*:*)
	echo ${UNAME_MACHINE}-unknown-bsdi${UNAME_RELEASE}
	exit ;;
    *:FreeBSD:*:*)
	echo ${UNAME_MACHINE}-unknown-freebsd`echo ${UNAME_RELEASE}|sed -e 's/[-(].*//'`
	exit ;;
    i*:CYGWIN*:*)
	echo ${UNAME_MACHINE}-pc-cygwin
	exit ;;
    i*:MINGW*:*)
	echo ${UNAME_MACHINE}-pc-mingw32
	exit ;;
    i*:windows32*:*)
    	# uname -m includes "-pc" on this system.
    	echo ${UNAME_MACHINE}-mingw32
	exit ;;
    i*:PW*:*)
	echo ${UNAME_MACHINE}-pc-pw32
	exit ;;
    x86:Interix*:[34]*)
	echo i586-pc-interix${UNAME_RELEASE}|sed -e 's/\..*//'
	exit ;;
    [345]86:Windows_95:* | [345]86:Windows_98:* | [345]86:Windows_NT:*)
	echo i${UNAME_MACHINE}-pc-mks
	exit ;;
    i*:Windows_NT*:* | Pentium*:Windows_NT*:*)
	# How do we know it's Interix rather than the generic POSIX subsystem?
	# It also conflicts with pre-2.0 versions of AT&T UWIN. Should we
	# UNAME_MACHINE based on the output of uname instead of i386?
	echo i586-pc-interix
	exit ;;
    i*:UWIN*:*)
	echo ${UNAME_MACHINE}-pc-uwin
	exit ;;
    amd64:CYGWIN*:*:*)
	echo x86_64-unknown-cygwin
	exit ;;
    p*:CYGWIN*:*)
	echo powerpcle-unknown-cygwin
	exit ;;
    prep*:SunOS:5.*:*)
	echo powerpcle-unknown-solaris2`echo ${UNAME_RELEASE}|sed -e 's/[^.]*//'`
	exit ;;
    *:GNU:*:*)
	# the GNU system
	echo `echo ${UNAME_MACHINE}|sed -e 's,[-/].*$,,'`-unknown-gnu`echo ${UNAME_RELEASE}|sed -e 's,/.*$,,'`
	exit ;;
    *:GNU/*:*:*)
	# other systems with GNU libc and userland
	echo ${UNAME_MACHINE}-unknown-`echo ${UNAME_SYSTEM} | sed 's,^[^/]*/,,' | tr '[A-Z]' '[a-z]'``echo ${UNAME_RELEASE}|sed -e 's/[-(].*//'`-gnu
	exit ;;
    i*86:Minix:*:*)
	echo ${UNAME_MACHINE}-pc-minix
	exit ;;
    arm*:Linux:*:*)
	echo ${UNAME_MACHINE}-${VENDOR}-linux
	exit ;;
    cris:Linux:*:*)
	echo cris-axis-linux
	exit ;;
    crisv32:Linux:*:*)
	echo crisv32-axis-linux
	exit ;;
    frv:Linux:*:*)
    	echo frv-${VENDOR}-linux
	exit ;;
    ia64:Linux:*:*)
	echo ${UNAME_MACHINE}-${VENDOR}-linux
	exit ;;
    m32r*:Linux:*:*)
	echo ${UNAME_MACHINE}-${VENDOR}-linux
	exit ;;
    m68*:Linux:*:*)
	echo ${UNAME_MACHINE}-${VENDOR}-linux
	exit ;;
    mips:Linux:*:*)
	eval $set_cc_for_build
	sed 's/^	//' << EOF >$dummy.c
	#undef CPU
	#undef mips
	#undef mipsel
	#if defined(__MIPSEL__) || defined(__MIPSEL) || defined(_MIPSEL) || defined(MIPSEL)
	CPU=mipsel
	#else
	#if defined(__MIPSEB__) || defined(__MIPSEB) || defined(_MIPSEB) || defined(MIPSEB)
	CPU=mips
	#else
	CPU=
	#endif
	#endif
EOF
	eval `$CC_FOR_BUILD -E $dummy.c 2>/dev/null | grep ^CPU=`
	test x"${CPU}" != x && { echo "${CPU}-${VENDOR}-linux"; exit; }
	;;
    mips64:Linux:*:*)
	eval $set_cc_for_build
	sed 's/^	//' << EOF >$dummy.c
	#undef CPU
	#undef mips64
	#undef mips64el
	#if defined(__MIPSEL__) || defined(__MIPSEL) || defined(_MIPSEL) || defined(MIPSEL)
	CPU=mips64el
	#else
	#if defined(__MIPSEB__) || defined(__MIPSEB) || defined(_MIPSEB) || defined(MIPSEB)
	CPU=mips64
	#else
	CPU=
	#endif
	#endif
EOF
	eval `$CC_FOR_BUILD -E $dummy.c 2>/dev/null | grep ^CPU=`
	test x"${CPU}" != x && { echo "${CPU}-${VENDOR}-linux"; exit; }
	;;
    ppc:Linux:*:*)
	echo powerpc-${VENDOR}-linux
	exit ;;
    ppc64:Linux:*:*)
	echo powerpc64-${VENDOR}-linux
	exit ;;
    alpha:Linux:*:*)
	case `sed -n '/^cpu model/s/^.*: \(.*\)/\1/p' < /proc/cpuinfo` in
	  EV5)   UNAME_MACHINE=alphaev5 ;;
	  EV56)  UNAME_MACHINE=alphaev56 ;;
	  PCA56) UNAME_MACHINE=alphapca56 ;;
	  PCA57) UNAME_MACHINE=alphapca56 ;;
	  EV6)   UNAME_MACHINE=alphaev6 ;;
	  EV67)  UNAME_MACHINE=alphaev67 ;;
	  EV68*) UNAME_MACHINE=alphaev68 ;;
        esac
	objdump --private-headers /bin/sh | grep ld.so.1 >/dev/null
	if test "$?" = 0 ; then LIBC="libc1" ; else LIBC="" ; fi
	echo ${UNAME_MACHINE}-${VENDOR}-linux${LIBC}
	exit ;;
    parisc:Linux:*:* | hppa:Linux:*:*)
	# Look for CPU level
	case `grep '^cpu[^a-z]*:' /proc/cpuinfo 2>/dev/null | cut -d' ' -f2` in
	  PA7*) echo hppa1.1-${VENDOR}-linux ;;
	  PA8*) echo hppa2.0-${VENDOR}-linux ;;
	  *)    echo hppa-${VENDOR}-linux ;;
	esac
	exit ;;
    parisc64:Linux:*:* | hppa64:Linux:*:*)
	echo hppa64-${VENDOR}-linux
	exit ;;
    s390:Linux:*:* | s390x:Linux:*:*)
	echo ${UNAME_MACHINE}-ibm-linux
	exit ;;
    sh64*:Linux:*:*)
    	echo ${UNAME_MACHINE}-${VENDOR}-linux
	exit ;;
    sh*:Linux:*:*)
	echo ${UNAME_MACHINE}-${VENDOR}-linux
	exit ;;
    sparc:Linux:*:* | sparc64:Linux:*:*)
	echo ${UNAME_MACHINE}-${VENDOR}-linux
	exit ;;
    x86_64:Linux:*:*)
	echo x86_64-${VENDOR}-linux
	exit ;;
    i*86:Linux:*:*)
	# The BFD linker knows what the default object file format is, so
	# first see if it will tell us. cd to the root directory to prevent
	# problems with other programs or directories called `ld' in the path.
	# Set LC_ALL=C to ensure ld outputs messages in English.
	ld_supported_targets=`cd /; LC_ALL=C ld --help 2>&1 \
			 | sed -ne '/supported targets:/!d
				    s/[ 	][ 	]*/ /g
				    s/.*supported targets: *//
				    s/ .*//
				    p'`
        case "$ld_supported_targets" in
	  elf32-i386)
		TENTATIVE="${UNAME_MACHINE}-${VENDOR}-linux"
		;;
	  a.out-i386-linux)
		echo "${UNAME_MACHINE}-${VENDOR}-linuxaout"
		exit ;;
	  coff-i386)
		echo "${UNAME_MACHINE}-${VENDOR}-linuxcoff"
		exit ;;
	  "")
		# Either a pre-BFD a.out linker (linuxoldld) or
		# one that does not give us useful --help.
		echo "${UNAME_MACHINE}-${VENDOR}-linuxoldld"
		exit ;;
	esac
	# Determine whether the default compiler is a.out or elf
	eval $set_cc_for_build
	sed 's/^	//' << EOF >$dummy.c
	#include <features.h>
	#ifdef __ELF__
	# ifdef __GLIBC__
	#  if __GLIBC__ >= 2
	LIBC=gnu
	#  else
	LIBC=gnulibc1
	#  endif
	# else
	LIBC=gnulibc1
	# endif
	#else
	#ifdef __INTEL_COMPILER
	LIBC=gnu
	#else
	LIBC=gnuaout
	#endif
	#endif
	#ifdef __dietlibc__
	LIBC=dietlibc
	#endif
EOF
	eval `$CC_FOR_BUILD -E $dummy.c 2>/dev/null | grep ^LIBC=`
	test x"${LIBC}" != x && {
		echo "${UNAME_MACHINE}-${VENDOR}-linux-${LIBC}" | sed 's/linux-gnu/linux/'
		exit
	}
	test x"${TENTATIVE}" != x && { echo "${TENTATIVE}"; exit; }
	;;
    i*86:DYNIX/ptx:4*:*)
	# ptx 4.0 does uname -s correctly, with DYNIX/ptx in there.
	# earlier versions are messed up and put the nodename in both
	# sysname and nodename.
	echo i386-sequent-sysv4
	exit ;;
    i*86:UNIX_SV:4.2MP:2.*)
        # Unixware is an offshoot of SVR4, but it has its own version
        # number series starting with 2...
        # I am not positive that other SVR4 systems won't match this,
	# I just have to hope.  -- rms.
        # Use sysv4.2uw... so that sysv4* matches it.
	echo ${UNAME_MACHINE}-pc-sysv4.2uw${UNAME_VERSION}
	exit ;;
    i*86:OS/2:*:*)
	# If we were able to find `uname', then EMX Unix compatibility
	# is probably installed.
	echo ${UNAME_MACHINE}-pc-os2-emx
	exit ;;
    i*86:XTS-300:*:STOP)
	echo ${UNAME_MACHINE}-unknown-stop
	exit ;;
    i*86:atheos:*:*)
	echo ${UNAME_MACHINE}-unknown-atheos
	exit ;;
    i*86:syllable:*:*)
	echo ${UNAME_MACHINE}-pc-syllable
	exit ;;
    i*86:LynxOS:2.*:* | i*86:LynxOS:3.[01]*:* | i*86:LynxOS:4.0*:*)
	echo i386-unknown-lynxos${UNAME_RELEASE}
	exit ;;
    i*86:*DOS:*:*)
	echo ${UNAME_MACHINE}-pc-msdosdjgpp
	exit ;;
    i*86:*:4.*:* | i*86:SYSTEM_V:4.*:*)
	UNAME_REL=`echo ${UNAME_RELEASE} | sed 's/\/MP$//'`
	if grep Novell /usr/include/link.h >/dev/null 2>/dev/null; then
		echo ${UNAME_MACHINE}-univel-sysv${UNAME_REL}
	else
		echo ${UNAME_MACHINE}-pc-sysv${UNAME_REL}
	fi
	exit ;;
    i*86:*:5:[678]*)
    	# UnixWare 7.x, OpenUNIX and OpenServer 6.
	case `/bin/uname -X | grep "^Machine"` in
	    *486*)	     UNAME_MACHINE=i486 ;;
	    *Pentium)	     UNAME_MACHINE=i586 ;;
	    *Pent*|*Celeron) UNAME_MACHINE=i686 ;;
	esac
	echo ${UNAME_MACHINE}-unknown-sysv${UNAME_RELEASE}${UNAME_SYSTEM}${UNAME_VERSION}
	exit ;;
    i*86:*:3.2:*)
	if test -f /usr/options/cb.name; then
		UNAME_REL=`sed -n 's/.*Version //p' </usr/options/cb.name`
		echo ${UNAME_MACHINE}-pc-isc$UNAME_REL
	elif /bin/uname -X 2>/dev/null >/dev/null ; then
		UNAME_REL=`(/bin/uname -X|grep Release|sed -e 's/.*= //')`
		(/bin/uname -X|grep i80486 >/dev/null) && UNAME_MACHINE=i486
		(/bin/uname -X|grep '^Machine.*Pentium' >/dev/null) \
			&& UNAME_MACHINE=i586
		(/bin/uname -X|grep '^Machine.*Pent *II' >/dev/null) \
			&& UNAME_MACHINE=i686
		(/bin/uname -X|grep '^Machine.*Pentium Pro' >/dev/null) \
			&& UNAME_MACHINE=i686
		echo ${UNAME_MACHINE}-pc-sco$UNAME_REL
	else
		echo ${UNAME_MACHINE}-pc-sysv32
	fi
	exit ;;
    pc:*:*:*)
	# Left here for compatibility:
        # uname -m prints for DJGPP always 'pc', but it prints nothing about
        # the processor, so we play safe by assuming i386.
	echo i386-pc-msdosdjgpp
        exit ;;
    Intel:Mach:3*:*)
	echo i386-pc-mach3
	exit ;;
    paragon:*:*:*)
	echo i860-intel-osf1
	exit ;;
    i860:*:4.*:*) # i860-SVR4
	if grep Stardent /usr/include/sys/uadmin.h >/dev/null 2>&1 ; then
	  echo i860-stardent-sysv${UNAME_RELEASE} # Stardent Vistra i860-SVR4
	else # Add other i860-SVR4 vendors below as they are discovered.
	  echo i860-unknown-sysv${UNAME_RELEASE}  # Unknown i860-SVR4
	fi
	exit ;;
    mini*:CTIX:SYS*5:*)
	# "miniframe"
	echo m68010-convergent-sysv
	exit ;;
    mc68k:UNIX:SYSTEM5:3.51m)
	echo m68k-convergent-sysv
	exit ;;
    M680?0:D-NIX:5.3:*)
	echo m68k-diab-dnix
	exit ;;
    M68*:*:R3V[5678]*:*)
	test -r /sysV68 && { echo 'm68k-motorola-sysv'; exit; } ;;
    3[345]??:*:4.0:3.0 | 3[34]??A:*:4.0:3.0 | 3[34]??,*:*:4.0:3.0 | 3[34]??/*:*:4.0:3.0 | 4400:*:4.0:3.0 | 4850:*:4.0:3.0 | SKA40:*:4.0:3.0 | SDS2:*:4.0:3.0 | SHG2:*:4.0:3.0 | S7501*:*:4.0:3.0)
	OS_REL=''
	test -r /etc/.relid \
	&& OS_REL=.`sed -n 's/[^ ]* [^ ]* \([0-9][0-9]\).*/\1/p' < /etc/.relid`
	/bin/uname -p 2>/dev/null | grep 86 >/dev/null \
	  && { echo i486-ncr-sysv4.3${OS_REL}; exit; }
	/bin/uname -p 2>/dev/null | /bin/grep entium >/dev/null \
	  && { echo i586-ncr-sysv4.3${OS_REL}; exit; } ;;
    3[34]??:*:4.0:* | 3[34]??,*:*:4.0:*)
        /bin/uname -p 2>/dev/null | grep 86 >/dev/null \
          && { echo i486-ncr-sysv4; exit; } ;;
    m68*:LynxOS:2.*:* | m68*:LynxOS:3.0*:*)
	echo m68k-unknown-lynxos${UNAME_RELEASE}
	exit ;;
    mc68030:UNIX_System_V:4.*:*)
	echo m68k-atari-sysv4
	exit ;;
    TSUNAMI:LynxOS:2.*:*)
	echo sparc-unknown-lynxos${UNAME_RELEASE}
	exit ;;
    rs6000:LynxOS:2.*:*)
	echo rs6000-unknown-lynxos${UNAME_RELEASE}
	exit ;;
    PowerPC:LynxOS:2.*:* | PowerPC:LynxOS:3.[01]*:* | PowerPC:LynxOS:4.0*:*)
	echo powerpc-unknown-lynxos${UNAME_RELEASE}
	exit ;;
    SM[BE]S:UNIX_SV:*:*)
	echo mips-dde-sysv${UNAME_RELEASE}
	exit ;;
    RM*:ReliantUNIX-*:*:*)
	echo mips-sni-sysv4
	exit ;;
    RM*:SINIX-*:*:*)
	echo mips-sni-sysv4
	exit ;;
    *:SINIX-*:*:*)
	if uname -p 2>/dev/null >/dev/null ; then
		UNAME_MACHINE=`(uname -p) 2>/dev/null`
		echo ${UNAME_MACHINE}-sni-sysv4
	else
		echo ns32k-sni-sysv
	fi
	exit ;;
    PENTIUM:*:4.0*:*) # Unisys `ClearPath HMP IX 4000' SVR4/MP effort
                      # says <Richard.M.Bartel@ccMail.Census.GOV>
        echo i586-unisys-sysv4
        exit ;;
    *:UNIX_System_V:4*:FTX*)
	# From Gerald Hewes <hewes@openmarket.com>.
	# How about differentiating between stratus architectures? -djm
	echo hppa1.1-stratus-sysv4
	exit ;;
    *:*:*:FTX*)
	# From seanf@swdc.stratus.com.
	echo i860-stratus-sysv4
	exit ;;
    i*86:VOS:*:*)
	# From Paul.Green@stratus.com.
	echo ${UNAME_MACHINE}-stratus-vos
	exit ;;
    *:VOS:*:*)
	# From Paul.Green@stratus.com.
	echo hppa1.1-stratus-vos
	exit ;;
    mc68*:A/UX:*:*)
	echo m68k-apple-aux${UNAME_RELEASE}
	exit ;;
    news*:NEWS-OS:6*:*)
	echo mips-sony-newsos6
	exit ;;
    R[34]000:*System_V*:*:* | R4000:UNIX_SYSV:*:* | R*000:UNIX_SV:*:*)
	if [ -d /usr/nec ]; then
	        echo mips-nec-sysv${UNAME_RELEASE}
	else
	        echo mips-unknown-sysv${UNAME_RELEASE}
	fi
        exit ;;
    BeBox:BeOS:*:*)	# BeOS running on hardware made by Be, PPC only.
	echo powerpc-be-beos
	exit ;;
    BeMac:BeOS:*:*)	# BeOS running on Mac or Mac clone, PPC only.
	echo powerpc-apple-beos
	exit ;;
    BePC:BeOS:*:*)	# BeOS running on Intel PC compatible.
	echo i586-pc-beos
	exit ;;
    SX-4:SUPER-UX:*:*)
	echo sx4-nec-superux${UNAME_RELEASE}
	exit ;;
    SX-5:SUPER-UX:*:*)
	echo sx5-nec-superux${UNAME_RELEASE}
	exit ;;
    SX-6:SUPER-UX:*:*)
	echo sx6-nec-superux${UNAME_RELEASE}
	exit ;;
    Power*:Rhapsody:*:*)
	echo powerpc-apple-rhapsody${UNAME_RELEASE}
	exit ;;
    *:Rhapsody:*:*)
	echo ${UNAME_MACHINE}-apple-rhapsody${UNAME_RELEASE}
	exit ;;
    *:Darwin:*:*)
	UNAME_PROCESSOR=`uname -p` || UNAME_PROCESSOR=unknown
	case $UNAME_PROCESSOR in
	    *86) UNAME_PROCESSOR=i686 ;;
	    unknown) UNAME_PROCESSOR=powerpc ;;
	esac
	echo ${UNAME_PROCESSOR}-apple-darwin${UNAME_RELEASE}
	exit ;;
    *:procnto*:*:* | *:QNX:[0123456789]*:*)
	UNAME_PROCESSOR=`uname -p`
	if test "$UNAME_PROCESSOR" = "x86"; then
		UNAME_PROCESSOR=i386
		UNAME_MACHINE=pc
	fi
	echo ${UNAME_PROCESSOR}-${UNAME_MACHINE}-nto-qnx${UNAME_RELEASE}
	exit ;;
    *:QNX:*:4*)
	echo i386-pc-qnx
	exit ;;
    NSE-?:NONSTOP_KERNEL:*:*)
	echo nse-tandem-nsk${UNAME_RELEASE}
	exit ;;
    NSR-?:NONSTOP_KERNEL:*:*)
	echo nsr-tandem-nsk${UNAME_RELEASE}
	exit ;;
    *:NonStop-UX:*:*)
	echo mips-compaq-nonstopux
	exit ;;
    BS2000:POSIX*:*:*)
	echo bs2000-siemens-sysv
	exit ;;
    DS/*:UNIX_System_V:*:*)
	echo ${UNAME_MACHINE}-${UNAME_SYSTEM}-${UNAME_RELEASE}
	exit ;;
    *:Plan9:*:*)
	# "uname -m" is not consistent, so use $cputype instead. 386
	# is converted to i386 for consistency with other x86
	# operating systems.
	if test "$cputype" = "386"; then
	    UNAME_MACHINE=i386
	else
	    UNAME_MACHINE="$cputype"
	fi
	echo ${UNAME_MACHINE}-unknown-plan9
	exit ;;
    *:TOPS-10:*:*)
	echo pdp10-unknown-tops10
	exit ;;
    *:TENEX:*:*)
	echo pdp10-unknown-tenex
	exit ;;
    KS10:TOPS-20:*:* | KL10:TOPS-20:*:* | TYPE4:TOPS-20:*:*)
	echo pdp10-dec-tops20
	exit ;;
    XKL-1:TOPS-20:*:* | TYPE5:TOPS-20:*:*)
	echo pdp10-xkl-tops20
	exit ;;
    *:TOPS-20:*:*)
	echo pdp10-unknown-tops20
	exit ;;
    *:ITS:*:*)
	echo pdp10-unknown-its
	exit ;;
    SEI:*:*:SEIUX)
        echo mips-sei-seiux${UNAME_RELEASE}
	exit ;;
    *:DragonFly:*:*)
	echo ${UNAME_MACHINE}-unknown-dragonfly`echo ${UNAME_RELEASE}|sed -e 's/[-(].*//'`
	exit ;;
    *:*VMS:*:*)
    	UNAME_MACHINE=`(uname -p) 2>/dev/null`
	case "${UNAME_MACHINE}" in
	    A*) echo alpha-dec-vms ; exit ;;
	    I*) echo ia64-dec-vms ; exit ;;
	    V*) echo vax-dec-vms ; exit ;;
	esac ;;
    *:XENIX:*:SysV)
	echo i386-pc-xenix
	exit ;;
    i*86:skyos:*:*)
	echo ${UNAME_MACHINE}-pc-skyos`echo ${UNAME_RELEASE}` | sed -e 's/ .*$//'
	exit ;;
esac

#echo '(No uname command or uname output not recognized.)' 1>&2
#echo "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" 1>&2

eval $set_cc_for_build
cat >$dummy.c <<EOF
#ifdef _SEQUENT_
# include <sys/types.h>
# include <sys/utsname.h>
#endif
main ()
{
#if defined (sony)
#if defined (MIPSEB)
  /* BFD wants "bsd" instead of "newsos".  Perhaps BFD should be changed,
     I don't know....  */
  printf ("mips-sony-bsd\n"); exit (0);
#else
#include <sys/param.h>
  printf ("m68k-sony-newsos%s\n",
#ifdef NEWSOS4
          "4"
#else
	  ""
#endif
         ); exit (0);
#endif
#endif

#if defined (__arm) && defined (__acorn) && defined (__unix)
  printf ("arm-acorn-riscix\n"); exit (0);
#endif

#if defined (hp300) && !defined (hpux)
  printf ("m68k-hp-bsd\n"); exit (0);
#endif

#if defined (NeXT)
#if !defined (__ARCHITECTURE__)
#define __ARCHITECTURE__ "m68k"
#endif
  int version;
  version=`(hostinfo | sed -n 's/.*NeXT Mach \([0-9]*\).*/\1/p') 2>/dev/null`;
  if (version < 4)
    printf ("%s-next-nextstep%d\n", __ARCHITECTURE__, version);
  else
    printf ("%s-next-openstep%d\n", __ARCHITECTURE__, version);
  exit (0);
#endif

#if defined (MULTIMAX) || defined (n16)
#if defined (UMAXV)
  printf ("ns32k-encore-sysv\n"); exit (0);
#else
#if defined (CMU)
  printf ("ns32k-encore-mach\n"); exit (0);
#else
  printf ("ns32k-encore-bsd\n"); exit (0);
#endif
#endif
#endif

#if defined (__386BSD__)
  printf ("i386-pc-bsd\n"); exit (0);
#endif

#if defined (sequent)
#if defined (i386)
  printf ("i386-sequent-dynix\n"); exit (0);
#endif
#if defined (ns32000)
  printf ("ns32k-sequent-dynix\n"); exit (0);
#endif
#endif

#if defined (_SEQUENT_)
    struct utsname un;

    uname(&un);

    if (strncmp(un.version, "V2", 2) == 0) {
	printf ("i386-sequent-ptx2\n"); exit (0);
    }
    if (strncmp(un.version, "V1", 2) == 0) { /* XXX is V1 correct? */
	printf ("i386-sequent-ptx1\n"); exit (0);
    }
    printf ("i386-sequent-ptx\n"); exit (0);

#endif

#if defined (vax)
# if !defined (ultrix)
#  include <sys/param.h>
#  if defined (BSD)
#   if BSD == 43
      printf ("vax-dec-bsd4.3\n"); exit (0);
#   else
#    if BSD == 199006
      printf ("vax-dec-bsd4.3reno\n"); exit (0);
#    else
      printf ("vax-dec-bsd\n"); exit (0);
#    endif
#   endif
#  else
    printf ("vax-dec-bsd\n"); exit (0);
#  endif
# else
    printf ("vax-dec-ultrix\n"); exit (0);
# endif
#endif

#if defined (alliant) && defined (i860)
  printf ("i860-alliant-bsd\n"); exit (0);
#endif

  exit (1);
}
EOF

$CC_FOR_BUILD -o $dummy $dummy.c 2>/dev/null && SYSTEM_NAME=`$dummy` &&
	{ echo "$SYSTEM_NAME"; exit; }

# Apollos put the system type in the environment.

test -d /usr/apollo && { echo ${ISP}-apollo-${SYSTYPE}; exit; }

# Convex versions that predate uname can use getsysinfo(1)

if [ -x /usr/convex/getsysinfo ]
then
    case `getsysinfo -f cpu_type` in
    c1*)
	echo c1-convex-bsd
	exit ;;
    c2*)
	if getsysinfo -f scalar_acc
	then echo c32-convex-bsd
	else echo c2-convex-bsd
	fi
	exit ;;
    c34*)
	echo c34-convex-bsd
	exit ;;
    c38*)
	echo c38-convex-bsd
	exit ;;
    c4*)
	echo c4-convex-bsd
	exit ;;
    esac
fi

cat >&2 <<EOF
$0: unable to guess system type

This script, last modified $timestamp, has failed to recognize
the operating system you are using. It is advised that you
download the most up to date version of the config scripts from

  http://savannah.gnu.org/cgi-bin/viewcvs/*checkout*/config/config/config.guess
and
  http://savannah.gnu.org/cgi-bin/viewcvs/*checkout*/config/config/config.sub

If the version you run ($0) is already up to date, please
send the following data and any information you think might be
pertinent to <config-patches@gnu.org> in order to provide the needed
information to handle your system.

config.guess timestamp = $timestamp

uname -m = `(uname -m) 2>/dev/null || echo unknown`
uname -r = `(uname -r) 2>/dev/null || echo unknown`
uname -s = `(uname -s) 2>/dev/null || echo unknown`
uname -v = `(uname -v) 2>/dev/null || echo unknown`

/usr/bin/uname -p = `(/usr/bin/uname -p) 2>/dev/null`
/bin/uname -X     = `(/bin/uname -X) 2>/dev/null`

hostinfo               = `(hostinfo) 2>/dev/null`
/bin/universe          = `(/bin/universe) 2>/dev/null`
/usr/bin/arch -k       = `(/usr/bin/arch -k) 2>/dev/null`
/bin/arch              = `(/bin/arch) 2>/dev/null`
/usr/bin/oslevel       = `(/usr/bin/oslevel) 2>/dev/null`
/usr/convex/getsysinfo = `(/usr/convex/getsysinfo) 2>/dev/null`

UNAME_MACHINE = ${UNAME_MACHINE}
UNAME_RELEASE = ${UNAME_RELEASE}
UNAME_SYSTEM  = ${UNAME_SYSTEM}
UNAME_VERSION = ${UNAME_VERSION}
EOF

exit 1

# Local variables:
# eval: (add-hook 'write-file-hooks 'time-stamp)
# time-stamp-start: "timestamp='"
# time-stamp-format: "%:y-%02m-%02d"
# time-stamp-end: "'"
# End:
���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/config/ltmain.sh����������������������������������������������������������������������0000664�0000771�0001750�00000577530�11166010112�013123� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������# ltmain.sh - Provide generalized library-building support services.
# NOTE: Changing this file will not affect anything until you rerun configure.
#
# Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001, 2003, 2004, 2005
# Free Software Foundation, Inc.
# Originally by Gordon Matzigkeit <gord@gnu.ai.mit.edu>, 1996
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
#
# As a special exception to the GNU General Public License, if you
# distribute this file as part of a program that contains a
# configuration script generated by Autoconf, you may include it under
# the same distribution terms that you use for the rest of that program.

basename="s,^.*/,,g"

# Work around backward compatibility issue on IRIX 6.5. On IRIX 6.4+, sh
# is ksh but when the shell is invoked as "sh" and the current value of
# the _XPG environment variable is not equal to 1 (one), the special
# positional parameter $0, within a function call, is the name of the
# function.
progpath="$0"

# The name of this program:
progname=`echo "$progpath" | $SED $basename`
modename="$progname"

# Global variables:
EXIT_SUCCESS=0
EXIT_FAILURE=1

PROGRAM=ltmain.sh
PACKAGE=libtool
VERSION=1.5.22
TIMESTAMP=" (1.1220.2.365 2005/12/18 22:14:06)"

# See if we are running on zsh, and set the options which allow our
# commands through without removal of \ escapes.
if test -n "${ZSH_VERSION+set}" ; then
  setopt NO_GLOB_SUBST
fi

# Check that we have a working $echo.
if test "X$1" = X--no-reexec; then
  # Discard the --no-reexec flag, and continue.
  shift
elif test "X$1" = X--fallback-echo; then
  # Avoid inline document here, it may be left over
  :
elif test "X`($echo '\t') 2>/dev/null`" = 'X\t'; then
  # Yippee, $echo works!
  :
else
  # Restart under the correct shell, and then maybe $echo will work.
  exec $SHELL "$progpath" --no-reexec ${1+"$@"}
fi

if test "X$1" = X--fallback-echo; then
  # used as fallback echo
  shift
  cat <<EOF
$*
EOF
  exit $EXIT_SUCCESS
fi

default_mode=
help="Try \`$progname --help' for more information."
magic="%%%MAGIC variable%%%"
mkdir="mkdir"
mv="mv -f"
rm="rm -f"

# Sed substitution that helps us do robust quoting.  It backslashifies
# metacharacters that are still active within double-quoted strings.
Xsed="${SED}"' -e 1s/^X//'
sed_quote_subst='s/\([\\`\\"$\\\\]\)/\\\1/g'
# test EBCDIC or ASCII
case `echo X|tr X '\101'` in
 A) # ASCII based system
    # \n is not interpreted correctly by Solaris 8 /usr/ucb/tr
  SP2NL='tr \040 \012'
  NL2SP='tr \015\012 \040\040'
  ;;
 *) # EBCDIC based system
  SP2NL='tr \100 \n'
  NL2SP='tr \r\n \100\100'
  ;;
esac

# NLS nuisances.
# Only set LANG and LC_ALL to C if already set.
# These must not be set unconditionally because not all systems understand
# e.g. LANG=C (notably SCO).
# We save the old values to restore during execute mode.
if test "${LC_ALL+set}" = set; then
  save_LC_ALL="$LC_ALL"; LC_ALL=C; export LC_ALL
fi
if test "${LANG+set}" = set; then
  save_LANG="$LANG"; LANG=C; export LANG
fi

# Make sure IFS has a sensible default
lt_nl='
'
IFS=" 	$lt_nl"

if test "$build_libtool_libs" != yes && test "$build_old_libs" != yes; then
  $echo "$modename: not configured to build any kind of library" 1>&2
  $echo "Fatal configuration error.  See the $PACKAGE docs for more information." 1>&2
  exit $EXIT_FAILURE
fi

# Global variables.
mode=$default_mode
nonopt=
prev=
prevopt=
run=
show="$echo"
show_help=
execute_dlfiles=
duplicate_deps=no
preserve_args=
lo2o="s/\\.lo\$/.${objext}/"
o2lo="s/\\.${objext}\$/.lo/"

#####################################
# Shell function definitions:
# This seems to be the best place for them

# func_mktempdir [string]
# Make a temporary directory that won't clash with other running
# libtool processes, and avoids race conditions if possible.  If
# given, STRING is the basename for that directory.
func_mktempdir ()
{
    my_template="${TMPDIR-/tmp}/${1-$progname}"

    if test "$run" = ":"; then
      # Return a directory name, but don't create it in dry-run mode
      my_tmpdir="${my_template}-$$"
    else

      # If mktemp works, use that first and foremost
      my_tmpdir=`mktemp -d "${my_template}-XXXXXXXX" 2>/dev/null`

      if test ! -d "$my_tmpdir"; then
	# Failing that, at least try and use $RANDOM to avoid a race
	my_tmpdir="${my_template}-${RANDOM-0}$$"

	save_mktempdir_umask=`umask`
	umask 0077
	$mkdir "$my_tmpdir"
	umask $save_mktempdir_umask
      fi

      # If we're not in dry-run mode, bomb out on failure
      test -d "$my_tmpdir" || {
        $echo "cannot create temporary directory \`$my_tmpdir'" 1>&2
	exit $EXIT_FAILURE
      }
    fi

    $echo "X$my_tmpdir" | $Xsed
}


# func_win32_libid arg
# return the library type of file 'arg'
#
# Need a lot of goo to handle *both* DLLs and import libs
# Has to be a shell function in order to 'eat' the argument
# that is supplied when $file_magic_command is called.
func_win32_libid ()
{
  win32_libid_type="unknown"
  win32_fileres=`file -L $1 2>/dev/null`
  case $win32_fileres in
  *ar\ archive\ import\ library*) # definitely import
    win32_libid_type="x86 archive import"
    ;;
  *ar\ archive*) # could be an import, or static
    if eval $OBJDUMP -f $1 | $SED -e '10q' 2>/dev/null | \
      $EGREP -e 'file format pe-i386(.*architecture: i386)?' >/dev/null ; then
      win32_nmres=`eval $NM -f posix -A $1 | \
	$SED -n -e '1,100{/ I /{s,.*,import,;p;q;};}'`
      case $win32_nmres in
      import*)  win32_libid_type="x86 archive import";;
      *)        win32_libid_type="x86 archive static";;
      esac
    fi
    ;;
  *DLL*)
    win32_libid_type="x86 DLL"
    ;;
  *executable*) # but shell scripts are "executable" too...
    case $win32_fileres in
    *MS\ Windows\ PE\ Intel*)
      win32_libid_type="x86 DLL"
      ;;
    esac
    ;;
  esac
  $echo $win32_libid_type
}


# func_infer_tag arg
# Infer tagged configuration to use if any are available and
# if one wasn't chosen via the "--tag" command line option.
# Only attempt this if the compiler in the base compile
# command doesn't match the default compiler.
# arg is usually of the form 'gcc ...'
func_infer_tag ()
{
    if test -n "$available_tags" && test -z "$tagname"; then
      CC_quoted=
      for arg in $CC; do
	case $arg in
	  *[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \	]*|*]*|"")
	  arg="\"$arg\""
	  ;;
	esac
	CC_quoted="$CC_quoted $arg"
      done
      case $@ in
      # Blanks in the command may have been stripped by the calling shell,
      # but not from the CC environment variable when configure was run.
      " $CC "* | "$CC "* | " `$echo $CC` "* | "`$echo $CC` "* | " $CC_quoted"* | "$CC_quoted "* | " `$echo $CC_quoted` "* | "`$echo $CC_quoted` "*) ;;
      # Blanks at the start of $base_compile will cause this to fail
      # if we don't check for them as well.
      *)
	for z in $available_tags; do
	  if grep "^# ### BEGIN LIBTOOL TAG CONFIG: $z$" < "$progpath" > /dev/null; then
	    # Evaluate the configuration.
	    eval "`${SED} -n -e '/^# ### BEGIN LIBTOOL TAG CONFIG: '$z'$/,/^# ### END LIBTOOL TAG CONFIG: '$z'$/p' < $progpath`"
	    CC_quoted=
	    for arg in $CC; do
	    # Double-quote args containing other shell metacharacters.
	    case $arg in
	      *[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \	]*|*]*|"")
	      arg="\"$arg\""
	      ;;
	    esac
	    CC_quoted="$CC_quoted $arg"
	  done
	    case "$@ " in
	      " $CC "* | "$CC "* | " `$echo $CC` "* | "`$echo $CC` "* | " $CC_quoted"* | "$CC_quoted "* | " `$echo $CC_quoted` "* | "`$echo $CC_quoted` "*)
	      # The compiler in the base compile command matches
	      # the one in the tagged configuration.
	      # Assume this is the tagged configuration we want.
	      tagname=$z
	      break
	      ;;
	    esac
	  fi
	done
	# If $tagname still isn't set, then no tagged configuration
	# was found and let the user know that the "--tag" command
	# line option must be used.
	if test -z "$tagname"; then
	  $echo "$modename: unable to infer tagged configuration"
	  $echo "$modename: specify a tag with \`--tag'" 1>&2
	  exit $EXIT_FAILURE
#        else
#          $echo "$modename: using $tagname tagged configuration"
	fi
	;;
      esac
    fi
}


# func_extract_an_archive dir oldlib
func_extract_an_archive ()
{
    f_ex_an_ar_dir="$1"; shift
    f_ex_an_ar_oldlib="$1"

    $show "(cd $f_ex_an_ar_dir && $AR x $f_ex_an_ar_oldlib)"
    $run eval "(cd \$f_ex_an_ar_dir && $AR x \$f_ex_an_ar_oldlib)" || exit $?
    if ($AR t "$f_ex_an_ar_oldlib" | sort | sort -uc >/dev/null 2>&1); then
     :
    else
      $echo "$modename: ERROR: object name conflicts: $f_ex_an_ar_dir/$f_ex_an_ar_oldlib" 1>&2
      exit $EXIT_FAILURE
    fi
}

# func_extract_archives gentop oldlib ...
func_extract_archives ()
{
    my_gentop="$1"; shift
    my_oldlibs=${1+"$@"}
    my_oldobjs=""
    my_xlib=""
    my_xabs=""
    my_xdir=""
    my_status=""

    $show "${rm}r $my_gentop"
    $run ${rm}r "$my_gentop"
    $show "$mkdir $my_gentop"
    $run $mkdir "$my_gentop"
    my_status=$?
    if test "$my_status" -ne 0 && test ! -d "$my_gentop"; then
      exit $my_status
    fi

    for my_xlib in $my_oldlibs; do
      # Extract the objects.
      case $my_xlib in
	[\\/]* | [A-Za-z]:[\\/]*) my_xabs="$my_xlib" ;;
	*) my_xabs=`pwd`"/$my_xlib" ;;
      esac
      my_xlib=`$echo "X$my_xlib" | $Xsed -e 's%^.*/%%'`
      my_xdir="$my_gentop/$my_xlib"

      $show "${rm}r $my_xdir"
      $run ${rm}r "$my_xdir"
      $show "$mkdir $my_xdir"
      $run $mkdir "$my_xdir"
      exit_status=$?
      if test "$exit_status" -ne 0 && test ! -d "$my_xdir"; then
	exit $exit_status
      fi
      case $host in
      *-darwin*)
	$show "Extracting $my_xabs"
	# Do not bother doing anything if just a dry run
	if test -z "$run"; then
	  darwin_orig_dir=`pwd`
	  cd $my_xdir || exit $?
	  darwin_archive=$my_xabs
	  darwin_curdir=`pwd`
	  darwin_base_archive=`$echo "X$darwin_archive" | $Xsed -e 's%^.*/%%'`
	  darwin_arches=`lipo -info "$darwin_archive" 2>/dev/null | $EGREP Architectures 2>/dev/null`
	  if test -n "$darwin_arches"; then 
	    darwin_arches=`echo "$darwin_arches" | $SED -e 's/.*are://'`
	    darwin_arch=
	    $show "$darwin_base_archive has multiple architectures $darwin_arches"
	    for darwin_arch in  $darwin_arches ; do
	      mkdir -p "unfat-$$/${darwin_base_archive}-${darwin_arch}"
	      lipo -thin $darwin_arch -output "unfat-$$/${darwin_base_archive}-${darwin_arch}/${darwin_base_archive}" "${darwin_archive}"
	      cd "unfat-$$/${darwin_base_archive}-${darwin_arch}"
	      func_extract_an_archive "`pwd`" "${darwin_base_archive}"
	      cd "$darwin_curdir"
	      $rm "unfat-$$/${darwin_base_archive}-${darwin_arch}/${darwin_base_archive}"
	    done # $darwin_arches
      ## Okay now we have a bunch of thin objects, gotta fatten them up :)
	    darwin_filelist=`find unfat-$$ -type f -name \*.o -print -o -name \*.lo -print| xargs basename | sort -u | $NL2SP`
	    darwin_file=
	    darwin_files=
	    for darwin_file in $darwin_filelist; do
	      darwin_files=`find unfat-$$ -name $darwin_file -print | $NL2SP`
	      lipo -create -output "$darwin_file" $darwin_files
	    done # $darwin_filelist
	    ${rm}r unfat-$$
	    cd "$darwin_orig_dir"
	  else
	    cd "$darwin_orig_dir"
 	    func_extract_an_archive "$my_xdir" "$my_xabs"
	  fi # $darwin_arches
	fi # $run
	;;
      *)
        func_extract_an_archive "$my_xdir" "$my_xabs"
        ;;
      esac
      my_oldobjs="$my_oldobjs "`find $my_xdir -name \*.$objext -print -o -name \*.lo -print | $NL2SP`
    done
    func_extract_archives_result="$my_oldobjs"
}
# End of Shell function definitions
#####################################

# Darwin sucks
eval std_shrext=\"$shrext_cmds\"

disable_libs=no

# Parse our command line options once, thoroughly.
while test "$#" -gt 0
do
  arg="$1"
  shift

  case $arg in
  -*=*) optarg=`$echo "X$arg" | $Xsed -e 's/[-_a-zA-Z0-9]*=//'` ;;
  *) optarg= ;;
  esac

  # If the previous option needs an argument, assign it.
  if test -n "$prev"; then
    case $prev in
    execute_dlfiles)
      execute_dlfiles="$execute_dlfiles $arg"
      ;;
    tag)
      tagname="$arg"
      preserve_args="${preserve_args}=$arg"

      # Check whether tagname contains only valid characters
      case $tagname in
      *[!-_A-Za-z0-9,/]*)
	$echo "$progname: invalid tag name: $tagname" 1>&2
	exit $EXIT_FAILURE
	;;
      esac

      case $tagname in
      CC)
	# Don't test for the "default" C tag, as we know, it's there, but
	# not specially marked.
	;;
      *)
	if grep "^# ### BEGIN LIBTOOL TAG CONFIG: $tagname$" < "$progpath" > /dev/null; then
	  taglist="$taglist $tagname"
	  # Evaluate the configuration.
	  eval "`${SED} -n -e '/^# ### BEGIN LIBTOOL TAG CONFIG: '$tagname'$/,/^# ### END LIBTOOL TAG CONFIG: '$tagname'$/p' < $progpath`"
	else
	  $echo "$progname: ignoring unknown tag $tagname" 1>&2
	fi
	;;
      esac
      ;;
    *)
      eval "$prev=\$arg"
      ;;
    esac

    prev=
    prevopt=
    continue
  fi

  # Have we seen a non-optional argument yet?
  case $arg in
  --help)
    show_help=yes
    ;;

  --version)
    $echo "$PROGRAM (GNU $PACKAGE) $VERSION$TIMESTAMP"
    $echo
    $echo "Copyright (C) 2005  Free Software Foundation, Inc."
    $echo "This is free software; see the source for copying conditions.  There is NO"
    $echo "warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE."
    exit $?
    ;;

  --config)
    ${SED} -e '1,/^# ### BEGIN LIBTOOL CONFIG/d' -e '/^# ### END LIBTOOL CONFIG/,$d' $progpath
    # Now print the configurations for the tags.
    for tagname in $taglist; do
      ${SED} -n -e "/^# ### BEGIN LIBTOOL TAG CONFIG: $tagname$/,/^# ### END LIBTOOL TAG CONFIG: $tagname$/p" < "$progpath"
    done
    exit $?
    ;;

  --debug)
    $echo "$progname: enabling shell trace mode"
    set -x
    preserve_args="$preserve_args $arg"
    ;;

  --dry-run | -n)
    run=:
    ;;

  --features)
    $echo "host: $host"
    if test "$build_libtool_libs" = yes; then
      $echo "enable shared libraries"
    else
      $echo "disable shared libraries"
    fi
    if test "$build_old_libs" = yes; then
      $echo "enable static libraries"
    else
      $echo "disable static libraries"
    fi
    exit $?
    ;;

  --finish) mode="finish" ;;

  --mode) prevopt="--mode" prev=mode ;;
  --mode=*) mode="$optarg" ;;

  --preserve-dup-deps) duplicate_deps="yes" ;;

  --quiet | --silent)
    show=:
    preserve_args="$preserve_args $arg"
    ;;

  --tag)
    prevopt="--tag"
    prev=tag
    preserve_args="$preserve_args --tag"
    ;;
  --tag=*)
    set tag "$optarg" ${1+"$@"}
    shift
    prev=tag
    preserve_args="$preserve_args --tag"
    ;;

  -dlopen)
    prevopt="-dlopen"
    prev=execute_dlfiles
    ;;

  -*)
    $echo "$modename: unrecognized option \`$arg'" 1>&2
    $echo "$help" 1>&2
    exit $EXIT_FAILURE
    ;;

  *)
    nonopt="$arg"
    break
    ;;
  esac
done

if test -n "$prevopt"; then
  $echo "$modename: option \`$prevopt' requires an argument" 1>&2
  $echo "$help" 1>&2
  exit $EXIT_FAILURE
fi

case $disable_libs in
no) 
  ;;
shared)
  build_libtool_libs=no
  build_old_libs=yes
  ;;
static)
  build_old_libs=`case $build_libtool_libs in yes) echo no;; *) echo yes;; esac`
  ;;
esac

# If this variable is set in any of the actions, the command in it
# will be execed at the end.  This prevents here-documents from being
# left over by shells.
exec_cmd=

if test -z "$show_help"; then

  # Infer the operation mode.
  if test -z "$mode"; then
    $echo "*** Warning: inferring the mode of operation is deprecated." 1>&2
    $echo "*** Future versions of Libtool will require --mode=MODE be specified." 1>&2
    case $nonopt in
    *cc | cc* | *++ | gcc* | *-gcc* | g++* | xlc*)
      mode=link
      for arg
      do
	case $arg in
	-c)
	   mode=compile
	   break
	   ;;
	esac
      done
      ;;
    *db | *dbx | *strace | *truss)
      mode=execute
      ;;
    *install*|cp|mv)
      mode=install
      ;;
    *rm)
      mode=uninstall
      ;;
    *)
      # If we have no mode, but dlfiles were specified, then do execute mode.
      test -n "$execute_dlfiles" && mode=execute

      # Just use the default operation mode.
      if test -z "$mode"; then
	if test -n "$nonopt"; then
	  $echo "$modename: warning: cannot infer operation mode from \`$nonopt'" 1>&2
	else
	  $echo "$modename: warning: cannot infer operation mode without MODE-ARGS" 1>&2
	fi
      fi
      ;;
    esac
  fi

  # Only execute mode is allowed to have -dlopen flags.
  if test -n "$execute_dlfiles" && test "$mode" != execute; then
    $echo "$modename: unrecognized option \`-dlopen'" 1>&2
    $echo "$help" 1>&2
    exit $EXIT_FAILURE
  fi

  # Change the help message to a mode-specific one.
  generic_help="$help"
  help="Try \`$modename --help --mode=$mode' for more information."

  # These modes are in order of execution frequency so that they run quickly.
  case $mode in
  # libtool compile mode
  compile)
    modename="$modename: compile"
    # Get the compilation command and the source file.
    base_compile=
    srcfile="$nonopt"  #  always keep a non-empty value in "srcfile"
    suppress_opt=yes
    suppress_output=
    arg_mode=normal
    libobj=
    later=

    for arg
    do
      case $arg_mode in
      arg  )
	# do not "continue".  Instead, add this to base_compile
	lastarg="$arg"
	arg_mode=normal
	;;

      target )
	libobj="$arg"
	arg_mode=normal
	continue
	;;

      normal )
	# Accept any command-line options.
	case $arg in
	-o)
	  if test -n "$libobj" ; then
	    $echo "$modename: you cannot specify \`-o' more than once" 1>&2
	    exit $EXIT_FAILURE
	  fi
	  arg_mode=target
	  continue
	  ;;

	-static | -prefer-pic | -prefer-non-pic)
	  later="$later $arg"
	  continue
	  ;;

	-no-suppress)
	  suppress_opt=no
	  continue
	  ;;

	-Xcompiler)
	  arg_mode=arg  #  the next one goes into the "base_compile" arg list
	  continue      #  The current "srcfile" will either be retained or
	  ;;            #  replaced later.  I would guess that would be a bug.

	-Wc,*)
	  args=`$echo "X$arg" | $Xsed -e "s/^-Wc,//"`
	  lastarg=
	  save_ifs="$IFS"; IFS=','
 	  for arg in $args; do
	    IFS="$save_ifs"

	    # Double-quote args containing other shell metacharacters.
	    # Many Bourne shells cannot handle close brackets correctly
	    # in scan sets, so we specify it separately.
	    case $arg in
	      *[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \	]*|*]*|"")
	      arg="\"$arg\""
	      ;;
	    esac
	    lastarg="$lastarg $arg"
	  done
	  IFS="$save_ifs"
	  lastarg=`$echo "X$lastarg" | $Xsed -e "s/^ //"`

	  # Add the arguments to base_compile.
	  base_compile="$base_compile $lastarg"
	  continue
	  ;;

	* )
	  # Accept the current argument as the source file.
	  # The previous "srcfile" becomes the current argument.
	  #
	  lastarg="$srcfile"
	  srcfile="$arg"
	  ;;
	esac  #  case $arg
	;;
      esac    #  case $arg_mode

      # Aesthetically quote the previous argument.
      lastarg=`$echo "X$lastarg" | $Xsed -e "$sed_quote_subst"`

      case $lastarg in
      # Double-quote args containing other shell metacharacters.
      # Many Bourne shells cannot handle close brackets correctly
      # in scan sets, and some SunOS ksh mistreat backslash-escaping
      # in scan sets (worked around with variable expansion),
      # and furthermore cannot handle '|' '&' '(' ')' in scan sets 
      # at all, so we specify them separately.
      *[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \	]*|*]*|"")
	lastarg="\"$lastarg\""
	;;
      esac

      base_compile="$base_compile $lastarg"
    done # for arg

    case $arg_mode in
    arg)
      $echo "$modename: you must specify an argument for -Xcompile"
      exit $EXIT_FAILURE
      ;;
    target)
      $echo "$modename: you must specify a target with \`-o'" 1>&2
      exit $EXIT_FAILURE
      ;;
    *)
      # Get the name of the library object.
      [ -z "$libobj" ] && libobj=`$echo "X$srcfile" | $Xsed -e 's%^.*/%%'`
      ;;
    esac

    # Recognize several different file suffixes.
    # If the user specifies -o file.o, it is replaced with file.lo
    xform='[cCFSifmso]'
    case $libobj in
    *.ada) xform=ada ;;
    *.adb) xform=adb ;;
    *.ads) xform=ads ;;
    *.asm) xform=asm ;;
    *.c++) xform=c++ ;;
    *.cc) xform=cc ;;
    *.ii) xform=ii ;;
    *.class) xform=class ;;
    *.cpp) xform=cpp ;;
    *.cxx) xform=cxx ;;
    *.f90) xform=f90 ;;
    *.for) xform=for ;;
    *.java) xform=java ;;
    esac

    libobj=`$echo "X$libobj" | $Xsed -e "s/\.$xform$/.lo/"`

    case $libobj in
    *.lo) obj=`$echo "X$libobj" | $Xsed -e "$lo2o"` ;;
    *)
      $echo "$modename: cannot determine name of library object from \`$libobj'" 1>&2
      exit $EXIT_FAILURE
      ;;
    esac

    func_infer_tag $base_compile

    for arg in $later; do
      case $arg in
      -static)
	build_old_libs=yes
	continue
	;;

      -prefer-pic)
	pic_mode=yes
	continue
	;;

      -prefer-non-pic)
	pic_mode=no
	continue
	;;
      esac
    done

    qlibobj=`$echo "X$libobj" | $Xsed -e "$sed_quote_subst"`
    case $qlibobj in
      *[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \	]*|*]*|"")
	qlibobj="\"$qlibobj\"" ;;
    esac
    test "X$libobj" != "X$qlibobj" \
	&& $echo "X$libobj" | grep '[]~#^*{};<>?"'"'"' 	&()|`$[]' \
	&& $echo "$modename: libobj name \`$libobj' may not contain shell special characters."
    objname=`$echo "X$obj" | $Xsed -e 's%^.*/%%'`
    xdir=`$echo "X$obj" | $Xsed -e 's%/[^/]*$%%'`
    if test "X$xdir" = "X$obj"; then
      xdir=
    else
      xdir=$xdir/
    fi
    lobj=${xdir}$objdir/$objname

    if test -z "$base_compile"; then
      $echo "$modename: you must specify a compilation command" 1>&2
      $echo "$help" 1>&2
      exit $EXIT_FAILURE
    fi

    # Delete any leftover library objects.
    if test "$build_old_libs" = yes; then
      removelist="$obj $lobj $libobj ${libobj}T"
    else
      removelist="$lobj $libobj ${libobj}T"
    fi

    $run $rm $removelist
    trap "$run $rm $removelist; exit $EXIT_FAILURE" 1 2 15

    # On Cygwin there's no "real" PIC flag so we must build both object types
    case $host_os in
    cygwin* | mingw* | pw32* | os2*)
      pic_mode=default
      ;;
    esac
    if test "$pic_mode" = no && test "$deplibs_check_method" != pass_all; then
      # non-PIC code in shared libraries is not supported
      pic_mode=default
    fi

    # Calculate the filename of the output object if compiler does
    # not support -o with -c
    if test "$compiler_c_o" = no; then
      output_obj=`$echo "X$srcfile" | $Xsed -e 's%^.*/%%' -e 's%\.[^.]*$%%'`.${objext}
      lockfile="$output_obj.lock"
      removelist="$removelist $output_obj $lockfile"
      trap "$run $rm $removelist; exit $EXIT_FAILURE" 1 2 15
    else
      output_obj=
      need_locks=no
      lockfile=
    fi

    # Lock this critical section if it is needed
    # We use this script file to make the link, it avoids creating a new file
    if test "$need_locks" = yes; then
      until $run ln "$progpath" "$lockfile" 2>/dev/null; do
	$show "Waiting for $lockfile to be removed"
	sleep 2
      done
    elif test "$need_locks" = warn; then
      if test -f "$lockfile"; then
	$echo "\
*** ERROR, $lockfile exists and contains:
`cat $lockfile 2>/dev/null`

This indicates that another process is trying to use the same
temporary object file, and libtool could not work around it because
your compiler does not support \`-c' and \`-o' together.  If you
repeat this compilation, it may succeed, by chance, but you had better
avoid parallel builds (make -j) in this platform, or get a better
compiler."

	$run $rm $removelist
	exit $EXIT_FAILURE
      fi
      $echo "$srcfile" > "$lockfile"
    fi

    if test -n "$fix_srcfile_path"; then
      eval srcfile=\"$fix_srcfile_path\"
    fi
    qsrcfile=`$echo "X$srcfile" | $Xsed -e "$sed_quote_subst"`
    case $qsrcfile in
      *[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \	]*|*]*|"")
      qsrcfile="\"$qsrcfile\"" ;;
    esac

    $run $rm "$libobj" "${libobj}T"

    # Create a libtool object file (analogous to a ".la" file),
    # but don't create it if we're doing a dry run.
    test -z "$run" && cat > ${libobj}T <<EOF
# $libobj - a libtool object file
# Generated by $PROGRAM - GNU $PACKAGE $VERSION$TIMESTAMP
#
# Please DO NOT delete this file!
# It is necessary for linking the library.

# Name of the PIC object.
EOF

    # Only build a PIC object if we are building libtool libraries.
    if test "$build_libtool_libs" = yes; then
      # Without this assignment, base_compile gets emptied.
      fbsd_hideous_sh_bug=$base_compile

      if test "$pic_mode" != no; then
	command="$base_compile $qsrcfile $pic_flag"
      else
	# Don't build PIC code
	command="$base_compile $qsrcfile"
      fi

      if test ! -d "${xdir}$objdir"; then
	$show "$mkdir ${xdir}$objdir"
	$run $mkdir ${xdir}$objdir
	exit_status=$?
	if test "$exit_status" -ne 0 && test ! -d "${xdir}$objdir"; then
	  exit $exit_status
	fi
      fi

      if test -z "$output_obj"; then
	# Place PIC objects in $objdir
	command="$command -o $lobj"
      fi

      $run $rm "$lobj" "$output_obj"

      $show "$command"
      if $run eval "$command"; then :
      else
	test -n "$output_obj" && $run $rm $removelist
	exit $EXIT_FAILURE
      fi

      if test "$need_locks" = warn &&
	 test "X`cat $lockfile 2>/dev/null`" != "X$srcfile"; then
	$echo "\
*** ERROR, $lockfile contains:
`cat $lockfile 2>/dev/null`

but it should contain:
$srcfile

This indicates that another process is trying to use the same
temporary object file, and libtool could not work around it because
your compiler does not support \`-c' and \`-o' together.  If you
repeat this compilation, it may succeed, by chance, but you had better
avoid parallel builds (make -j) in this platform, or get a better
compiler."

	$run $rm $removelist
	exit $EXIT_FAILURE
      fi

      # Just move the object if needed, then go on to compile the next one
      if test -n "$output_obj" && test "X$output_obj" != "X$lobj"; then
	$show "$mv $output_obj $lobj"
	if $run $mv $output_obj $lobj; then :
	else
	  error=$?
	  $run $rm $removelist
	  exit $error
	fi
      fi

      # Append the name of the PIC object to the libtool object file.
      test -z "$run" && cat >> ${libobj}T <<EOF
pic_object='$objdir/$objname'

EOF

      # Allow error messages only from the first compilation.
      if test "$suppress_opt" = yes; then
        suppress_output=' >/dev/null 2>&1'
      fi
    else
      # No PIC object so indicate it doesn't exist in the libtool
      # object file.
      test -z "$run" && cat >> ${libobj}T <<EOF
pic_object=none

EOF
    fi

    # Only build a position-dependent object if we build old libraries.
    if test "$build_old_libs" = yes; then
      if test "$pic_mode" != yes; then
	# Don't build PIC code
	command="$base_compile $qsrcfile"
      else
	command="$base_compile $qsrcfile $pic_flag"
      fi
      if test "$compiler_c_o" = yes; then
	command="$command -o $obj"
      fi

      # Suppress compiler output if we already did a PIC compilation.
      command="$command$suppress_output"
      $run $rm "$obj" "$output_obj"
      $show "$command"
      if $run eval "$command"; then :
      else
	$run $rm $removelist
	exit $EXIT_FAILURE
      fi

      if test "$need_locks" = warn &&
	 test "X`cat $lockfile 2>/dev/null`" != "X$srcfile"; then
	$echo "\
*** ERROR, $lockfile contains:
`cat $lockfile 2>/dev/null`

but it should contain:
$srcfile

This indicates that another process is trying to use the same
temporary object file, and libtool could not work around it because
your compiler does not support \`-c' and \`-o' together.  If you
repeat this compilation, it may succeed, by chance, but you had better
avoid parallel builds (make -j) in this platform, or get a better
compiler."

	$run $rm $removelist
	exit $EXIT_FAILURE
      fi

      # Just move the object if needed
      if test -n "$output_obj" && test "X$output_obj" != "X$obj"; then
	$show "$mv $output_obj $obj"
	if $run $mv $output_obj $obj; then :
	else
	  error=$?
	  $run $rm $removelist
	  exit $error
	fi
      fi

      # Append the name of the non-PIC object the libtool object file.
      # Only append if the libtool object file exists.
      test -z "$run" && cat >> ${libobj}T <<EOF
# Name of the non-PIC object.
non_pic_object='$objname'

EOF
    else
      # Append the name of the non-PIC object the libtool object file.
      # Only append if the libtool object file exists.
      test -z "$run" && cat >> ${libobj}T <<EOF
# Name of the non-PIC object.
non_pic_object=none

EOF
    fi

    $run $mv "${libobj}T" "${libobj}"

    # Unlock the critical section if it was locked
    if test "$need_locks" != no; then
      $run $rm "$lockfile"
    fi

    exit $EXIT_SUCCESS
    ;;

  # libtool link mode
  link | relink)
    modename="$modename: link"
    case $host in
    *-*-cygwin* | *-*-mingw* | *-*-pw32* | *-*-os2*)
      # It is impossible to link a dll without this setting, and
      # we shouldn't force the makefile maintainer to figure out
      # which system we are compiling for in order to pass an extra
      # flag for every libtool invocation.
      # allow_undefined=no

      # FIXME: Unfortunately, there are problems with the above when trying
      # to make a dll which has undefined symbols, in which case not
      # even a static library is built.  For now, we need to specify
      # -no-undefined on the libtool link line when we can be certain
      # that all symbols are satisfied, otherwise we get a static library.
      allow_undefined=yes
      ;;
    *)
      allow_undefined=yes
      ;;
    esac
    libtool_args="$nonopt"
    base_compile="$nonopt $@"
    compile_command="$nonopt"
    finalize_command="$nonopt"

    compile_rpath=
    finalize_rpath=
    compile_shlibpath=
    finalize_shlibpath=
    convenience=
    old_convenience=
    deplibs=
    old_deplibs=
    compiler_flags=
    linker_flags=
    dllsearchpath=
    lib_search_path=`pwd`
    inst_prefix_dir=

    avoid_version=no
    dlfiles=
    dlprefiles=
    dlself=no
    export_dynamic=no
    export_symbols=
    export_symbols_regex=
    generated=
    libobjs=
    ltlibs=
    module=no
    no_install=no
    objs=
    non_pic_objects=
    notinst_path= # paths that contain not-installed libtool libraries
    precious_files_regex=
    prefer_static_libs=no
    preload=no
    prev=
    prevarg=
    release=
    rpath=
    xrpath=
    perm_rpath=
    temp_rpath=
    thread_safe=no
    vinfo=
    vinfo_number=no

    func_infer_tag $base_compile

    # We need to know -static, to get the right output filenames.
    for arg
    do
      case $arg in
      -all-static | -static)
	if test "X$arg" = "X-all-static"; then
	  if test "$build_libtool_libs" = yes && test -z "$link_static_flag"; then
	    $echo "$modename: warning: complete static linking is impossible in this configuration" 1>&2
	  fi
	  if test -n "$link_static_flag"; then
	    dlopen_self=$dlopen_self_static
	  fi
	  prefer_static_libs=yes
	else
	  if test -z "$pic_flag" && test -n "$link_static_flag"; then
	    dlopen_self=$dlopen_self_static
	  fi
	  prefer_static_libs=built
	fi
	build_libtool_libs=no
	build_old_libs=yes
	break
	;;
      esac
    done

    # See if our shared archives depend on static archives.
    test -n "$old_archive_from_new_cmds" && build_old_libs=yes

    # Go through the arguments, transforming them on the way.
    while test "$#" -gt 0; do
      arg="$1"
      shift
      case $arg in
      *[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \	]*|*]*|"")
	qarg=\"`$echo "X$arg" | $Xsed -e "$sed_quote_subst"`\" ### testsuite: skip nested quoting test
	;;
      *) qarg=$arg ;;
      esac
      libtool_args="$libtool_args $qarg"

      # If the previous option needs an argument, assign it.
      if test -n "$prev"; then
	case $prev in
	output)
	  compile_command="$compile_command @OUTPUT@"
	  finalize_command="$finalize_command @OUTPUT@"
	  ;;
	esac

	case $prev in
	dlfiles|dlprefiles)
	  if test "$preload" = no; then
	    # Add the symbol object into the linking commands.
	    compile_command="$compile_command @SYMFILE@"
	    finalize_command="$finalize_command @SYMFILE@"
	    preload=yes
	  fi
	  case $arg in
	  *.la | *.lo) ;;  # We handle these cases below.
	  force)
	    if test "$dlself" = no; then
	      dlself=needless
	      export_dynamic=yes
	    fi
	    prev=
	    continue
	    ;;
	  self)
	    if test "$prev" = dlprefiles; then
	      dlself=yes
	    elif test "$prev" = dlfiles && test "$dlopen_self" != yes; then
	      dlself=yes
	    else
	      dlself=needless
	      export_dynamic=yes
	    fi
	    prev=
	    continue
	    ;;
	  *)
	    if test "$prev" = dlfiles; then
	      dlfiles="$dlfiles $arg"
	    else
	      dlprefiles="$dlprefiles $arg"
	    fi
	    prev=
	    continue
	    ;;
	  esac
	  ;;
	expsyms)
	  export_symbols="$arg"
	  if test ! -f "$arg"; then
	    $echo "$modename: symbol file \`$arg' does not exist"
	    exit $EXIT_FAILURE
	  fi
	  prev=
	  continue
	  ;;
	expsyms_regex)
	  export_symbols_regex="$arg"
	  prev=
	  continue
	  ;;
	inst_prefix)
	  inst_prefix_dir="$arg"
	  prev=
	  continue
	  ;;
	precious_regex)
	  precious_files_regex="$arg"
	  prev=
	  continue
	  ;;
	release)
	  release="-$arg"
	  prev=
	  continue
	  ;;
	objectlist)
	  if test -f "$arg"; then
	    save_arg=$arg
	    moreargs=
	    for fil in `cat $save_arg`
	    do
#	      moreargs="$moreargs $fil"
	      arg=$fil
	      # A libtool-controlled object.

	      # Check to see that this really is a libtool object.
	      if (${SED} -e '2q' $arg | grep "^# Generated by .*$PACKAGE") >/dev/null 2>&1; then
		pic_object=
		non_pic_object=

		# Read the .lo file
		# If there is no directory component, then add one.
		case $arg in
		*/* | *\\*) . $arg ;;
		*) . ./$arg ;;
		esac

		if test -z "$pic_object" || \
		   test -z "$non_pic_object" ||
		   test "$pic_object" = none && \
		   test "$non_pic_object" = none; then
		  $echo "$modename: cannot find name of object for \`$arg'" 1>&2
		  exit $EXIT_FAILURE
		fi

		# Extract subdirectory from the argument.
		xdir=`$echo "X$arg" | $Xsed -e 's%/[^/]*$%%'`
		if test "X$xdir" = "X$arg"; then
		  xdir=
		else
		  xdir="$xdir/"
		fi

		if test "$pic_object" != none; then
		  # Prepend the subdirectory the object is found in.
		  pic_object="$xdir$pic_object"

		  if test "$prev" = dlfiles; then
		    if test "$build_libtool_libs" = yes && test "$dlopen_support" = yes; then
		      dlfiles="$dlfiles $pic_object"
		      prev=
		      continue
		    else
		      # If libtool objects are unsupported, then we need to preload.
		      prev=dlprefiles
		    fi
		  fi

		  # CHECK ME:  I think I busted this.  -Ossama
		  if test "$prev" = dlprefiles; then
		    # Preload the old-style object.
		    dlprefiles="$dlprefiles $pic_object"
		    prev=
		  fi

		  # A PIC object.
		  libobjs="$libobjs $pic_object"
		  arg="$pic_object"
		fi

		# Non-PIC object.
		if test "$non_pic_object" != none; then
		  # Prepend the subdirectory the object is found in.
		  non_pic_object="$xdir$non_pic_object"

		  # A standard non-PIC object
		  non_pic_objects="$non_pic_objects $non_pic_object"
		  if test -z "$pic_object" || test "$pic_object" = none ; then
		    arg="$non_pic_object"
		  fi
		else
		  # If the PIC object exists, use it instead.
		  # $xdir was prepended to $pic_object above.
		  non_pic_object="$pic_object"
		  non_pic_objects="$non_pic_objects $non_pic_object"
		fi
	      else
		# Only an error if not doing a dry-run.
		if test -z "$run"; then
		  $echo "$modename: \`$arg' is not a valid libtool object" 1>&2
		  exit $EXIT_FAILURE
		else
		  # Dry-run case.

		  # Extract subdirectory from the argument.
		  xdir=`$echo "X$arg" | $Xsed -e 's%/[^/]*$%%'`
		  if test "X$xdir" = "X$arg"; then
		    xdir=
		  else
		    xdir="$xdir/"
		  fi

		  pic_object=`$echo "X${xdir}${objdir}/${arg}" | $Xsed -e "$lo2o"`
		  non_pic_object=`$echo "X${xdir}${arg}" | $Xsed -e "$lo2o"`
		  libobjs="$libobjs $pic_object"
		  non_pic_objects="$non_pic_objects $non_pic_object"
		fi
	      fi
	    done
	  else
	    $echo "$modename: link input file \`$save_arg' does not exist"
	    exit $EXIT_FAILURE
	  fi
	  arg=$save_arg
	  prev=
	  continue
	  ;;
	rpath | xrpath)
	  # We need an absolute path.
	  case $arg in
	  [\\/]* | [A-Za-z]:[\\/]*) ;;
	  *)
	    $echo "$modename: only absolute run-paths are allowed" 1>&2
	    exit $EXIT_FAILURE
	    ;;
	  esac
	  if test "$prev" = rpath; then
	    case "$rpath " in
	    *" $arg "*) ;;
	    *) rpath="$rpath $arg" ;;
	    esac
	  else
	    case "$xrpath " in
	    *" $arg "*) ;;
	    *) xrpath="$xrpath $arg" ;;
	    esac
	  fi
	  prev=
	  continue
	  ;;
	xcompiler)
	  compiler_flags="$compiler_flags $qarg"
	  prev=
	  compile_command="$compile_command $qarg"
	  finalize_command="$finalize_command $qarg"
	  continue
	  ;;
	xlinker)
	  linker_flags="$linker_flags $qarg"
	  compiler_flags="$compiler_flags $wl$qarg"
	  prev=
	  compile_command="$compile_command $wl$qarg"
	  finalize_command="$finalize_command $wl$qarg"
	  continue
	  ;;
	xcclinker)
	  linker_flags="$linker_flags $qarg"
	  compiler_flags="$compiler_flags $qarg"
	  prev=
	  compile_command="$compile_command $qarg"
	  finalize_command="$finalize_command $qarg"
	  continue
	  ;;
	shrext)
  	  shrext_cmds="$arg"
	  prev=
	  continue
	  ;;
	darwin_framework|darwin_framework_skip)
	  test "$prev" = "darwin_framework" && compiler_flags="$compiler_flags $arg"
	  compile_command="$compile_command $arg"
	  finalize_command="$finalize_command $arg"
	  prev=
	  continue
	  ;;
	*)
	  eval "$prev=\"\$arg\""
	  prev=
	  continue
	  ;;
	esac
      fi # test -n "$prev"

      prevarg="$arg"

      case $arg in
      -all-static)
	if test -n "$link_static_flag"; then
	  compile_command="$compile_command $link_static_flag"
	  finalize_command="$finalize_command $link_static_flag"
	fi
	continue
	;;

      -allow-undefined)
	# FIXME: remove this flag sometime in the future.
	$echo "$modename: \`-allow-undefined' is deprecated because it is the default" 1>&2
	continue
	;;

      -avoid-version)
	avoid_version=yes
	continue
	;;

      -dlopen)
	prev=dlfiles
	continue
	;;

      -dlpreopen)
	prev=dlprefiles
	continue
	;;

      -export-dynamic)
	export_dynamic=yes
	continue
	;;

      -export-symbols | -export-symbols-regex)
	if test -n "$export_symbols" || test -n "$export_symbols_regex"; then
	  $echo "$modename: more than one -exported-symbols argument is not allowed"
	  exit $EXIT_FAILURE
	fi
	if test "X$arg" = "X-export-symbols"; then
	  prev=expsyms
	else
	  prev=expsyms_regex
	fi
	continue
	;;

      -framework|-arch|-isysroot)
	case " $CC " in
	  *" ${arg} ${1} "* | *" ${arg}	${1} "*) 
		prev=darwin_framework_skip ;;
	  *) compiler_flags="$compiler_flags $arg"
	     prev=darwin_framework ;;
	esac
	compile_command="$compile_command $arg"
	finalize_command="$finalize_command $arg"
	continue
	;;

      -inst-prefix-dir)
	prev=inst_prefix
	continue
	;;

      # The native IRIX linker understands -LANG:*, -LIST:* and -LNO:*
      # so, if we see these flags be careful not to treat them like -L
      -L[A-Z][A-Z]*:*)
	case $with_gcc/$host in
	no/*-*-irix* | /*-*-irix*)
	  compile_command="$compile_command $arg"
	  finalize_command="$finalize_command $arg"
	  ;;
	esac
	continue
	;;

      -L*)
	dir=`$echo "X$arg" | $Xsed -e 's/^-L//'`
	# We need an absolute path.
	case $dir in
	[\\/]* | [A-Za-z]:[\\/]*) ;;
	*)
	  absdir=`cd "$dir" && pwd`
	  if test -z "$absdir"; then
	    $echo "$modename: cannot determine absolute directory name of \`$dir'" 1>&2
	    absdir="$dir"
	    notinst_path="$notinst_path $dir"
	  fi
	  dir="$absdir"
	  ;;
	esac
	case "$deplibs " in
	*" -L$dir "*) ;;
	*)
	  deplibs="$deplibs -L$dir"
	  lib_search_path="$lib_search_path $dir"
	  ;;
	esac
	case $host in
	*-*-cygwin* | *-*-mingw* | *-*-pw32* | *-*-os2*)
	  testbindir=`$echo "X$dir" | $Xsed -e 's*/lib$*/bin*'`
	  case :$dllsearchpath: in
	  *":$dir:"*) ;;
	  *) dllsearchpath="$dllsearchpath:$dir";;
	  esac
	  case :$dllsearchpath: in
	  *":$testbindir:"*) ;;
	  *) dllsearchpath="$dllsearchpath:$testbindir";;
	  esac
	  ;;
	esac
	continue
	;;

      -l*)
	if test "X$arg" = "X-lc" || test "X$arg" = "X-lm"; then
	  case $host in
	  *-*-cygwin* | *-*-mingw* | *-*-pw32* | *-*-beos*)
	    # These systems don't actually have a C or math library (as such)
	    continue
	    ;;
	  *-*-os2*)
	    # These systems don't actually have a C library (as such)
	    test "X$arg" = "X-lc" && continue
	    ;;
	  *-*-openbsd* | *-*-freebsd* | *-*-dragonfly*)
	    # Do not include libc due to us having libc/libc_r.
	    test "X$arg" = "X-lc" && continue
	    ;;
	  *-*-rhapsody* | *-*-darwin1.[012])
	    # Rhapsody C and math libraries are in the System framework
	    deplibs="$deplibs -framework System"
	    continue
	    ;;
	  *-*-sco3.2v5* | *-*-sco5v6*)
	    # Causes problems with __ctype
	    test "X$arg" = "X-lc" && continue
	    ;;
	  *-*-sysv4.2uw2* | *-*-sysv5* | *-*-unixware* | *-*-OpenUNIX*)
	    # Compiler inserts libc in the correct place for threads to work
	    test "X$arg" = "X-lc" && continue
	    ;;
	  esac
	elif test "X$arg" = "X-lc_r"; then
	 case $host in
	 *-*-openbsd* | *-*-freebsd* | *-*-dragonfly*)
	   # Do not include libc_r directly, use -pthread flag.
	   continue
	   ;;
	 esac
	fi
	deplibs="$deplibs $arg"
	continue
	;;

      # Tru64 UNIX uses -model [arg] to determine the layout of C++
      # classes, name mangling, and exception handling.
      -model)
	compile_command="$compile_command $arg"
	compiler_flags="$compiler_flags $arg"
	finalize_command="$finalize_command $arg"
	prev=xcompiler
	continue
	;;

     -mt|-mthreads|-kthread|-Kthread|-pthread|-pthreads|--thread-safe)
	compiler_flags="$compiler_flags $arg"
	compile_command="$compile_command $arg"
	finalize_command="$finalize_command $arg"
	continue
	;;

      -module)
	module=yes
	continue
	;;

      # -64, -mips[0-9] enable 64-bit mode on the SGI compiler
      # -r[0-9][0-9]* specifies the processor on the SGI compiler
      # -xarch=*, -xtarget=* enable 64-bit mode on the Sun compiler
      # +DA*, +DD* enable 64-bit mode on the HP compiler
      # -q* pass through compiler args for the IBM compiler
      # -m* pass through architecture-specific compiler args for GCC
      # -m*, -t[45]*, -txscale* pass through architecture-specific
      # compiler args for GCC
      # -pg pass through profiling flag for GCC
      # @file GCC response files
      -64|-mips[0-9]|-r[0-9][0-9]*|-xarch=*|-xtarget=*|+DA*|+DD*|-q*|-m*|-pg| \
      -t[45]*|-txscale*|@*)

	# Unknown arguments in both finalize_command and compile_command need
	# to be aesthetically quoted because they are evaled later.
	arg=`$echo "X$arg" | $Xsed -e "$sed_quote_subst"`
	case $arg in
	*[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \	]*|*]*|"")
	  arg="\"$arg\""
	  ;;
	esac
        compile_command="$compile_command $arg"
        finalize_command="$finalize_command $arg"
        compiler_flags="$compiler_flags $arg"
        continue
        ;;

      -shrext)
	prev=shrext
	continue
	;;

      -no-fast-install)
	fast_install=no
	continue
	;;

      -no-install)
	case $host in
	*-*-cygwin* | *-*-mingw* | *-*-pw32* | *-*-os2*)
	  # The PATH hackery in wrapper scripts is required on Windows
	  # in order for the loader to find any dlls it needs.
	  $echo "$modename: warning: \`-no-install' is ignored for $host" 1>&2
	  $echo "$modename: warning: assuming \`-no-fast-install' instead" 1>&2
	  fast_install=no
	  ;;
	*) no_install=yes ;;
	esac
	continue
	;;

      -no-undefined)
	allow_undefined=no
	continue
	;;

      -objectlist)
	prev=objectlist
	continue
	;;

      -o) prev=output ;;

      -precious-files-regex)
	prev=precious_regex
	continue
	;;

      -release)
	prev=release
	continue
	;;

      -rpath)
	prev=rpath
	continue
	;;

      -R)
	prev=xrpath
	continue
	;;

      -R*)
	dir=`$echo "X$arg" | $Xsed -e 's/^-R//'`
	# We need an absolute path.
	case $dir in
	[\\/]* | [A-Za-z]:[\\/]*) ;;
	*)
	  $echo "$modename: only absolute run-paths are allowed" 1>&2
	  exit $EXIT_FAILURE
	  ;;
	esac
	case "$xrpath " in
	*" $dir "*) ;;
	*) xrpath="$xrpath $dir" ;;
	esac
	continue
	;;

      -static)
	# The effects of -static are defined in a previous loop.
	# We used to do the same as -all-static on platforms that
	# didn't have a PIC flag, but the assumption that the effects
	# would be equivalent was wrong.  It would break on at least
	# Digital Unix and AIX.
	continue
	;;

      -thread-safe)
	thread_safe=yes
	continue
	;;

      -version-info)
	prev=vinfo
	continue
	;;
      -version-number)
	prev=vinfo
	vinfo_number=yes
	continue
	;;

      -Wc,*)
	args=`$echo "X$arg" | $Xsed -e "$sed_quote_subst" -e 's/^-Wc,//'`
	arg=
	save_ifs="$IFS"; IFS=','
	for flag in $args; do
	  IFS="$save_ifs"
	  case $flag in
	    *[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \	]*|*]*|"")
	    flag="\"$flag\""
	    ;;
	  esac
	  arg="$arg $wl$flag"
	  compiler_flags="$compiler_flags $flag"
	done
	IFS="$save_ifs"
	arg=`$echo "X$arg" | $Xsed -e "s/^ //"`
	;;

      -Wl,*)
	args=`$echo "X$arg" | $Xsed -e "$sed_quote_subst" -e 's/^-Wl,//'`
	arg=
	save_ifs="$IFS"; IFS=','
	for flag in $args; do
	  IFS="$save_ifs"
	  case $flag in
	    *[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \	]*|*]*|"")
	    flag="\"$flag\""
	    ;;
	  esac
	  arg="$arg $wl$flag"
	  compiler_flags="$compiler_flags $wl$flag"
	  linker_flags="$linker_flags $flag"
	done
	IFS="$save_ifs"
	arg=`$echo "X$arg" | $Xsed -e "s/^ //"`
	;;

      -Xcompiler)
	prev=xcompiler
	continue
	;;

      -Xlinker)
	prev=xlinker
	continue
	;;

      -XCClinker)
	prev=xcclinker
	continue
	;;

      # Some other compiler flag.
      -* | +*)
	# Unknown arguments in both finalize_command and compile_command need
	# to be aesthetically quoted because they are evaled later.
	arg=`$echo "X$arg" | $Xsed -e "$sed_quote_subst"`
	case $arg in
	*[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \	]*|*]*|"")
	  arg="\"$arg\""
	  ;;
	esac
	;;

      *.$objext)
	# A standard object.
	objs="$objs $arg"
	;;

      *.lo)
	# A libtool-controlled object.

	# Check to see that this really is a libtool object.
	if (${SED} -e '2q' $arg | grep "^# Generated by .*$PACKAGE") >/dev/null 2>&1; then
	  pic_object=
	  non_pic_object=

	  # Read the .lo file
	  # If there is no directory component, then add one.
	  case $arg in
	  */* | *\\*) . $arg ;;
	  *) . ./$arg ;;
	  esac

	  if test -z "$pic_object" || \
	     test -z "$non_pic_object" ||
	     test "$pic_object" = none && \
	     test "$non_pic_object" = none; then
	    $echo "$modename: cannot find name of object for \`$arg'" 1>&2
	    exit $EXIT_FAILURE
	  fi

	  # Extract subdirectory from the argument.
	  xdir=`$echo "X$arg" | $Xsed -e 's%/[^/]*$%%'`
	  if test "X$xdir" = "X$arg"; then
	    xdir=
 	  else
	    xdir="$xdir/"
	  fi

	  if test "$pic_object" != none; then
	    # Prepend the subdirectory the object is found in.
	    pic_object="$xdir$pic_object"

	    if test "$prev" = dlfiles; then
	      if test "$build_libtool_libs" = yes && test "$dlopen_support" = yes; then
		dlfiles="$dlfiles $pic_object"
		prev=
		continue
	      else
		# If libtool objects are unsupported, then we need to preload.
		prev=dlprefiles
	      fi
	    fi

	    # CHECK ME:  I think I busted this.  -Ossama
	    if test "$prev" = dlprefiles; then
	      # Preload the old-style object.
	      dlprefiles="$dlprefiles $pic_object"
	      prev=
	    fi

	    # A PIC object.
	    libobjs="$libobjs $pic_object"
	    arg="$pic_object"
	  fi

	  # Non-PIC object.
	  if test "$non_pic_object" != none; then
	    # Prepend the subdirectory the object is found in.
	    non_pic_object="$xdir$non_pic_object"

	    # A standard non-PIC object
	    non_pic_objects="$non_pic_objects $non_pic_object"
	    if test -z "$pic_object" || test "$pic_object" = none ; then
	      arg="$non_pic_object"
	    fi
	  else
	    # If the PIC object exists, use it instead.
	    # $xdir was prepended to $pic_object above.
	    non_pic_object="$pic_object"
	    non_pic_objects="$non_pic_objects $non_pic_object"
	  fi
	else
	  # Only an error if not doing a dry-run.
	  if test -z "$run"; then
	    $echo "$modename: \`$arg' is not a valid libtool object" 1>&2
	    exit $EXIT_FAILURE
	  else
	    # Dry-run case.

	    # Extract subdirectory from the argument.
	    xdir=`$echo "X$arg" | $Xsed -e 's%/[^/]*$%%'`
	    if test "X$xdir" = "X$arg"; then
	      xdir=
	    else
	      xdir="$xdir/"
	    fi

	    pic_object=`$echo "X${xdir}${objdir}/${arg}" | $Xsed -e "$lo2o"`
	    non_pic_object=`$echo "X${xdir}${arg}" | $Xsed -e "$lo2o"`
	    libobjs="$libobjs $pic_object"
	    non_pic_objects="$non_pic_objects $non_pic_object"
	  fi
	fi
	;;

      *.$libext)
	# An archive.
	deplibs="$deplibs $arg"
	old_deplibs="$old_deplibs $arg"
	continue
	;;

      *.la)
	# A libtool-controlled library.

	if test "$prev" = dlfiles; then
	  # This library was specified with -dlopen.
	  dlfiles="$dlfiles $arg"
	  prev=
	elif test "$prev" = dlprefiles; then
	  # The library was specified with -dlpreopen.
	  dlprefiles="$dlprefiles $arg"
	  prev=
	else
	  deplibs="$deplibs $arg"
	fi
	continue
	;;

      # Some other compiler argument.
      *)
	# Unknown arguments in both finalize_command and compile_command need
	# to be aesthetically quoted because they are evaled later.
	arg=`$echo "X$arg" | $Xsed -e "$sed_quote_subst"`
	case $arg in
	*[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \	]*|*]*|"")
	  arg="\"$arg\""
	  ;;
	esac
	;;
      esac # arg

      # Now actually substitute the argument into the commands.
      if test -n "$arg"; then
	compile_command="$compile_command $arg"
	finalize_command="$finalize_command $arg"
      fi
    done # argument parsing loop

    if test -n "$prev"; then
      $echo "$modename: the \`$prevarg' option requires an argument" 1>&2
      $echo "$help" 1>&2
      exit $EXIT_FAILURE
    fi

    if test "$export_dynamic" = yes && test -n "$export_dynamic_flag_spec"; then
      eval arg=\"$export_dynamic_flag_spec\"
      compile_command="$compile_command $arg"
      finalize_command="$finalize_command $arg"
    fi

    oldlibs=
    # calculate the name of the file, without its directory
    outputname=`$echo "X$output" | $Xsed -e 's%^.*/%%'`
    libobjs_save="$libobjs"

    if test -n "$shlibpath_var"; then
      # get the directories listed in $shlibpath_var
      eval shlib_search_path=\`\$echo \"X\${$shlibpath_var}\" \| \$Xsed -e \'s/:/ /g\'\`
    else
      shlib_search_path=
    fi
    eval sys_lib_search_path=\"$sys_lib_search_path_spec\"
    eval sys_lib_dlsearch_path=\"$sys_lib_dlsearch_path_spec\"

    output_objdir=`$echo "X$output" | $Xsed -e 's%/[^/]*$%%'`
    if test "X$output_objdir" = "X$output"; then
      output_objdir="$objdir"
    else
      output_objdir="$output_objdir/$objdir"
    fi
    # Create the object directory.
    if test ! -d "$output_objdir"; then
      $show "$mkdir $output_objdir"
      $run $mkdir $output_objdir
      exit_status=$?
      if test "$exit_status" -ne 0 && test ! -d "$output_objdir"; then
	exit $exit_status
      fi
    fi

    # Determine the type of output
    case $output in
    "")
      $echo "$modename: you must specify an output file" 1>&2
      $echo "$help" 1>&2
      exit $EXIT_FAILURE
      ;;
    *.$libext) linkmode=oldlib ;;
    *.lo | *.$objext) linkmode=obj ;;
    *.la) linkmode=lib ;;
    *) linkmode=prog ;; # Anything else should be a program.
    esac

    case $host in
    *cygwin* | *mingw* | *pw32*)
      # don't eliminate duplications in $postdeps and $predeps
      duplicate_compiler_generated_deps=yes
      ;;
    *)
      duplicate_compiler_generated_deps=$duplicate_deps
      ;;
    esac
    specialdeplibs=

    libs=
    # Find all interdependent deplibs by searching for libraries
    # that are linked more than once (e.g. -la -lb -la)
    for deplib in $deplibs; do
      if test "X$duplicate_deps" = "Xyes" ; then
	case "$libs " in
	*" $deplib "*) specialdeplibs="$specialdeplibs $deplib" ;;
	esac
      fi
      libs="$libs $deplib"
    done

    if test "$linkmode" = lib; then
      libs="$predeps $libs $compiler_lib_search_path $postdeps"

      # Compute libraries that are listed more than once in $predeps
      # $postdeps and mark them as special (i.e., whose duplicates are
      # not to be eliminated).
      pre_post_deps=
      if test "X$duplicate_compiler_generated_deps" = "Xyes" ; then
	for pre_post_dep in $predeps $postdeps; do
	  case "$pre_post_deps " in
	  *" $pre_post_dep "*) specialdeplibs="$specialdeplibs $pre_post_deps" ;;
	  esac
	  pre_post_deps="$pre_post_deps $pre_post_dep"
	done
      fi
      pre_post_deps=
    fi

    deplibs=
    newdependency_libs=
    newlib_search_path=
    need_relink=no # whether we're linking any uninstalled libtool libraries
    notinst_deplibs= # not-installed libtool libraries
    case $linkmode in
    lib)
	passes="conv link"
	for file in $dlfiles $dlprefiles; do
	  case $file in
	  *.la) ;;
	  *)
	    $echo "$modename: libraries can \`-dlopen' only libtool libraries: $file" 1>&2
	    exit $EXIT_FAILURE
	    ;;
	  esac
	done
	;;
    prog)
	compile_deplibs=
	finalize_deplibs=
	alldeplibs=no
	newdlfiles=
	newdlprefiles=
	passes="conv scan dlopen dlpreopen link"
	;;
    *)  passes="conv"
	;;
    esac
    for pass in $passes; do
      if test "$linkmode,$pass" = "lib,link" ||
	 test "$linkmode,$pass" = "prog,scan"; then
	libs="$deplibs"
	deplibs=
      fi
      if test "$linkmode" = prog; then
	case $pass in
	dlopen) libs="$dlfiles" ;;
	dlpreopen) libs="$dlprefiles" ;;
	link) libs="$deplibs %DEPLIBS% $dependency_libs" ;;
	esac
      fi
      if test "$pass" = dlopen; then
	# Collect dlpreopened libraries
	save_deplibs="$deplibs"
	deplibs=
      fi
      for deplib in $libs; do
	lib=
	found=no
	case $deplib in
	-mt|-mthreads|-kthread|-Kthread|-pthread|-pthreads|--thread-safe)
	  if test "$linkmode,$pass" = "prog,link"; then
	    compile_deplibs="$deplib $compile_deplibs"
	    finalize_deplibs="$deplib $finalize_deplibs"
	  else
	    compiler_flags="$compiler_flags $deplib"
	  fi
	  continue
	  ;;
	-l*)
	  if test "$linkmode" != lib && test "$linkmode" != prog; then
	    $echo "$modename: warning: \`-l' is ignored for archives/objects" 1>&2
	    continue
	  fi
	  name=`$echo "X$deplib" | $Xsed -e 's/^-l//'`
	  for searchdir in $newlib_search_path $lib_search_path $sys_lib_search_path $shlib_search_path; do
	    for search_ext in .la $std_shrext .so .a; do
	      # Search the libtool library
	      lib="$searchdir/lib${name}${search_ext}"
	      if test -f "$lib"; then
		if test "$search_ext" = ".la"; then
		  found=yes
		else
		  found=no
		fi
		break 2
	      fi
	    done
	  done
	  if test "$found" != yes; then
	    # deplib doesn't seem to be a libtool library
	    if test "$linkmode,$pass" = "prog,link"; then
	      compile_deplibs="$deplib $compile_deplibs"
	      finalize_deplibs="$deplib $finalize_deplibs"
	    else
	      deplibs="$deplib $deplibs"
	      test "$linkmode" = lib && newdependency_libs="$deplib $newdependency_libs"
	    fi
	    continue
	  else # deplib is a libtool library
	    # If $allow_libtool_libs_with_static_runtimes && $deplib is a stdlib,
	    # We need to do some special things here, and not later.
	    if test "X$allow_libtool_libs_with_static_runtimes" = "Xyes" ; then
	      case " $predeps $postdeps " in
	      *" $deplib "*)
		if (${SED} -e '2q' $lib |
                    grep "^# Generated by .*$PACKAGE") >/dev/null 2>&1; then
		  library_names=
		  old_library=
		  case $lib in
		  */* | *\\*) . $lib ;;
		  *) . ./$lib ;;
		  esac
		  for l in $old_library $library_names; do
		    ll="$l"
		  done
		  if test "X$ll" = "X$old_library" ; then # only static version available
		    found=no
		    ladir=`$echo "X$lib" | $Xsed -e 's%/[^/]*$%%'`
		    test "X$ladir" = "X$lib" && ladir="."
		    lib=$ladir/$old_library
		    if test "$linkmode,$pass" = "prog,link"; then
		      compile_deplibs="$deplib $compile_deplibs"
		      finalize_deplibs="$deplib $finalize_deplibs"
		    else
		      deplibs="$deplib $deplibs"
		      test "$linkmode" = lib && newdependency_libs="$deplib $newdependency_libs"
		    fi
		    continue
		  fi
		fi
	        ;;
	      *) ;;
	      esac
	    fi
	  fi
	  ;; # -l
	-L*)
	  case $linkmode in
	  lib)
	    deplibs="$deplib $deplibs"
	    test "$pass" = conv && continue
	    newdependency_libs="$deplib $newdependency_libs"
	    newlib_search_path="$newlib_search_path "`$echo "X$deplib" | $Xsed -e 's/^-L//'`
	    ;;
	  prog)
	    if test "$pass" = conv; then
	      deplibs="$deplib $deplibs"
	      continue
	    fi
	    if test "$pass" = scan; then
	      deplibs="$deplib $deplibs"
	    else
	      compile_deplibs="$deplib $compile_deplibs"
	      finalize_deplibs="$deplib $finalize_deplibs"
	    fi
	    newlib_search_path="$newlib_search_path "`$echo "X$deplib" | $Xsed -e 's/^-L//'`
	    ;;
	  *)
	    $echo "$modename: warning: \`-L' is ignored for archives/objects" 1>&2
	    ;;
	  esac # linkmode
	  continue
	  ;; # -L
	-R*)
	  if test "$pass" = link; then
	    dir=`$echo "X$deplib" | $Xsed -e 's/^-R//'`
	    # Make sure the xrpath contains only unique directories.
	    case "$xrpath " in
	    *" $dir "*) ;;
	    *) xrpath="$xrpath $dir" ;;
	    esac
	  fi
	  deplibs="$deplib $deplibs"
	  continue
	  ;;
	*.la) lib="$deplib" ;;
	*.$libext)
	  if test "$pass" = conv; then
	    deplibs="$deplib $deplibs"
	    continue
	  fi
	  case $linkmode in
	  lib)
	    valid_a_lib=no
	    case $deplibs_check_method in
	      match_pattern*)
		set dummy $deplibs_check_method
	        match_pattern_regex=`expr "$deplibs_check_method" : "$2 \(.*\)"`
		if eval $echo \"$deplib\" 2>/dev/null \
		    | $SED 10q \
		    | $EGREP "$match_pattern_regex" > /dev/null; then
		  valid_a_lib=yes
		fi
		;;
	      pass_all)
		valid_a_lib=yes
		;;
            esac
	    if test "$valid_a_lib" != yes; then
	      $echo
	      $echo "*** Warning: Trying to link with static lib archive $deplib."
	      $echo "*** I have the capability to make that library automatically link in when"
	      $echo "*** you link to this library.  But I can only do this if you have a"
	      $echo "*** shared version of the library, which you do not appear to have"
	      $echo "*** because the file extensions .$libext of this argument makes me believe"
	      $echo "*** that it is just a static archive that I should not used here."
	    else
	      $echo
	      $echo "*** Warning: Linking the shared library $output against the"
	      $echo "*** static library $deplib is not portable!"
	      deplibs="$deplib $deplibs"
	    fi
	    continue
	    ;;
	  prog)
	    if test "$pass" != link; then
	      deplibs="$deplib $deplibs"
	    else
	      compile_deplibs="$deplib $compile_deplibs"
	      finalize_deplibs="$deplib $finalize_deplibs"
	    fi
	    continue
	    ;;
	  esac # linkmode
	  ;; # *.$libext
	*.lo | *.$objext)
	  if test "$pass" = conv; then
	    deplibs="$deplib $deplibs"
	  elif test "$linkmode" = prog; then
	    if test "$pass" = dlpreopen || test "$dlopen_support" != yes || test "$build_libtool_libs" = no; then
	      # If there is no dlopen support or we're linking statically,
	      # we need to preload.
	      newdlprefiles="$newdlprefiles $deplib"
	      compile_deplibs="$deplib $compile_deplibs"
	      finalize_deplibs="$deplib $finalize_deplibs"
	    else
	      newdlfiles="$newdlfiles $deplib"
	    fi
	  fi
	  continue
	  ;;
	%DEPLIBS%)
	  alldeplibs=yes
	  continue
	  ;;
	esac # case $deplib
	if test "$found" = yes || test -f "$lib"; then :
	else
	  $echo "$modename: cannot find the library \`$lib' or unhandled argument \`$deplib'" 1>&2
	  exit $EXIT_FAILURE
	fi

	# Check to see that this really is a libtool archive.
	if (${SED} -e '2q' $lib | grep "^# Generated by .*$PACKAGE") >/dev/null 2>&1; then :
	else
	  $echo "$modename: \`$lib' is not a valid libtool archive" 1>&2
	  exit $EXIT_FAILURE
	fi

	ladir=`$echo "X$lib" | $Xsed -e 's%/[^/]*$%%'`
	test "X$ladir" = "X$lib" && ladir="."

	dlname=
	dlopen=
	dlpreopen=
	libdir=
	library_names=
	old_library=
	# If the library was installed with an old release of libtool,
	# it will not redefine variables installed, or shouldnotlink
	installed=yes
	shouldnotlink=no
	avoidtemprpath=


	# Read the .la file
	case $lib in
	*/* | *\\*) . $lib ;;
	*) . ./$lib ;;
	esac

	if test "$linkmode,$pass" = "lib,link" ||
	   test "$linkmode,$pass" = "prog,scan" ||
	   { test "$linkmode" != prog && test "$linkmode" != lib; }; then
	  test -n "$dlopen" && dlfiles="$dlfiles $dlopen"
	  test -n "$dlpreopen" && dlprefiles="$dlprefiles $dlpreopen"
	fi

	if test "$pass" = conv; then
	  # Only check for convenience libraries
	  deplibs="$lib $deplibs"
	  if test -z "$libdir"; then
	    if test -z "$old_library"; then
	      $echo "$modename: cannot find name of link library for \`$lib'" 1>&2
	      exit $EXIT_FAILURE
	    fi
	    # It is a libtool convenience library, so add in its objects.
	    convenience="$convenience $ladir/$objdir/$old_library"
	    old_convenience="$old_convenience $ladir/$objdir/$old_library"
	    tmp_libs=
	    for deplib in $dependency_libs; do
	      deplibs="$deplib $deplibs"
              if test "X$duplicate_deps" = "Xyes" ; then
	        case "$tmp_libs " in
	        *" $deplib "*) specialdeplibs="$specialdeplibs $deplib" ;;
	        esac
              fi
	      tmp_libs="$tmp_libs $deplib"
	    done
	  elif test "$linkmode" != prog && test "$linkmode" != lib; then
	    $echo "$modename: \`$lib' is not a convenience library" 1>&2
	    exit $EXIT_FAILURE
	  fi
	  continue
	fi # $pass = conv


	# Get the name of the library we link against.
	linklib=
	for l in $old_library $library_names; do
	  linklib="$l"
	done
	if test -z "$linklib"; then
	  $echo "$modename: cannot find name of link library for \`$lib'" 1>&2
	  exit $EXIT_FAILURE
	fi

	# This library was specified with -dlopen.
	if test "$pass" = dlopen; then
	  if test -z "$libdir"; then
	    $echo "$modename: cannot -dlopen a convenience library: \`$lib'" 1>&2
	    exit $EXIT_FAILURE
	  fi
	  if test -z "$dlname" ||
	     test "$dlopen_support" != yes ||
	     test "$build_libtool_libs" = no; then
	    # If there is no dlname, no dlopen support or we're linking
	    # statically, we need to preload.  We also need to preload any
	    # dependent libraries so libltdl's deplib preloader doesn't
	    # bomb out in the load deplibs phase.
	    dlprefiles="$dlprefiles $lib $dependency_libs"
	  else
	    newdlfiles="$newdlfiles $lib"
	  fi
	  continue
	fi # $pass = dlopen

	# We need an absolute path.
	case $ladir in
	[\\/]* | [A-Za-z]:[\\/]*) abs_ladir="$ladir" ;;
	*)
	  abs_ladir=`cd "$ladir" && pwd`
	  if test -z "$abs_ladir"; then
	    $echo "$modename: warning: cannot determine absolute directory name of \`$ladir'" 1>&2
	    $echo "$modename: passing it literally to the linker, although it might fail" 1>&2
	    abs_ladir="$ladir"
	  fi
	  ;;
	esac
	laname=`$echo "X$lib" | $Xsed -e 's%^.*/%%'`

	# Find the relevant object directory and library name.
	if test "X$installed" = Xyes; then
	  if test ! -f "$libdir/$linklib" && test -f "$abs_ladir/$linklib"; then
	    $echo "$modename: warning: library \`$lib' was moved." 1>&2
	    dir="$ladir"
	    absdir="$abs_ladir"
	    libdir="$abs_ladir"
	  else
	    dir="$libdir"
	    absdir="$libdir"
	  fi
	  test "X$hardcode_automatic" = Xyes && avoidtemprpath=yes
	else
	  if test ! -f "$ladir/$objdir/$linklib" && test -f "$abs_ladir/$linklib"; then
	    dir="$ladir"
	    absdir="$abs_ladir"
	    # Remove this search path later
	    notinst_path="$notinst_path $abs_ladir"
	  else
	    dir="$ladir/$objdir"
	    absdir="$abs_ladir/$objdir"
	    # Remove this search path later
	    notinst_path="$notinst_path $abs_ladir"
	  fi
	fi # $installed = yes
	name=`$echo "X$laname" | $Xsed -e 's/\.la$//' -e 's/^lib//'`

	# This library was specified with -dlpreopen.
	if test "$pass" = dlpreopen; then
	  if test -z "$libdir"; then
	    $echo "$modename: cannot -dlpreopen a convenience library: \`$lib'" 1>&2
	    exit $EXIT_FAILURE
	  fi
	  # Prefer using a static library (so that no silly _DYNAMIC symbols
	  # are required to link).
	  if test -n "$old_library"; then
	    newdlprefiles="$newdlprefiles $dir/$old_library"
	  # Otherwise, use the dlname, so that lt_dlopen finds it.
	  elif test -n "$dlname"; then
	    newdlprefiles="$newdlprefiles $dir/$dlname"
	  else
	    newdlprefiles="$newdlprefiles $dir/$linklib"
	  fi
	fi # $pass = dlpreopen

	if test -z "$libdir"; then
	  # Link the convenience library
	  if test "$linkmode" = lib; then
	    deplibs="$dir/$old_library $deplibs"
	  elif test "$linkmode,$pass" = "prog,link"; then
	    compile_deplibs="$dir/$old_library $compile_deplibs"
	    finalize_deplibs="$dir/$old_library $finalize_deplibs"
	  else
	    deplibs="$lib $deplibs" # used for prog,scan pass
	  fi
	  continue
	fi


	if test "$linkmode" = prog && test "$pass" != link; then
	  newlib_search_path="$newlib_search_path $ladir"
	  deplibs="$lib $deplibs"

	  linkalldeplibs=no
	  if test "$link_all_deplibs" != no || test -z "$library_names" ||
	     test "$build_libtool_libs" = no; then
	    linkalldeplibs=yes
	  fi

	  tmp_libs=
	  for deplib in $dependency_libs; do
	    case $deplib in
	    -L*) newlib_search_path="$newlib_search_path "`$echo "X$deplib" | $Xsed -e 's/^-L//'`;; ### testsuite: skip nested quoting test
	    esac
	    # Need to link against all dependency_libs?
	    if test "$linkalldeplibs" = yes; then
	      deplibs="$deplib $deplibs"
	    else
	      # Need to hardcode shared library paths
	      # or/and link against static libraries
	      newdependency_libs="$deplib $newdependency_libs"
	    fi
	    if test "X$duplicate_deps" = "Xyes" ; then
	      case "$tmp_libs " in
	      *" $deplib "*) specialdeplibs="$specialdeplibs $deplib" ;;
	      esac
	    fi
	    tmp_libs="$tmp_libs $deplib"
	  done # for deplib
	  continue
	fi # $linkmode = prog...

	if test "$linkmode,$pass" = "prog,link"; then
	  if test -n "$library_names" &&
	     { test "$prefer_static_libs" = no || test -z "$old_library"; }; then
	    # We need to hardcode the library path
	    if test -n "$shlibpath_var" && test -z "$avoidtemprpath" ; then
	      # Make sure the rpath contains only unique directories.
	      case "$temp_rpath " in
	      *" $dir "*) ;;
	      *" $absdir "*) ;;
	      *) temp_rpath="$temp_rpath $absdir" ;;
	      esac
	    fi

	    # Hardcode the library path.
	    # Skip directories that are in the system default run-time
	    # search path.
	    case " $sys_lib_dlsearch_path " in
	    *" $absdir "*) ;;
	    *)
	      case "$compile_rpath " in
	      *" $absdir "*) ;;
	      *) compile_rpath="$compile_rpath $absdir"
	      esac
	      ;;
	    esac
	    case " $sys_lib_dlsearch_path " in
	    *" $libdir "*) ;;
	    *)
	      case "$finalize_rpath " in
	      *" $libdir "*) ;;
	      *) finalize_rpath="$finalize_rpath $libdir"
	      esac
	      ;;
	    esac
	  fi # $linkmode,$pass = prog,link...

	  if test "$alldeplibs" = yes &&
	     { test "$deplibs_check_method" = pass_all ||
	       { test "$build_libtool_libs" = yes &&
		 test -n "$library_names"; }; }; then
	    # We only need to search for static libraries
	    continue
	  fi
	fi

	link_static=no # Whether the deplib will be linked statically
	use_static_libs=$prefer_static_libs
	if test "$use_static_libs" = built && test "$installed" = yes ; then
	  use_static_libs=no
	fi
	if test -n "$library_names" &&
	   { test "$use_static_libs" = no || test -z "$old_library"; }; then
	  if test "$installed" = no; then
	    notinst_deplibs="$notinst_deplibs $lib"
	    need_relink=yes
	  fi
	  # This is a shared library

	  # Warn about portability, can't link against -module's on
	  # some systems (darwin)
	  if test "$shouldnotlink" = yes && test "$pass" = link ; then
	    $echo
	    if test "$linkmode" = prog; then
	      $echo "*** Warning: Linking the executable $output against the loadable module"
	    else
	      $echo "*** Warning: Linking the shared library $output against the loadable module"
	    fi
	    $echo "*** $linklib is not portable!"
	  fi
	  if test "$linkmode" = lib &&
	     test "$hardcode_into_libs" = yes; then
	    # Hardcode the library path.
	    # Skip directories that are in the system default run-time
	    # search path.
	    case " $sys_lib_dlsearch_path " in
	    *" $absdir "*) ;;
	    *)
	      case "$compile_rpath " in
	      *" $absdir "*) ;;
	      *) compile_rpath="$compile_rpath $absdir"
	      esac
	      ;;
	    esac
	    case " $sys_lib_dlsearch_path " in
	    *" $libdir "*) ;;
	    *)
	      case "$finalize_rpath " in
	      *" $libdir "*) ;;
	      *) finalize_rpath="$finalize_rpath $libdir"
	      esac
	      ;;
	    esac
	  fi

	  if test -n "$old_archive_from_expsyms_cmds"; then
	    # figure out the soname
	    set dummy $library_names
	    realname="$2"
	    shift; shift
	    libname=`eval \\$echo \"$libname_spec\"`
	    # use dlname if we got it. it's perfectly good, no?
	    if test -n "$dlname"; then
	      soname="$dlname"
	    elif test -n "$soname_spec"; then
	      # bleh windows
	      case $host in
	      *cygwin* | mingw*)
		major=`expr $current - $age`
		versuffix="-$major"
		;;
	      esac
	      eval soname=\"$soname_spec\"
	    else
	      soname="$realname"
	    fi

	    # Make a new name for the extract_expsyms_cmds to use
	    soroot="$soname"
	    soname=`$echo $soroot | ${SED} -e 's/^.*\///'`
	    newlib="libimp-`$echo $soname | ${SED} 's/^lib//;s/\.dll$//'`.a"

	    # If the library has no export list, then create one now
	    if test -f "$output_objdir/$soname-def"; then :
	    else
	      $show "extracting exported symbol list from \`$soname'"
	      save_ifs="$IFS"; IFS='~'
	      cmds=$extract_expsyms_cmds
	      for cmd in $cmds; do
		IFS="$save_ifs"
		eval cmd=\"$cmd\"
		$show "$cmd"
		$run eval "$cmd" || exit $?
	      done
	      IFS="$save_ifs"
	    fi

	    # Create $newlib
	    if test -f "$output_objdir/$newlib"; then :; else
	      $show "generating import library for \`$soname'"
	      save_ifs="$IFS"; IFS='~'
	      cmds=$old_archive_from_expsyms_cmds
	      for cmd in $cmds; do
		IFS="$save_ifs"
		eval cmd=\"$cmd\"
		$show "$cmd"
		$run eval "$cmd" || exit $?
	      done
	      IFS="$save_ifs"
	    fi
	    # make sure the library variables are pointing to the new library
	    dir=$output_objdir
	    linklib=$newlib
	  fi # test -n "$old_archive_from_expsyms_cmds"

	  if test "$linkmode" = prog || test "$mode" != relink; then
	    add_shlibpath=
	    add_dir=
	    add=
	    lib_linked=yes
	    case $hardcode_action in
	    immediate | unsupported)
	      if test "$hardcode_direct" = no; then
		add="$dir/$linklib"
		case $host in
		  *-*-sco3.2v5.0.[024]*) add_dir="-L$dir" ;;
		  *-*-sysv4*uw2*) add_dir="-L$dir" ;;
		  *-*-sysv5OpenUNIX* | *-*-sysv5UnixWare7.[01].[10]* | \
		    *-*-unixware7*) add_dir="-L$dir" ;;
		  *-*-darwin* )
		    # if the lib is a module then we can not link against
		    # it, someone is ignoring the new warnings I added
		    if /usr/bin/file -L $add 2> /dev/null |
                      $EGREP ": [^:]* bundle" >/dev/null ; then
		      $echo "** Warning, lib $linklib is a module, not a shared library"
		      if test -z "$old_library" ; then
		        $echo
		        $echo "** And there doesn't seem to be a static archive available"
		        $echo "** The link will probably fail, sorry"
		      else
		        add="$dir/$old_library"
		      fi
		    fi
		esac
	      elif test "$hardcode_minus_L" = no; then
		case $host in
		*-*-sunos*) add_shlibpath="$dir" ;;
		esac
		add_dir="-L$dir"
		add="-l$name"
	      elif test "$hardcode_shlibpath_var" = no; then
		add_shlibpath="$dir"
		add="-l$name"
	      else
		lib_linked=no
	      fi
	      ;;
	    relink)
	      if test "$hardcode_direct" = yes; then
		add="$dir/$linklib"
	      elif test "$hardcode_minus_L" = yes; then
		add_dir="-L$dir"
		# Try looking first in the location we're being installed to.
		if test -n "$inst_prefix_dir"; then
		  case $libdir in
		    [\\/]*)
		      add_dir="$add_dir -L$inst_prefix_dir$libdir"
		      ;;
		  esac
		fi
		add="-l$name"
	      elif test "$hardcode_shlibpath_var" = yes; then
		add_shlibpath="$dir"
		add="-l$name"
	      else
		lib_linked=no
	      fi
	      ;;
	    *) lib_linked=no ;;
	    esac

	    if test "$lib_linked" != yes; then
	      $echo "$modename: configuration error: unsupported hardcode properties"
	      exit $EXIT_FAILURE
	    fi

	    if test -n "$add_shlibpath"; then
	      case :$compile_shlibpath: in
	      *":$add_shlibpath:"*) ;;
	      *) compile_shlibpath="$compile_shlibpath$add_shlibpath:" ;;
	      esac
	    fi
	    if test "$linkmode" = prog; then
	      test -n "$add_dir" && compile_deplibs="$add_dir $compile_deplibs"
	      test -n "$add" && compile_deplibs="$add $compile_deplibs"
	    else
	      test -n "$add_dir" && deplibs="$add_dir $deplibs"
	      test -n "$add" && deplibs="$add $deplibs"
	      if test "$hardcode_direct" != yes && \
		 test "$hardcode_minus_L" != yes && \
		 test "$hardcode_shlibpath_var" = yes; then
		case :$finalize_shlibpath: in
		*":$libdir:"*) ;;
		*) finalize_shlibpath="$finalize_shlibpath$libdir:" ;;
		esac
	      fi
	    fi
	  fi

	  if test "$linkmode" = prog || test "$mode" = relink; then
	    add_shlibpath=
	    add_dir=
	    add=
	    # Finalize command for both is simple: just hardcode it.
	    if test "$hardcode_direct" = yes; then
	      add="$libdir/$linklib"
	    elif test "$hardcode_minus_L" = yes; then
	      add_dir="-L$libdir"
	      add="-l$name"
	    elif test "$hardcode_shlibpath_var" = yes; then
	      case :$finalize_shlibpath: in
	      *":$libdir:"*) ;;
	      *) finalize_shlibpath="$finalize_shlibpath$libdir:" ;;
	      esac
	      add="-l$name"
	    elif test "$hardcode_automatic" = yes; then
	      if test -n "$inst_prefix_dir" &&
		 test -f "$inst_prefix_dir$libdir/$linklib" ; then
	        add="$inst_prefix_dir$libdir/$linklib"
	      else
	        add="$libdir/$linklib"
	      fi
	    else
	      # We cannot seem to hardcode it, guess we'll fake it.
	      add_dir="-L$libdir"
	      # Try looking first in the location we're being installed to.
	      if test -n "$inst_prefix_dir"; then
		case $libdir in
		  [\\/]*)
		    add_dir="$add_dir -L$inst_prefix_dir$libdir"
		    ;;
		esac
	      fi
	      add="-l$name"
	    fi

	    if test "$linkmode" = prog; then
	      test -n "$add_dir" && finalize_deplibs="$add_dir $finalize_deplibs"
	      test -n "$add" && finalize_deplibs="$add $finalize_deplibs"
	    else
	      test -n "$add_dir" && deplibs="$add_dir $deplibs"
	      test -n "$add" && deplibs="$add $deplibs"
	    fi
	  fi
	elif test "$linkmode" = prog; then
	  # Here we assume that one of hardcode_direct or hardcode_minus_L
	  # is not unsupported.  This is valid on all known static and
	  # shared platforms.
	  if test "$hardcode_direct" != unsupported; then
	    test -n "$old_library" && linklib="$old_library"
	    compile_deplibs="$dir/$linklib $compile_deplibs"
	    finalize_deplibs="$dir/$linklib $finalize_deplibs"
	  else
	    compile_deplibs="-l$name -L$dir $compile_deplibs"
	    finalize_deplibs="-l$name -L$dir $finalize_deplibs"
	  fi
	elif test "$build_libtool_libs" = yes; then
	  # Not a shared library
	  if test "$deplibs_check_method" != pass_all; then
	    # We're trying link a shared library against a static one
	    # but the system doesn't support it.

	    # Just print a warning and add the library to dependency_libs so
	    # that the program can be linked against the static library.
	    $echo
	    $echo "*** Warning: This system can not link to static lib archive $lib."
	    $echo "*** I have the capability to make that library automatically link in when"
	    $echo "*** you link to this library.  But I can only do this if you have a"
	    $echo "*** shared version of the library, which you do not appear to have."
	    if test "$module" = yes; then
	      $echo "*** But as you try to build a module library, libtool will still create "
	      $echo "*** a static module, that should work as long as the dlopening application"
	      $echo "*** is linked with the -dlopen flag to resolve symbols at runtime."
	      if test -z "$global_symbol_pipe"; then
		$echo
		$echo "*** However, this would only work if libtool was able to extract symbol"
		$echo "*** lists from a program, using \`nm' or equivalent, but libtool could"
		$echo "*** not find such a program.  So, this module is probably useless."
		$echo "*** \`nm' from GNU binutils and a full rebuild may help."
	      fi
	      if test "$build_old_libs" = no; then
		build_libtool_libs=module
		build_old_libs=yes
	      else
		build_libtool_libs=no
	      fi
	    fi
	  else
	    deplibs="$dir/$old_library $deplibs"
	    link_static=yes
	  fi
	fi # link shared/static library?

	if test "$linkmode" = lib; then
	  if test -n "$dependency_libs" &&
	     { test "$hardcode_into_libs" != yes ||
	       test "$build_old_libs" = yes ||
	       test "$link_static" = yes; }; then
	    # Extract -R from dependency_libs
	    temp_deplibs=
	    for libdir in $dependency_libs; do
	      case $libdir in
	      -R*) temp_xrpath=`$echo "X$libdir" | $Xsed -e 's/^-R//'`
		   case " $xrpath " in
		   *" $temp_xrpath "*) ;;
		   *) xrpath="$xrpath $temp_xrpath";;
		   esac;;
	      *) temp_deplibs="$temp_deplibs $libdir";;
	      esac
	    done
	    dependency_libs="$temp_deplibs"
	  fi

	  newlib_search_path="$newlib_search_path $absdir"
	  # Link against this library
	  test "$link_static" = no && newdependency_libs="$abs_ladir/$laname $newdependency_libs"
	  # ... and its dependency_libs
	  tmp_libs=
	  for deplib in $dependency_libs; do
	    newdependency_libs="$deplib $newdependency_libs"
	    if test "X$duplicate_deps" = "Xyes" ; then
	      case "$tmp_libs " in
	      *" $deplib "*) specialdeplibs="$specialdeplibs $deplib" ;;
	      esac
	    fi
	    tmp_libs="$tmp_libs $deplib"
	  done

	  if test "$link_all_deplibs" != no; then
	    # Add the search paths of all dependency libraries
	    for deplib in $dependency_libs; do
	      case $deplib in
	      -L*) path="$deplib" ;;
	      *.la)
		dir=`$echo "X$deplib" | $Xsed -e 's%/[^/]*$%%'`
		test "X$dir" = "X$deplib" && dir="."
		# We need an absolute path.
		case $dir in
		[\\/]* | [A-Za-z]:[\\/]*) absdir="$dir" ;;
		*)
		  absdir=`cd "$dir" && pwd`
		  if test -z "$absdir"; then
		    $echo "$modename: warning: cannot determine absolute directory name of \`$dir'" 1>&2
		    absdir="$dir"
		  fi
		  ;;
		esac
		if grep "^installed=no" $deplib > /dev/null; then
		  path="$absdir/$objdir"
		else
		  eval libdir=`${SED} -n -e 's/^libdir=\(.*\)$/\1/p' $deplib`
		  if test -z "$libdir"; then
		    $echo "$modename: \`$deplib' is not a valid libtool archive" 1>&2
		    exit $EXIT_FAILURE
		  fi
		  if test "$absdir" != "$libdir"; then
		    $echo "$modename: warning: \`$deplib' seems to be moved" 1>&2
		  fi
		  path="$absdir"
		fi
		depdepl=
		case $host in
		*-*-darwin*)
		  # we do not want to link against static libs,
		  # but need to link against shared
		  eval deplibrary_names=`${SED} -n -e 's/^library_names=\(.*\)$/\1/p' $deplib`
		  if test -n "$deplibrary_names" ; then
		    for tmp in $deplibrary_names ; do
		      depdepl=$tmp
		    done
		    if test -f "$path/$depdepl" ; then
		      depdepl="$path/$depdepl"
		    fi
		    # do not add paths which are already there
		    case " $newlib_search_path " in
		    *" $path "*) ;;
		    *) newlib_search_path="$newlib_search_path $path";;
		    esac
		  fi
		  path=""
		  ;;
		*)
		  path="-L$path"
		  ;;
		esac
		;;
	      -l*)
		case $host in
		*-*-darwin*)
		  # Again, we only want to link against shared libraries
		  eval tmp_libs=`$echo "X$deplib" | $Xsed -e "s,^\-l,,"`
		  for tmp in $newlib_search_path ; do
		    if test -f "$tmp/lib$tmp_libs.dylib" ; then
		      eval depdepl="$tmp/lib$tmp_libs.dylib"
		      break
		    fi
		  done
		  path=""
		  ;;
		*) continue ;;
		esac
		;;
	      *) continue ;;
	      esac
	      case " $deplibs " in
	      *" $path "*) ;;
	      *) deplibs="$path $deplibs" ;;
	      esac
	      case " $deplibs " in
	      *" $depdepl "*) ;;
	      *) deplibs="$depdepl $deplibs" ;;
	      esac
	    done
	  fi # link_all_deplibs != no
	fi # linkmode = lib
      done # for deplib in $libs
      dependency_libs="$newdependency_libs"
      if test "$pass" = dlpreopen; then
	# Link the dlpreopened libraries before other libraries
	for deplib in $save_deplibs; do
	  deplibs="$deplib $deplibs"
	done
      fi
      if test "$pass" != dlopen; then
	if test "$pass" != conv; then
	  # Make sure lib_search_path contains only unique directories.
	  lib_search_path=
	  for dir in $newlib_search_path; do
	    case "$lib_search_path " in
	    *" $dir "*) ;;
	    *) lib_search_path="$lib_search_path $dir" ;;
	    esac
	  done
	  newlib_search_path=
	fi

	if test "$linkmode,$pass" != "prog,link"; then
	  vars="deplibs"
	else
	  vars="compile_deplibs finalize_deplibs"
	fi
	for var in $vars dependency_libs; do
	  # Add libraries to $var in reverse order
	  eval tmp_libs=\"\$$var\"
	  new_libs=
	  for deplib in $tmp_libs; do
	    # FIXME: Pedantically, this is the right thing to do, so
	    #        that some nasty dependency loop isn't accidentally
	    #        broken:
	    #new_libs="$deplib $new_libs"
	    # Pragmatically, this seems to cause very few problems in
	    # practice:
	    case $deplib in
	    -L*) new_libs="$deplib $new_libs" ;;
	    -R*) ;;
	    *)
	      # And here is the reason: when a library appears more
	      # than once as an explicit dependence of a library, or
	      # is implicitly linked in more than once by the
	      # compiler, it is considered special, and multiple
	      # occurrences thereof are not removed.  Compare this
	      # with having the same library being listed as a
	      # dependency of multiple other libraries: in this case,
	      # we know (pedantically, we assume) the library does not
	      # need to be listed more than once, so we keep only the
	      # last copy.  This is not always right, but it is rare
	      # enough that we require users that really mean to play
	      # such unportable linking tricks to link the library
	      # using -Wl,-lname, so that libtool does not consider it
	      # for duplicate removal.
	      case " $specialdeplibs " in
	      *" $deplib "*) new_libs="$deplib $new_libs" ;;
	      *)
		case " $new_libs " in
		*" $deplib "*) ;;
		*) new_libs="$deplib $new_libs" ;;
		esac
		;;
	      esac
	      ;;
	    esac
	  done
	  tmp_libs=
	  for deplib in $new_libs; do
	    case $deplib in
	    -L*)
	      case " $tmp_libs " in
	      *" $deplib "*) ;;
	      *) tmp_libs="$tmp_libs $deplib" ;;
	      esac
	      ;;
	    *) tmp_libs="$tmp_libs $deplib" ;;
	    esac
	  done
	  eval $var=\"$tmp_libs\"
	done # for var
      fi
      # Last step: remove runtime libs from dependency_libs
      # (they stay in deplibs)
      tmp_libs=
      for i in $dependency_libs ; do
	case " $predeps $postdeps $compiler_lib_search_path " in
	*" $i "*)
	  i=""
	  ;;
	esac
	if test -n "$i" ; then
	  tmp_libs="$tmp_libs $i"
	fi
      done
      dependency_libs=$tmp_libs
    done # for pass
    if test "$linkmode" = prog; then
      dlfiles="$newdlfiles"
      dlprefiles="$newdlprefiles"
    fi

    case $linkmode in
    oldlib)
      if test -n "$deplibs"; then
	$echo "$modename: warning: \`-l' and \`-L' are ignored for archives" 1>&2
      fi

      if test -n "$dlfiles$dlprefiles" || test "$dlself" != no; then
	$echo "$modename: warning: \`-dlopen' is ignored for archives" 1>&2
      fi

      if test -n "$rpath"; then
	$echo "$modename: warning: \`-rpath' is ignored for archives" 1>&2
      fi

      if test -n "$xrpath"; then
	$echo "$modename: warning: \`-R' is ignored for archives" 1>&2
      fi

      if test -n "$vinfo"; then
	$echo "$modename: warning: \`-version-info/-version-number' is ignored for archives" 1>&2
      fi

      if test -n "$release"; then
	$echo "$modename: warning: \`-release' is ignored for archives" 1>&2
      fi

      if test -n "$export_symbols" || test -n "$export_symbols_regex"; then
	$echo "$modename: warning: \`-export-symbols' is ignored for archives" 1>&2
      fi

      # Now set the variables for building old libraries.
      build_libtool_libs=no
      oldlibs="$output"
      objs="$objs$old_deplibs"
      ;;

    lib)
      # Make sure we only generate libraries of the form `libNAME.la'.
      case $outputname in
      lib*)
	name=`$echo "X$outputname" | $Xsed -e 's/\.la$//' -e 's/^lib//'`
	eval shared_ext=\"$shrext_cmds\"
	eval libname=\"$libname_spec\"
	;;
      *)
	if test "$module" = no; then
	  $echo "$modename: libtool library \`$output' must begin with \`lib'" 1>&2
	  $echo "$help" 1>&2
	  exit $EXIT_FAILURE
	fi
	if test "$need_lib_prefix" != no; then
	  # Add the "lib" prefix for modules if required
	  name=`$echo "X$outputname" | $Xsed -e 's/\.la$//'`
	  eval shared_ext=\"$shrext_cmds\"
	  eval libname=\"$libname_spec\"
	else
	  libname=`$echo "X$outputname" | $Xsed -e 's/\.la$//'`
	fi
	;;
      esac

      if test -n "$objs"; then
	if test "$deplibs_check_method" != pass_all; then
	  $echo "$modename: cannot build libtool library \`$output' from non-libtool objects on this host:$objs" 2>&1
	  exit $EXIT_FAILURE
	else
	  $echo
	  $echo "*** Warning: Linking the shared library $output against the non-libtool"
	  $echo "*** objects $objs is not portable!"
	  libobjs="$libobjs $objs"
	fi
      fi

      if test "$dlself" != no; then
	$echo "$modename: warning: \`-dlopen self' is ignored for libtool libraries" 1>&2
      fi

      set dummy $rpath
      if test "$#" -gt 2; then
	$echo "$modename: warning: ignoring multiple \`-rpath's for a libtool library" 1>&2
      fi
      install_libdir="$2"

      oldlibs=
      if test -z "$rpath"; then
	if test "$build_libtool_libs" = yes; then
	  # Building a libtool convenience library.
	  # Some compilers have problems with a `.al' extension so
	  # convenience libraries should have the same extension an
	  # archive normally would.
	  oldlibs="$output_objdir/$libname.$libext $oldlibs"
	  build_libtool_libs=convenience
	  build_old_libs=yes
	fi

	if test -n "$vinfo"; then
	  $echo "$modename: warning: \`-version-info/-version-number' is ignored for convenience libraries" 1>&2
	fi

	if test -n "$release"; then
	  $echo "$modename: warning: \`-release' is ignored for convenience libraries" 1>&2
	fi
      else

	# Parse the version information argument.
	save_ifs="$IFS"; IFS=':'
	set dummy $vinfo 0 0 0
	IFS="$save_ifs"

	if test -n "$8"; then
	  $echo "$modename: too many parameters to \`-version-info'" 1>&2
	  $echo "$help" 1>&2
	  exit $EXIT_FAILURE
	fi

	# convert absolute version numbers to libtool ages
	# this retains compatibility with .la files and attempts
	# to make the code below a bit more comprehensible

	case $vinfo_number in
	yes)
	  number_major="$2"
	  number_minor="$3"
	  number_revision="$4"
	  #
	  # There are really only two kinds -- those that
	  # use the current revision as the major version
	  # and those that subtract age and use age as
	  # a minor version.  But, then there is irix
	  # which has an extra 1 added just for fun
	  #
	  case $version_type in
	  darwin|linux|osf|windows)
	    current=`expr $number_major + $number_minor`
	    age="$number_minor"
	    revision="$number_revision"
	    ;;
	  freebsd-aout|freebsd-elf|sunos)
	    current="$number_major"
	    revision="$number_minor"
	    age="0"
	    ;;
	  irix|nonstopux)
	    current=`expr $number_major + $number_minor - 1`
	    age="$number_minor"
	    revision="$number_minor"
	    ;;
	  esac
	  ;;
	no)
	  current="$2"
	  revision="$3"
	  age="$4"
	  ;;
	esac

	# Check that each of the things are valid numbers.
	case $current in
	0|[1-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-9][0-9][0-9][0-9][0-9]) ;;
	*)
	  $echo "$modename: CURRENT \`$current' must be a nonnegative integer" 1>&2
	  $echo "$modename: \`$vinfo' is not valid version information" 1>&2
	  exit $EXIT_FAILURE
	  ;;
	esac

	case $revision in
	0|[1-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-9][0-9][0-9][0-9][0-9]) ;;
	*)
	  $echo "$modename: REVISION \`$revision' must be a nonnegative integer" 1>&2
	  $echo "$modename: \`$vinfo' is not valid version information" 1>&2
	  exit $EXIT_FAILURE
	  ;;
	esac

	case $age in
	0|[1-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-9][0-9][0-9][0-9][0-9]) ;;
	*)
	  $echo "$modename: AGE \`$age' must be a nonnegative integer" 1>&2
	  $echo "$modename: \`$vinfo' is not valid version information" 1>&2
	  exit $EXIT_FAILURE
	  ;;
	esac

	if test "$age" -gt "$current"; then
	  $echo "$modename: AGE \`$age' is greater than the current interface number \`$current'" 1>&2
	  $echo "$modename: \`$vinfo' is not valid version information" 1>&2
	  exit $EXIT_FAILURE
	fi

	# Calculate the version variables.
	major=
	versuffix=
	verstring=
	case $version_type in
	none) ;;

	darwin)
	  # Like Linux, but with the current version available in
	  # verstring for coding it into the library header
	  major=.`expr $current - $age`
	  versuffix="$major.$age.$revision"
	  # Darwin ld doesn't like 0 for these options...
	  minor_current=`expr $current + 1`
	  verstring="${wl}-compatibility_version ${wl}$minor_current ${wl}-current_version ${wl}$minor_current.$revision"
	  ;;

	freebsd-aout)
	  major=".$current"
	  versuffix=".$current.$revision";
	  ;;

	freebsd-elf)
	  major=".$current"
	  versuffix=".$current";
	  ;;

	irix | nonstopux)
	  major=`expr $current - $age + 1`

	  case $version_type in
	    nonstopux) verstring_prefix=nonstopux ;;
	    *)         verstring_prefix=sgi ;;
	  esac
	  verstring="$verstring_prefix$major.$revision"

	  # Add in all the interfaces that we are compatible with.
	  loop=$revision
	  while test "$loop" -ne 0; do
	    iface=`expr $revision - $loop`
	    loop=`expr $loop - 1`
	    verstring="$verstring_prefix$major.$iface:$verstring"
	  done

	  # Before this point, $major must not contain `.'.
	  major=.$major
	  versuffix="$major.$revision"
	  ;;

	linux)
	  major=.`expr $current - $age`
	  versuffix="$major.$age.$revision"
	  ;;

	osf)
	  major=.`expr $current - $age`
	  versuffix=".$current.$age.$revision"
	  verstring="$current.$age.$revision"

	  # Add in all the interfaces that we are compatible with.
	  loop=$age
	  while test "$loop" -ne 0; do
	    iface=`expr $current - $loop`
	    loop=`expr $loop - 1`
	    verstring="$verstring:${iface}.0"
	  done

	  # Make executables depend on our current version.
	  verstring="$verstring:${current}.0"
	  ;;

	sunos)
	  major=".$current"
	  versuffix=".$current.$revision"
	  ;;

	windows)
	  # Use '-' rather than '.', since we only want one
	  # extension on DOS 8.3 filesystems.
	  major=`expr $current - $age`
	  versuffix="-$major"
	  ;;

	*)
	  $echo "$modename: unknown library version type \`$version_type'" 1>&2
	  $echo "Fatal configuration error.  See the $PACKAGE docs for more information." 1>&2
	  exit $EXIT_FAILURE
	  ;;
	esac

	# Clear the version info if we defaulted, and they specified a release.
	if test -z "$vinfo" && test -n "$release"; then
	  major=
	  case $version_type in
	  darwin)
	    # we can't check for "0.0" in archive_cmds due to quoting
	    # problems, so we reset it completely
	    verstring=
	    ;;
	  *)
	    verstring="0.0"
	    ;;
	  esac
	  if test "$need_version" = no; then
	    versuffix=
	  else
	    versuffix=".0.0"
	  fi
	fi

	# Remove version info from name if versioning should be avoided
	if test "$avoid_version" = yes && test "$need_version" = no; then
	  major=
	  versuffix=
	  verstring=""
	fi

	# Check to see if the archive will have undefined symbols.
	if test "$allow_undefined" = yes; then
	  if test "$allow_undefined_flag" = unsupported; then
	    $echo "$modename: warning: undefined symbols not allowed in $host shared libraries" 1>&2
	    build_libtool_libs=no
	    build_old_libs=yes
	  fi
	else
	  # Don't allow undefined symbols.
	  allow_undefined_flag="$no_undefined_flag"
	fi
      fi

      if test "$mode" != relink; then
	# Remove our outputs, but don't remove object files since they
	# may have been created when compiling PIC objects.
	removelist=
	tempremovelist=`$echo "$output_objdir/*"`
	for p in $tempremovelist; do
	  case $p in
	    *.$objext)
	       ;;
	    $output_objdir/$outputname | $output_objdir/$libname.* | $output_objdir/${libname}${release}.*)
	       if test "X$precious_files_regex" != "X"; then
	         if echo $p | $EGREP -e "$precious_files_regex" >/dev/null 2>&1
	         then
		   continue
		 fi
	       fi
	       removelist="$removelist $p"
	       ;;
	    *) ;;
	  esac
	done
	if test -n "$removelist"; then
	  $show "${rm}r $removelist"
	  $run ${rm}r $removelist
	fi
      fi

      # Now set the variables for building old libraries.
      if test "$build_old_libs" = yes && test "$build_libtool_libs" != convenience ; then
	oldlibs="$oldlibs $output_objdir/$libname.$libext"

	# Transform .lo files to .o files.
	oldobjs="$objs "`$echo "X$libobjs" | $SP2NL | $Xsed -e '/\.'${libext}'$/d' -e "$lo2o" | $NL2SP`
      fi

      # Eliminate all temporary directories.
      for path in $notinst_path; do
	lib_search_path=`$echo "$lib_search_path " | ${SED} -e "s% $path % %g"`
	deplibs=`$echo "$deplibs " | ${SED} -e "s% -L$path % %g"`
	dependency_libs=`$echo "$dependency_libs " | ${SED} -e "s% -L$path % %g"`
      done

      if test -n "$xrpath"; then
	# If the user specified any rpath flags, then add them.
	temp_xrpath=
	for libdir in $xrpath; do
	  temp_xrpath="$temp_xrpath -R$libdir"
	  case "$finalize_rpath " in
	  *" $libdir "*) ;;
	  *) finalize_rpath="$finalize_rpath $libdir" ;;
	  esac
	done
	if test "$hardcode_into_libs" != yes || test "$build_old_libs" = yes; then
	  dependency_libs="$temp_xrpath $dependency_libs"
	fi
      fi

      # Make sure dlfiles contains only unique files that won't be dlpreopened
      old_dlfiles="$dlfiles"
      dlfiles=
      for lib in $old_dlfiles; do
	case " $dlprefiles $dlfiles " in
	*" $lib "*) ;;
	*) dlfiles="$dlfiles $lib" ;;
	esac
      done

      # Make sure dlprefiles contains only unique files
      old_dlprefiles="$dlprefiles"
      dlprefiles=
      for lib in $old_dlprefiles; do
	case "$dlprefiles " in
	*" $lib "*) ;;
	*) dlprefiles="$dlprefiles $lib" ;;
	esac
      done

      if test "$build_libtool_libs" = yes; then
	if test -n "$rpath"; then
	  case $host in
	  *-*-cygwin* | *-*-mingw* | *-*-pw32* | *-*-os2* | *-*-beos*)
	    # these systems don't actually have a c library (as such)!
	    ;;
	  *-*-rhapsody* | *-*-darwin1.[012])
	    # Rhapsody C library is in the System framework
	    deplibs="$deplibs -framework System"
	    ;;
	  *-*-netbsd*)
	    # Don't link with libc until the a.out ld.so is fixed.
	    ;;
	  *-*-openbsd* | *-*-freebsd* | *-*-dragonfly*)
	    # Do not include libc due to us having libc/libc_r.
	    ;;
	  *-*-sco3.2v5* | *-*-sco5v6*)
	    # Causes problems with __ctype
	    ;;
	  *-*-sysv4.2uw2* | *-*-sysv5* | *-*-unixware* | *-*-OpenUNIX*)
	    # Compiler inserts libc in the correct place for threads to work
	    ;;
 	  *)
	    # Add libc to deplibs on all other systems if necessary.
	    if test "$build_libtool_need_lc" = "yes"; then
	      deplibs="$deplibs -lc"
	    fi
	    ;;
	  esac
	fi

	# Transform deplibs into only deplibs that can be linked in shared.
	name_save=$name
	libname_save=$libname
	release_save=$release
	versuffix_save=$versuffix
	major_save=$major
	# I'm not sure if I'm treating the release correctly.  I think
	# release should show up in the -l (ie -lgmp5) so we don't want to
	# add it in twice.  Is that correct?
	release=""
	versuffix=""
	major=""
	newdeplibs=
	droppeddeps=no
	case $deplibs_check_method in
	pass_all)
	  # Don't check for shared/static.  Everything works.
	  # This might be a little naive.  We might want to check
	  # whether the library exists or not.  But this is on
	  # osf3 & osf4 and I'm not really sure... Just
	  # implementing what was already the behavior.
	  newdeplibs=$deplibs
	  ;;
	test_compile)
	  # This code stresses the "libraries are programs" paradigm to its
	  # limits. Maybe even breaks it.  We compile a program, linking it
	  # against the deplibs as a proxy for the library.  Then we can check
	  # whether they linked in statically or dynamically with ldd.
	  $rm conftest.c
	  cat > conftest.c <<EOF
	  int main() { return 0; }
EOF
	  $rm conftest
	  $LTCC $LTCFLAGS -o conftest conftest.c $deplibs
	  if test "$?" -eq 0 ; then
	    ldd_output=`ldd conftest`
	    for i in $deplibs; do
	      name=`expr $i : '-l\(.*\)'`
	      # If $name is empty we are operating on a -L argument.
              if test "$name" != "" && test "$name" -ne "0"; then
		if test "X$allow_libtool_libs_with_static_runtimes" = "Xyes" ; then
		  case " $predeps $postdeps " in
		  *" $i "*)
		    newdeplibs="$newdeplibs $i"
		    i=""
		    ;;
		  esac
	        fi
		if test -n "$i" ; then
		  libname=`eval \\$echo \"$libname_spec\"`
		  deplib_matches=`eval \\$echo \"$library_names_spec\"`
		  set dummy $deplib_matches
		  deplib_match=$2
		  if test `expr "$ldd_output" : ".*$deplib_match"` -ne 0 ; then
		    newdeplibs="$newdeplibs $i"
		  else
		    droppeddeps=yes
		    $echo
		    $echo "*** Warning: dynamic linker does not accept needed library $i."
		    $echo "*** I have the capability to make that library automatically link in when"
		    $echo "*** you link to this library.  But I can only do this if you have a"
		    $echo "*** shared version of the library, which I believe you do not have"
		    $echo "*** because a test_compile did reveal that the linker did not use it for"
		    $echo "*** its dynamic dependency list that programs get resolved with at runtime."
		  fi
		fi
	      else
		newdeplibs="$newdeplibs $i"
	      fi
	    done
	  else
	    # Error occurred in the first compile.  Let's try to salvage
	    # the situation: Compile a separate program for each library.
	    for i in $deplibs; do
	      name=`expr $i : '-l\(.*\)'`
	      # If $name is empty we are operating on a -L argument.
              if test "$name" != "" && test "$name" != "0"; then
		$rm conftest
		$LTCC $LTCFLAGS -o conftest conftest.c $i
		# Did it work?
		if test "$?" -eq 0 ; then
		  ldd_output=`ldd conftest`
		  if test "X$allow_libtool_libs_with_static_runtimes" = "Xyes" ; then
		    case " $predeps $postdeps " in
		    *" $i "*)
		      newdeplibs="$newdeplibs $i"
		      i=""
		      ;;
		    esac
		  fi
		  if test -n "$i" ; then
		    libname=`eval \\$echo \"$libname_spec\"`
		    deplib_matches=`eval \\$echo \"$library_names_spec\"`
		    set dummy $deplib_matches
		    deplib_match=$2
		    if test `expr "$ldd_output" : ".*$deplib_match"` -ne 0 ; then
		      newdeplibs="$newdeplibs $i"
		    else
		      droppeddeps=yes
		      $echo
		      $echo "*** Warning: dynamic linker does not accept needed library $i."
		      $echo "*** I have the capability to make that library automatically link in when"
		      $echo "*** you link to this library.  But I can only do this if you have a"
		      $echo "*** shared version of the library, which you do not appear to have"
		      $echo "*** because a test_compile did reveal that the linker did not use this one"
		      $echo "*** as a dynamic dependency that programs can get resolved with at runtime."
		    fi
		  fi
		else
		  droppeddeps=yes
		  $echo
		  $echo "*** Warning!  Library $i is needed by this library but I was not able to"
		  $echo "***  make it link in!  You will probably need to install it or some"
		  $echo "*** library that it depends on before this library will be fully"
		  $echo "*** functional.  Installing it before continuing would be even better."
		fi
	      else
		newdeplibs="$newdeplibs $i"
	      fi
	    done
	  fi
	  ;;
	file_magic*)
	  set dummy $deplibs_check_method
	  file_magic_regex=`expr "$deplibs_check_method" : "$2 \(.*\)"`
	  for a_deplib in $deplibs; do
	    name=`expr $a_deplib : '-l\(.*\)'`
	    # If $name is empty we are operating on a -L argument.
            if test "$name" != "" && test  "$name" != "0"; then
	      if test "X$allow_libtool_libs_with_static_runtimes" = "Xyes" ; then
		case " $predeps $postdeps " in
		*" $a_deplib "*)
		  newdeplibs="$newdeplibs $a_deplib"
		  a_deplib=""
		  ;;
		esac
	      fi
	      if test -n "$a_deplib" ; then
		libname=`eval \\$echo \"$libname_spec\"`
		for i in $lib_search_path $sys_lib_search_path $shlib_search_path; do
		  potential_libs=`ls $i/$libname[.-]* 2>/dev/null`
		  for potent_lib in $potential_libs; do
		      # Follow soft links.
		      if ls -lLd "$potent_lib" 2>/dev/null \
			 | grep " -> " >/dev/null; then
			continue
		      fi
		      # The statement above tries to avoid entering an
		      # endless loop below, in case of cyclic links.
		      # We might still enter an endless loop, since a link
		      # loop can be closed while we follow links,
		      # but so what?
		      potlib="$potent_lib"
		      while test -h "$potlib" 2>/dev/null; do
			potliblink=`ls -ld $potlib | ${SED} 's/.* -> //'`
			case $potliblink in
			[\\/]* | [A-Za-z]:[\\/]*) potlib="$potliblink";;
			*) potlib=`$echo "X$potlib" | $Xsed -e 's,[^/]*$,,'`"$potliblink";;
			esac
		      done
		      if eval $file_magic_cmd \"\$potlib\" 2>/dev/null \
			 | ${SED} 10q \
			 | $EGREP "$file_magic_regex" > /dev/null; then
			newdeplibs="$newdeplibs $a_deplib"
			a_deplib=""
			break 2
		      fi
		  done
		done
	      fi
	      if test -n "$a_deplib" ; then
		droppeddeps=yes
		$echo
		$echo "*** Warning: linker path does not have real file for library $a_deplib."
		$echo "*** I have the capability to make that library automatically link in when"
		$echo "*** you link to this library.  But I can only do this if you have a"
		$echo "*** shared version of the library, which you do not appear to have"
		$echo "*** because I did check the linker path looking for a file starting"
		if test -z "$potlib" ; then
		  $echo "*** with $libname but no candidates were found. (...for file magic test)"
		else
		  $echo "*** with $libname and none of the candidates passed a file format test"
		  $echo "*** using a file magic. Last file checked: $potlib"
		fi
	      fi
	    else
	      # Add a -L argument.
	      newdeplibs="$newdeplibs $a_deplib"
	    fi
	  done # Gone through all deplibs.
	  ;;
	match_pattern*)
	  set dummy $deplibs_check_method
	  match_pattern_regex=`expr "$deplibs_check_method" : "$2 \(.*\)"`
	  for a_deplib in $deplibs; do
	    name=`expr $a_deplib : '-l\(.*\)'`
	    # If $name is empty we are operating on a -L argument.
	    if test -n "$name" && test "$name" != "0"; then
	      if test "X$allow_libtool_libs_with_static_runtimes" = "Xyes" ; then
		case " $predeps $postdeps " in
		*" $a_deplib "*)
		  newdeplibs="$newdeplibs $a_deplib"
		  a_deplib=""
		  ;;
		esac
	      fi
	      if test -n "$a_deplib" ; then
		libname=`eval \\$echo \"$libname_spec\"`
		for i in $lib_search_path $sys_lib_search_path $shlib_search_path; do
		  potential_libs=`ls $i/$libname[.-]* 2>/dev/null`
		  for potent_lib in $potential_libs; do
		    potlib="$potent_lib" # see symlink-check above in file_magic test
		    if eval $echo \"$potent_lib\" 2>/dev/null \
		        | ${SED} 10q \
		        | $EGREP "$match_pattern_regex" > /dev/null; then
		      newdeplibs="$newdeplibs $a_deplib"
		      a_deplib=""
		      break 2
		    fi
		  done
		done
	      fi
	      if test -n "$a_deplib" ; then
		droppeddeps=yes
		$echo
		$echo "*** Warning: linker path does not have real file for library $a_deplib."
		$echo "*** I have the capability to make that library automatically link in when"
		$echo "*** you link to this library.  But I can only do this if you have a"
		$echo "*** shared version of the library, which you do not appear to have"
		$echo "*** because I did check the linker path looking for a file starting"
		if test -z "$potlib" ; then
		  $echo "*** with $libname but no candidates were found. (...for regex pattern test)"
		else
		  $echo "*** with $libname and none of the candidates passed a file format test"
		  $echo "*** using a regex pattern. Last file checked: $potlib"
		fi
	      fi
	    else
	      # Add a -L argument.
	      newdeplibs="$newdeplibs $a_deplib"
	    fi
	  done # Gone through all deplibs.
	  ;;
	none | unknown | *)
	  newdeplibs=""
	  tmp_deplibs=`$echo "X $deplibs" | $Xsed -e 's/ -lc$//' \
	    -e 's/ -[LR][^ ]*//g'`
	  if test "X$allow_libtool_libs_with_static_runtimes" = "Xyes" ; then
	    for i in $predeps $postdeps ; do
	      # can't use Xsed below, because $i might contain '/'
	      tmp_deplibs=`$echo "X $tmp_deplibs" | ${SED} -e "1s,^X,," -e "s,$i,,"`
	    done
	  fi
	  if $echo "X $tmp_deplibs" | $Xsed -e 's/[ 	]//g' \
	    | grep . >/dev/null; then
	    $echo
	    if test "X$deplibs_check_method" = "Xnone"; then
	      $echo "*** Warning: inter-library dependencies are not supported in this platform."
	    else
	      $echo "*** Warning: inter-library dependencies are not known to be supported."
	    fi
	    $echo "*** All declared inter-library dependencies are being dropped."
	    droppeddeps=yes
	  fi
	  ;;
	esac
	versuffix=$versuffix_save
	major=$major_save
	release=$release_save
	libname=$libname_save
	name=$name_save

	case $host in
	*-*-rhapsody* | *-*-darwin1.[012])
	  # On Rhapsody replace the C library is the System framework
	  newdeplibs=`$echo "X $newdeplibs" | $Xsed -e 's/ -lc / -framework System /'`
	  ;;
	esac

	if test "$droppeddeps" = yes; then
	  if test "$module" = yes; then
	    $echo
	    $echo "*** Warning: libtool could not satisfy all declared inter-library"
	    $echo "*** dependencies of module $libname.  Therefore, libtool will create"
	    $echo "*** a static module, that should work as long as the dlopening"
	    $echo "*** application is linked with the -dlopen flag."
	    if test -z "$global_symbol_pipe"; then
	      $echo
	      $echo "*** However, this would only work if libtool was able to extract symbol"
	      $echo "*** lists from a program, using \`nm' or equivalent, but libtool could"
	      $echo "*** not find such a program.  So, this module is probably useless."
	      $echo "*** \`nm' from GNU binutils and a full rebuild may help."
	    fi
	    if test "$build_old_libs" = no; then
	      oldlibs="$output_objdir/$libname.$libext"
	      build_libtool_libs=module
	      build_old_libs=yes
	    else
	      build_libtool_libs=no
	    fi
	  else
	    $echo "*** The inter-library dependencies that have been dropped here will be"
	    $echo "*** automatically added whenever a program is linked with this library"
	    $echo "*** or is declared to -dlopen it."

	    if test "$allow_undefined" = no; then
	      $echo
	      $echo "*** Since this library must not contain undefined symbols,"
	      $echo "*** because either the platform does not support them or"
	      $echo "*** it was explicitly requested with -no-undefined,"
	      $echo "*** libtool will only create a static version of it."
	      if test "$build_old_libs" = no; then
		oldlibs="$output_objdir/$libname.$libext"
		build_libtool_libs=module
		build_old_libs=yes
	      else
		build_libtool_libs=no
	      fi
	    fi
	  fi
	fi
	# Done checking deplibs!
	deplibs=$newdeplibs
      fi


      # move library search paths that coincide with paths to not yet
      # installed libraries to the beginning of the library search list
      new_libs=
      for path in $notinst_path; do
	case " $new_libs " in
	*" -L$path/$objdir "*) ;;
	*)
	  case " $deplibs " in
	  *" -L$path/$objdir "*)
	    new_libs="$new_libs -L$path/$objdir" ;;
	  esac
	  ;;
	esac
      done
      for deplib in $deplibs; do
	case $deplib in
	-L*)
	  case " $new_libs " in
	  *" $deplib "*) ;;
	  *) new_libs="$new_libs $deplib" ;;
	  esac
	  ;;
	*) new_libs="$new_libs $deplib" ;;
	esac
      done
      deplibs="$new_libs"


      # All the library-specific variables (install_libdir is set above).
      library_names=
      old_library=
      dlname=

      # Test again, we may have decided not to build it any more
      if test "$build_libtool_libs" = yes; then
	if test "$hardcode_into_libs" = yes; then
	  # Hardcode the library paths
	  hardcode_libdirs=
	  dep_rpath=
	  rpath="$finalize_rpath"
	  test "$mode" != relink && rpath="$compile_rpath$rpath"
	  for libdir in $rpath; do
	    if test -n "$hardcode_libdir_flag_spec"; then
	      if test -n "$hardcode_libdir_separator"; then
		if test -z "$hardcode_libdirs"; then
		  hardcode_libdirs="$libdir"
		else
		  # Just accumulate the unique libdirs.
		  case $hardcode_libdir_separator$hardcode_libdirs$hardcode_libdir_separator in
		  *"$hardcode_libdir_separator$libdir$hardcode_libdir_separator"*)
		    ;;
		  *)
		    hardcode_libdirs="$hardcode_libdirs$hardcode_libdir_separator$libdir"
		    ;;
		  esac
		fi
	      else
		eval flag=\"$hardcode_libdir_flag_spec\"
		dep_rpath="$dep_rpath $flag"
	      fi
	    elif test -n "$runpath_var"; then
	      case "$perm_rpath " in
	      *" $libdir "*) ;;
	      *) perm_rpath="$perm_rpath $libdir" ;;
	      esac
	    fi
	  done
	  # Substitute the hardcoded libdirs into the rpath.
	  if test -n "$hardcode_libdir_separator" &&
	     test -n "$hardcode_libdirs"; then
	    libdir="$hardcode_libdirs"
	    if test -n "$hardcode_libdir_flag_spec_ld"; then
	      eval dep_rpath=\"$hardcode_libdir_flag_spec_ld\"
	    else
	      eval dep_rpath=\"$hardcode_libdir_flag_spec\"
	    fi
	  fi
	  if test -n "$runpath_var" && test -n "$perm_rpath"; then
	    # We should set the runpath_var.
	    rpath=
	    for dir in $perm_rpath; do
	      rpath="$rpath$dir:"
	    done
	    eval "$runpath_var='$rpath\$$runpath_var'; export $runpath_var"
	  fi
	  test -n "$dep_rpath" && deplibs="$dep_rpath $deplibs"
	fi

	shlibpath="$finalize_shlibpath"
	test "$mode" != relink && shlibpath="$compile_shlibpath$shlibpath"
	if test -n "$shlibpath"; then
	  eval "$shlibpath_var='$shlibpath\$$shlibpath_var'; export $shlibpath_var"
	fi

	# Get the real and link names of the library.
	eval shared_ext=\"$shrext_cmds\"
	eval library_names=\"$library_names_spec\"
	set dummy $library_names
	realname="$2"
	shift; shift

	if test -n "$soname_spec"; then
	  eval soname=\"$soname_spec\"
	else
	  soname="$realname"
	fi
	if test -z "$dlname"; then
	  dlname=$soname
	fi

	lib="$output_objdir/$realname"
	linknames=
	for link
	do
	  linknames="$linknames $link"
	done

	# Use standard objects if they are pic
	test -z "$pic_flag" && libobjs=`$echo "X$libobjs" | $SP2NL | $Xsed -e "$lo2o" | $NL2SP`

	# Prepare the list of exported symbols
	if test -z "$export_symbols"; then
	  if test "$always_export_symbols" = yes || test -n "$export_symbols_regex"; then
	    $show "generating symbol list for \`$libname.la'"
	    export_symbols="$output_objdir/$libname.exp"
	    $run $rm $export_symbols
	    cmds=$export_symbols_cmds
	    save_ifs="$IFS"; IFS='~'
	    for cmd in $cmds; do
	      IFS="$save_ifs"
	      eval cmd=\"$cmd\"
	      if len=`expr "X$cmd" : ".*"` &&
	       test "$len" -le "$max_cmd_len" || test "$max_cmd_len" -le -1; then
	        $show "$cmd"
	        $run eval "$cmd" || exit $?
	        skipped_export=false
	      else
	        # The command line is too long to execute in one step.
	        $show "using reloadable object file for export list..."
	        skipped_export=:
		# Break out early, otherwise skipped_export may be
		# set to false by a later but shorter cmd.
		break
	      fi
	    done
	    IFS="$save_ifs"
	    if test -n "$export_symbols_regex"; then
	      $show "$EGREP -e \"$export_symbols_regex\" \"$export_symbols\" > \"${export_symbols}T\""
	      $run eval '$EGREP -e "$export_symbols_regex" "$export_symbols" > "${export_symbols}T"'
	      $show "$mv \"${export_symbols}T\" \"$export_symbols\""
	      $run eval '$mv "${export_symbols}T" "$export_symbols"'
	    fi
	  fi
	fi

	if test -n "$export_symbols" && test -n "$include_expsyms"; then
	  $run eval '$echo "X$include_expsyms" | $SP2NL >> "$export_symbols"'
	fi

	tmp_deplibs=
	for test_deplib in $deplibs; do
		case " $convenience " in
		*" $test_deplib "*) ;;
		*)
			tmp_deplibs="$tmp_deplibs $test_deplib"
			;;
		esac
	done
	deplibs="$tmp_deplibs"

	if test -n "$convenience"; then
	  if test -n "$whole_archive_flag_spec"; then
	    save_libobjs=$libobjs
	    eval libobjs=\"\$libobjs $whole_archive_flag_spec\"
	  else
	    gentop="$output_objdir/${outputname}x"
	    generated="$generated $gentop"

	    func_extract_archives $gentop $convenience
	    libobjs="$libobjs $func_extract_archives_result"
	  fi
	fi
	
	if test "$thread_safe" = yes && test -n "$thread_safe_flag_spec"; then
	  eval flag=\"$thread_safe_flag_spec\"
	  linker_flags="$linker_flags $flag"
	fi

	# Make a backup of the uninstalled library when relinking
	if test "$mode" = relink; then
	  $run eval '(cd $output_objdir && $rm ${realname}U && $mv $realname ${realname}U)' || exit $?
	fi

	# Do each of the archive commands.
	if test "$module" = yes && test -n "$module_cmds" ; then
	  if test -n "$export_symbols" && test -n "$module_expsym_cmds"; then
	    eval test_cmds=\"$module_expsym_cmds\"
	    cmds=$module_expsym_cmds
	  else
	    eval test_cmds=\"$module_cmds\"
	    cmds=$module_cmds
	  fi
	else
	if test -n "$export_symbols" && test -n "$archive_expsym_cmds"; then
	  eval test_cmds=\"$archive_expsym_cmds\"
	  cmds=$archive_expsym_cmds
	else
	  eval test_cmds=\"$archive_cmds\"
	  cmds=$archive_cmds
	  fi
	fi

	if test "X$skipped_export" != "X:" &&
	   len=`expr "X$test_cmds" : ".*" 2>/dev/null` &&
	   test "$len" -le "$max_cmd_len" || test "$max_cmd_len" -le -1; then
	  :
	else
	  # The command line is too long to link in one step, link piecewise.
	  $echo "creating reloadable object files..."

	  # Save the value of $output and $libobjs because we want to
	  # use them later.  If we have whole_archive_flag_spec, we
	  # want to use save_libobjs as it was before
	  # whole_archive_flag_spec was expanded, because we can't
	  # assume the linker understands whole_archive_flag_spec.
	  # This may have to be revisited, in case too many
	  # convenience libraries get linked in and end up exceeding
	  # the spec.
	  if test -z "$convenience" || test -z "$whole_archive_flag_spec"; then
	    save_libobjs=$libobjs
	  fi
	  save_output=$output
	  output_la=`$echo "X$output" | $Xsed -e "$basename"`

	  # Clear the reloadable object creation command queue and
	  # initialize k to one.
	  test_cmds=
	  concat_cmds=
	  objlist=
	  delfiles=
	  last_robj=
	  k=1
	  output=$output_objdir/$output_la-${k}.$objext
	  # Loop over the list of objects to be linked.
	  for obj in $save_libobjs
	  do
	    eval test_cmds=\"$reload_cmds $objlist $last_robj\"
	    if test "X$objlist" = X ||
	       { len=`expr "X$test_cmds" : ".*" 2>/dev/null` &&
		 test "$len" -le "$max_cmd_len"; }; then
	      objlist="$objlist $obj"
	    else
	      # The command $test_cmds is almost too long, add a
	      # command to the queue.
	      if test "$k" -eq 1 ; then
		# The first file doesn't have a previous command to add.
		eval concat_cmds=\"$reload_cmds $objlist $last_robj\"
	      else
		# All subsequent reloadable object files will link in
		# the last one created.
		eval concat_cmds=\"\$concat_cmds~$reload_cmds $objlist $last_robj\"
	      fi
	      last_robj=$output_objdir/$output_la-${k}.$objext
	      k=`expr $k + 1`
	      output=$output_objdir/$output_la-${k}.$objext
	      objlist=$obj
	      len=1
	    fi
	  done
	  # Handle the remaining objects by creating one last
	  # reloadable object file.  All subsequent reloadable object
	  # files will link in the last one created.
	  test -z "$concat_cmds" || concat_cmds=$concat_cmds~
	  eval concat_cmds=\"\${concat_cmds}$reload_cmds $objlist $last_robj\"

	  if ${skipped_export-false}; then
	    $show "generating symbol list for \`$libname.la'"
	    export_symbols="$output_objdir/$libname.exp"
	    $run $rm $export_symbols
	    libobjs=$output
	    # Append the command to create the export file.
	    eval concat_cmds=\"\$concat_cmds~$export_symbols_cmds\"
          fi

	  # Set up a command to remove the reloadable object files
	  # after they are used.
	  i=0
	  while test "$i" -lt "$k"
	  do
	    i=`expr $i + 1`
	    delfiles="$delfiles $output_objdir/$output_la-${i}.$objext"
	  done

	  $echo "creating a temporary reloadable object file: $output"

	  # Loop through the commands generated above and execute them.
	  save_ifs="$IFS"; IFS='~'
	  for cmd in $concat_cmds; do
	    IFS="$save_ifs"
	    $show "$cmd"
	    $run eval "$cmd" || exit $?
	  done
	  IFS="$save_ifs"

	  libobjs=$output
	  # Restore the value of output.
	  output=$save_output

	  if test -n "$convenience" && test -n "$whole_archive_flag_spec"; then
	    eval libobjs=\"\$libobjs $whole_archive_flag_spec\"
	  fi
	  # Expand the library linking commands again to reset the
	  # value of $libobjs for piecewise linking.

	  # Do each of the archive commands.
	  if test "$module" = yes && test -n "$module_cmds" ; then
	    if test -n "$export_symbols" && test -n "$module_expsym_cmds"; then
	      cmds=$module_expsym_cmds
	    else
	      cmds=$module_cmds
	    fi
	  else
	  if test -n "$export_symbols" && test -n "$archive_expsym_cmds"; then
	    cmds=$archive_expsym_cmds
	  else
	    cmds=$archive_cmds
	    fi
	  fi

	  # Append the command to remove the reloadable object files
	  # to the just-reset $cmds.
	  eval cmds=\"\$cmds~\$rm $delfiles\"
	fi
	save_ifs="$IFS"; IFS='~'
	for cmd in $cmds; do
	  IFS="$save_ifs"
	  eval cmd=\"$cmd\"
	  $show "$cmd"
	  $run eval "$cmd" || {
	    lt_exit=$?

	    # Restore the uninstalled library and exit
	    if test "$mode" = relink; then
	      $run eval '(cd $output_objdir && $rm ${realname}T && $mv ${realname}U $realname)'
	    fi

	    exit $lt_exit
	  }
	done
	IFS="$save_ifs"

	# Restore the uninstalled library and exit
	if test "$mode" = relink; then
	  $run eval '(cd $output_objdir && $rm ${realname}T && $mv $realname ${realname}T && $mv "$realname"U $realname)' || exit $?

	  if test -n "$convenience"; then
	    if test -z "$whole_archive_flag_spec"; then
	      $show "${rm}r $gentop"
	      $run ${rm}r "$gentop"
	    fi
	  fi

	  exit $EXIT_SUCCESS
	fi

	# Create links to the real library.
	for linkname in $linknames; do
	  if test "$realname" != "$linkname"; then
	    $show "(cd $output_objdir && $rm $linkname && $LN_S $realname $linkname)"
	    $run eval '(cd $output_objdir && $rm $linkname && $LN_S $realname $linkname)' || exit $?
	  fi
	done

	# If -module or -export-dynamic was specified, set the dlname.
	if test "$module" = yes || test "$export_dynamic" = yes; then
	  # On all known operating systems, these are identical.
	  dlname="$soname"
	fi
      fi
      ;;

    obj)
      if test -n "$deplibs"; then
	$echo "$modename: warning: \`-l' and \`-L' are ignored for objects" 1>&2
      fi

      if test -n "$dlfiles$dlprefiles" || test "$dlself" != no; then
	$echo "$modename: warning: \`-dlopen' is ignored for objects" 1>&2
      fi

      if test -n "$rpath"; then
	$echo "$modename: warning: \`-rpath' is ignored for objects" 1>&2
      fi

      if test -n "$xrpath"; then
	$echo "$modename: warning: \`-R' is ignored for objects" 1>&2
      fi

      if test -n "$vinfo"; then
	$echo "$modename: warning: \`-version-info' is ignored for objects" 1>&2
      fi

      if test -n "$release"; then
	$echo "$modename: warning: \`-release' is ignored for objects" 1>&2
      fi

      case $output in
      *.lo)
	if test -n "$objs$old_deplibs"; then
	  $echo "$modename: cannot build library object \`$output' from non-libtool objects" 1>&2
	  exit $EXIT_FAILURE
	fi
	libobj="$output"
	obj=`$echo "X$output" | $Xsed -e "$lo2o"`
	;;
      *)
	libobj=
	obj="$output"
	;;
      esac

      # Delete the old objects.
      $run $rm $obj $libobj

      # Objects from convenience libraries.  This assumes
      # single-version convenience libraries.  Whenever we create
      # different ones for PIC/non-PIC, this we'll have to duplicate
      # the extraction.
      reload_conv_objs=
      gentop=
      # reload_cmds runs $LD directly, so let us get rid of
      # -Wl from whole_archive_flag_spec
      wl=

      if test -n "$convenience"; then
	if test -n "$whole_archive_flag_spec"; then
	  eval reload_conv_objs=\"\$reload_objs $whole_archive_flag_spec\"
	else
	  gentop="$output_objdir/${obj}x"
	  generated="$generated $gentop"

	  func_extract_archives $gentop $convenience
	  reload_conv_objs="$reload_objs $func_extract_archives_result"
	fi
      fi

      # Create the old-style object.
      reload_objs="$objs$old_deplibs "`$echo "X$libobjs" | $SP2NL | $Xsed -e '/\.'${libext}$'/d' -e '/\.lib$/d' -e "$lo2o" | $NL2SP`" $reload_conv_objs" ### testsuite: skip nested quoting test

      output="$obj"
      cmds=$reload_cmds
      save_ifs="$IFS"; IFS='~'
      for cmd in $cmds; do
	IFS="$save_ifs"
	eval cmd=\"$cmd\"
	$show "$cmd"
	$run eval "$cmd" || exit $?
      done
      IFS="$save_ifs"

      # Exit if we aren't doing a library object file.
      if test -z "$libobj"; then
	if test -n "$gentop"; then
	  $show "${rm}r $gentop"
	  $run ${rm}r $gentop
	fi

	exit $EXIT_SUCCESS
      fi

      if test "$build_libtool_libs" != yes; then
	if test -n "$gentop"; then
	  $show "${rm}r $gentop"
	  $run ${rm}r $gentop
	fi

	# Create an invalid libtool object if no PIC, so that we don't
	# accidentally link it into a program.
	# $show "echo timestamp > $libobj"
	# $run eval "echo timestamp > $libobj" || exit $?
	exit $EXIT_SUCCESS
      fi

      if test -n "$pic_flag" || test "$pic_mode" != default; then
	# Only do commands if we really have different PIC objects.
	reload_objs="$libobjs $reload_conv_objs"
	output="$libobj"
	cmds=$reload_cmds
	save_ifs="$IFS"; IFS='~'
	for cmd in $cmds; do
	  IFS="$save_ifs"
	  eval cmd=\"$cmd\"
	  $show "$cmd"
	  $run eval "$cmd" || exit $?
	done
	IFS="$save_ifs"
      fi

      if test -n "$gentop"; then
	$show "${rm}r $gentop"
	$run ${rm}r $gentop
      fi

      exit $EXIT_SUCCESS
      ;;

    prog)
      case $host in
	*cygwin*) output=`$echo $output | ${SED} -e 's,.exe$,,;s,$,.exe,'` ;;
      esac
      if test -n "$vinfo"; then
	$echo "$modename: warning: \`-version-info' is ignored for programs" 1>&2
      fi

      if test -n "$release"; then
	$echo "$modename: warning: \`-release' is ignored for programs" 1>&2
      fi

      if test "$preload" = yes; then
	if test "$dlopen_support" = unknown && test "$dlopen_self" = unknown &&
	   test "$dlopen_self_static" = unknown; then
	  $echo "$modename: warning: \`AC_LIBTOOL_DLOPEN' not used. Assuming no dlopen support."
	fi
      fi

      case $host in
      *-*-rhapsody* | *-*-darwin1.[012])
	# On Rhapsody replace the C library is the System framework
	compile_deplibs=`$echo "X $compile_deplibs" | $Xsed -e 's/ -lc / -framework System /'`
	finalize_deplibs=`$echo "X $finalize_deplibs" | $Xsed -e 's/ -lc / -framework System /'`
	;;
      esac

      case $host in
      *darwin*)
        # Don't allow lazy linking, it breaks C++ global constructors
        if test "$tagname" = CXX ; then
        compile_command="$compile_command ${wl}-bind_at_load"
        finalize_command="$finalize_command ${wl}-bind_at_load"
        fi
        ;;
      esac


      # move library search paths that coincide with paths to not yet
      # installed libraries to the beginning of the library search list
      new_libs=
      for path in $notinst_path; do
	case " $new_libs " in
	*" -L$path/$objdir "*) ;;
	*)
	  case " $compile_deplibs " in
	  *" -L$path/$objdir "*)
	    new_libs="$new_libs -L$path/$objdir" ;;
	  esac
	  ;;
	esac
      done
      for deplib in $compile_deplibs; do
	case $deplib in
	-L*)
	  case " $new_libs " in
	  *" $deplib "*) ;;
	  *) new_libs="$new_libs $deplib" ;;
	  esac
	  ;;
	*) new_libs="$new_libs $deplib" ;;
	esac
      done
      compile_deplibs="$new_libs"


      compile_command="$compile_command $compile_deplibs"
      finalize_command="$finalize_command $finalize_deplibs"

      if test -n "$rpath$xrpath"; then
	# If the user specified any rpath flags, then add them.
	for libdir in $rpath $xrpath; do
	  # This is the magic to use -rpath.
	  case "$finalize_rpath " in
	  *" $libdir "*) ;;
	  *) finalize_rpath="$finalize_rpath $libdir" ;;
	  esac
	done
      fi

      # Now hardcode the library paths
      rpath=
      hardcode_libdirs=
      for libdir in $compile_rpath $finalize_rpath; do
	if test -n "$hardcode_libdir_flag_spec"; then
	  if test -n "$hardcode_libdir_separator"; then
	    if test -z "$hardcode_libdirs"; then
	      hardcode_libdirs="$libdir"
	    else
	      # Just accumulate the unique libdirs.
	      case $hardcode_libdir_separator$hardcode_libdirs$hardcode_libdir_separator in
	      *"$hardcode_libdir_separator$libdir$hardcode_libdir_separator"*)
		;;
	      *)
		hardcode_libdirs="$hardcode_libdirs$hardcode_libdir_separator$libdir"
		;;
	      esac
	    fi
	  else
	    eval flag=\"$hardcode_libdir_flag_spec\"
	    rpath="$rpath $flag"
	  fi
	elif test -n "$runpath_var"; then
	  case "$perm_rpath " in
	  *" $libdir "*) ;;
	  *) perm_rpath="$perm_rpath $libdir" ;;
	  esac
	fi
	case $host in
	*-*-cygwin* | *-*-mingw* | *-*-pw32* | *-*-os2*)
	  testbindir=`$echo "X$libdir" | $Xsed -e 's*/lib$*/bin*'`
	  case :$dllsearchpath: in
	  *":$libdir:"*) ;;
	  *) dllsearchpath="$dllsearchpath:$libdir";;
	  esac
	  case :$dllsearchpath: in
	  *":$testbindir:"*) ;;
	  *) dllsearchpath="$dllsearchpath:$testbindir";;
	  esac
	  ;;
	esac
      done
      # Substitute the hardcoded libdirs into the rpath.
      if test -n "$hardcode_libdir_separator" &&
	 test -n "$hardcode_libdirs"; then
	libdir="$hardcode_libdirs"
	eval rpath=\" $hardcode_libdir_flag_spec\"
      fi
      compile_rpath="$rpath"

      rpath=
      hardcode_libdirs=
      for libdir in $finalize_rpath; do
	if test -n "$hardcode_libdir_flag_spec"; then
	  if test -n "$hardcode_libdir_separator"; then
	    if test -z "$hardcode_libdirs"; then
	      hardcode_libdirs="$libdir"
	    else
	      # Just accumulate the unique libdirs.
	      case $hardcode_libdir_separator$hardcode_libdirs$hardcode_libdir_separator in
	      *"$hardcode_libdir_separator$libdir$hardcode_libdir_separator"*)
		;;
	      *)
		hardcode_libdirs="$hardcode_libdirs$hardcode_libdir_separator$libdir"
		;;
	      esac
	    fi
	  else
	    eval flag=\"$hardcode_libdir_flag_spec\"
	    rpath="$rpath $flag"
	  fi
	elif test -n "$runpath_var"; then
	  case "$finalize_perm_rpath " in
	  *" $libdir "*) ;;
	  *) finalize_perm_rpath="$finalize_perm_rpath $libdir" ;;
	  esac
	fi
      done
      # Substitute the hardcoded libdirs into the rpath.
      if test -n "$hardcode_libdir_separator" &&
	 test -n "$hardcode_libdirs"; then
	libdir="$hardcode_libdirs"
	eval rpath=\" $hardcode_libdir_flag_spec\"
      fi
      finalize_rpath="$rpath"

      if test -n "$libobjs" && test "$build_old_libs" = yes; then
	# Transform all the library objects into standard objects.
	compile_command=`$echo "X$compile_command" | $SP2NL | $Xsed -e "$lo2o" | $NL2SP`
	finalize_command=`$echo "X$finalize_command" | $SP2NL | $Xsed -e "$lo2o" | $NL2SP`
      fi

      dlsyms=
      if test -n "$dlfiles$dlprefiles" || test "$dlself" != no; then
	if test -n "$NM" && test -n "$global_symbol_pipe"; then
	  dlsyms="${outputname}S.c"
	else
	  $echo "$modename: not configured to extract global symbols from dlpreopened files" 1>&2
	fi
      fi

      if test -n "$dlsyms"; then
	case $dlsyms in
	"") ;;
	*.c)
	  # Discover the nlist of each of the dlfiles.
	  nlist="$output_objdir/${outputname}.nm"

	  $show "$rm $nlist ${nlist}S ${nlist}T"
	  $run $rm "$nlist" "${nlist}S" "${nlist}T"

	  # Parse the name list into a source file.
	  $show "creating $output_objdir/$dlsyms"

	  test -z "$run" && $echo > "$output_objdir/$dlsyms" "\
/* $dlsyms - symbol resolution table for \`$outputname' dlsym emulation. */
/* Generated by $PROGRAM - GNU $PACKAGE $VERSION$TIMESTAMP */

#ifdef __cplusplus
extern \"C\" {
#endif

/* Prevent the only kind of declaration conflicts we can make. */
#define lt_preloaded_symbols some_other_symbol

/* External symbol declarations for the compiler. */\
"

	  if test "$dlself" = yes; then
	    $show "generating symbol list for \`$output'"

	    test -z "$run" && $echo ': @PROGRAM@ ' > "$nlist"

	    # Add our own program objects to the symbol list.
	    progfiles=`$echo "X$objs$old_deplibs" | $SP2NL | $Xsed -e "$lo2o" | $NL2SP`
	    for arg in $progfiles; do
	      $show "extracting global C symbols from \`$arg'"
	      $run eval "$NM $arg | $global_symbol_pipe >> '$nlist'"
	    done

	    if test -n "$exclude_expsyms"; then
	      $run eval '$EGREP -v " ($exclude_expsyms)$" "$nlist" > "$nlist"T'
	      $run eval '$mv "$nlist"T "$nlist"'
	    fi

	    if test -n "$export_symbols_regex"; then
	      $run eval '$EGREP -e "$export_symbols_regex" "$nlist" > "$nlist"T'
	      $run eval '$mv "$nlist"T "$nlist"'
	    fi

	    # Prepare the list of exported symbols
	    if test -z "$export_symbols"; then
	      export_symbols="$output_objdir/$outputname.exp"
	      $run $rm $export_symbols
	      $run eval "${SED} -n -e '/^: @PROGRAM@ $/d' -e 's/^.* \(.*\)$/\1/p' "'< "$nlist" > "$export_symbols"'
              case $host in
              *cygwin* | *mingw* )
	        $run eval "echo EXPORTS "'> "$output_objdir/$outputname.def"'
		$run eval 'cat "$export_symbols" >> "$output_objdir/$outputname.def"'
                ;;
              esac
	    else
	      $run eval "${SED} -e 's/\([].[*^$]\)/\\\\\1/g' -e 's/^/ /' -e 's/$/$/'"' < "$export_symbols" > "$output_objdir/$outputname.exp"'
	      $run eval 'grep -f "$output_objdir/$outputname.exp" < "$nlist" > "$nlist"T'
	      $run eval 'mv "$nlist"T "$nlist"'
              case $host in
              *cygwin* | *mingw* )
	        $run eval "echo EXPORTS "'> "$output_objdir/$outputname.def"'
		$run eval 'cat "$nlist" >> "$output_objdir/$outputname.def"'
                ;;
              esac
	    fi
	  fi

	  for arg in $dlprefiles; do
	    $show "extracting global C symbols from \`$arg'"
	    name=`$echo "$arg" | ${SED} -e 's%^.*/%%'`
	    $run eval '$echo ": $name " >> "$nlist"'
	    $run eval "$NM $arg | $global_symbol_pipe >> '$nlist'"
	  done

	  if test -z "$run"; then
	    # Make sure we have at least an empty file.
	    test -f "$nlist" || : > "$nlist"

	    if test -n "$exclude_expsyms"; then
	      $EGREP -v " ($exclude_expsyms)$" "$nlist" > "$nlist"T
	      $mv "$nlist"T "$nlist"
	    fi

	    # Try sorting and uniquifying the output.
	    if grep -v "^: " < "$nlist" |
		if sort -k 3 </dev/null >/dev/null 2>&1; then
		  sort -k 3
		else
		  sort +2
		fi |
		uniq > "$nlist"S; then
	      :
	    else
	      grep -v "^: " < "$nlist" > "$nlist"S
	    fi

	    if test -f "$nlist"S; then
	      eval "$global_symbol_to_cdecl"' < "$nlist"S >> "$output_objdir/$dlsyms"'
	    else
	      $echo '/* NONE */' >> "$output_objdir/$dlsyms"
	    fi

	    $echo >> "$output_objdir/$dlsyms" "\

#undef lt_preloaded_symbols

#if defined (__STDC__) && __STDC__
# define lt_ptr void *
#else
# define lt_ptr char *
# define const
#endif

/* The mapping between symbol names and symbols. */
"

	    case $host in
	    *cygwin* | *mingw* )
	  $echo >> "$output_objdir/$dlsyms" "\
/* DATA imports from DLLs on WIN32 can't be const, because
   runtime relocations are performed -- see ld's documentation
   on pseudo-relocs */
struct {
"
	      ;;
	    * )
	  $echo >> "$output_objdir/$dlsyms" "\
const struct {
"
	      ;;
	    esac


	  $echo >> "$output_objdir/$dlsyms" "\
  const char *name;
  lt_ptr address;
}
lt_preloaded_symbols[] =
{\
"

	    eval "$global_symbol_to_c_name_address" < "$nlist" >> "$output_objdir/$dlsyms"

	    $echo >> "$output_objdir/$dlsyms" "\
  {0, (lt_ptr) 0}
};

/* This works around a problem in FreeBSD linker */
#ifdef FREEBSD_WORKAROUND
static const void *lt_preloaded_setup() {
  return lt_preloaded_symbols;
}
#endif

#ifdef __cplusplus
}
#endif\
"
	  fi

	  pic_flag_for_symtable=
	  case $host in
	  # compiling the symbol table file with pic_flag works around
	  # a FreeBSD bug that causes programs to crash when -lm is
	  # linked before any other PIC object.  But we must not use
	  # pic_flag when linking with -static.  The problem exists in
	  # FreeBSD 2.2.6 and is fixed in FreeBSD 3.1.
	  *-*-freebsd2*|*-*-freebsd3.0*|*-*-freebsdelf3.0*)
	    case "$compile_command " in
	    *" -static "*) ;;
	    *) pic_flag_for_symtable=" $pic_flag -DFREEBSD_WORKAROUND";;
	    esac;;
	  *-*-hpux*)
	    case "$compile_command " in
	    *" -static "*) ;;
	    *) pic_flag_for_symtable=" $pic_flag";;
	    esac
	  esac

	  # Now compile the dynamic symbol file.
	  $show "(cd $output_objdir && $LTCC  $LTCFLAGS -c$no_builtin_flag$pic_flag_for_symtable \"$dlsyms\")"
	  $run eval '(cd $output_objdir && $LTCC  $LTCFLAGS -c$no_builtin_flag$pic_flag_for_symtable "$dlsyms")' || exit $?

	  # Clean up the generated files.
	  $show "$rm $output_objdir/$dlsyms $nlist ${nlist}S ${nlist}T"
	  $run $rm "$output_objdir/$dlsyms" "$nlist" "${nlist}S" "${nlist}T"

	  # Transform the symbol file into the correct name.
          case $host in
          *cygwin* | *mingw* )
            if test -f "$output_objdir/${outputname}.def" ; then
              compile_command=`$echo "X$compile_command" | $Xsed -e "s%@SYMFILE@%$output_objdir/${outputname}.def $output_objdir/${outputname}S.${objext}%"`
              finalize_command=`$echo "X$finalize_command" | $Xsed -e "s%@SYMFILE@%$output_objdir/${outputname}.def $output_objdir/${outputname}S.${objext}%"`
            else
              compile_command=`$echo "X$compile_command" | $Xsed -e "s%@SYMFILE@%$output_objdir/${outputname}S.${objext}%"`
              finalize_command=`$echo "X$finalize_command" | $Xsed -e "s%@SYMFILE@%$output_objdir/${outputname}S.${objext}%"`
             fi
            ;;
          * )
            compile_command=`$echo "X$compile_command" | $Xsed -e "s%@SYMFILE@%$output_objdir/${outputname}S.${objext}%"`
            finalize_command=`$echo "X$finalize_command" | $Xsed -e "s%@SYMFILE@%$output_objdir/${outputname}S.${objext}%"`
            ;;
          esac
	  ;;
	*)
	  $echo "$modename: unknown suffix for \`$dlsyms'" 1>&2
	  exit $EXIT_FAILURE
	  ;;
	esac
      else
	# We keep going just in case the user didn't refer to
	# lt_preloaded_symbols.  The linker will fail if global_symbol_pipe
	# really was required.

	# Nullify the symbol file.
	compile_command=`$echo "X$compile_command" | $Xsed -e "s% @SYMFILE@%%"`
	finalize_command=`$echo "X$finalize_command" | $Xsed -e "s% @SYMFILE@%%"`
      fi

      if test "$need_relink" = no || test "$build_libtool_libs" != yes; then
	# Replace the output file specification.
	compile_command=`$echo "X$compile_command" | $Xsed -e 's%@OUTPUT@%'"$output"'%g'`
	link_command="$compile_command$compile_rpath"

	# We have no uninstalled library dependencies, so finalize right now.
	$show "$link_command"
	$run eval "$link_command"
	exit_status=$?

	# Delete the generated files.
	if test -n "$dlsyms"; then
	  $show "$rm $output_objdir/${outputname}S.${objext}"
	  $run $rm "$output_objdir/${outputname}S.${objext}"
	fi

	exit $exit_status
      fi

      if test -n "$shlibpath_var"; then
	# We should set the shlibpath_var
	rpath=
	for dir in $temp_rpath; do
	  case $dir in
	  [\\/]* | [A-Za-z]:[\\/]*)
	    # Absolute path.
	    rpath="$rpath$dir:"
	    ;;
	  *)
	    # Relative path: add a thisdir entry.
	    rpath="$rpath\$thisdir/$dir:"
	    ;;
	  esac
	done
	temp_rpath="$rpath"
      fi

      if test -n "$compile_shlibpath$finalize_shlibpath"; then
	compile_command="$shlibpath_var=\"$compile_shlibpath$finalize_shlibpath\$$shlibpath_var\" $compile_command"
      fi
      if test -n "$finalize_shlibpath"; then
	finalize_command="$shlibpath_var=\"$finalize_shlibpath\$$shlibpath_var\" $finalize_command"
      fi

      compile_var=
      finalize_var=
      if test -n "$runpath_var"; then
	if test -n "$perm_rpath"; then
	  # We should set the runpath_var.
	  rpath=
	  for dir in $perm_rpath; do
	    rpath="$rpath$dir:"
	  done
	  compile_var="$runpath_var=\"$rpath\$$runpath_var\" "
	fi
	if test -n "$finalize_perm_rpath"; then
	  # We should set the runpath_var.
	  rpath=
	  for dir in $finalize_perm_rpath; do
	    rpath="$rpath$dir:"
	  done
	  finalize_var="$runpath_var=\"$rpath\$$runpath_var\" "
	fi
      fi

      if test "$no_install" = yes; then
	# We don't need to create a wrapper script.
	link_command="$compile_var$compile_command$compile_rpath"
	# Replace the output file specification.
	link_command=`$echo "X$link_command" | $Xsed -e 's%@OUTPUT@%'"$output"'%g'`
	# Delete the old output file.
	$run $rm $output
	# Link the executable and exit
	$show "$link_command"
	$run eval "$link_command" || exit $?
	exit $EXIT_SUCCESS
      fi

      if test "$hardcode_action" = relink; then
	# Fast installation is not supported
	link_command="$compile_var$compile_command$compile_rpath"
	relink_command="$finalize_var$finalize_command$finalize_rpath"

	$echo "$modename: warning: this platform does not like uninstalled shared libraries" 1>&2
	$echo "$modename: \`$output' will be relinked during installation" 1>&2
      else
	if test "$fast_install" != no; then
	  link_command="$finalize_var$compile_command$finalize_rpath"
	  if test "$fast_install" = yes; then
	    relink_command=`$echo "X$compile_var$compile_command$compile_rpath" | $Xsed -e 's%@OUTPUT@%\$progdir/\$file%g'`
	  else
	    # fast_install is set to needless
	    relink_command=
	  fi
	else
	  link_command="$compile_var$compile_command$compile_rpath"
	  relink_command="$finalize_var$finalize_command$finalize_rpath"
	fi
      fi

      # Replace the output file specification.
      link_command=`$echo "X$link_command" | $Xsed -e 's%@OUTPUT@%'"$output_objdir/$outputname"'%g'`

      # Delete the old output files.
      $run $rm $output $output_objdir/$outputname $output_objdir/lt-$outputname

      $show "$link_command"
      $run eval "$link_command" || exit $?

      # Now create the wrapper script.
      $show "creating $output"

      # Quote the relink command for shipping.
      if test -n "$relink_command"; then
	# Preserve any variables that may affect compiler behavior
	for var in $variables_saved_for_relink; do
	  if eval test -z \"\${$var+set}\"; then
	    relink_command="{ test -z \"\${$var+set}\" || unset $var || { $var=; export $var; }; }; $relink_command"
	  elif eval var_value=\$$var; test -z "$var_value"; then
	    relink_command="$var=; export $var; $relink_command"
	  else
	    var_value=`$echo "X$var_value" | $Xsed -e "$sed_quote_subst"`
	    relink_command="$var=\"$var_value\"; export $var; $relink_command"
	  fi
	done
	relink_command="(cd `pwd`; $relink_command)"
	relink_command=`$echo "X$relink_command" | $Xsed -e "$sed_quote_subst"`
      fi

      # Quote $echo for shipping.
      if test "X$echo" = "X$SHELL $progpath --fallback-echo"; then
	case $progpath in
	[\\/]* | [A-Za-z]:[\\/]*) qecho="$SHELL $progpath --fallback-echo";;
	*) qecho="$SHELL `pwd`/$progpath --fallback-echo";;
	esac
	qecho=`$echo "X$qecho" | $Xsed -e "$sed_quote_subst"`
      else
	qecho=`$echo "X$echo" | $Xsed -e "$sed_quote_subst"`
      fi

      # Only actually do things if our run command is non-null.
      if test -z "$run"; then
	# win32 will think the script is a binary if it has
	# a .exe suffix, so we strip it off here.
	case $output in
	  *.exe) output=`$echo $output|${SED} 's,.exe$,,'` ;;
	esac
	# test for cygwin because mv fails w/o .exe extensions
	case $host in
	  *cygwin*)
	    exeext=.exe
	    outputname=`$echo $outputname|${SED} 's,.exe$,,'` ;;
	  *) exeext= ;;
	esac
	case $host in
	  *cygwin* | *mingw* )
            output_name=`basename $output`
            output_path=`dirname $output`
            cwrappersource="$output_path/$objdir/lt-$output_name.c"
            cwrapper="$output_path/$output_name.exe"
            $rm $cwrappersource $cwrapper
            trap "$rm $cwrappersource $cwrapper; exit $EXIT_FAILURE" 1 2 15

	    cat > $cwrappersource <<EOF

/* $cwrappersource - temporary wrapper executable for $objdir/$outputname
   Generated by $PROGRAM - GNU $PACKAGE $VERSION$TIMESTAMP

   The $output program cannot be directly executed until all the libtool
   libraries that it depends on are installed.

   This wrapper executable should never be moved out of the build directory.
   If it is, it will not operate correctly.

   Currently, it simply execs the wrapper *script* "/bin/sh $output",
   but could eventually absorb all of the scripts functionality and
   exec $objdir/$outputname directly.
*/
EOF
	    cat >> $cwrappersource<<"EOF"
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <malloc.h>
#include <stdarg.h>
#include <assert.h>
#include <string.h>
#include <ctype.h>
#include <sys/stat.h>

#if defined(PATH_MAX)
# define LT_PATHMAX PATH_MAX
#elif defined(MAXPATHLEN)
# define LT_PATHMAX MAXPATHLEN
#else
# define LT_PATHMAX 1024
#endif

#ifndef DIR_SEPARATOR
# define DIR_SEPARATOR '/'
# define PATH_SEPARATOR ':'
#endif

#if defined (_WIN32) || defined (__MSDOS__) || defined (__DJGPP__) || \
  defined (__OS2__)
# define HAVE_DOS_BASED_FILE_SYSTEM
# ifndef DIR_SEPARATOR_2
#  define DIR_SEPARATOR_2 '\\'
# endif
# ifndef PATH_SEPARATOR_2
#  define PATH_SEPARATOR_2 ';'
# endif
#endif

#ifndef DIR_SEPARATOR_2
# define IS_DIR_SEPARATOR(ch) ((ch) == DIR_SEPARATOR)
#else /* DIR_SEPARATOR_2 */
# define IS_DIR_SEPARATOR(ch) \
        (((ch) == DIR_SEPARATOR) || ((ch) == DIR_SEPARATOR_2))
#endif /* DIR_SEPARATOR_2 */

#ifndef PATH_SEPARATOR_2
# define IS_PATH_SEPARATOR(ch) ((ch) == PATH_SEPARATOR)
#else /* PATH_SEPARATOR_2 */
# define IS_PATH_SEPARATOR(ch) ((ch) == PATH_SEPARATOR_2)
#endif /* PATH_SEPARATOR_2 */

#define XMALLOC(type, num)      ((type *) xmalloc ((num) * sizeof(type)))
#define XFREE(stale) do { \
  if (stale) { free ((void *) stale); stale = 0; } \
} while (0)

/* -DDEBUG is fairly common in CFLAGS.  */
#undef DEBUG
#if defined DEBUGWRAPPER
# define DEBUG(format, ...) fprintf(stderr, format, __VA_ARGS__)
#else
# define DEBUG(format, ...)
#endif

const char *program_name = NULL;

void * xmalloc (size_t num);
char * xstrdup (const char *string);
const char * base_name (const char *name);
char * find_executable(const char *wrapper);
int    check_executable(const char *path);
char * strendzap(char *str, const char *pat);
void lt_fatal (const char *message, ...);

int
main (int argc, char *argv[])
{
  char **newargz;
  int i;

  program_name = (char *) xstrdup (base_name (argv[0]));
  DEBUG("(main) argv[0]      : %s\n",argv[0]);
  DEBUG("(main) program_name : %s\n",program_name);
  newargz = XMALLOC(char *, argc+2);
EOF

            cat >> $cwrappersource <<EOF
  newargz[0] = (char *) xstrdup("$SHELL");
EOF

            cat >> $cwrappersource <<"EOF"
  newargz[1] = find_executable(argv[0]);
  if (newargz[1] == NULL)
    lt_fatal("Couldn't find %s", argv[0]);
  DEBUG("(main) found exe at : %s\n",newargz[1]);
  /* we know the script has the same name, without the .exe */
  /* so make sure newargz[1] doesn't end in .exe */
  strendzap(newargz[1],".exe");
  for (i = 1; i < argc; i++)
    newargz[i+1] = xstrdup(argv[i]);
  newargz[argc+1] = NULL;

  for (i=0; i<argc+1; i++)
  {
    DEBUG("(main) newargz[%d]   : %s\n",i,newargz[i]);
    ;
  }

EOF

            case $host_os in
              mingw*)
                cat >> $cwrappersource <<EOF
  execv("$SHELL",(char const **)newargz);
EOF
              ;;
              *)
                cat >> $cwrappersource <<EOF
  execv("$SHELL",newargz);
EOF
              ;;
            esac

            cat >> $cwrappersource <<"EOF"
  return 127;
}

void *
xmalloc (size_t num)
{
  void * p = (void *) malloc (num);
  if (!p)
    lt_fatal ("Memory exhausted");

  return p;
}

char *
xstrdup (const char *string)
{
  return string ? strcpy ((char *) xmalloc (strlen (string) + 1), string) : NULL
;
}

const char *
base_name (const char *name)
{
  const char *base;

#if defined (HAVE_DOS_BASED_FILE_SYSTEM)
  /* Skip over the disk name in MSDOS pathnames. */
  if (isalpha ((unsigned char)name[0]) && name[1] == ':')
    name += 2;
#endif

  for (base = name; *name; name++)
    if (IS_DIR_SEPARATOR (*name))
      base = name + 1;
  return base;
}

int
check_executable(const char * path)
{
  struct stat st;

  DEBUG("(check_executable)  : %s\n", path ? (*path ? path : "EMPTY!") : "NULL!");
  if ((!path) || (!*path))
    return 0;

  if ((stat (path, &st) >= 0) &&
      (
        /* MinGW & native WIN32 do not support S_IXOTH or S_IXGRP */
#if defined (S_IXOTH)
       ((st.st_mode & S_IXOTH) == S_IXOTH) ||
#endif
#if defined (S_IXGRP)
       ((st.st_mode & S_IXGRP) == S_IXGRP) ||
#endif
       ((st.st_mode & S_IXUSR) == S_IXUSR))
      )
    return 1;
  else
    return 0;
}

/* Searches for the full path of the wrapper.  Returns
   newly allocated full path name if found, NULL otherwise */
char *
find_executable (const char* wrapper)
{
  int has_slash = 0;
  const char* p;
  const char* p_next;
  /* static buffer for getcwd */
  char tmp[LT_PATHMAX + 1];
  int tmp_len;
  char* concat_name;

  DEBUG("(find_executable)  : %s\n", wrapper ? (*wrapper ? wrapper : "EMPTY!") : "NULL!");

  if ((wrapper == NULL) || (*wrapper == '\0'))
    return NULL;

  /* Absolute path? */
#if defined (HAVE_DOS_BASED_FILE_SYSTEM)
  if (isalpha ((unsigned char)wrapper[0]) && wrapper[1] == ':')
  {
    concat_name = xstrdup (wrapper);
    if (check_executable(concat_name))
      return concat_name;
    XFREE(concat_name);
  }
  else
  {
#endif
    if (IS_DIR_SEPARATOR (wrapper[0]))
    {
      concat_name = xstrdup (wrapper);
      if (check_executable(concat_name))
        return concat_name;
      XFREE(concat_name);
    }
#if defined (HAVE_DOS_BASED_FILE_SYSTEM)
  }
#endif

  for (p = wrapper; *p; p++)
    if (*p == '/')
    {
      has_slash = 1;
      break;
    }
  if (!has_slash)
  {
    /* no slashes; search PATH */
    const char* path = getenv ("PATH");
    if (path != NULL)
    {
      for (p = path; *p; p = p_next)
      {
        const char* q;
        size_t p_len;
        for (q = p; *q; q++)
          if (IS_PATH_SEPARATOR(*q))
            break;
        p_len = q - p;
        p_next = (*q == '\0' ? q : q + 1);
        if (p_len == 0)
        {
          /* empty path: current directory */
          if (getcwd (tmp, LT_PATHMAX) == NULL)
            lt_fatal ("getcwd failed");
          tmp_len = strlen(tmp);
          concat_name = XMALLOC(char, tmp_len + 1 + strlen(wrapper) + 1);
          memcpy (concat_name, tmp, tmp_len);
          concat_name[tmp_len] = '/';
          strcpy (concat_name + tmp_len + 1, wrapper);
        }
        else
        {
          concat_name = XMALLOC(char, p_len + 1 + strlen(wrapper) + 1);
          memcpy (concat_name, p, p_len);
          concat_name[p_len] = '/';
          strcpy (concat_name + p_len + 1, wrapper);
        }
        if (check_executable(concat_name))
          return concat_name;
        XFREE(concat_name);
      }
    }
    /* not found in PATH; assume curdir */
  }
  /* Relative path | not found in path: prepend cwd */
  if (getcwd (tmp, LT_PATHMAX) == NULL)
    lt_fatal ("getcwd failed");
  tmp_len = strlen(tmp);
  concat_name = XMALLOC(char, tmp_len + 1 + strlen(wrapper) + 1);
  memcpy (concat_name, tmp, tmp_len);
  concat_name[tmp_len] = '/';
  strcpy (concat_name + tmp_len + 1, wrapper);

  if (check_executable(concat_name))
    return concat_name;
  XFREE(concat_name);
  return NULL;
}

char *
strendzap(char *str, const char *pat)
{
  size_t len, patlen;

  assert(str != NULL);
  assert(pat != NULL);

  len = strlen(str);
  patlen = strlen(pat);

  if (patlen <= len)
  {
    str += len - patlen;
    if (strcmp(str, pat) == 0)
      *str = '\0';
  }
  return str;
}

static void
lt_error_core (int exit_status, const char * mode,
          const char * message, va_list ap)
{
  fprintf (stderr, "%s: %s: ", program_name, mode);
  vfprintf (stderr, message, ap);
  fprintf (stderr, ".\n");

  if (exit_status >= 0)
    exit (exit_status);
}

void
lt_fatal (const char *message, ...)
{
  va_list ap;
  va_start (ap, message);
  lt_error_core (EXIT_FAILURE, "FATAL", message, ap);
  va_end (ap);
}
EOF
          # we should really use a build-platform specific compiler
          # here, but OTOH, the wrappers (shell script and this C one)
          # are only useful if you want to execute the "real" binary.
          # Since the "real" binary is built for $host, then this
          # wrapper might as well be built for $host, too.
          $run $LTCC $LTCFLAGS -s -o $cwrapper $cwrappersource
          ;;
        esac
        $rm $output
        trap "$rm $output; exit $EXIT_FAILURE" 1 2 15

	$echo > $output "\
#! $SHELL

# $output - temporary wrapper script for $objdir/$outputname
# Generated by $PROGRAM - GNU $PACKAGE $VERSION$TIMESTAMP
#
# The $output program cannot be directly executed until all the libtool
# libraries that it depends on are installed.
#
# This wrapper script should never be moved out of the build directory.
# If it is, it will not operate correctly.

# Sed substitution that helps us do robust quoting.  It backslashifies
# metacharacters that are still active within double-quoted strings.
Xsed='${SED} -e 1s/^X//'
sed_quote_subst='$sed_quote_subst'

# The HP-UX ksh and POSIX shell print the target directory to stdout
# if CDPATH is set.
(unset CDPATH) >/dev/null 2>&1 && unset CDPATH

relink_command=\"$relink_command\"

# This environment variable determines our operation mode.
if test \"\$libtool_install_magic\" = \"$magic\"; then
  # install mode needs the following variable:
  notinst_deplibs='$notinst_deplibs'
else
  # When we are sourced in execute mode, \$file and \$echo are already set.
  if test \"\$libtool_execute_magic\" != \"$magic\"; then
    echo=\"$qecho\"
    file=\"\$0\"
    # Make sure echo works.
    if test \"X\$1\" = X--no-reexec; then
      # Discard the --no-reexec flag, and continue.
      shift
    elif test \"X\`(\$echo '\t') 2>/dev/null\`\" = 'X\t'; then
      # Yippee, \$echo works!
      :
    else
      # Restart under the correct shell, and then maybe \$echo will work.
      exec $SHELL \"\$0\" --no-reexec \${1+\"\$@\"}
    fi
  fi\
"
	$echo >> $output "\

  # Find the directory that this script lives in.
  thisdir=\`\$echo \"X\$file\" | \$Xsed -e 's%/[^/]*$%%'\`
  test \"x\$thisdir\" = \"x\$file\" && thisdir=.

  # Follow symbolic links until we get to the real thisdir.
  file=\`ls -ld \"\$file\" | ${SED} -n 's/.*-> //p'\`
  while test -n \"\$file\"; do
    destdir=\`\$echo \"X\$file\" | \$Xsed -e 's%/[^/]*\$%%'\`

    # If there was a directory component, then change thisdir.
    if test \"x\$destdir\" != \"x\$file\"; then
      case \"\$destdir\" in
      [\\\\/]* | [A-Za-z]:[\\\\/]*) thisdir=\"\$destdir\" ;;
      *) thisdir=\"\$thisdir/\$destdir\" ;;
      esac
    fi

    file=\`\$echo \"X\$file\" | \$Xsed -e 's%^.*/%%'\`
    file=\`ls -ld \"\$thisdir/\$file\" | ${SED} -n 's/.*-> //p'\`
  done

  # Try to get the absolute directory name.
  absdir=\`cd \"\$thisdir\" && pwd\`
  test -n \"\$absdir\" && thisdir=\"\$absdir\"
"

	if test "$fast_install" = yes; then
	  $echo >> $output "\
  program=lt-'$outputname'$exeext
  progdir=\"\$thisdir/$objdir\"

  if test ! -f \"\$progdir/\$program\" || \\
     { file=\`ls -1dt \"\$progdir/\$program\" \"\$progdir/../\$program\" 2>/dev/null | ${SED} 1q\`; \\
       test \"X\$file\" != \"X\$progdir/\$program\"; }; then

    file=\"\$\$-\$program\"

    if test ! -d \"\$progdir\"; then
      $mkdir \"\$progdir\"
    else
      $rm \"\$progdir/\$file\"
    fi"

	  $echo >> $output "\

    # relink executable if necessary
    if test -n \"\$relink_command\"; then
      if relink_command_output=\`eval \$relink_command 2>&1\`; then :
      else
	$echo \"\$relink_command_output\" >&2
	$rm \"\$progdir/\$file\"
	exit $EXIT_FAILURE
      fi
    fi

    $mv \"\$progdir/\$file\" \"\$progdir/\$program\" 2>/dev/null ||
    { $rm \"\$progdir/\$program\";
      $mv \"\$progdir/\$file\" \"\$progdir/\$program\"; }
    $rm \"\$progdir/\$file\"
  fi"
	else
	  $echo >> $output "\
  program='$outputname'
  progdir=\"\$thisdir/$objdir\"
"
	fi

	$echo >> $output "\

  if test -f \"\$progdir/\$program\"; then"

	# Export our shlibpath_var if we have one.
	if test "$shlibpath_overrides_runpath" = yes && test -n "$shlibpath_var" && test -n "$temp_rpath"; then
	  $echo >> $output "\
    # Add our own library path to $shlibpath_var
    $shlibpath_var=\"$temp_rpath\$$shlibpath_var\"

    # Some systems cannot cope with colon-terminated $shlibpath_var
    # The second colon is a workaround for a bug in BeOS R4 sed
    $shlibpath_var=\`\$echo \"X\$$shlibpath_var\" | \$Xsed -e 's/::*\$//'\`

    export $shlibpath_var
"
	fi

	# fixup the dll searchpath if we need to.
	if test -n "$dllsearchpath"; then
	  $echo >> $output "\
    # Add the dll search path components to the executable PATH
    PATH=$dllsearchpath:\$PATH
"
	fi

	$echo >> $output "\
    if test \"\$libtool_execute_magic\" != \"$magic\"; then
      # Run the actual program with our arguments.
"
	case $host in
	# Backslashes separate directories on plain windows
	*-*-mingw | *-*-os2*)
	  $echo >> $output "\
      exec \"\$progdir\\\\\$program\" \${1+\"\$@\"}
"
	  ;;

	*)
	  $echo >> $output "\
      exec \"\$progdir/\$program\" \${1+\"\$@\"}
"
	  ;;
	esac
	$echo >> $output "\
      \$echo \"\$0: cannot exec \$program \${1+\"\$@\"}\"
      exit $EXIT_FAILURE
    fi
  else
    # The program doesn't exist.
    \$echo \"\$0: error: \\\`\$progdir/\$program' does not exist\" 1>&2
    \$echo \"This script is just a wrapper for \$program.\" 1>&2
    $echo \"See the $PACKAGE documentation for more information.\" 1>&2
    exit $EXIT_FAILURE
  fi
fi\
"
	chmod +x $output
      fi
      exit $EXIT_SUCCESS
      ;;
    esac

    # See if we need to build an old-fashioned archive.
    for oldlib in $oldlibs; do

      if test "$build_libtool_libs" = convenience; then
	oldobjs="$libobjs_save"
	addlibs="$convenience"
	build_libtool_libs=no
      else
	if test "$build_libtool_libs" = module; then
	  oldobjs="$libobjs_save"
	  build_libtool_libs=no
	else
	  oldobjs="$old_deplibs $non_pic_objects"
	fi
	addlibs="$old_convenience"
      fi

      if test -n "$addlibs"; then
	gentop="$output_objdir/${outputname}x"
	generated="$generated $gentop"

	func_extract_archives $gentop $addlibs
	oldobjs="$oldobjs $func_extract_archives_result"
      fi

      # Do each command in the archive commands.
      if test -n "$old_archive_from_new_cmds" && test "$build_libtool_libs" = yes; then
       cmds=$old_archive_from_new_cmds
      else
	# POSIX demands no paths to be encoded in archives.  We have
	# to avoid creating archives with duplicate basenames if we
	# might have to extract them afterwards, e.g., when creating a
	# static archive out of a convenience library, or when linking
	# the entirety of a libtool archive into another (currently
	# not supported by libtool).
	if (for obj in $oldobjs
	    do
	      $echo "X$obj" | $Xsed -e 's%^.*/%%'
	    done | sort | sort -uc >/dev/null 2>&1); then
	  :
	else
	  $echo "copying selected object files to avoid basename conflicts..."

	  if test -z "$gentop"; then
	    gentop="$output_objdir/${outputname}x"
	    generated="$generated $gentop"

	    $show "${rm}r $gentop"
	    $run ${rm}r "$gentop"
	    $show "$mkdir $gentop"
	    $run $mkdir "$gentop"
	    exit_status=$?
	    if test "$exit_status" -ne 0 && test ! -d "$gentop"; then
	      exit $exit_status
	    fi
	  fi

	  save_oldobjs=$oldobjs
	  oldobjs=
	  counter=1
	  for obj in $save_oldobjs
	  do
	    objbase=`$echo "X$obj" | $Xsed -e 's%^.*/%%'`
	    case " $oldobjs " in
	    " ") oldobjs=$obj ;;
	    *[\ /]"$objbase "*)
	      while :; do
		# Make sure we don't pick an alternate name that also
		# overlaps.
		newobj=lt$counter-$objbase
		counter=`expr $counter + 1`
		case " $oldobjs " in
		*[\ /]"$newobj "*) ;;
		*) if test ! -f "$gentop/$newobj"; then break; fi ;;
		esac
	      done
	      $show "ln $obj $gentop/$newobj || cp $obj $gentop/$newobj"
	      $run ln "$obj" "$gentop/$newobj" ||
	      $run cp "$obj" "$gentop/$newobj"
	      oldobjs="$oldobjs $gentop/$newobj"
	      ;;
	    *) oldobjs="$oldobjs $obj" ;;
	    esac
	  done
	fi

	eval cmds=\"$old_archive_cmds\"

	if len=`expr "X$cmds" : ".*"` &&
	     test "$len" -le "$max_cmd_len" || test "$max_cmd_len" -le -1; then
	  cmds=$old_archive_cmds
	else
	  # the command line is too long to link in one step, link in parts
	  $echo "using piecewise archive linking..."
	  save_RANLIB=$RANLIB
	  RANLIB=:
	  objlist=
	  concat_cmds=
	  save_oldobjs=$oldobjs

	  # Is there a better way of finding the last object in the list?
	  for obj in $save_oldobjs
	  do
	    last_oldobj=$obj
	  done
	  for obj in $save_oldobjs
	  do
	    oldobjs="$objlist $obj"
	    objlist="$objlist $obj"
	    eval test_cmds=\"$old_archive_cmds\"
	    if len=`expr "X$test_cmds" : ".*" 2>/dev/null` &&
	       test "$len" -le "$max_cmd_len"; then
	      :
	    else
	      # the above command should be used before it gets too long
	      oldobjs=$objlist
	      if test "$obj" = "$last_oldobj" ; then
	        RANLIB=$save_RANLIB
	      fi
	      test -z "$concat_cmds" || concat_cmds=$concat_cmds~
	      eval concat_cmds=\"\${concat_cmds}$old_archive_cmds\"
	      objlist=
	    fi
	  done
	  RANLIB=$save_RANLIB
	  oldobjs=$objlist
	  if test "X$oldobjs" = "X" ; then
	    eval cmds=\"\$concat_cmds\"
	  else
	    eval cmds=\"\$concat_cmds~\$old_archive_cmds\"
	  fi
	fi
      fi
      save_ifs="$IFS"; IFS='~'
      for cmd in $cmds; do
        eval cmd=\"$cmd\"
	IFS="$save_ifs"
	$show "$cmd"
	$run eval "$cmd" || exit $?
      done
      IFS="$save_ifs"
    done

    if test -n "$generated"; then
      $show "${rm}r$generated"
      $run ${rm}r$generated
    fi

    # Now create the libtool archive.
    case $output in
    *.la)
      old_library=
      test "$build_old_libs" = yes && old_library="$libname.$libext"
      $show "creating $output"

      # Preserve any variables that may affect compiler behavior
      for var in $variables_saved_for_relink; do
	if eval test -z \"\${$var+set}\"; then
	  relink_command="{ test -z \"\${$var+set}\" || unset $var || { $var=; export $var; }; }; $relink_command"
	elif eval var_value=\$$var; test -z "$var_value"; then
	  relink_command="$var=; export $var; $relink_command"
	else
	  var_value=`$echo "X$var_value" | $Xsed -e "$sed_quote_subst"`
	  relink_command="$var=\"$var_value\"; export $var; $relink_command"
	fi
      done
      # Quote the link command for shipping.
      relink_command="(cd `pwd`; $SHELL $progpath $preserve_args --mode=relink $libtool_args @inst_prefix_dir@)"
      relink_command=`$echo "X$relink_command" | $Xsed -e "$sed_quote_subst"`
      if test "$hardcode_automatic" = yes ; then
	relink_command=
      fi


      # Only create the output if not a dry run.
      if test -z "$run"; then
	for installed in no yes; do
	  if test "$installed" = yes; then
	    if test -z "$install_libdir"; then
	      break
	    fi
	    output="$output_objdir/$outputname"i
	    # Replace all uninstalled libtool libraries with the installed ones
	    newdependency_libs=
	    for deplib in $dependency_libs; do
	      case $deplib in
	      *.la)
		name=`$echo "X$deplib" | $Xsed -e 's%^.*/%%'`
		eval libdir=`${SED} -n -e 's/^libdir=\(.*\)$/\1/p' $deplib`
		if test -z "$libdir"; then
		  $echo "$modename: \`$deplib' is not a valid libtool archive" 1>&2
		  exit $EXIT_FAILURE
		fi
		newdependency_libs="$newdependency_libs $libdir/$name"
		;;
	      *) newdependency_libs="$newdependency_libs $deplib" ;;
	      esac
	    done
	    dependency_libs="$newdependency_libs"
	    newdlfiles=
	    for lib in $dlfiles; do
	      name=`$echo "X$lib" | $Xsed -e 's%^.*/%%'`
	      eval libdir=`${SED} -n -e 's/^libdir=\(.*\)$/\1/p' $lib`
	      if test -z "$libdir"; then
		$echo "$modename: \`$lib' is not a valid libtool archive" 1>&2
		exit $EXIT_FAILURE
	      fi
	      newdlfiles="$newdlfiles $libdir/$name"
	    done
	    dlfiles="$newdlfiles"
	    newdlprefiles=
	    for lib in $dlprefiles; do
	      name=`$echo "X$lib" | $Xsed -e 's%^.*/%%'`
	      eval libdir=`${SED} -n -e 's/^libdir=\(.*\)$/\1/p' $lib`
	      if test -z "$libdir"; then
		$echo "$modename: \`$lib' is not a valid libtool archive" 1>&2
		exit $EXIT_FAILURE
	      fi
	      newdlprefiles="$newdlprefiles $libdir/$name"
	    done
	    dlprefiles="$newdlprefiles"
	  else
	    newdlfiles=
	    for lib in $dlfiles; do
	      case $lib in
		[\\/]* | [A-Za-z]:[\\/]*) abs="$lib" ;;
		*) abs=`pwd`"/$lib" ;;
	      esac
	      newdlfiles="$newdlfiles $abs"
	    done
	    dlfiles="$newdlfiles"
	    newdlprefiles=
	    for lib in $dlprefiles; do
	      case $lib in
		[\\/]* | [A-Za-z]:[\\/]*) abs="$lib" ;;
		*) abs=`pwd`"/$lib" ;;
	      esac
	      newdlprefiles="$newdlprefiles $abs"
	    done
	    dlprefiles="$newdlprefiles"
	  fi
	  $rm $output
	  # place dlname in correct position for cygwin
	  tdlname=$dlname
	  case $host,$output,$installed,$module,$dlname in
	    *cygwin*,*lai,yes,no,*.dll | *mingw*,*lai,yes,no,*.dll) tdlname=../bin/$dlname ;;
	  esac
	  $echo > $output "\
# $outputname - a libtool library file
# Generated by $PROGRAM - GNU $PACKAGE $VERSION$TIMESTAMP
#
# Please DO NOT delete this file!
# It is necessary for linking the library.

# The name that we can dlopen(3).
dlname='$tdlname'

# Names of this library.
library_names='$library_names'

# The name of the static archive.
old_library='$old_library'

# Libraries that this one depends upon.
dependency_libs='$dependency_libs'

# Version information for $libname.
current=$current
age=$age
revision=$revision

# Is this an already installed library?
installed=$installed

# Should we warn about portability when linking against -modules?
shouldnotlink=$module

# Files to dlopen/dlpreopen
dlopen='$dlfiles'
dlpreopen='$dlprefiles'

# Directory that this library needs to be installed in:
libdir='$install_libdir'"
	  if test "$installed" = no && test "$need_relink" = yes; then
	    $echo >> $output "\
relink_command=\"$relink_command\""
	  fi
	done
      fi

      # Do a symbolic link so that the libtool archive can be found in
      # LD_LIBRARY_PATH before the program is installed.
      $show "(cd $output_objdir && $rm $outputname && $LN_S ../$outputname $outputname)"
      $run eval '(cd $output_objdir && $rm $outputname && $LN_S ../$outputname $outputname)' || exit $?
      ;;
    esac
    exit $EXIT_SUCCESS
    ;;

  # libtool install mode
  install)
    modename="$modename: install"

    # There may be an optional sh(1) argument at the beginning of
    # install_prog (especially on Windows NT).
    if test "$nonopt" = "$SHELL" || test "$nonopt" = /bin/sh ||
       # Allow the use of GNU shtool's install command.
       $echo "X$nonopt" | grep shtool > /dev/null; then
      # Aesthetically quote it.
      arg=`$echo "X$nonopt" | $Xsed -e "$sed_quote_subst"`
      case $arg in
      *[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \	]*|*]*|"")
	arg="\"$arg\""
	;;
      esac
      install_prog="$arg "
      arg="$1"
      shift
    else
      install_prog=
      arg=$nonopt
    fi

    # The real first argument should be the name of the installation program.
    # Aesthetically quote it.
    arg=`$echo "X$arg" | $Xsed -e "$sed_quote_subst"`
    case $arg in
    *[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \	]*|*]*|"")
      arg="\"$arg\""
      ;;
    esac
    install_prog="$install_prog$arg"

    # We need to accept at least all the BSD install flags.
    dest=
    files=
    opts=
    prev=
    install_type=
    isdir=no
    stripme=
    for arg
    do
      if test -n "$dest"; then
	files="$files $dest"
	dest=$arg
	continue
      fi

      case $arg in
      -d) isdir=yes ;;
      -f) 
      	case " $install_prog " in
	*[\\\ /]cp\ *) ;;
	*) prev=$arg ;;
	esac
	;;
      -g | -m | -o) prev=$arg ;;
      -s)
	stripme=" -s"
	continue
	;;
      -*)
	;;
      *)
	# If the previous option needed an argument, then skip it.
	if test -n "$prev"; then
	  prev=
	else
	  dest=$arg
	  continue
	fi
	;;
      esac

      # Aesthetically quote the argument.
      arg=`$echo "X$arg" | $Xsed -e "$sed_quote_subst"`
      case $arg in
      *[\[\~\#\^\&\*\(\)\{\}\|\;\<\>\?\'\ \	]*|*]*|"")
	arg="\"$arg\""
	;;
      esac
      install_prog="$install_prog $arg"
    done

    if test -z "$install_prog"; then
      $echo "$modename: you must specify an install program" 1>&2
      $echo "$help" 1>&2
      exit $EXIT_FAILURE
    fi

    if test -n "$prev"; then
      $echo "$modename: the \`$prev' option requires an argument" 1>&2
      $echo "$help" 1>&2
      exit $EXIT_FAILURE
    fi

    if test -z "$files"; then
      if test -z "$dest"; then
	$echo "$modename: no file or destination specified" 1>&2
      else
	$echo "$modename: you must specify a destination" 1>&2
      fi
      $echo "$help" 1>&2
      exit $EXIT_FAILURE
    fi

    # Strip any trailing slash from the destination.
    dest=`$echo "X$dest" | $Xsed -e 's%/$%%'`

    # Check to see that the destination is a directory.
    test -d "$dest" && isdir=yes
    if test "$isdir" = yes; then
      destdir="$dest"
      destname=
    else
      destdir=`$echo "X$dest" | $Xsed -e 's%/[^/]*$%%'`
      test "X$destdir" = "X$dest" && destdir=.
      destname=`$echo "X$dest" | $Xsed -e 's%^.*/%%'`

      # Not a directory, so check to see that there is only one file specified.
      set dummy $files
      if test "$#" -gt 2; then
	$echo "$modename: \`$dest' is not a directory" 1>&2
	$echo "$help" 1>&2
	exit $EXIT_FAILURE
      fi
    fi
    case $destdir in
    [\\/]* | [A-Za-z]:[\\/]*) ;;
    *)
      for file in $files; do
	case $file in
	*.lo) ;;
	*)
	  $echo "$modename: \`$destdir' must be an absolute directory name" 1>&2
	  $echo "$help" 1>&2
	  exit $EXIT_FAILURE
	  ;;
	esac
      done
      ;;
    esac

    # This variable tells wrapper scripts just to set variables rather
    # than running their programs.
    libtool_install_magic="$magic"

    staticlibs=
    future_libdirs=
    current_libdirs=
    for file in $files; do

      # Do each installation.
      case $file in
      *.$libext)
	# Do the static libraries later.
	staticlibs="$staticlibs $file"
	;;

      *.la)
	# Check to see that this really is a libtool archive.
	if (${SED} -e '2q' $file | grep "^# Generated by .*$PACKAGE") >/dev/null 2>&1; then :
	else
	  $echo "$modename: \`$file' is not a valid libtool archive" 1>&2
	  $echo "$help" 1>&2
	  exit $EXIT_FAILURE
	fi

	library_names=
	old_library=
	relink_command=
	# If there is no directory component, then add one.
	case $file in
	*/* | *\\*) . $file ;;
	*) . ./$file ;;
	esac

	# Add the libdir to current_libdirs if it is the destination.
	if test "X$destdir" = "X$libdir"; then
	  case "$current_libdirs " in
	  *" $libdir "*) ;;
	  *) current_libdirs="$current_libdirs $libdir" ;;
	  esac
	else
	  # Note the libdir as a future libdir.
	  case "$future_libdirs " in
	  *" $libdir "*) ;;
	  *) future_libdirs="$future_libdirs $libdir" ;;
	  esac
	fi

	dir=`$echo "X$file" | $Xsed -e 's%/[^/]*$%%'`/
	test "X$dir" = "X$file/" && dir=
	dir="$dir$objdir"

	if test -n "$relink_command"; then
	  # Determine the prefix the user has applied to our future dir.
	  inst_prefix_dir=`$echo "$destdir" | $SED "s%$libdir\$%%"`

	  # Don't allow the user to place us outside of our expected
	  # location b/c this prevents finding dependent libraries that
	  # are installed to the same prefix.
	  # At present, this check doesn't affect windows .dll's that
	  # are installed into $libdir/../bin (currently, that works fine)
	  # but it's something to keep an eye on.
	  if test "$inst_prefix_dir" = "$destdir"; then
	    $echo "$modename: error: cannot install \`$file' to a directory not ending in $libdir" 1>&2
	    exit $EXIT_FAILURE
	  fi

	  if test -n "$inst_prefix_dir"; then
	    # Stick the inst_prefix_dir data into the link command.
	    relink_command=`$echo "$relink_command" | $SED "s%@inst_prefix_dir@%-inst-prefix-dir $inst_prefix_dir%"`
	  else
	    relink_command=`$echo "$relink_command" | $SED "s%@inst_prefix_dir@%%"`
	  fi

	  $echo "$modename: warning: relinking \`$file'" 1>&2
	  $show "$relink_command"
	  if $run eval "$relink_command"; then :
	  else
	    $echo "$modename: error: relink \`$file' with the above command before installing it" 1>&2
	    exit $EXIT_FAILURE
	  fi
	fi

	# See the names of the shared library.
	set dummy $library_names
	if test -n "$2"; then
	  realname="$2"
	  shift
	  shift

	  srcname="$realname"
	  test -n "$relink_command" && srcname="$realname"T

	  # Install the shared library and build the symlinks.
	  $show "$install_prog $dir/$srcname $destdir/$realname"
	  $run eval "$install_prog $dir/$srcname $destdir/$realname" || exit $?
	  if test -n "$stripme" && test -n "$striplib"; then
	    $show "$striplib $destdir/$realname"
	    $run eval "$striplib $destdir/$realname" || exit $?
	  fi

	  if test "$#" -gt 0; then
	    # Delete the old symlinks, and create new ones.
	    # Try `ln -sf' first, because the `ln' binary might depend on
	    # the symlink we replace!  Solaris /bin/ln does not understand -f,
	    # so we also need to try rm && ln -s.
	    for linkname
	    do
	      if test "$linkname" != "$realname"; then
                $show "(cd $destdir && { $LN_S -f $realname $linkname || { $rm $linkname && $LN_S $realname $linkname; }; })"
                $run eval "(cd $destdir && { $LN_S -f $realname $linkname || { $rm $linkname && $LN_S $realname $linkname; }; })"
	      fi
	    done
	  fi

	  # Do each command in the postinstall commands.
	  lib="$destdir/$realname"
	  cmds=$postinstall_cmds
	  save_ifs="$IFS"; IFS='~'
	  for cmd in $cmds; do
	    IFS="$save_ifs"
	    eval cmd=\"$cmd\"
	    $show "$cmd"
	    $run eval "$cmd" || {
	      lt_exit=$?

	      # Restore the uninstalled library and exit
	      if test "$mode" = relink; then
		$run eval '(cd $output_objdir && $rm ${realname}T && $mv ${realname}U $realname)'
	      fi

	      exit $lt_exit
	    }
	  done
	  IFS="$save_ifs"
	fi

	# Install the pseudo-library for information purposes.
	name=`$echo "X$file" | $Xsed -e 's%^.*/%%'`
	instname="$dir/$name"i
	$show "$install_prog $instname $destdir/$name"
	$run eval "$install_prog $instname $destdir/$name" || exit $?

	# Maybe install the static library, too.
	test -n "$old_library" && staticlibs="$staticlibs $dir/$old_library"
	;;

      *.lo)
	# Install (i.e. copy) a libtool object.

	# Figure out destination file name, if it wasn't already specified.
	if test -n "$destname"; then
	  destfile="$destdir/$destname"
	else
	  destfile=`$echo "X$file" | $Xsed -e 's%^.*/%%'`
	  destfile="$destdir/$destfile"
	fi

	# Deduce the name of the destination old-style object file.
	case $destfile in
	*.lo)
	  staticdest=`$echo "X$destfile" | $Xsed -e "$lo2o"`
	  ;;
	*.$objext)
	  staticdest="$destfile"
	  destfile=
	  ;;
	*)
	  $echo "$modename: cannot copy a libtool object to \`$destfile'" 1>&2
	  $echo "$help" 1>&2
	  exit $EXIT_FAILURE
	  ;;
	esac

	# Install the libtool object if requested.
	if test -n "$destfile"; then
	  $show "$install_prog $file $destfile"
	  $run eval "$install_prog $file $destfile" || exit $?
	fi

	# Install the old object if enabled.
	if test "$build_old_libs" = yes; then
	  # Deduce the name of the old-style object file.
	  staticobj=`$echo "X$file" | $Xsed -e "$lo2o"`

	  $show "$install_prog $staticobj $staticdest"
	  $run eval "$install_prog \$staticobj \$staticdest" || exit $?
	fi
	exit $EXIT_SUCCESS
	;;

      *)
	# Figure out destination file name, if it wasn't already specified.
	if test -n "$destname"; then
	  destfile="$destdir/$destname"
	else
	  destfile=`$echo "X$file" | $Xsed -e 's%^.*/%%'`
	  destfile="$destdir/$destfile"
	fi

	# If the file is missing, and there is a .exe on the end, strip it
	# because it is most likely a libtool script we actually want to
	# install
	stripped_ext=""
	case $file in
	  *.exe)
	    if test ! -f "$file"; then
	      file=`$echo $file|${SED} 's,.exe$,,'`
	      stripped_ext=".exe"
	    fi
	    ;;
	esac

	# Do a test to see if this is really a libtool program.
	case $host in
	*cygwin*|*mingw*)
	    wrapper=`$echo $file | ${SED} -e 's,.exe$,,'`
	    ;;
	*)
	    wrapper=$file
	    ;;
	esac
	if (${SED} -e '4q' $wrapper | grep "^# Generated by .*$PACKAGE")>/dev/null 2>&1; then
	  notinst_deplibs=
	  relink_command=

	  # Note that it is not necessary on cygwin/mingw to append a dot to
	  # foo even if both foo and FILE.exe exist: automatic-append-.exe
	  # behavior happens only for exec(3), not for open(2)!  Also, sourcing
	  # `FILE.' does not work on cygwin managed mounts.
	  #
	  # If there is no directory component, then add one.
	  case $wrapper in
	  */* | *\\*) . ${wrapper} ;;
	  *) . ./${wrapper} ;;
	  esac

	  # Check the variables that should have been set.
	  if test -z "$notinst_deplibs"; then
	    $echo "$modename: invalid libtool wrapper script \`$wrapper'" 1>&2
	    exit $EXIT_FAILURE
	  fi

	  finalize=yes
	  for lib in $notinst_deplibs; do
	    # Check to see that each library is installed.
	    libdir=
	    if test -f "$lib"; then
	      # If there is no directory component, then add one.
	      case $lib in
	      */* | *\\*) . $lib ;;
	      *) . ./$lib ;;
	      esac
	    fi
	    libfile="$libdir/"`$echo "X$lib" | $Xsed -e 's%^.*/%%g'` ### testsuite: skip nested quoting test
	    if test -n "$libdir" && test ! -f "$libfile"; then
	      $echo "$modename: warning: \`$lib' has not been installed in \`$libdir'" 1>&2
	      finalize=no
	    fi
	  done

	  relink_command=
	  # Note that it is not necessary on cygwin/mingw to append a dot to
	  # foo even if both foo and FILE.exe exist: automatic-append-.exe
	  # behavior happens only for exec(3), not for open(2)!  Also, sourcing
	  # `FILE.' does not work on cygwin managed mounts.
	  #
	  # If there is no directory component, then add one.
	  case $wrapper in
	  */* | *\\*) . ${wrapper} ;;
	  *) . ./${wrapper} ;;
	  esac

	  outputname=
	  if test "$fast_install" = no && test -n "$relink_command"; then
	    if test "$finalize" = yes && test -z "$run"; then
	      tmpdir=`func_mktempdir`
	      file=`$echo "X$file$stripped_ext" | $Xsed -e 's%^.*/%%'`
	      outputname="$tmpdir/$file"
	      # Replace the output file specification.
	      relink_command=`$echo "X$relink_command" | $Xsed -e 's%@OUTPUT@%'"$outputname"'%g'`

	      $show "$relink_command"
	      if $run eval "$relink_command"; then :
	      else
		$echo "$modename: error: relink \`$file' with the above command before installing it" 1>&2
		${rm}r "$tmpdir"
		continue
	      fi
	      file="$outputname"
	    else
	      $echo "$modename: warning: cannot relink \`$file'" 1>&2
	    fi
	  else
	    # Install the binary that we compiled earlier.
	    file=`$echo "X$file$stripped_ext" | $Xsed -e "s%\([^/]*\)$%$objdir/\1%"`
	  fi
	fi

	# remove .exe since cygwin /usr/bin/install will append another
	# one anyway 
	case $install_prog,$host in
	*/usr/bin/install*,*cygwin*)
	  case $file:$destfile in
	  *.exe:*.exe)
	    # this is ok
	    ;;
	  *.exe:*)
	    destfile=$destfile.exe
	    ;;
	  *:*.exe)
	    destfile=`$echo $destfile | ${SED} -e 's,.exe$,,'`
	    ;;
	  esac
	  ;;
	esac
	$show "$install_prog$stripme $file $destfile"
	$run eval "$install_prog\$stripme \$file \$destfile" || exit $?
	test -n "$outputname" && ${rm}r "$tmpdir"
	;;
      esac
    done

    for file in $staticlibs; do
      name=`$echo "X$file" | $Xsed -e 's%^.*/%%'`

      # Set up the ranlib parameters.
      oldlib="$destdir/$name"

      $show "$install_prog $file $oldlib"
      $run eval "$install_prog \$file \$oldlib" || exit $?

      if test -n "$stripme" && test -n "$old_striplib"; then
	$show "$old_striplib $oldlib"
	$run eval "$old_striplib $oldlib" || exit $?
      fi

      # Do each command in the postinstall commands.
      cmds=$old_postinstall_cmds
      save_ifs="$IFS"; IFS='~'
      for cmd in $cmds; do
	IFS="$save_ifs"
	eval cmd=\"$cmd\"
	$show "$cmd"
	$run eval "$cmd" || exit $?
      done
      IFS="$save_ifs"
    done

    if test -n "$future_libdirs"; then
      $echo "$modename: warning: remember to run \`$progname --finish$future_libdirs'" 1>&2
    fi

    if test -n "$current_libdirs"; then
      # Maybe just do a dry run.
      test -n "$run" && current_libdirs=" -n$current_libdirs"
      exec_cmd='$SHELL $progpath $preserve_args --finish$current_libdirs'
    else
      exit $EXIT_SUCCESS
    fi
    ;;

  # libtool finish mode
  finish)
    modename="$modename: finish"
    libdirs="$nonopt"
    admincmds=

    if test -n "$finish_cmds$finish_eval" && test -n "$libdirs"; then
      for dir
      do
	libdirs="$libdirs $dir"
      done

      for libdir in $libdirs; do
	if test -n "$finish_cmds"; then
	  # Do each command in the finish commands.
	  cmds=$finish_cmds
	  save_ifs="$IFS"; IFS='~'
	  for cmd in $cmds; do
	    IFS="$save_ifs"
	    eval cmd=\"$cmd\"
	    $show "$cmd"
	    $run eval "$cmd" || admincmds="$admincmds
       $cmd"
	  done
	  IFS="$save_ifs"
	fi
	if test -n "$finish_eval"; then
	  # Do the single finish_eval.
	  eval cmds=\"$finish_eval\"
	  $run eval "$cmds" || admincmds="$admincmds
       $cmds"
	fi
      done
    fi

    # Exit here if they wanted silent mode.
    test "$show" = : && exit $EXIT_SUCCESS

    $echo "X----------------------------------------------------------------------" | $Xsed
    $echo "Libraries have been installed in:"
    for libdir in $libdirs; do
      $echo "   $libdir"
    done
    $echo
    $echo "If you ever happen to want to link against installed libraries"
    $echo "in a given directory, LIBDIR, you must either use libtool, and"
    $echo "specify the full pathname of the library, or use the \`-LLIBDIR'"
    $echo "flag during linking and do at least one of the following:"
    if test -n "$shlibpath_var"; then
      $echo "   - add LIBDIR to the \`$shlibpath_var' environment variable"
      $echo "     during execution"
    fi
    if test -n "$runpath_var"; then
      $echo "   - add LIBDIR to the \`$runpath_var' environment variable"
      $echo "     during linking"
    fi
    if test -n "$hardcode_libdir_flag_spec"; then
      libdir=LIBDIR
      eval flag=\"$hardcode_libdir_flag_spec\"

      $echo "   - use the \`$flag' linker flag"
    fi
    if test -n "$admincmds"; then
      $echo "   - have your system administrator run these commands:$admincmds"
    fi
    if test -f /etc/ld.so.conf; then
      $echo "   - have your system administrator add LIBDIR to \`/etc/ld.so.conf'"
    fi
    $echo
    $echo "See any operating system documentation about shared libraries for"
    $echo "more information, such as the ld(1) and ld.so(8) manual pages."
    $echo "X----------------------------------------------------------------------" | $Xsed
    exit $EXIT_SUCCESS
    ;;

  # libtool execute mode
  execute)
    modename="$modename: execute"

    # The first argument is the command name.
    cmd="$nonopt"
    if test -z "$cmd"; then
      $echo "$modename: you must specify a COMMAND" 1>&2
      $echo "$help"
      exit $EXIT_FAILURE
    fi

    # Handle -dlopen flags immediately.
    for file in $execute_dlfiles; do
      if test ! -f "$file"; then
	$echo "$modename: \`$file' is not a file" 1>&2
	$echo "$help" 1>&2
	exit $EXIT_FAILURE
      fi

      dir=
      case $file in
      *.la)
	# Check to see that this really is a libtool archive.
	if (${SED} -e '2q' $file | grep "^# Generated by .*$PACKAGE") >/dev/null 2>&1; then :
	else
	  $echo "$modename: \`$lib' is not a valid libtool archive" 1>&2
	  $echo "$help" 1>&2
	  exit $EXIT_FAILURE
	fi

	# Read the libtool library.
	dlname=
	library_names=

	# If there is no directory component, then add one.
	case $file in
	*/* | *\\*) . $file ;;
	*) . ./$file ;;
	esac

	# Skip this library if it cannot be dlopened.
	if test -z "$dlname"; then
	  # Warn if it was a shared library.
	  test -n "$library_names" && $echo "$modename: warning: \`$file' was not linked with \`-export-dynamic'"
	  continue
	fi

	dir=`$echo "X$file" | $Xsed -e 's%/[^/]*$%%'`
	test "X$dir" = "X$file" && dir=.

	if test -f "$dir/$objdir/$dlname"; then
	  dir="$dir/$objdir"
	else
	  $echo "$modename: cannot find \`$dlname' in \`$dir' or \`$dir/$objdir'" 1>&2
	  exit $EXIT_FAILURE
	fi
	;;

      *.lo)
	# Just add the directory containing the .lo file.
	dir=`$echo "X$file" | $Xsed -e 's%/[^/]*$%%'`
	test "X$dir" = "X$file" && dir=.
	;;

      *)
	$echo "$modename: warning \`-dlopen' is ignored for non-libtool libraries and objects" 1>&2
	continue
	;;
      esac

      # Get the absolute pathname.
      absdir=`cd "$dir" && pwd`
      test -n "$absdir" && dir="$absdir"

      # Now add the directory to shlibpath_var.
      if eval "test -z \"\$$shlibpath_var\""; then
	eval "$shlibpath_var=\"\$dir\""
      else
	eval "$shlibpath_var=\"\$dir:\$$shlibpath_var\""
      fi
    done

    # This variable tells wrapper scripts just to set shlibpath_var
    # rather than running their programs.
    libtool_execute_magic="$magic"

    # Check if any of the arguments is a wrapper script.
    args=
    for file
    do
      case $file in
      -*) ;;
      *)
	# Do a test to see if this is really a libtool program.
	if (${SED} -e '4q' $file | grep "^# Generated by .*$PACKAGE") >/dev/null 2>&1; then
	  # If there is no directory component, then add one.
	  case $file in
	  */* | *\\*) . $file ;;
	  *) . ./$file ;;
	  esac

	  # Transform arg to wrapped name.
	  file="$progdir/$program"
	fi
	;;
      esac
      # Quote arguments (to preserve shell metacharacters).
      file=`$echo "X$file" | $Xsed -e "$sed_quote_subst"`
      args="$args \"$file\""
    done

    if test -z "$run"; then
      if test -n "$shlibpath_var"; then
	# Export the shlibpath_var.
	eval "export $shlibpath_var"
      fi

      # Restore saved environment variables
      if test "${save_LC_ALL+set}" = set; then
	LC_ALL="$save_LC_ALL"; export LC_ALL
      fi
      if test "${save_LANG+set}" = set; then
	LANG="$save_LANG"; export LANG
      fi

      # Now prepare to actually exec the command.
      exec_cmd="\$cmd$args"
    else
      # Display what would be done.
      if test -n "$shlibpath_var"; then
	eval "\$echo \"\$shlibpath_var=\$$shlibpath_var\""
	$echo "export $shlibpath_var"
      fi
      $echo "$cmd$args"
      exit $EXIT_SUCCESS
    fi
    ;;

  # libtool clean and uninstall mode
  clean | uninstall)
    modename="$modename: $mode"
    rm="$nonopt"
    files=
    rmforce=
    exit_status=0

    # This variable tells wrapper scripts just to set variables rather
    # than running their programs.
    libtool_install_magic="$magic"

    for arg
    do
      case $arg in
      -f) rm="$rm $arg"; rmforce=yes ;;
      -*) rm="$rm $arg" ;;
      *) files="$files $arg" ;;
      esac
    done

    if test -z "$rm"; then
      $echo "$modename: you must specify an RM program" 1>&2
      $echo "$help" 1>&2
      exit $EXIT_FAILURE
    fi

    rmdirs=

    origobjdir="$objdir"
    for file in $files; do
      dir=`$echo "X$file" | $Xsed -e 's%/[^/]*$%%'`
      if test "X$dir" = "X$file"; then
	dir=.
	objdir="$origobjdir"
      else
	objdir="$dir/$origobjdir"
      fi
      name=`$echo "X$file" | $Xsed -e 's%^.*/%%'`
      test "$mode" = uninstall && objdir="$dir"

      # Remember objdir for removal later, being careful to avoid duplicates
      if test "$mode" = clean; then
	case " $rmdirs " in
	  *" $objdir "*) ;;
	  *) rmdirs="$rmdirs $objdir" ;;
	esac
      fi

      # Don't error if the file doesn't exist and rm -f was used.
      if (test -L "$file") >/dev/null 2>&1 \
	|| (test -h "$file") >/dev/null 2>&1 \
	|| test -f "$file"; then
	:
      elif test -d "$file"; then
	exit_status=1
	continue
      elif test "$rmforce" = yes; then
	continue
      fi

      rmfiles="$file"

      case $name in
      *.la)
	# Possibly a libtool archive, so verify it.
	if (${SED} -e '2q' $file | grep "^# Generated by .*$PACKAGE") >/dev/null 2>&1; then
	  . $dir/$name

	  # Delete the libtool libraries and symlinks.
	  for n in $library_names; do
	    rmfiles="$rmfiles $objdir/$n"
	  done
	  test -n "$old_library" && rmfiles="$rmfiles $objdir/$old_library"

	  case "$mode" in
	  clean)
	    case "  $library_names " in
	    # "  " in the beginning catches empty $dlname
	    *" $dlname "*) ;;
	    *) rmfiles="$rmfiles $objdir/$dlname" ;;
	    esac
	     test -n "$libdir" && rmfiles="$rmfiles $objdir/$name $objdir/${name}i"
	    ;;
	  uninstall)
	    if test -n "$library_names"; then
	      # Do each command in the postuninstall commands.
	      cmds=$postuninstall_cmds
	      save_ifs="$IFS"; IFS='~'
	      for cmd in $cmds; do
		IFS="$save_ifs"
		eval cmd=\"$cmd\"
		$show "$cmd"
		$run eval "$cmd"
		if test "$?" -ne 0 && test "$rmforce" != yes; then
		  exit_status=1
		fi
	      done
	      IFS="$save_ifs"
	    fi

	    if test -n "$old_library"; then
	      # Do each command in the old_postuninstall commands.
	      cmds=$old_postuninstall_cmds
	      save_ifs="$IFS"; IFS='~'
	      for cmd in $cmds; do
		IFS="$save_ifs"
		eval cmd=\"$cmd\"
		$show "$cmd"
		$run eval "$cmd"
		if test "$?" -ne 0 && test "$rmforce" != yes; then
		  exit_status=1
		fi
	      done
	      IFS="$save_ifs"
	    fi
	    # FIXME: should reinstall the best remaining shared library.
	    ;;
	  esac
	fi
	;;

      *.lo)
	# Possibly a libtool object, so verify it.
	if (${SED} -e '2q' $file | grep "^# Generated by .*$PACKAGE") >/dev/null 2>&1; then

	  # Read the .lo file
	  . $dir/$name

	  # Add PIC object to the list of files to remove.
	  if test -n "$pic_object" \
	     && test "$pic_object" != none; then
	    rmfiles="$rmfiles $dir/$pic_object"
	  fi

	  # Add non-PIC object to the list of files to remove.
	  if test -n "$non_pic_object" \
	     && test "$non_pic_object" != none; then
	    rmfiles="$rmfiles $dir/$non_pic_object"
	  fi
	fi
	;;

      *)
	if test "$mode" = clean ; then
	  noexename=$name
	  case $file in
	  *.exe)
	    file=`$echo $file|${SED} 's,.exe$,,'`
	    noexename=`$echo $name|${SED} 's,.exe$,,'`
	    # $file with .exe has already been added to rmfiles,
	    # add $file without .exe
	    rmfiles="$rmfiles $file"
	    ;;
	  esac
	  # Do a test to see if this is a libtool program.
	  if (${SED} -e '4q' $file | grep "^# Generated by .*$PACKAGE") >/dev/null 2>&1; then
	    relink_command=
	    . $dir/$noexename

	    # note $name still contains .exe if it was in $file originally
	    # as does the version of $file that was added into $rmfiles
	    rmfiles="$rmfiles $objdir/$name $objdir/${name}S.${objext}"
	    if test "$fast_install" = yes && test -n "$relink_command"; then
	      rmfiles="$rmfiles $objdir/lt-$name"
	    fi
	    if test "X$noexename" != "X$name" ; then
	      rmfiles="$rmfiles $objdir/lt-${noexename}.c"
	    fi
	  fi
	fi
	;;
      esac
      $show "$rm $rmfiles"
      $run $rm $rmfiles || exit_status=1
    done
    objdir="$origobjdir"

    # Try to remove the ${objdir}s in the directories where we deleted files
    for dir in $rmdirs; do
      if test -d "$dir"; then
	$show "rmdir $dir"
	$run rmdir $dir >/dev/null 2>&1
      fi
    done

    exit $exit_status
    ;;

  "")
    $echo "$modename: you must specify a MODE" 1>&2
    $echo "$generic_help" 1>&2
    exit $EXIT_FAILURE
    ;;
  esac

  if test -z "$exec_cmd"; then
    $echo "$modename: invalid operation mode \`$mode'" 1>&2
    $echo "$generic_help" 1>&2
    exit $EXIT_FAILURE
  fi
fi # test -z "$show_help"

if test -n "$exec_cmd"; then
  eval exec $exec_cmd
  exit $EXIT_FAILURE
fi

# We need to display help for each of the modes.
case $mode in
"") $echo \
"Usage: $modename [OPTION]... [MODE-ARG]...

Provide generalized library-building support services.

    --config          show all configuration variables
    --debug           enable verbose shell tracing
-n, --dry-run         display commands without modifying any files
    --features        display basic configuration information and exit
    --finish          same as \`--mode=finish'
    --help            display this help message and exit
    --mode=MODE       use operation mode MODE [default=inferred from MODE-ARGS]
    --quiet           same as \`--silent'
    --silent          don't print informational messages
    --tag=TAG         use configuration variables from tag TAG
    --version         print version information

MODE must be one of the following:

      clean           remove files from the build directory
      compile         compile a source file into a libtool object
      execute         automatically set library path, then run a program
      finish          complete the installation of libtool libraries
      install         install libraries or executables
      link            create a library or an executable
      uninstall       remove libraries from an installed directory

MODE-ARGS vary depending on the MODE.  Try \`$modename --help --mode=MODE' for
a more detailed description of MODE.

Report bugs to <bug-libtool@gnu.org>."
  exit $EXIT_SUCCESS
  ;;

clean)
  $echo \
"Usage: $modename [OPTION]... --mode=clean RM [RM-OPTION]... FILE...

Remove files from the build directory.

RM is the name of the program to use to delete files associated with each FILE
(typically \`/bin/rm').  RM-OPTIONS are options (such as \`-f') to be passed
to RM.

If FILE is a libtool library, object or program, all the files associated
with it are deleted. Otherwise, only FILE itself is deleted using RM."
  ;;

compile)
  $echo \
"Usage: $modename [OPTION]... --mode=compile COMPILE-COMMAND... SOURCEFILE

Compile a source file into a libtool library object.

This mode accepts the following additional options:

  -o OUTPUT-FILE    set the output file name to OUTPUT-FILE
  -prefer-pic       try to building PIC objects only
  -prefer-non-pic   try to building non-PIC objects only
  -static           always build a \`.o' file suitable for static linking

COMPILE-COMMAND is a command to be used in creating a \`standard' object file
from the given SOURCEFILE.

The output file name is determined by removing the directory component from
SOURCEFILE, then substituting the C source code suffix \`.c' with the
library object suffix, \`.lo'."
  ;;

execute)
  $echo \
"Usage: $modename [OPTION]... --mode=execute COMMAND [ARGS]...

Automatically set library path, then run a program.

This mode accepts the following additional options:

  -dlopen FILE      add the directory containing FILE to the library path

This mode sets the library path environment variable according to \`-dlopen'
flags.

If any of the ARGS are libtool executable wrappers, then they are translated
into their corresponding uninstalled binary, and any of their required library
directories are added to the library path.

Then, COMMAND is executed, with ARGS as arguments."
  ;;

finish)
  $echo \
"Usage: $modename [OPTION]... --mode=finish [LIBDIR]...

Complete the installation of libtool libraries.

Each LIBDIR is a directory that contains libtool libraries.

The commands that this mode executes may require superuser privileges.  Use
the \`--dry-run' option if you just want to see what would be executed."
  ;;

install)
  $echo \
"Usage: $modename [OPTION]... --mode=install INSTALL-COMMAND...

Install executables or libraries.

INSTALL-COMMAND is the installation command.  The first component should be
either the \`install' or \`cp' program.

The rest of the components are interpreted as arguments to that command (only
BSD-compatible install options are recognized)."
  ;;

link)
  $echo \
"Usage: $modename [OPTION]... --mode=link LINK-COMMAND...

Link object files or libraries together to form another library, or to
create an executable program.

LINK-COMMAND is a command using the C compiler that you would use to create
a program from several object files.

The following components of LINK-COMMAND are treated specially:

  -all-static       do not do any dynamic linking at all
  -avoid-version    do not add a version suffix if possible
  -dlopen FILE      \`-dlpreopen' FILE if it cannot be dlopened at runtime
  -dlpreopen FILE   link in FILE and add its symbols to lt_preloaded_symbols
  -export-dynamic   allow symbols from OUTPUT-FILE to be resolved with dlsym(3)
  -export-symbols SYMFILE
		    try to export only the symbols listed in SYMFILE
  -export-symbols-regex REGEX
		    try to export only the symbols matching REGEX
  -LLIBDIR          search LIBDIR for required installed libraries
  -lNAME            OUTPUT-FILE requires the installed library libNAME
  -module           build a library that can dlopened
  -no-fast-install  disable the fast-install mode
  -no-install       link a not-installable executable
  -no-undefined     declare that a library does not refer to external symbols
  -o OUTPUT-FILE    create OUTPUT-FILE from the specified objects
  -objectlist FILE  Use a list of object files found in FILE to specify objects
  -precious-files-regex REGEX
                    don't remove output files matching REGEX
  -release RELEASE  specify package release information
  -rpath LIBDIR     the created library will eventually be installed in LIBDIR
  -R[ ]LIBDIR       add LIBDIR to the runtime path of programs and libraries
  -static           do not do any dynamic linking of libtool libraries
  -version-info CURRENT[:REVISION[:AGE]]
		    specify library version info [each variable defaults to 0]

All other options (arguments beginning with \`-') are ignored.

Every other argument is treated as a filename.  Files ending in \`.la' are
treated as uninstalled libtool libraries, other files are standard or library
object files.

If the OUTPUT-FILE ends in \`.la', then a libtool library is created,
only library objects (\`.lo' files) may be specified, and \`-rpath' is
required, except when creating a convenience library.

If OUTPUT-FILE ends in \`.a' or \`.lib', then a standard library is created
using \`ar' and \`ranlib', or on Windows using \`lib'.

If OUTPUT-FILE ends in \`.lo' or \`.${objext}', then a reloadable object file
is created, otherwise an executable program is created."
  ;;

uninstall)
  $echo \
"Usage: $modename [OPTION]... --mode=uninstall RM [RM-OPTION]... FILE...

Remove libraries from an installation directory.

RM is the name of the program to use to delete files associated with each FILE
(typically \`/bin/rm').  RM-OPTIONS are options (such as \`-f') to be passed
to RM.

If FILE is a libtool library, all the files associated with it are deleted.
Otherwise, only FILE itself is deleted using RM."
  ;;

*)
  $echo "$modename: invalid operation mode \`$mode'" 1>&2
  $echo "$help" 1>&2
  exit $EXIT_FAILURE
  ;;
esac

$echo
$echo "Try \`$modename --help' for more information about other modes."

exit $?

# The TAGs below are defined such that we never get into a situation
# in which we disable both kinds of libraries.  Given conflicting
# choices, we go for a static library, that is the most portable,
# since we can't tell whether shared libraries were disabled because
# the user asked for that or because the platform doesn't support
# them.  This is particularly important on AIX, because we don't
# support having both static and shared libraries enabled at the same
# time on that platform, so we default to a shared-only configuration.
# If a disable-shared tag is given, we'll fallback to a static-only
# configuration.  But we'll never go from static-only to shared-only.

# ### BEGIN LIBTOOL TAG CONFIG: disable-shared
disable_libs=shared
# ### END LIBTOOL TAG CONFIG: disable-shared

# ### BEGIN LIBTOOL TAG CONFIG: disable-static
disable_libs=static
# ### END LIBTOOL TAG CONFIG: disable-static

# Local Variables:
# mode:shell-script
# sh-indentation:2
# End:
������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/config/install-sh���������������������������������������������������������������������0000775�0000771�0001750�00000014253�11166010112�013273� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/bin/sh
#
# install - install a program, script, or datafile
# This comes from X11R5 (mit/util/scripts/install.sh).
#
# Copyright 1991 by the Massachusetts Institute of Technology
#
# Permission to use, copy, modify, distribute, and sell this software and its
# documentation for any purpose is hereby granted without fee, provided that
# the above copyright notice appear in all copies and that both that
# copyright notice and this permission notice appear in supporting
# documentation, and that the name of M.I.T. not be used in advertising or
# publicity pertaining to distribution of the software without specific,
# written prior permission.  M.I.T. makes no representations about the
# suitability of this software for any purpose.  It is provided "as is"
# without express or implied warranty.
#
# Calling this script install-sh is preferred over install.sh, to prevent
# `make' implicit rules from creating a file called install from it
# when there is no Makefile.
#
# This script is compatible with the BSD install script, but was written
# from scratch.  It can only install one file at a time, a restriction
# shared with many OS's install programs.


# set DOITPROG to echo to test this script

# Don't use :- since 4.3BSD and earlier shells don't like it.
doit="${DOITPROG-}"


# put in absolute paths if you don't have them in your path; or use env. vars.

mvprog="${MVPROG-mv}"
cpprog="${CPPROG-cp}"
chmodprog="${CHMODPROG-chmod}"
chownprog="${CHOWNPROG-chown}"
chgrpprog="${CHGRPPROG-chgrp}"
stripprog="${STRIPPROG-strip}"
rmprog="${RMPROG-rm}"
mkdirprog="${MKDIRPROG-mkdir}"

transformbasename=""
transform_arg=""
instcmd="$mvprog"
chmodcmd="$chmodprog 0755"
chowncmd=""
chgrpcmd=""
stripcmd=""
rmcmd="$rmprog -f"
mvcmd="$mvprog"
src=""
dst=""
dir_arg=""

while [ x"$1" != x ]; do
    case $1 in
	-c) instcmd=$cpprog
	    shift
	    continue;;

	-d) dir_arg=true
	    shift
	    continue;;

	-m) chmodcmd="$chmodprog $2"
	    shift
	    shift
	    continue;;

	-o) chowncmd="$chownprog $2"
	    shift
	    shift
	    continue;;

	-g) chgrpcmd="$chgrpprog $2"
	    shift
	    shift
	    continue;;

	-s) stripcmd=$stripprog
	    shift
	    continue;;

	-t=*) transformarg=`echo $1 | sed 's/-t=//'`
	    shift
	    continue;;

	-b=*) transformbasename=`echo $1 | sed 's/-b=//'`
	    shift
	    continue;;

	*)  if [ x"$src" = x ]
	    then
		src=$1
	    else
		# this colon is to work around a 386BSD /bin/sh bug
		:
		dst=$1
	    fi
	    shift
	    continue;;
    esac
done

if [ x"$src" = x ]
then
	echo "$0: no input file specified" >&2
	exit 1
else
	:
fi

if [ x"$dir_arg" != x ]; then
	dst=$src
	src=""

	if [ -d "$dst" ]; then
		instcmd=:
		chmodcmd=""
	else
		instcmd=$mkdirprog
	fi
else

# Waiting for this to be detected by the "$instcmd $src $dsttmp" command
# might cause directories to be created, which would be especially bad
# if $src (and thus $dsttmp) contains '*'.

	if [ -f "$src" ] || [ -d "$src" ]
	then
		:
	else
		echo "$0: $src does not exist" >&2
		exit 1
	fi

	if [ x"$dst" = x ]
	then
		echo "$0: no destination specified" >&2
		exit 1
	else
		:
	fi

# If destination is a directory, append the input filename; if your system
# does not like double slashes in filenames, you may need to add some logic

	if [ -d "$dst" ]
	then
		dst=$dst/`basename "$src"`
	else
		:
	fi
fi

## this sed command emulates the dirname command
dstdir=`echo "$dst" | sed -e 's,[^/]*$,,;s,/$,,;s,^$,.,'`

# Make sure that the destination directory exists.
#  this part is taken from Noah Friedman's mkinstalldirs script

# Skip lots of stat calls in the usual case.
if [ ! -d "$dstdir" ]; then
defaultIFS='
	'
IFS="${IFS-$defaultIFS}"

oIFS=$IFS
# Some sh's can't handle IFS=/ for some reason.
IFS='%'
set - `echo "$dstdir" | sed -e 's@/@%@g' -e 's@^%@/@'`
IFS=$oIFS

pathcomp=''

while [ $# -ne 0 ] ; do
	pathcomp=$pathcomp$1
	shift

	if [ ! -d "$pathcomp" ] ;
        then
		$mkdirprog "$pathcomp"
	else
		:
	fi

	pathcomp=$pathcomp/
done
fi

if [ x"$dir_arg" != x ]
then
	$doit $instcmd "$dst" &&

	if [ x"$chowncmd" != x ]; then $doit $chowncmd "$dst"; else : ; fi &&
	if [ x"$chgrpcmd" != x ]; then $doit $chgrpcmd "$dst"; else : ; fi &&
	if [ x"$stripcmd" != x ]; then $doit $stripcmd "$dst"; else : ; fi &&
	if [ x"$chmodcmd" != x ]; then $doit $chmodcmd "$dst"; else : ; fi
else

# If we're going to rename the final executable, determine the name now.

	if [ x"$transformarg" = x ]
	then
		dstfile=`basename "$dst"`
	else
		dstfile=`basename "$dst" $transformbasename |
			sed $transformarg`$transformbasename
	fi

# don't allow the sed command to completely eliminate the filename

	if [ x"$dstfile" = x ]
	then
		dstfile=`basename "$dst"`
	else
		:
	fi

# Make a couple of temp file names in the proper directory.

	dsttmp=$dstdir/#inst.$$#
	rmtmp=$dstdir/#rm.$$#

# Trap to clean up temp files at exit.

	trap 'status=$?; rm -f "$dsttmp" "$rmtmp" && exit $status' 0
	trap '(exit $?); exit' 1 2 13 15

# Move or copy the file name to the temp name

	$doit $instcmd "$src" "$dsttmp" &&

# and set any options; do chmod last to preserve setuid bits

# If any of these fail, we abort the whole thing.  If we want to
# ignore errors from any of these, just make sure not to ignore
# errors from the above "$doit $instcmd $src $dsttmp" command.

	if [ x"$chowncmd" != x ]; then $doit $chowncmd "$dsttmp"; else :;fi &&
	if [ x"$chgrpcmd" != x ]; then $doit $chgrpcmd "$dsttmp"; else :;fi &&
	if [ x"$stripcmd" != x ]; then $doit $stripcmd "$dsttmp"; else :;fi &&
	if [ x"$chmodcmd" != x ]; then $doit $chmodcmd "$dsttmp"; else :;fi &&

# Now remove or move aside any old file at destination location.  We try this
# two ways since rm can't unlink itself on some systems and the destination
# file might be busy for other reasons.  In this case, the final cleanup
# might fail but the new file should still install successfully.

{
	if [ -f "$dstdir/$dstfile" ]
	then
		$doit $rmcmd -f "$dstdir/$dstfile" 2>/dev/null ||
		$doit $mvcmd -f "$dstdir/$dstfile" "$rmtmp" 2>/dev/null ||
		{
		  echo "$0: cannot unlink or rename $dstdir/$dstfile" >&2
		  (exit 1); exit
		}
	else
		:
	fi
} &&

# Now rename the file to the real destination.

	$doit $mvcmd "$dsttmp" "$dstdir/$dstfile"

fi &&

# The final little trick to "correctly" pass the exit status to the exit trap.

{
	(exit 0); exit
}
�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/config/config.sub���������������������������������������������������������������������0000775�0000771�0001750�00000075676�11166010112�013272� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#! /bin/sh
# Configuration validation subroutine script.
#   Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,
#   2000, 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc.

timestamp='2005-07-08'

# This file is (in principle) common to ALL GNU software.
# The presence of a machine in this file suggests that SOME GNU software
# can handle that machine.  It does not imply ALL GNU software can.
#
# This file is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA
# 02110-1301, USA.
#
# As a special exception to the GNU General Public License, if you
# distribute this file as part of a program that contains a
# configuration script generated by Autoconf, you may include it under
# the same distribution terms that you use for the rest of that program.


# Please send patches to <config-patches@gnu.org>.  Submit a context
# diff and a properly formatted ChangeLog entry.
#
# Configuration subroutine to validate and canonicalize a configuration type.
# Supply the specified configuration type as an argument.
# If it is invalid, we print an error message on stderr and exit with code 1.
# Otherwise, we print the canonical config type on stdout and succeed.

# This file is supposed to be the same for all GNU packages
# and recognize all the CPU types, system types and aliases
# that are meaningful with *any* GNU software.
# Each package is responsible for reporting which valid configurations
# it does not support.  The user should be able to distinguish
# a failure to support a valid configuration from a meaningless
# configuration.

# The goal of this file is to map all the various variations of a given
# machine specification into a single specification in the form:
#	CPU_TYPE-MANUFACTURER-OPERATING_SYSTEM
# or in some cases, the newer four-part form:
#	CPU_TYPE-MANUFACTURER-KERNEL-OPERATING_SYSTEM
# It is wrong to echo any other type of specification.

me=`echo "$0" | sed -e 's,.*/,,'`

usage="\
Usage: $0 [OPTION] CPU-MFR-OPSYS
       $0 [OPTION] ALIAS

Canonicalize a configuration name.

Operation modes:
  -h, --help         print this help, then exit
  -t, --time-stamp   print date of last modification, then exit
  -v, --version      print version number, then exit

Report bugs and patches to <config-patches@gnu.org>."

version="\
GNU config.sub ($timestamp)

Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005
Free Software Foundation, Inc.

This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE."

help="
Try \`$me --help' for more information."

# Parse command line
while test $# -gt 0 ; do
  case $1 in
    --time-stamp | --time* | -t )
       echo "$timestamp" ; exit ;;
    --version | -v )
       echo "$version" ; exit ;;
    --help | --h* | -h )
       echo "$usage"; exit ;;
    -- )     # Stop option processing
       shift; break ;;
    - )	# Use stdin as input.
       break ;;
    -* )
       echo "$me: invalid option $1$help"
       exit 1 ;;

    *local*)
       # First pass through any local machine types.
       echo $1
       exit ;;

    * )
       break ;;
  esac
done

case $# in
 0) echo "$me: missing argument$help" >&2
    exit 1;;
 1) ;;
 *) echo "$me: too many arguments$help" >&2
    exit 1;;
esac

# Separate what the user gave into CPU-COMPANY and OS or KERNEL-OS (if any).
# Here we must recognize all the valid KERNEL-OS combinations.
maybe_os=`echo $1 | sed 's/^\(.*\)-\([^-]*-[^-]*\)$/\2/'`
case $maybe_os in
  nto-qnx* | linux-gnu* | linux-dietlibc | linux-uclibc* | uclinux-uclibc* | uclinux-gnu* | \
  kfreebsd*-gnu* | knetbsd*-gnu* | netbsd*-gnu* | storm-chaos* | os2-emx* | rtmk-nova*)
    os=-$maybe_os
    basic_machine=`echo $1 | sed 's/^\(.*\)-\([^-]*-[^-]*\)$/\1/'`
    ;;
  *)
    basic_machine=`echo $1 | sed 's/-[^-]*$//'`
    if [ $basic_machine != $1 ]
    then os=`echo $1 | sed 's/.*-/-/'`
    else os=; fi
    ;;
esac

### Let's recognize common machines as not being operating systems so
### that things like config.sub decstation-3100 work.  We also
### recognize some manufacturers as not being operating systems, so we
### can provide default operating systems below.
case $os in
	-sun*os*)
		# Prevent following clause from handling this invalid input.
		;;
	-dec* | -mips* | -sequent* | -encore* | -pc532* | -sgi* | -sony* | \
	-att* | -7300* | -3300* | -delta* | -motorola* | -sun[234]* | \
	-unicom* | -ibm* | -next | -hp | -isi* | -apollo | -altos* | \
	-convergent* | -ncr* | -news | -32* | -3600* | -3100* | -hitachi* |\
	-c[123]* | -convex* | -sun | -crds | -omron* | -dg | -ultra | -tti* | \
	-harris | -dolphin | -highlevel | -gould | -cbm | -ns | -masscomp | \
	-apple | -axis | -knuth | -cray)
		os=
		basic_machine=$1
		;;
	-sim | -cisco | -oki | -wec | -winbond)
		os=
		basic_machine=$1
		;;
	-scout)
		;;
	-wrs)
		os=-vxworks
		basic_machine=$1
		;;
	-chorusos*)
		os=-chorusos
		basic_machine=$1
		;;
 	-chorusrdb)
 		os=-chorusrdb
		basic_machine=$1
 		;;
	-hiux*)
		os=-hiuxwe2
		;;
	-sco5)
		os=-sco3.2v5
		basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'`
		;;
	-sco4)
		os=-sco3.2v4
		basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'`
		;;
	-sco3.2.[4-9]*)
		os=`echo $os | sed -e 's/sco3.2./sco3.2v/'`
		basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'`
		;;
	-sco3.2v[4-9]*)
		# Don't forget version if it is 3.2v4 or newer.
		basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'`
		;;
	-sco*)
		os=-sco3.2v2
		basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'`
		;;
	-udk*)
		basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'`
		;;
	-isc)
		os=-isc2.2
		basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'`
		;;
	-clix*)
		basic_machine=clipper-intergraph
		;;
	-isc*)
		basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'`
		;;
	-lynx*)
		os=-lynxos
		;;
	-ptx*)
		basic_machine=`echo $1 | sed -e 's/86-.*/86-sequent/'`
		;;
	-windowsnt*)
		os=`echo $os | sed -e 's/windowsnt/winnt/'`
		;;
	-psos*)
		os=-psos
		;;
	-mint | -mint[0-9]*)
		basic_machine=m68k-atari
		os=-mint
		;;
esac

# Decode aliases for certain CPU-COMPANY combinations.
case $basic_machine in
	# Recognize the basic CPU types without company name.
	# Some are omitted here because they have special meanings below.
	1750a | 580 \
	| a29k \
	| alpha | alphaev[4-8] | alphaev56 | alphaev6[78] | alphapca5[67] \
	| alpha64 | alpha64ev[4-8] | alpha64ev56 | alpha64ev6[78] | alpha64pca5[67] \
	| am33_2.0 \
	| arc | arm | arm[bl]e | arme[lb] | armv[2345] | armv[345][lb] | avr \
	| bfin \
	| c4x | clipper \
	| d10v | d30v | dlx | dsp16xx \
	| fr30 | frv \
	| h8300 | h8500 | hppa | hppa1.[01] | hppa2.0 | hppa2.0[nw] | hppa64 \
	| i370 | i860 | i960 | ia64 \
	| ip2k | iq2000 \
	| m32r | m32rle | m68000 | m68k | m88k | maxq | mcore \
	| mips | mipsbe | mipseb | mipsel | mipsle \
	| mips16 \
	| mips64 | mips64el \
	| mips64vr | mips64vrel \
	| mips64orion | mips64orionel \
	| mips64vr4100 | mips64vr4100el \
	| mips64vr4300 | mips64vr4300el \
	| mips64vr5000 | mips64vr5000el \
	| mips64vr5900 | mips64vr5900el \
	| mipsisa32 | mipsisa32el \
	| mipsisa32r2 | mipsisa32r2el \
	| mipsisa64 | mipsisa64el \
	| mipsisa64r2 | mipsisa64r2el \
	| mipsisa64sb1 | mipsisa64sb1el \
	| mipsisa64sr71k | mipsisa64sr71kel \
	| mipstx39 | mipstx39el \
	| mn10200 | mn10300 \
	| ms1 \
	| msp430 \
	| ns16k | ns32k \
	| or32 \
	| pdp10 | pdp11 | pj | pjl \
	| powerpc | powerpc64 | powerpc64le | powerpcle | ppcbe \
	| pyramid \
	| sh | sh[1234] | sh[24]a | sh[23]e | sh[34]eb | shbe | shle | sh[1234]le | sh3ele \
	| sh64 | sh64le \
	| sparc | sparc64 | sparc64b | sparc86x | sparclet | sparclite \
	| sparcv8 | sparcv9 | sparcv9b \
	| strongarm \
	| tahoe | thumb | tic4x | tic80 | tron \
	| v850 | v850e \
	| we32k \
	| x86 | xscale | xscalee[bl] | xstormy16 | xtensa \
	| z8k)
		basic_machine=$basic_machine-unknown
		;;
	m32c)
		basic_machine=$basic_machine-unknown
		;;
	m6811 | m68hc11 | m6812 | m68hc12)
		# Motorola 68HC11/12.
		basic_machine=$basic_machine-unknown
		os=-none
		;;
	m88110 | m680[12346]0 | m683?2 | m68360 | m5200 | v70 | w65 | z8k)
		;;

	# We use `pc' rather than `unknown'
	# because (1) that's what they normally are, and
	# (2) the word "unknown" tends to confuse beginning users.
	i*86 | x86_64)
	  basic_machine=$basic_machine-pc
	  ;;
	# Object if more than one company name word.
	*-*-*)
		echo Invalid configuration \`$1\': machine \`$basic_machine\' not recognized 1>&2
		exit 1
		;;
	# Recognize the basic CPU types with company name.
	580-* \
	| a29k-* \
	| alpha-* | alphaev[4-8]-* | alphaev56-* | alphaev6[78]-* \
	| alpha64-* | alpha64ev[4-8]-* | alpha64ev56-* | alpha64ev6[78]-* \
	| alphapca5[67]-* | alpha64pca5[67]-* | arc-* \
	| arm-*  | armbe-* | armle-* | armeb-* | armv*-* \
	| avr-* \
	| bfin-* | bs2000-* \
	| c[123]* | c30-* | [cjt]90-* | c4x-* | c54x-* | c55x-* | c6x-* \
	| clipper-* | craynv-* | cydra-* \
	| d10v-* | d30v-* | dlx-* \
	| elxsi-* \
	| f30[01]-* | f700-* | fr30-* | frv-* | fx80-* \
	| h8300-* | h8500-* \
	| hppa-* | hppa1.[01]-* | hppa2.0-* | hppa2.0[nw]-* | hppa64-* \
	| i*86-* | i860-* | i960-* | ia64-* \
	| ip2k-* | iq2000-* \
	| m32r-* | m32rle-* \
	| m68000-* | m680[012346]0-* | m68360-* | m683?2-* | m68k-* \
	| m88110-* | m88k-* | maxq-* | mcore-* \
	| mips-* | mipsbe-* | mipseb-* | mipsel-* | mipsle-* \
	| mips16-* \
	| mips64-* | mips64el-* \
	| mips64vr-* | mips64vrel-* \
	| mips64orion-* | mips64orionel-* \
	| mips64vr4100-* | mips64vr4100el-* \
	| mips64vr4300-* | mips64vr4300el-* \
	| mips64vr5000-* | mips64vr5000el-* \
	| mips64vr5900-* | mips64vr5900el-* \
	| mipsisa32-* | mipsisa32el-* \
	| mipsisa32r2-* | mipsisa32r2el-* \
	| mipsisa64-* | mipsisa64el-* \
	| mipsisa64r2-* | mipsisa64r2el-* \
	| mipsisa64sb1-* | mipsisa64sb1el-* \
	| mipsisa64sr71k-* | mipsisa64sr71kel-* \
	| mipstx39-* | mipstx39el-* \
	| mmix-* \
	| ms1-* \
	| msp430-* \
	| none-* | np1-* | ns16k-* | ns32k-* \
	| orion-* \
	| pdp10-* | pdp11-* | pj-* | pjl-* | pn-* | power-* \
	| powerpc-* | powerpc64-* | powerpc64le-* | powerpcle-* | ppcbe-* \
	| pyramid-* \
	| romp-* | rs6000-* \
	| sh-* | sh[1234]-* | sh[24]a-* | sh[23]e-* | sh[34]eb-* | shbe-* \
	| shle-* | sh[1234]le-* | sh3ele-* | sh64-* | sh64le-* \
	| sparc-* | sparc64-* | sparc64b-* | sparc86x-* | sparclet-* \
	| sparclite-* \
	| sparcv8-* | sparcv9-* | sparcv9b-* | strongarm-* | sv1-* | sx?-* \
	| tahoe-* | thumb-* \
	| tic30-* | tic4x-* | tic54x-* | tic55x-* | tic6x-* | tic80-* \
	| tron-* \
	| v850-* | v850e-* | vax-* \
	| we32k-* \
	| x86-* | x86_64-* | xps100-* | xscale-* | xscalee[bl]-* \
	| xstormy16-* | xtensa-* \
	| ymp-* \
	| z8k-*)
		;;
	m32c-*)
		;;
	# Recognize the various machine names and aliases which stand
	# for a CPU type and a company and sometimes even an OS.
	386bsd)
		basic_machine=i386-unknown
		os=-bsd
		;;
	3b1 | 7300 | 7300-att | att-7300 | pc7300 | safari | unixpc)
		basic_machine=m68000-att
		;;
	3b*)
		basic_machine=we32k-att
		;;
	a29khif)
		basic_machine=a29k-amd
		os=-udi
		;;
    	abacus)
		basic_machine=abacus-unknown
		;;
	adobe68k)
		basic_machine=m68010-adobe
		os=-scout
		;;
	alliant | fx80)
		basic_machine=fx80-alliant
		;;
	altos | altos3068)
		basic_machine=m68k-altos
		;;
	am29k)
		basic_machine=a29k-none
		os=-bsd
		;;
	amd64)
		basic_machine=x86_64-pc
		;;
	amd64-*)
		basic_machine=x86_64-`echo $basic_machine | sed 's/^[^-]*-//'`
		;;
	amdahl)
		basic_machine=580-amdahl
		os=-sysv
		;;
	amiga | amiga-*)
		basic_machine=m68k-unknown
		;;
	amigaos | amigados)
		basic_machine=m68k-unknown
		os=-amigaos
		;;
	amigaunix | amix)
		basic_machine=m68k-unknown
		os=-sysv4
		;;
	apollo68)
		basic_machine=m68k-apollo
		os=-sysv
		;;
	apollo68bsd)
		basic_machine=m68k-apollo
		os=-bsd
		;;
	aux)
		basic_machine=m68k-apple
		os=-aux
		;;
	balance)
		basic_machine=ns32k-sequent
		os=-dynix
		;;
	c90)
		basic_machine=c90-cray
		os=-unicos
		;;
	convex-c1)
		basic_machine=c1-convex
		os=-bsd
		;;
	convex-c2)
		basic_machine=c2-convex
		os=-bsd
		;;
	convex-c32)
		basic_machine=c32-convex
		os=-bsd
		;;
	convex-c34)
		basic_machine=c34-convex
		os=-bsd
		;;
	convex-c38)
		basic_machine=c38-convex
		os=-bsd
		;;
	cray | j90)
		basic_machine=j90-cray
		os=-unicos
		;;
	craynv)
		basic_machine=craynv-cray
		os=-unicosmp
		;;
	cr16c)
		basic_machine=cr16c-unknown
		os=-elf
		;;
	crds | unos)
		basic_machine=m68k-crds
		;;
	crisv32 | crisv32-* | etraxfs*)
		basic_machine=crisv32-axis
		;;
	cris | cris-* | etrax*)
		basic_machine=cris-axis
		;;
	crx)
		basic_machine=crx-unknown
		os=-elf
		;;
	da30 | da30-*)
		basic_machine=m68k-da30
		;;
	decstation | decstation-3100 | pmax | pmax-* | pmin | dec3100 | decstatn)
		basic_machine=mips-dec
		;;
	decsystem10* | dec10*)
		basic_machine=pdp10-dec
		os=-tops10
		;;
	decsystem20* | dec20*)
		basic_machine=pdp10-dec
		os=-tops20
		;;
	delta | 3300 | motorola-3300 | motorola-delta \
	      | 3300-motorola | delta-motorola)
		basic_machine=m68k-motorola
		;;
	delta88)
		basic_machine=m88k-motorola
		os=-sysv3
		;;
	djgpp)
		basic_machine=i586-pc
		os=-msdosdjgpp
		;;
	dpx20 | dpx20-*)
		basic_machine=rs6000-bull
		os=-bosx
		;;
	dpx2* | dpx2*-bull)
		basic_machine=m68k-bull
		os=-sysv3
		;;
	ebmon29k)
		basic_machine=a29k-amd
		os=-ebmon
		;;
	elxsi)
		basic_machine=elxsi-elxsi
		os=-bsd
		;;
	encore | umax | mmax)
		basic_machine=ns32k-encore
		;;
	es1800 | OSE68k | ose68k | ose | OSE)
		basic_machine=m68k-ericsson
		os=-ose
		;;
	fx2800)
		basic_machine=i860-alliant
		;;
	genix)
		basic_machine=ns32k-ns
		;;
	gmicro)
		basic_machine=tron-gmicro
		os=-sysv
		;;
	go32)
		basic_machine=i386-pc
		os=-go32
		;;
	h3050r* | hiux*)
		basic_machine=hppa1.1-hitachi
		os=-hiuxwe2
		;;
	h8300hms)
		basic_machine=h8300-hitachi
		os=-hms
		;;
	h8300xray)
		basic_machine=h8300-hitachi
		os=-xray
		;;
	h8500hms)
		basic_machine=h8500-hitachi
		os=-hms
		;;
	harris)
		basic_machine=m88k-harris
		os=-sysv3
		;;
	hp300-*)
		basic_machine=m68k-hp
		;;
	hp300bsd)
		basic_machine=m68k-hp
		os=-bsd
		;;
	hp300hpux)
		basic_machine=m68k-hp
		os=-hpux
		;;
	hp3k9[0-9][0-9] | hp9[0-9][0-9])
		basic_machine=hppa1.0-hp
		;;
	hp9k2[0-9][0-9] | hp9k31[0-9])
		basic_machine=m68000-hp
		;;
	hp9k3[2-9][0-9])
		basic_machine=m68k-hp
		;;
	hp9k6[0-9][0-9] | hp6[0-9][0-9])
		basic_machine=hppa1.0-hp
		;;
	hp9k7[0-79][0-9] | hp7[0-79][0-9])
		basic_machine=hppa1.1-hp
		;;
	hp9k78[0-9] | hp78[0-9])
		# FIXME: really hppa2.0-hp
		basic_machine=hppa1.1-hp
		;;
	hp9k8[67]1 | hp8[67]1 | hp9k80[24] | hp80[24] | hp9k8[78]9 | hp8[78]9 | hp9k893 | hp893)
		# FIXME: really hppa2.0-hp
		basic_machine=hppa1.1-hp
		;;
	hp9k8[0-9][13679] | hp8[0-9][13679])
		basic_machine=hppa1.1-hp
		;;
	hp9k8[0-9][0-9] | hp8[0-9][0-9])
		basic_machine=hppa1.0-hp
		;;
	hppa-next)
		os=-nextstep3
		;;
	hppaosf)
		basic_machine=hppa1.1-hp
		os=-osf
		;;
	hppro)
		basic_machine=hppa1.1-hp
		os=-proelf
		;;
	i370-ibm* | ibm*)
		basic_machine=i370-ibm
		;;
# I'm not sure what "Sysv32" means.  Should this be sysv3.2?
	i*86v32)
		basic_machine=`echo $1 | sed -e 's/86.*/86-pc/'`
		os=-sysv32
		;;
	i*86v4*)
		basic_machine=`echo $1 | sed -e 's/86.*/86-pc/'`
		os=-sysv4
		;;
	i*86v)
		basic_machine=`echo $1 | sed -e 's/86.*/86-pc/'`
		os=-sysv
		;;
	i*86sol2)
		basic_machine=`echo $1 | sed -e 's/86.*/86-pc/'`
		os=-solaris2
		;;
	i386mach)
		basic_machine=i386-mach
		os=-mach
		;;
	i386-vsta | vsta)
		basic_machine=i386-unknown
		os=-vsta
		;;
	iris | iris4d)
		basic_machine=mips-sgi
		case $os in
		    -irix*)
			;;
		    *)
			os=-irix4
			;;
		esac
		;;
	isi68 | isi)
		basic_machine=m68k-isi
		os=-sysv
		;;
	m88k-omron*)
		basic_machine=m88k-omron
		;;
	magnum | m3230)
		basic_machine=mips-mips
		os=-sysv
		;;
	merlin)
		basic_machine=ns32k-utek
		os=-sysv
		;;
	mingw32)
		basic_machine=i386-pc
		os=-mingw32
		;;
	miniframe)
		basic_machine=m68000-convergent
		;;
	*mint | -mint[0-9]* | *MiNT | *MiNT[0-9]*)
		basic_machine=m68k-atari
		os=-mint
		;;
	mips3*-*)
		basic_machine=`echo $basic_machine | sed -e 's/mips3/mips64/'`
		;;
	mips3*)
		basic_machine=`echo $basic_machine | sed -e 's/mips3/mips64/'`-unknown
		;;
	monitor)
		basic_machine=m68k-rom68k
		os=-coff
		;;
	morphos)
		basic_machine=powerpc-unknown
		os=-morphos
		;;
	msdos)
		basic_machine=i386-pc
		os=-msdos
		;;
	mvs)
		basic_machine=i370-ibm
		os=-mvs
		;;
	ncr3000)
		basic_machine=i486-ncr
		os=-sysv4
		;;
	netbsd386)
		basic_machine=i386-unknown
		os=-netbsd
		;;
	netwinder)
		basic_machine=armv4l-rebel
		os=-linux
		;;
	news | news700 | news800 | news900)
		basic_machine=m68k-sony
		os=-newsos
		;;
	news1000)
		basic_machine=m68030-sony
		os=-newsos
		;;
	news-3600 | risc-news)
		basic_machine=mips-sony
		os=-newsos
		;;
	necv70)
		basic_machine=v70-nec
		os=-sysv
		;;
	next | m*-next )
		basic_machine=m68k-next
		case $os in
		    -nextstep* )
			;;
		    -ns2*)
		      os=-nextstep2
			;;
		    *)
		      os=-nextstep3
			;;
		esac
		;;
	nh3000)
		basic_machine=m68k-harris
		os=-cxux
		;;
	nh[45]000)
		basic_machine=m88k-harris
		os=-cxux
		;;
	nindy960)
		basic_machine=i960-intel
		os=-nindy
		;;
	mon960)
		basic_machine=i960-intel
		os=-mon960
		;;
	nonstopux)
		basic_machine=mips-compaq
		os=-nonstopux
		;;
	np1)
		basic_machine=np1-gould
		;;
	nsr-tandem)
		basic_machine=nsr-tandem
		;;
	op50n-* | op60c-*)
		basic_machine=hppa1.1-oki
		os=-proelf
		;;
	openrisc | openrisc-*)
		basic_machine=or32-unknown
		;;
	os400)
		basic_machine=powerpc-ibm
		os=-os400
		;;
	OSE68000 | ose68000)
		basic_machine=m68000-ericsson
		os=-ose
		;;
	os68k)
		basic_machine=m68k-none
		os=-os68k
		;;
	pa-hitachi)
		basic_machine=hppa1.1-hitachi
		os=-hiuxwe2
		;;
	paragon)
		basic_machine=i860-intel
		os=-osf
		;;
	pbd)
		basic_machine=sparc-tti
		;;
	pbb)
		basic_machine=m68k-tti
		;;
	pc532 | pc532-*)
		basic_machine=ns32k-pc532
		;;
	pentium | p5 | k5 | k6 | nexgen | viac3)
		basic_machine=i586-pc
		;;
	pentiumpro | p6 | 6x86 | athlon | athlon_*)
		basic_machine=i686-pc
		;;
	pentiumii | pentium2 | pentiumiii | pentium3)
		basic_machine=i686-pc
		;;
	pentium4)
		basic_machine=i786-pc
		;;
	pentium-* | p5-* | k5-* | k6-* | nexgen-* | viac3-*)
		basic_machine=i586-`echo $basic_machine | sed 's/^[^-]*-//'`
		;;
	pentiumpro-* | p6-* | 6x86-* | athlon-*)
		basic_machine=i686-`echo $basic_machine | sed 's/^[^-]*-//'`
		;;
	pentiumii-* | pentium2-* | pentiumiii-* | pentium3-*)
		basic_machine=i686-`echo $basic_machine | sed 's/^[^-]*-//'`
		;;
	pentium4-*)
		basic_machine=i786-`echo $basic_machine | sed 's/^[^-]*-//'`
		;;
	pn)
		basic_machine=pn-gould
		;;
	power)	basic_machine=power-ibm
		;;
	ppc)	basic_machine=powerpc-unknown
		;;
	ppc-*)	basic_machine=powerpc-`echo $basic_machine | sed 's/^[^-]*-//'`
		;;
	ppcle | powerpclittle | ppc-le | powerpc-little)
		basic_machine=powerpcle-unknown
		;;
	ppcle-* | powerpclittle-*)
		basic_machine=powerpcle-`echo $basic_machine | sed 's/^[^-]*-//'`
		;;
	ppc64)	basic_machine=powerpc64-unknown
		;;
	ppc64-*) basic_machine=powerpc64-`echo $basic_machine | sed 's/^[^-]*-//'`
		;;
	ppc64le | powerpc64little | ppc64-le | powerpc64-little)
		basic_machine=powerpc64le-unknown
		;;
	ppc64le-* | powerpc64little-*)
		basic_machine=powerpc64le-`echo $basic_machine | sed 's/^[^-]*-//'`
		;;
	ps2)
		basic_machine=i386-ibm
		;;
	pw32)
		basic_machine=i586-unknown
		os=-pw32
		;;
	rom68k)
		basic_machine=m68k-rom68k
		os=-coff
		;;
	rm[46]00)
		basic_machine=mips-siemens
		;;
	rtpc | rtpc-*)
		basic_machine=romp-ibm
		;;
	s390 | s390-*)
		basic_machine=s390-ibm
		;;
	s390x | s390x-*)
		basic_machine=s390x-ibm
		;;
	sa29200)
		basic_machine=a29k-amd
		os=-udi
		;;
	sb1)
		basic_machine=mipsisa64sb1-unknown
		;;
	sb1el)
		basic_machine=mipsisa64sb1el-unknown
		;;
	sei)
		basic_machine=mips-sei
		os=-seiux
		;;
	sequent)
		basic_machine=i386-sequent
		;;
	sh)
		basic_machine=sh-hitachi
		os=-hms
		;;
	sh64)
		basic_machine=sh64-unknown
		;;
	sparclite-wrs | simso-wrs)
		basic_machine=sparclite-wrs
		os=-vxworks
		;;
	sps7)
		basic_machine=m68k-bull
		os=-sysv2
		;;
	spur)
		basic_machine=spur-unknown
		;;
	st2000)
		basic_machine=m68k-tandem
		;;
	stratus)
		basic_machine=i860-stratus
		os=-sysv4
		;;
	sun2)
		basic_machine=m68000-sun
		;;
	sun2os3)
		basic_machine=m68000-sun
		os=-sunos3
		;;
	sun2os4)
		basic_machine=m68000-sun
		os=-sunos4
		;;
	sun3os3)
		basic_machine=m68k-sun
		os=-sunos3
		;;
	sun3os4)
		basic_machine=m68k-sun
		os=-sunos4
		;;
	sun4os3)
		basic_machine=sparc-sun
		os=-sunos3
		;;
	sun4os4)
		basic_machine=sparc-sun
		os=-sunos4
		;;
	sun4sol2)
		basic_machine=sparc-sun
		os=-solaris2
		;;
	sun3 | sun3-*)
		basic_machine=m68k-sun
		;;
	sun4)
		basic_machine=sparc-sun
		;;
	sun386 | sun386i | roadrunner)
		basic_machine=i386-sun
		;;
	sv1)
		basic_machine=sv1-cray
		os=-unicos
		;;
	symmetry)
		basic_machine=i386-sequent
		os=-dynix
		;;
	t3e)
		basic_machine=alphaev5-cray
		os=-unicos
		;;
	t90)
		basic_machine=t90-cray
		os=-unicos
		;;
	tic54x | c54x*)
		basic_machine=tic54x-unknown
		os=-coff
		;;
	tic55x | c55x*)
		basic_machine=tic55x-unknown
		os=-coff
		;;
	tic6x | c6x*)
		basic_machine=tic6x-unknown
		os=-coff
		;;
	tx39)
		basic_machine=mipstx39-unknown
		;;
	tx39el)
		basic_machine=mipstx39el-unknown
		;;
	toad1)
		basic_machine=pdp10-xkl
		os=-tops20
		;;
	tower | tower-32)
		basic_machine=m68k-ncr
		;;
	tpf)
		basic_machine=s390x-ibm
		os=-tpf
		;;
	udi29k)
		basic_machine=a29k-amd
		os=-udi
		;;
	ultra3)
		basic_machine=a29k-nyu
		os=-sym1
		;;
	v810 | necv810)
		basic_machine=v810-nec
		os=-none
		;;
	vaxv)
		basic_machine=vax-dec
		os=-sysv
		;;
	vms)
		basic_machine=vax-dec
		os=-vms
		;;
	vpp*|vx|vx-*)
		basic_machine=f301-fujitsu
		;;
	vxworks960)
		basic_machine=i960-wrs
		os=-vxworks
		;;
	vxworks68)
		basic_machine=m68k-wrs
		os=-vxworks
		;;
	vxworks29k)
		basic_machine=a29k-wrs
		os=-vxworks
		;;
	w65*)
		basic_machine=w65-wdc
		os=-none
		;;
	w89k-*)
		basic_machine=hppa1.1-winbond
		os=-proelf
		;;
	xbox)
		basic_machine=i686-pc
		os=-mingw32
		;;
	xps | xps100)
		basic_machine=xps100-honeywell
		;;
	ymp)
		basic_machine=ymp-cray
		os=-unicos
		;;
	z8k-*-coff)
		basic_machine=z8k-unknown
		os=-sim
		;;
	none)
		basic_machine=none-none
		os=-none
		;;

# Here we handle the default manufacturer of certain CPU types.  It is in
# some cases the only manufacturer, in others, it is the most popular.
	w89k)
		basic_machine=hppa1.1-winbond
		;;
	op50n)
		basic_machine=hppa1.1-oki
		;;
	op60c)
		basic_machine=hppa1.1-oki
		;;
	romp)
		basic_machine=romp-ibm
		;;
	mmix)
		basic_machine=mmix-knuth
		;;
	rs6000)
		basic_machine=rs6000-ibm
		;;
	vax)
		basic_machine=vax-dec
		;;
	pdp10)
		# there are many clones, so DEC is not a safe bet
		basic_machine=pdp10-unknown
		;;
	pdp11)
		basic_machine=pdp11-dec
		;;
	we32k)
		basic_machine=we32k-att
		;;
	sh[1234] | sh[24]a | sh[34]eb | sh[1234]le | sh[23]ele)
		basic_machine=sh-unknown
		;;
	sparc | sparcv8 | sparcv9 | sparcv9b)
		basic_machine=sparc-sun
		;;
	cydra)
		basic_machine=cydra-cydrome
		;;
	orion)
		basic_machine=orion-highlevel
		;;
	orion105)
		basic_machine=clipper-highlevel
		;;
	mac | mpw | mac-mpw)
		basic_machine=m68k-apple
		;;
	pmac | pmac-mpw)
		basic_machine=powerpc-apple
		;;
	*-unknown)
		# Make sure to match an already-canonicalized machine name.
		;;
	*)
		echo Invalid configuration \`$1\': machine \`$basic_machine\' not recognized 1>&2
		exit 1
		;;
esac

# Here we canonicalize certain aliases for manufacturers.
case $basic_machine in
	*-digital*)
		basic_machine=`echo $basic_machine | sed 's/digital.*/dec/'`
		;;
	*-commodore*)
		basic_machine=`echo $basic_machine | sed 's/commodore.*/cbm/'`
		;;
	*)
		;;
esac

# Decode manufacturer-specific aliases for certain operating systems.

if [ x"$os" != x"" ]
then
case $os in
        # First match some system type aliases
        # that might get confused with valid system types.
	# -solaris* is a basic system type, with this one exception.
	-solaris1 | -solaris1.*)
		os=`echo $os | sed -e 's|solaris1|sunos4|'`
		;;
	-solaris)
		os=-solaris2
		;;
	-svr4*)
		os=-sysv4
		;;
	-unixware*)
		os=-sysv4.2uw
		;;
	-gnu/linux*)
		os=`echo $os | sed -e 's|gnu/linux|linux-gnu|'`
		;;
	# First accept the basic system types.
	# The portable systems comes first.
	# Each alternative MUST END IN A *, to match a version number.
	# -sysv* is not here because it comes later, after sysvr4.
	-gnu* | -bsd* | -mach* | -minix* | -genix* | -ultrix* | -irix* \
	      | -*vms* | -sco* | -esix* | -isc* | -aix* | -sunos | -sunos[34]*\
	      | -hpux* | -unos* | -osf* | -luna* | -dgux* | -solaris* | -sym* \
	      | -amigaos* | -amigados* | -msdos* | -newsos* | -unicos* | -aof* \
	      | -aos* \
	      | -nindy* | -vxsim* | -vxworks* | -ebmon* | -hms* | -mvs* \
	      | -clix* | -riscos* | -uniplus* | -iris* | -rtu* | -xenix* \
	      | -hiux* | -386bsd* | -knetbsd* | -mirbsd* | -netbsd* | -openbsd* \
	      | -ekkobsd* | -kfreebsd* | -freebsd* | -riscix* | -lynxos* \
	      | -bosx* | -nextstep* | -cxux* | -aout* | -elf* | -oabi* \
	      | -ptx* | -coff* | -ecoff* | -winnt* | -domain* | -vsta* \
	      | -udi* | -eabi* | -lites* | -ieee* | -go32* | -aux* \
	      | -chorusos* | -chorusrdb* \
	      | -cygwin* | -pe* | -psos* | -moss* | -proelf* | -rtems* \
	      | -mingw32* | -linux* | -linux-uclibc* | -uxpv* | -beos* | -mpeix* | -udk* \
	      | -interix* | -uwin* | -mks* | -rhapsody* | -darwin* | -opened* \
	      | -openstep* | -oskit* | -conix* | -pw32* | -nonstopux* \
	      | -storm-chaos* | -tops10* | -tenex* | -tops20* | -its* \
	      | -os2* | -vos* | -palmos* | -uclinux* | -nucleus* \
	      | -morphos* | -superux* | -rtmk* | -rtmk-nova* | -windiss* \
	      | -powermax* | -dnix* | -nx6 | -nx7 | -sei* | -dragonfly* \
	      | -skyos* | -haiku*)
	# Remember, each alternative MUST END IN *, to match a version number.
		;;
	-qnx*)
		case $basic_machine in
		    x86-* | i*86-*)
			;;
		    *)
			os=-nto$os
			;;
		esac
		;;
	-nto-qnx*)
		;;
	-nto*)
		os=`echo $os | sed -e 's|nto|nto-qnx|'`
		;;
	-sim | -es1800* | -hms* | -xray | -os68k* | -none* | -v88r* \
	      | -windows* | -osx | -abug | -netware* | -os9* | -beos* | -haiku* \
	      | -macos* | -mpw* | -magic* | -mmixware* | -mon960* | -lnews*)
		;;
	-mac*)
		os=`echo $os | sed -e 's|mac|macos|'`
		;;
	-linux-dietlibc)
		os=-linux-dietlibc
		;;
	-sunos5*)
		os=`echo $os | sed -e 's|sunos5|solaris2|'`
		;;
	-sunos6*)
		os=`echo $os | sed -e 's|sunos6|solaris3|'`
		;;
	-opened*)
		os=-openedition
		;;
        -os400*)
		os=-os400
		;;
	-wince*)
		os=-wince
		;;
	-osfrose*)
		os=-osfrose
		;;
	-osf*)
		os=-osf
		;;
	-utek*)
		os=-bsd
		;;
	-dynix*)
		os=-bsd
		;;
	-acis*)
		os=-aos
		;;
	-atheos*)
		os=-atheos
		;;
	-syllable*)
		os=-syllable
		;;
	-386bsd)
		os=-bsd
		;;
	-ctix* | -uts*)
		os=-sysv
		;;
	-nova*)
		os=-rtmk-nova
		;;
	-ns2 )
		os=-nextstep2
		;;
	-nsk*)
		os=-nsk
		;;
	# Preserve the version number of sinix5.
	-sinix5.*)
		os=`echo $os | sed -e 's|sinix|sysv|'`
		;;
	-sinix*)
		os=-sysv4
		;;
        -tpf*)
		os=-tpf
		;;
	-triton*)
		os=-sysv3
		;;
	-oss*)
		os=-sysv3
		;;
	-svr4)
		os=-sysv4
		;;
	-svr3)
		os=-sysv3
		;;
	-sysvr4)
		os=-sysv4
		;;
	# This must come after -sysvr4.
	-sysv*)
		;;
	-ose*)
		os=-ose
		;;
	-es1800*)
		os=-ose
		;;
	-xenix)
		os=-xenix
		;;
	-*mint | -mint[0-9]* | -*MiNT | -MiNT[0-9]*)
		os=-mint
		;;
	-aros*)
		os=-aros
		;;
	-kaos*)
		os=-kaos
		;;
	-zvmoe)
		os=-zvmoe
		;;
	-none)
		;;
	*)
		# Get rid of the `-' at the beginning of $os.
		os=`echo $os | sed 's/[^-]*-//'`
		echo Invalid configuration \`$1\': system \`$os\' not recognized 1>&2
		exit 1
		;;
esac
else

# Here we handle the default operating systems that come with various machines.
# The value should be what the vendor currently ships out the door with their
# machine or put another way, the most popular os provided with the machine.

# Note that if you're going to try to match "-MANUFACTURER" here (say,
# "-sun"), then you have to tell the case statement up towards the top
# that MANUFACTURER isn't an operating system.  Otherwise, code above
# will signal an error saying that MANUFACTURER isn't an operating
# system, and we'll never get to this point.

case $basic_machine in
	*-acorn)
		os=-riscix1.2
		;;
	arm*-rebel)
		os=-linux
		;;
	arm*-semi)
		os=-aout
		;;
    c4x-* | tic4x-*)
        os=-coff
        ;;
	# This must come before the *-dec entry.
	pdp10-*)
		os=-tops20
		;;
	pdp11-*)
		os=-none
		;;
	*-dec | vax-*)
		os=-ultrix4.2
		;;
	m68*-apollo)
		os=-domain
		;;
	i386-sun)
		os=-sunos4.0.2
		;;
	m68000-sun)
		os=-sunos3
		# This also exists in the configure program, but was not the
		# default.
		# os=-sunos4
		;;
	m68*-cisco)
		os=-aout
		;;
	mips*-cisco)
		os=-elf
		;;
	mips*-*)
		os=-elf
		;;
	or32-*)
		os=-coff
		;;
	*-tti)	# must be before sparc entry or we get the wrong os.
		os=-sysv3
		;;
	sparc-* | *-sun)
		os=-sunos4.1.1
		;;
	*-be)
		os=-beos
		;;
	*-haiku)
		os=-haiku
		;;
	*-ibm)
		os=-aix
		;;
    	*-knuth)
		os=-mmixware
		;;
	*-wec)
		os=-proelf
		;;
	*-winbond)
		os=-proelf
		;;
	*-oki)
		os=-proelf
		;;
	*-hp)
		os=-hpux
		;;
	*-hitachi)
		os=-hiux
		;;
	i860-* | *-att | *-ncr | *-altos | *-motorola | *-convergent)
		os=-sysv
		;;
	*-cbm)
		os=-amigaos
		;;
	*-dg)
		os=-dgux
		;;
	*-dolphin)
		os=-sysv3
		;;
	m68k-ccur)
		os=-rtu
		;;
	m88k-omron*)
		os=-luna
		;;
	*-next )
		os=-nextstep
		;;
	*-sequent)
		os=-ptx
		;;
	*-crds)
		os=-unos
		;;
	*-ns)
		os=-genix
		;;
	i370-*)
		os=-mvs
		;;
	*-next)
		os=-nextstep3
		;;
	*-gould)
		os=-sysv
		;;
	*-highlevel)
		os=-bsd
		;;
	*-encore)
		os=-bsd
		;;
	*-sgi)
		os=-irix
		;;
	*-siemens)
		os=-sysv4
		;;
	*-masscomp)
		os=-rtu
		;;
	f30[01]-fujitsu | f700-fujitsu)
		os=-uxpv
		;;
	*-rom68k)
		os=-coff
		;;
	*-*bug)
		os=-coff
		;;
	*-apple)
		os=-macos
		;;
	*-atari*)
		os=-mint
		;;
	*)
		os=-none
		;;
esac
fi

# Here we handle the case where we know the os, and the CPU type, but not the
# manufacturer.  We pick the logical manufacturer.
vendor=unknown
case $basic_machine in
	*-unknown)
		case $os in
			-riscix*)
				vendor=acorn
				;;
			-sunos*)
				vendor=sun
				;;
			-aix*)
				vendor=ibm
				;;
			-beos*)
				vendor=be
				;;
			-hpux*)
				vendor=hp
				;;
			-mpeix*)
				vendor=hp
				;;
			-hiux*)
				vendor=hitachi
				;;
			-unos*)
				vendor=crds
				;;
			-dgux*)
				vendor=dg
				;;
			-luna*)
				vendor=omron
				;;
			-genix*)
				vendor=ns
				;;
			-mvs* | -opened*)
				vendor=ibm
				;;
			-os400*)
				vendor=ibm
				;;
			-ptx*)
				vendor=sequent
				;;
			-tpf*)
				vendor=ibm
				;;
			-vxsim* | -vxworks* | -windiss*)
				vendor=wrs
				;;
			-aux*)
				vendor=apple
				;;
			-hms*)
				vendor=hitachi
				;;
			-mpw* | -macos*)
				vendor=apple
				;;
			-*mint | -mint[0-9]* | -*MiNT | -MiNT[0-9]*)
				vendor=atari
				;;
			-vos*)
				vendor=stratus
				;;
		esac
		basic_machine=`echo $basic_machine | sed "s/unknown/$vendor/"`
		;;
esac

echo $basic_machine$os
exit

# Local variables:
# eval: (add-hook 'write-file-hooks 'time-stamp)
# time-stamp-start: "timestamp='"
# time-stamp-format: "%:y-%02m-%02d"
# time-stamp-end: "'"
# End:
������������������������������������������������������������������swish-e-2.4.7/config/mkinstalldirs������������������������������������������������������������������0000775�0000771�0001750�00000003704�11166010112�014074� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#! /bin/sh
# mkinstalldirs --- make directory hierarchy
# Author: Noah Friedman <friedman@prep.ai.mit.edu>
# Created: 1993-05-16
# Public domain

errstatus=0
dirmode=""

usage="\
Usage: mkinstalldirs [-h] [--help] [-m mode] dir ..."

# process command line arguments
while test $# -gt 0 ; do
  case $1 in
    -h | --help | --h*)         # -h for help
      echo "$usage" 1>&2
      exit 0
      ;;
    -m)                         # -m PERM arg
      shift
      test $# -eq 0 && { echo "$usage" 1>&2; exit 1; }
      dirmode=$1
      shift
      ;;
    --)                         # stop option processing
      shift
      break
      ;;
    -*)                         # unknown option
      echo "$usage" 1>&2
      exit 1
      ;;
    *)                          # first non-opt arg
      break
      ;;
  esac
done

for file
do
  if test -d "$file"; then
    shift
  else
    break
  fi
done

case $# in
  0) exit 0 ;;
esac

case $dirmode in
  '')
    if mkdir -p -- . 2>/dev/null; then
      echo "mkdir -p -- $*"
      exec mkdir -p -- "$@"
    fi
    ;;
  *)
    if mkdir -m "$dirmode" -p -- . 2>/dev/null; then
      echo "mkdir -m $dirmode -p -- $*"
      exec mkdir -m "$dirmode" -p -- "$@"
    fi
    ;;
esac

for file
do
  set fnord `echo ":$file" | sed -ne 's/^:\//#/;s/^://;s/\// /g;s/^#/\//;p'`
  shift

  pathcomp=
  for d
  do
    pathcomp="$pathcomp$d"
    case $pathcomp in
      -*) pathcomp=./$pathcomp ;;
    esac

    if test ! -d "$pathcomp"; then
      echo "mkdir $pathcomp"

      mkdir "$pathcomp" || lasterr=$?

      if test ! -d "$pathcomp"; then
  	errstatus=$lasterr
      else
  	if test ! -z "$dirmode"; then
	  echo "chmod $dirmode $pathcomp"
    	  lasterr=""
  	  chmod "$dirmode" "$pathcomp" || lasterr=$?

  	  if test ! -z "$lasterr"; then
  	    errstatus=$lasterr
  	  fi
  	fi
      fi
    fi

    pathcomp="$pathcomp/"
  done
done

exit $errstatus

# Local Variables:
# mode: shell-script
# sh-indentation: 2
# End:
# mkinstalldirs ends here
������������������������������������������������������������swish-e-2.4.7/config/compile������������������������������������������������������������������������0000775�0000771�0001750�00000005326�11166010112�012646� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#! /bin/sh

# Wrapper for compilers which do not understand `-c -o'.

# Copyright 1999, 2000 Free Software Foundation, Inc.
# Written by Tom Tromey <tromey@cygnus.com>.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2, or (at your option)
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

# As a special exception to the GNU General Public License, if you
# distribute this file as part of a program that contains a
# configuration script generated by Autoconf, you may include it under
# the same distribution terms that you use for the rest of that program.

# Usage:
# compile PROGRAM [ARGS]...
# `-o FOO.o' is removed from the args passed to the actual compile.

prog=$1
shift

ofile=
cfile=
args=
while test $# -gt 0; do
   case "$1" in
    -o)
       # configure might choose to run compile as `compile cc -o foo foo.c'.
       # So we do something ugly here.
       ofile=$2
       shift
       case "$ofile" in
	*.o | *.obj)
	   ;;
	*)
	   args="$args -o $ofile"
	   ofile=
	   ;;
       esac
       ;;
    *.c)
       cfile=$1
       args="$args $1"
       ;;
    *)
       args="$args $1"
       ;;
   esac
   shift
done

if test -z "$ofile" || test -z "$cfile"; then
   # If no `-o' option was seen then we might have been invoked from a
   # pattern rule where we don't need one.  That is ok -- this is a
   # normal compilation that the losing compiler can handle.  If no
   # `.c' file was seen then we are probably linking.  That is also
   # ok.
   exec "$prog" $args
fi

# Name of file we expect compiler to create.
cofile=`echo $cfile | sed -e 's|^.*/||' -e 's/\.c$/.o/'`

# Create the lock directory.
# Note: use `[/.-]' here to ensure that we don't use the same name
# that we are using for the .o file.  Also, base the name on the expected
# object file name, since that is what matters with a parallel build.
lockdir=`echo $cofile | sed -e 's|[/.-]|_|g'`.d
while true; do
   if mkdir $lockdir > /dev/null 2>&1; then
      break
   fi
   sleep 1
done
# FIXME: race condition here if user kills between mkdir and trap.
trap "rmdir $lockdir; exit 1" 1 2 15

# Run the compile.
"$prog" $args
status=$?

if test -f "$cofile"; then
   mv "$cofile" "$ofile"
fi

rmdir $lockdir
exit $status
����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/config/depcomp������������������������������������������������������������������������0000775�0000771�0001750�00000031767�11166010112�012655� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#! /bin/sh

# depcomp - compile a program generating dependencies as side-effects
# Copyright 1999, 2000 Free Software Foundation, Inc.

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2, or (at your option)
# any later version.

# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.

# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
# 02111-1307, USA.

# As a special exception to the GNU General Public License, if you
# distribute this file as part of a program that contains a
# configuration script generated by Autoconf, you may include it under
# the same distribution terms that you use for the rest of that program.

# Originally written by Alexandre Oliva <oliva@dcc.unicamp.br>.

if test -z "$depmode" || test -z "$source" || test -z "$object"; then
  echo "depcomp: Variables source, object and depmode must be set" 1>&2
  exit 1
fi
# `libtool' can also be set to `yes' or `no'.

if test -z "$depfile"; then
   base=`echo "$object" | sed -e 's,^.*/,,' -e 's,\.\([^.]*\)$,.P\1,'`
   dir=`echo "$object" | sed 's,/.*$,/,'`
   if test "$dir" = "$object"; then
      dir=
   fi
   # FIXME: should be _deps on DOS.
   depfile="$dir.deps/$base"
fi

tmpdepfile=${tmpdepfile-`echo "$depfile" | sed 's/\.\([^.]*\)$/.T\1/'`}

rm -f "$tmpdepfile"

# Some modes work just like other modes, but use different flags.  We
# parameterize here, but still list the modes in the big case below,
# to make depend.m4 easier to write.  Note that we *cannot* use a case
# here, because this file can only contain one case statement.
if test "$depmode" = hp; then
  # HP compiler uses -M and no extra arg.
  gccflag=-M
  depmode=gcc
fi

if test "$depmode" = dashXmstdout; then
   # This is just like dashmstdout with a different argument.
   dashmflag=-xM
   depmode=dashmstdout
fi

case "$depmode" in
gcc3)
## gcc 3 implements dependency tracking that does exactly what
## we want.  Yay!  Note: for some reason libtool 1.4 doesn't like
## it if -MD -MP comes after the -MF stuff.  Hmm.
  "$@" -MT "$object" -MD -MP -MF "$tmpdepfile"
  stat=$?
  if test $stat -eq 0; then :
  else
    rm -f "$tmpdepfile"
    exit $stat
  fi
  mv "$tmpdepfile" "$depfile"
  ;;

gcc)
## There are various ways to get dependency output from gcc.  Here's
## why we pick this rather obscure method:
## - Don't want to use -MD because we'd like the dependencies to end
##   up in a subdir.  Having to rename by hand is ugly.
##   (We might end up doing this anyway to support other compilers.)
## - The DEPENDENCIES_OUTPUT environment variable makes gcc act like
##   -MM, not -M (despite what the docs say).
## - Using -M directly means running the compiler twice (even worse
##   than renaming).
  if test -z "$gccflag"; then
    gccflag=-MD,
  fi
  "$@" -Wp,"$gccflag$tmpdepfile"
  stat=$?
  if test $stat -eq 0; then :
  else
    rm -f "$tmpdepfile"
    exit $stat
  fi
  rm -f "$depfile"
  echo "$object : \\" > "$depfile"
  alpha=ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
## The second -e expression handles DOS-style file names with drive letters.
  sed -e 's/^[^:]*: / /' \
      -e 's/^['$alpha']:\/[^:]*: / /' < "$tmpdepfile" >> "$depfile"
## This next piece of magic avoids the `deleted header file' problem.
## The problem is that when a header file which appears in a .P file
## is deleted, the dependency causes make to die (because there is
## typically no way to rebuild the header).  We avoid this by adding
## dummy dependencies for each header file.  Too bad gcc doesn't do
## this for us directly.
  tr ' ' '
' < "$tmpdepfile" |
## Some versions of gcc put a space before the `:'.  On the theory
## that the space means something, we add a space to the output as
## well.
## Some versions of the HPUX 10.20 sed can't process this invocation
## correctly.  Breaking it into two sed invocations is a workaround.
    sed -e 's/^\\$//' -e '/^$/d' -e '/:$/d' | sed -e 's/$/ :/' >> "$depfile"
  rm -f "$tmpdepfile"
  ;;

hp)
  # This case exists only to let depend.m4 do its work.  It works by
  # looking at the text of this script.  This case will never be run,
  # since it is checked for above.
  exit 1
  ;;

sgi)
  if test "$libtool" = yes; then
    "$@" "-Wp,-MDupdate,$tmpdepfile"
  else
    "$@" -MDupdate "$tmpdepfile"
  fi
  stat=$?
  if test $stat -eq 0; then :
  else
    rm -f "$tmpdepfile"
    exit $stat
  fi
  rm -f "$depfile"

  if test -f "$tmpdepfile"; then  # yes, the sourcefile depend on other files
    echo "$object : \\" > "$depfile"

    # Clip off the initial element (the dependent).  Don't try to be
    # clever and replace this with sed code, as IRIX sed won't handle
    # lines with more than a fixed number of characters (4096 in
    # IRIX 6.2 sed, 8192 in IRIX 6.5).  We also remove comment lines;
    # the IRIX cc adds comments like `#:fec' to the end of the
    # dependency line.
    tr ' ' '
' < "$tmpdepfile" \
    | sed -e 's/^.*\.o://' -e 's/#.*$//' -e '/^$/ d' | \
    tr '
' ' ' >> $depfile
    echo >> $depfile

    # The second pass generates a dummy entry for each header file.
    tr ' ' '
' < "$tmpdepfile" \
   | sed -e 's/^.*\.o://' -e 's/#.*$//' -e '/^$/ d' -e 's/$/:/' \
   >> $depfile
  else
    # The sourcefile does not contain any dependencies, so just
    # store a dummy comment line, to avoid errors with the Makefile
    # "include basename.Plo" scheme.
    echo "#dummy" > "$depfile"
  fi
  rm -f "$tmpdepfile"
  ;;

aix)
  # The C for AIX Compiler uses -M and outputs the dependencies
  # in a .u file.  This file always lives in the current directory.
  # Also, the AIX compiler puts `$object:' at the start of each line;
  # $object doesn't have directory information.
  stripped=`echo "$object" | sed -e 's,^.*/,,' -e 's/\(.*\)\..*$/\1/'`
  tmpdepfile="$stripped.u"
  outname="$stripped.o"
  if test "$libtool" = yes; then
    "$@" -Wc,-M
  else
    "$@" -M
  fi

  stat=$?
  if test $stat -eq 0; then :
  else
    rm -f "$tmpdepfile"
    exit $stat
  fi

  if test -f "$tmpdepfile"; then
    # Each line is of the form `foo.o: dependent.h'.
    # Do two passes, one to just change these to
    # `$object: dependent.h' and one to simply `dependent.h:'.
    sed -e "s,^$outname:,$object :," < "$tmpdepfile" > "$depfile"
    sed -e "s,^$outname: \(.*\)$,\1:," < "$tmpdepfile" >> "$depfile"
  else
    # The sourcefile does not contain any dependencies, so just
    # store a dummy comment line, to avoid errors with the Makefile
    # "include basename.Plo" scheme.
    echo "#dummy" > "$depfile"
  fi
  rm -f "$tmpdepfile"
  ;;

icc)
  # Must come before tru64.

  # Intel's C compiler understands `-MD -MF file'.  However
  #    icc -MD -MF foo.d -c -o sub/foo.o sub/foo.c
  # will fill foo.d with something like
  #    foo.o: sub/foo.c
  #    foo.o: sub/foo.h
  # which is wrong.  We want:
  #    sub/foo.o: sub/foo.c
  #    sub/foo.o: sub/foo.h
  #    sub/foo.c:
  #    sub/foo.h:

  "$@" -MD -MF "$tmpdepfile"
  stat=$?
  if test $stat -eq 0; then :
  else
    rm -f "$tmpdepfile"
    exit $stat
  fi
  rm -f "$depfile"
  # Each line is of the form `foo.o: dependent.h'.
  # Do two passes, one to just change these to
  # `$object: dependent.h' and one to simply `dependent.h:'.
  sed -e "s,^[^:]*:,$object :," < "$tmpdepfile" > "$depfile"
  sed -e "s,^[^:]*: \(.*\)$,\1:," < "$tmpdepfile" >> "$depfile"
  rm -f "$tmpdepfile"
  ;;

tru64)
   # The Tru64 compiler uses -MD to generate dependencies as a side
   # effect.  `cc -MD -o foo.o ...' puts the dependencies into `foo.o.d'.
   # At least on Alpha/Redhat 6.1, Compaq CCC V6.2-504 seems to put
   # dependencies in `foo.d' instead, so we check for that too.
   # Subdirectories are respected.
   dir=`echo "$object" | sed -e 's|/[^/]*$|/|'`
   test "x$dir" = "x$object" && dir=
   base=`echo "$object" | sed -e 's|^.*/||' -e 's/\.o$//' -e 's/\.lo$//'`

   if test "$libtool" = yes; then
      tmpdepfile1="$dir.libs/$base.lo.d"
      tmpdepfile2="$dir.libs/$base.d"
      "$@" -Wc,-MD
   else
      tmpdepfile1="$dir$base.o.d"
      tmpdepfile2="$dir$base.d"
      "$@" -MD
   fi

   stat=$?
   if test $stat -eq 0; then :
   else
      rm -f "$tmpdepfile1" "$tmpdepfile2"
      exit $stat
   fi

   if test -f "$tmpdepfile1"; then
      tmpdepfile="$tmpdepfile1"
   else
      tmpdepfile="$tmpdepfile2"
   fi
   if test -f "$tmpdepfile"; then
      sed -e "s,^.*\.[a-z]*:,$object:," < "$tmpdepfile" > "$depfile"
      # That's a space and a tab in the [].
      sed -e 's,^.*\.[a-z]*:[ 	]*,,' -e 's,$,:,' < "$tmpdepfile" >> "$depfile"
   else
      echo "#dummy" > "$depfile"
   fi
   rm -f "$tmpdepfile"
   ;;

#nosideeffect)
  # This comment above is used by automake to tell side-effect
  # dependency tracking mechanisms from slower ones.

dashmstdout)
  # Important note: in order to support this mode, a compiler *must*
  # always write the proprocessed file to stdout, regardless of -o.
  "$@" || exit $?

  # Remove the call to Libtool.
  if test "$libtool" = yes; then
    while test $1 != '--mode=compile'; do
      shift
    done
    shift
  fi

  # Remove `-o $object'.
  IFS=" "
  for arg
  do
    case $arg in
    -o)
      shift
      ;;
    $object)
      shift
      ;;
    *)
      set fnord "$@" "$arg"
      shift # fnord
      shift # $arg
      ;;
    esac
  done

  test -z "$dashmflag" && dashmflag=-M
  # Require at least two characters before searching for `:'
  # in the target name.  This is to cope with DOS-style filenames:
  # a dependency such as `c:/foo/bar' could be seen as target `c' otherwise.
  "$@" $dashmflag |
    sed 's:^[  ]*[^: ][^:][^:]*\:[    ]*:'"$object"'\: :' > "$tmpdepfile"
  rm -f "$depfile"
  cat < "$tmpdepfile" > "$depfile"
  tr ' ' '
' < "$tmpdepfile" | \
## Some versions of the HPUX 10.20 sed can't process this invocation
## correctly.  Breaking it into two sed invocations is a workaround.
    sed -e 's/^\\$//' -e '/^$/d' -e '/:$/d' | sed -e 's/$/ :/' >> "$depfile"
  rm -f "$tmpdepfile"
  ;;

dashXmstdout)
  # This case only exists to satisfy depend.m4.  It is never actually
  # run, as this mode is specially recognized in the preamble.
  exit 1
  ;;

makedepend)
  "$@" || exit $?
  # Remove any Libtool call
  if test "$libtool" = yes; then
    while test $1 != '--mode=compile'; do
      shift
    done
    shift
  fi
  # X makedepend
  shift
  cleared=no
  for arg in "$@"; do
    case $cleared in
    no)
      set ""; shift
      cleared=yes ;;
    esac
    case "$arg" in
    -D*|-I*)
      set fnord "$@" "$arg"; shift ;;
    # Strip any option that makedepend may not understand.  Remove
    # the object too, otherwise makedepend will parse it as a source file.
    -*|$object)
      ;;
    *)
      set fnord "$@" "$arg"; shift ;;
    esac
  done
  obj_suffix="`echo $object | sed 's/^.*\././'`"
  touch "$tmpdepfile"
  ${MAKEDEPEND-makedepend} -o"$obj_suffix" -f"$tmpdepfile" "$@"
  rm -f "$depfile"
  cat < "$tmpdepfile" > "$depfile"
  sed '1,2d' "$tmpdepfile" | tr ' ' '
' | \
## Some versions of the HPUX 10.20 sed can't process this invocation
## correctly.  Breaking it into two sed invocations is a workaround.
    sed -e 's/^\\$//' -e '/^$/d' -e '/:$/d' | sed -e 's/$/ :/' >> "$depfile"
  rm -f "$tmpdepfile" "$tmpdepfile".bak
  ;;

cpp)
  # Important note: in order to support this mode, a compiler *must*
  # always write the proprocessed file to stdout.
  "$@" || exit $?

  # Remove the call to Libtool.
  if test "$libtool" = yes; then
    while test $1 != '--mode=compile'; do
      shift
    done
    shift
  fi

  # Remove `-o $object'.
  IFS=" "
  for arg
  do
    case $arg in
    -o)
      shift
      ;;
    $object)
      shift
      ;;
    *)
      set fnord "$@" "$arg"
      shift # fnord
      shift # $arg
      ;;
    esac
  done

  "$@" -E |
    sed -n '/^# [0-9][0-9]* "\([^"]*\)".*/ s:: \1 \\:p' |
    sed '$ s: \\$::' > "$tmpdepfile"
  rm -f "$depfile"
  echo "$object : \\" > "$depfile"
  cat < "$tmpdepfile" >> "$depfile"
  sed < "$tmpdepfile" '/^$/d;s/^ //;s/ \\$//;s/$/ :/' >> "$depfile"
  rm -f "$tmpdepfile"
  ;;

msvisualcpp)
  # Important note: in order to support this mode, a compiler *must*
  # always write the proprocessed file to stdout, regardless of -o,
  # because we must use -o when running libtool.
  "$@" || exit $?
  IFS=" "
  for arg
  do
    case "$arg" in
    "-Gm"|"/Gm"|"-Gi"|"/Gi"|"-ZI"|"/ZI")
	set fnord "$@"
	shift
	shift
	;;
    *)
	set fnord "$@" "$arg"
	shift
	shift
	;;
    esac
  done
  "$@" -E |
  sed -n '/^#line [0-9][0-9]* "\([^"]*\)"/ s::echo "`cygpath -u \\"\1\\"`":p' | sort | uniq > "$tmpdepfile"
  rm -f "$depfile"
  echo "$object : \\" > "$depfile"
  . "$tmpdepfile" | sed 's% %\\ %g' | sed -n '/^\(.*\)$/ s::	\1 \\:p' >> "$depfile"
  echo "	" >> "$depfile"
  . "$tmpdepfile" | sed 's% %\\ %g' | sed -n '/^\(.*\)$/ s::\1\::p' >> "$depfile"
  rm -f "$tmpdepfile"
  ;;

none)
  exec "$@"
  ;;

*)
  echo "Unknown depmode $depmode" 1>&2
  exit 1
  ;;
esac

exit 0
���������swish-e-2.4.7/README.cvs����������������������������������������������������������������������������0000664�0000771�0001750�00000022547�11166010113�011502� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������Using CVS
=========

Swish-e is available via cvs from SourceForge.  See http://swish-e.org for
links to SourceForge and instructions downloading via cvs.

The buildswishe.pl script can be used to fetch the latest cvs source, along
with latest libxml2 and zlib sources, and build them all.

Otherwise, swish-e uses the familiar:

  ./configure --prefix=/path/to/install
  make
  make check
  make install

Only 'make install' might require root priviledges, depending on the permissions
of /path/to/install.

Modifications to any of the build files (Makefile.am, configure.in) requires
running the bootstrap script.  You will need a full set of developer tools
for these to work (autoconf, automake, libtool).


Building with HTML docs
=======================

The cvs version does not included the HTML documentation.  The HTML docs are
built from the swish_website module (also in cvs) using the same templates that
http://swish-e.org uses.  Links not pointing to the local documentation are
adjusted in the distribuiton documentation to point to http://swish-e.org.

The tools to build the HTML docs can also be fetched via cvs from SourceForge
(see the swish_website cvs module).

The swish_website module requires several Perl modules, including
Template-Toolkit, Pod::POM and a few others.  Once you have swish_website
downloaded and can build the site using the swish_website/bin/build script,
you can then proceed to building Swish-e with:

  ./configure --with-website=/path/to/swish_website
  make
  make check
  make install

If the program "build-swish-docs" is in your $PATH then you do not need
to specify --with-website above.  "build-swish-docs" can be a symlink
to the actual swish_website/bin/build script.


See the swish_website/README for more about the HTML documentation.


Daily Builds
============

The swish-daily.pl script can be used to build swish-e from cvs (or
from tarball) via cron.  In a top-level build directory a dated sub-directory is
created and the source is fetched, compiled and installed.  For example:

    swish-daily.pl \
        --topdir=$HOME/swish/swish_daily_build
        --tardir=$HOME/swish/swish-daily

Which creates a directory structure like:

    $ ls -l swish_daily_build/
    latest_swish_build -> swish-e-2005-05-29
    swish-e-2005-05-26
    swish-e-2005-05-27
    swish-e-2005-05-28
    swish-e-2005-05-29

See the documentation in the swish-daily.pl script for more details.



Developer Notes:

Release and Tag
===============

There's two parts to making a release.  One is building the tarball
and the other is updating the website.  (Plus, announce the relase,
of course).

Building the tarball is done on your own machine, and then testing that tarball.

Updating the site requires using the swish-daily.pl script to fetch the tarball,
build it and then update the website.  There's a script called Make_Release.sh that
automates this process.


Building the tarball
--------------------

1) Run "cvs -nq update" to make sure your copy is up-to-date.
   Watch carefully for merge errors.

2) Edit the pod/CHANGES.pod file to update the date of the release

3) Edit the version in configure.in

4) run ./bootstrap to create new Makefile.in with new version string

5) Make sure you have the swish_website module available and can build
   the docs.  See above about building the docs.

6) Run make distcheck to make sure it all builds correctly.

    $ mkdir temp && cd temp
    $ ../swish-e/configure >/dev/null  (/dev/null is up to you)
    $ make distcheck > /dev/null

   That will create a tarball in the current directory.

7) Good idea to upload the tarball created by make distcheck someplace and
   then test on a few platforms

8) Check in the updated version Makefiles:

  $ cvs ci

9) Tag the release.  For example:

    $ cvs tag rel-2-4-0

10) Upload to swish-e site -- see "Update Site" below for details before doing this.

Now move on to development.  Change version in configure.in (2.5.0 for example),
run ./bootstrap and check in.


Update Site
-----------

Updating the site for a new release means updating the web site, such as the
main page to announce the new release, and to build the web site using the new
release so the on-line documentation is up to date.

This is always a bit more hands-on due to changes in the build system between
releases.

See swish_website/README for more details on building the web site, but
basically the swish_website script (bin/build) needs to know where to find the
source for both the release and the current development builds.  This is currently done by
two symlinks "swishsrc" and "develsrc".  develsrc points to the daily build directory.
The symlinks go in the swish_website directory:


For example:

    develsrc -> /home/bmoseley/swish/swish_daily_build/latest_swish_build/source

The daily builds are created with the swish-daily.pl script.  swish-daily.pl fetches the
source via cvs (with daily builds) or via a URL (for releases), unpacks into the "source"
directory and builds swish-e.  If all goes well a symlink is created to the
"latest_swish_build".

This same process can be used to build a release.  But, instead of fetching from cvs[1]
you fetch a tarball built above and placed in some location to fetch by URL.

The swish_website/Build_Release.sh script does this by running:

    #!/bin/sh

    DIR="${BASE_DIR:=$HOME/swish}"

    if test ! -n "$1"; then
        echo "Must specify URL to fetch"
        exit 1
    fi

    TAR_URL="$1"

    swish-daily.pl \
        --fetchtarurl="$TAR_URL" \
        --topdir=$DIR/swish_release_build \
        --noremove \
        --notimestamp \
        --verbose \
        --tardir=$DIR/swish-releases || exit 1;

So, run that script and pass a URL, swish-daily.pl will fetch the script, attempt to
build and install swish-e, and then build a tarball and place it in the swish-releases
directory and upate the latest.tar.gz link.  That basically makes the tarball available in
the download directory, but it cannot be seen yet until the website is updated.

For that you can run the swish_website/build.sh script, passing -a to tell it to build the
entire site.




Patching a previous version
---------------------------

Say development has started on the next release but a bug is found in the last.
Development has gone on too far to use the development version.

-- Creating a Branch in CVS -

The first thing that is needed is to create a branch of the cvs tree.  This
only needs to be done *once*, and there's more than one way to do it: you can
either branch based on a checked out version (e.g. first check out the tagged
version), or the branch can be done on the repository.

So, to just branch the repository:

  $ cvs -d :ext:<developer>@cvs.sourceforge.net:/cvsroot/swishe rtag -b -r rel-2-4-0 rel-2-4-0-patches swish-e

which says to create a branch called rel-2-4-0.patches branched off of the
tagged source rel-2-4-0.

[See NOTE below about before following this example]

-- Checkout branch --

Again, two ways to go here.  You can convert your existing local source tree
to the branch or simply do a new checkout.  I prefer to do a new checkout.

  $ cvs -d :ext:<moseley>@cvs.sourceforge.net:/cvsroot/swishe -z3 checkout -r rel-2-4-0-patches -d swish-e-2.4.0.patches swish-e

This specifies the Root again with -d, -z3 is compression, we checkout
rel-2-4-0-patches from the swish-e module, but instead of writing to
the "swish-e" directory use -d to create a directory called
swish-e-2.4.0.patches swish-e.

-- Make chages --

Make changes as normal.  Check in as normal:

  $ cvs ci

and that will ONLY update this current branch.


-- Merging changes back to main branch --

Changes and fixes that are done on the branch will proably need to be moved
to the main branch.

To merge in changs from another branch into your working copy (the main HEAD
branch - 2.5.x in this example)

  (in your 2.5.x working copy)
  $ cvs update -j rel-2-4-0-patches

That should update your sources.  Watch out for conflicts!

Note that this is a bit tricky, because this will likely change the version
number in configure.in.

Verify version in configure.in (might have been changed) then don't forget to
check the new patches into the HEAD branch.

  $ cvs ci -m "Merged patches from 2-4-0-patches"

Now, once merged go to your patches directory (swish-e-2.4.0.patches in this
example) and *tag* it.  That will allow merging later updates into the HEAD
branch without re-merging what was just merged.

  (in the .patches directory)
  $ cvs tag rel-2-4-1

that just tags the rel-2-4-0-patches branch.

Then when 2.4.2 is later created, just the changes from 2.4.1 to 2.4.2 can be
merged:

  (in the main branch)
  $ cvs update -j rel-2-4-1 -j rel-2-4-0-patches

which says merge from the tag rel-2-4-1 to the end of the branch
rel-2-4-0-patches into the current directory.

At least I think that's how it works.


NOTE: The example for naming the branch may not be best.  In this example we
are fixing a bug in 2.4.0 so will likely release a 2.4.1.  Development is now
in 2.5.x.  So it may be better to create the branch as rel-2-4-patches (instead
of rel-2-4-0-patches) and then all 2.4.x updates will get added there.


Developer Hints
===============

There's a bunch of debugging techniques, but I do this:

$ ./configure --prefix=$HOME/swt --enable-memtrace CFLAGS='-O0 -g'
$ make install

Then I cd to src and create a test config

$ gdb ~/swt/bin/swish-e
run -c c -i test.doc

Changes to source in another window and in gdb "make install" will
build without having to exit.
���������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/filter-bin/���������������������������������������������������������������������������0000777�0000771�0001750�00000000000�11166013170�012141� 5����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/filter-bin/Makefile.am����������������������������������������������������������������0000664�0000771�0001750�00000001236�11166010111�014105� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������exampledir = $(datadir)/doc/$(PACKAGE)/examples/filter-bin

example_DATA = \
    README \
    swish_filter.pl \
    _binfilter.sh \
    _pdf2html.pl

CLEANFILES = swish_filter.pl


# This is done here to stay in the GNU coding standards
# libexecdir can be modified at make time, so can't use
# variable substitution at configure time

swish_filter.pl: swish_filter.pl.in
	@rm -f swish_filter.pl
	@sed \
		-e 's,@@perlmoduledir@@,$(libexecdir)/perl,' \
		-e 's,@@swishbindir@@,$(bindir),' \
		-e 's,@@perlbinary@@,$(PERL),' \
		$(srcdir)/swish_filter.pl.in > swish_filter.pl


EXTRA_DIST = \
    README \
    swish_filter.pl.in \
    _binfilter.sh \
    _pdf2html.pl



������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/filter-bin/swish_filter.pl.in���������������������������������������������������������0000775�0000771�0001750�00000003527�11166010111�015525� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!@@perlbinary@@ -w
use strict;

# This is set to where Swish-e's "make install" installed the helper modules.
use lib ( '@@perlmoduledir@@' );


use SWISH::Filter;


=pod

This is an example of how to use the SWISH::Filter module to filter
documents using Swish-e's C<FileFilter> feature.  This will filter any
number of document types, depending on what filter modules are installed.

This program should typically only be used for the -S fs indexing method. 
For -S http the F<swishspider> program calls SWISH::Filter directly.  And -S
prog programs written in Perl can also make use of SWISH::Filter directly.

In general, you will not want to filter with this program if you have a lot
of files to filter.  Running a perl program for many documents will be slow
(due to the compiliation of the perl program).  If you have many documents
to convert with the -S fs method of indexing then consider using -S prog
with F<prog-bin/DirTree.pl> and use the SWISH::Filter module (see
F<filters/README>).

Swish-e configuration:

    FileFilter .pdf /path/to/swish_filter.pl
    FileFilter .doc /path/to/swish_filter.pl
    FileFilter .mp3 /path/to/swish_filter.pl
    IndexContents HTML2 .pdf .mp3
    IndexContents TXT2 .doc

Then when indexing those type of documents this program will attempt to filter (convert)
them into a text format.

See SWISH-CONFIG documentation on Filtering for more information.

=cut    


    my ( $work_path, $real_path ) = @ARGV;
    my $filter = SWISH::Filter->new;

    my $filtered = $filter->filter(
        document => $work_path,
        name     => $real_path,
        content_type => \$real_path, # use the real path to lookup the content type
    );

    print STDERR $filtered ? " - Filtered: $real_path\n" : " - Not filtered: $real_path ($work_path)\n";

    print $filtered
        ? ${$filter->fetch_doc}
        : $real_path;

    



�������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/filter-bin/_binfilter.sh��������������������������������������������������������������0000775�0000771�0001750�00000000552�11166010111�014525� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/bin/sh

# Example of a shell script filter

# Simple filter for binary files (eg: exe files) 

# Note: This is just an example of a shell script.  In general you would not
# use a shell script to just call a program -- rather call the program directly from
# swish using a FileFilter command.
# e.g. FileFilter .exe strings "'%p'"

strings "$1" - 2>/dev/null
������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/filter-bin/README���������������������������������������������������������������������0000664�0000771�0001750�00000001251�11166010111�012726� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������Here are some very basic filter scripts that could be used with the 
FileFilter configuration directive.

In general, Swish-e's FileFilter feature is best used with binary programs 
to avoid the time parsing and compiling a script for each document that 
needs to be filtered.

    _binfilter.sh
        This filter uses the strings(1) command to extract
        text out of binary files.
        

    _pdf2html.pl
        Filter uses the xpdf package's programs pdfinfo and pdftotext
        to convert a PDF file into html.


    swish_filter.pl
        Uses the SWISH::Filter module to filter a number of different types
        of documents with the same script.        





�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/filter-bin/_pdf2html.pl���������������������������������������������������������������0000664�0000771�0001750�00000004416�11166010111�014270� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#! /usr/bin/perl -w
use strict;

# -- Filter PDF to simple HTML for swish
# --
# -- 2000-05  rasc
#
=pod

This filter requires two programs "pdfinfo" and "pdftotext".
These programs are part of the xpdf package found at
http://www.foolabs.com/xpdf/xpdf.html.

These programs must be found in the PATH when indexing is run, or 
explicitly set the path in this program:

  $ENV{PATH} = '/path/to/programs'

"pdfinfo" extracts the document info from a pdf file, if any exist,
and creates metanames for swish to index.  See man pdfinfo(1) for
information what keywords are available.

An HTML title is created from the "title" and "subject" pdf info data.
Adjust as needed below.

How the extracted keyword info is indexed in Swish-e is controlled by
the following Swish-e configuration settings: MetaNames, PropertyNames,
UndefinedMetaTags.

Passing the -raw option to pdftotext may improve indexing time by
reducing the size of the converted output.

=cut


my $file = shift || die "Usage: $0 <filename>\n";

#
# -- read pdf meta information
#

my %metadata;

open F, "pdfinfo $file |" || 
die "$0: Failed to open $file $!";

while (<F>) {
    if ( /^\s*([^:]+):\s+(.+)$/ ) {
        my ( $metaname, $value ) = ( lc( $1 ), escapeHTML( $2 ) );
        $metaname =~ tr/ /_/;
        $metadata{$metaname} = $value;
    }
}
close F or die "$0: Failed close on pipe to pdfinfo for $file: $?";


# Set the default title from the title and subject info

my @title = grep { $_ } @metadata{ qw/title subject/ };
delete $metadata{$_} for qw/title subject/;


my $title = join ' // ', ( @title ? @title : 'Unknown title' );

my $metadata = 
    join "\n",
        map { qq[<meta name="$_" content="$metadata{$_}">] }
                   sort keys %metadata;


print <<EOF;
<html>
<head>
    <title>
        $title
    
    $metadata


EOF

# Might be faster to use sysread and read in larger blocks

open F, "pdftotext $file - |" or die "$0: failed to run pdftotext: $!";
print escapeHTML($_) while (  );
close F or die "$0: Failed close on pipe to pdftotext for $file: $?";

print "\n";


# How are URLs printed with pdftotext?
sub escapeHTML {

   my $str = shift;

   for ( $str ) {
       s/&/&/go;
       s//>/go;
       s/"/"/go;
    }
   return $str;
}

swish-e-2.4.7/filter-bin/Makefile.in0000664000077100017500000002600011166010111014112 00000000000000# Makefile.in generated by automake 1.9.6 from Makefile.am.
# @configure_input@

# Copyright (C) 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002,
# 2003, 2004, 2005  Free Software Foundation, Inc.
# This Makefile.in is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.

# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY, to the extent permitted by law; without
# even the implied warranty of MERCHANTABILITY or FITNESS FOR A
# PARTICULAR PURPOSE.

@SET_MAKE@

srcdir = @srcdir@
top_srcdir = @top_srcdir@
VPATH = @srcdir@
pkgdatadir = $(datadir)/@PACKAGE@
pkglibdir = $(libdir)/@PACKAGE@
pkgincludedir = $(includedir)/@PACKAGE@
top_builddir = ..
am__cd = CDPATH="$${ZSH_VERSION+.}$(PATH_SEPARATOR)" && cd
INSTALL = @INSTALL@
install_sh_DATA = $(install_sh) -c -m 644
install_sh_PROGRAM = $(install_sh) -c
install_sh_SCRIPT = $(install_sh) -c
INSTALL_HEADER = $(INSTALL_DATA)
transform = $(program_transform_name)
NORMAL_INSTALL = :
PRE_INSTALL = :
POST_INSTALL = :
NORMAL_UNINSTALL = :
PRE_UNINSTALL = :
POST_UNINSTALL = :
build_triplet = @build@
host_triplet = @host@
subdir = filter-bin
DIST_COMMON = README $(srcdir)/Makefile.am $(srcdir)/Makefile.in
ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
am__aclocal_m4_deps = $(top_srcdir)/config/acinclude.m4 \
	$(top_srcdir)/configure.in
am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \
	$(ACLOCAL_M4)
mkinstalldirs = $(SHELL) $(top_srcdir)/config/mkinstalldirs
CONFIG_HEADER = $(top_builddir)/src/acconfig.h
CONFIG_CLEAN_FILES =
SOURCES =
DIST_SOURCES =
am__vpath_adj_setup = srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`;
am__vpath_adj = case $$p in \
    $(srcdir)/*) f=`echo "$$p" | sed "s|^$$srcdirstrip/||"`;; \
    *) f=$$p;; \
  esac;
am__strip_dir = `echo $$p | sed -e 's|^.*/||'`;
am__installdirs = "$(DESTDIR)$(exampledir)"
exampleDATA_INSTALL = $(INSTALL_DATA)
DATA = $(example_DATA)
DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST)
ACLOCAL = @ACLOCAL@
ALLOCA = @ALLOCA@
AMDEP_FALSE = @AMDEP_FALSE@
AMDEP_TRUE = @AMDEP_TRUE@
AMTAR = @AMTAR@
AR = @AR@
AS = @AS@
AUTOCONF = @AUTOCONF@
AUTOHEADER = @AUTOHEADER@
AUTOMAKE = @AUTOMAKE@
AWK = @AWK@
BTREE_OBJS = @BTREE_OBJS@
BUILDDOCS_FALSE = @BUILDDOCS_FALSE@
BUILDDOCS_TRUE = @BUILDDOCS_TRUE@
CC = @CC@
CCDEPMODE = @CCDEPMODE@
CFLAGS = @CFLAGS@
CPP = @CPP@
CPPFLAGS = @CPPFLAGS@
CXX = @CXX@
CXXCPP = @CXXCPP@
CXXDEPMODE = @CXXDEPMODE@
CXXFLAGS = @CXXFLAGS@
CYGPATH_W = @CYGPATH_W@
DEFS = @DEFS@
DEPDIR = @DEPDIR@
DLLTOOL = @DLLTOOL@
ECHO = @ECHO@
ECHO_C = @ECHO_C@
ECHO_N = @ECHO_N@
ECHO_T = @ECHO_T@
EGREP = @EGREP@
EXEEXT = @EXEEXT@
F77 = @F77@
FFLAGS = @FFLAGS@
INSTALLDOCS_FALSE = @INSTALLDOCS_FALSE@
INSTALLDOCS_TRUE = @INSTALLDOCS_TRUE@
INSTALL_DATA = @INSTALL_DATA@
INSTALL_PROGRAM = @INSTALL_PROGRAM@
INSTALL_SCRIPT = @INSTALL_SCRIPT@
INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@
LARGEFILES_MACROS = @LARGEFILES_MACROS@
LDFLAGS = @LDFLAGS@
LIBOBJS = @LIBOBJS@
LIBS = @LIBS@
LIBTOOL = @LIBTOOL@
LIBXML2_CFLAGS = @LIBXML2_CFLAGS@
LIBXML2_LIB = @LIBXML2_LIB@
LIBXML2_OBJS = @LIBXML2_OBJS@
LIBXML_REQUIRED_VERSION = @LIBXML_REQUIRED_VERSION@
LN_S = @LN_S@
LTLIBOBJS = @LTLIBOBJS@
MAINT = @MAINT@
MAINTAINER_MODE_FALSE = @MAINTAINER_MODE_FALSE@
MAINTAINER_MODE_TRUE = @MAINTAINER_MODE_TRUE@
MAKEINFO = @MAKEINFO@
OBJDUMP = @OBJDUMP@
OBJEXT = @OBJEXT@
PACKAGE = @PACKAGE@
PACKAGE_BUGREPORT = @PACKAGE_BUGREPORT@
PACKAGE_NAME = @PACKAGE_NAME@
PACKAGE_STRING = @PACKAGE_STRING@
PACKAGE_TARNAME = @PACKAGE_TARNAME@
PACKAGE_VERSION = @PACKAGE_VERSION@
PATH_SEPARATOR = @PATH_SEPARATOR@
PCRE_CFLAGS = @PCRE_CFLAGS@
PCRE_CONFIG = @PCRE_CONFIG@
PCRE_LIBS = @PCRE_LIBS@
PCRE_REQUIRED_VERSION = @PCRE_REQUIRED_VERSION@
PERL = @PERL@
POD2MAN = @POD2MAN@
RANLIB = @RANLIB@
SET_MAKE = @SET_MAKE@
SHELL = @SHELL@
STRIP = @STRIP@
SWISH_WEB = @SWISH_WEB@
VERSION = @VERSION@
XML2_CONFIG = @XML2_CONFIG@
Z_CFLAGS = @Z_CFLAGS@
Z_LIBS = @Z_LIBS@
ac_ct_AR = @ac_ct_AR@
ac_ct_AS = @ac_ct_AS@
ac_ct_CC = @ac_ct_CC@
ac_ct_CXX = @ac_ct_CXX@
ac_ct_DLLTOOL = @ac_ct_DLLTOOL@
ac_ct_F77 = @ac_ct_F77@
ac_ct_OBJDUMP = @ac_ct_OBJDUMP@
ac_ct_RANLIB = @ac_ct_RANLIB@
ac_ct_STRIP = @ac_ct_STRIP@
am__fastdepCC_FALSE = @am__fastdepCC_FALSE@
am__fastdepCC_TRUE = @am__fastdepCC_TRUE@
am__fastdepCXX_FALSE = @am__fastdepCXX_FALSE@
am__fastdepCXX_TRUE = @am__fastdepCXX_TRUE@
am__include = @am__include@
am__leading_dot = @am__leading_dot@
am__quote = @am__quote@
am__tar = @am__tar@
am__untar = @am__untar@
bindir = @bindir@
build = @build@
build_alias = @build_alias@
build_cpu = @build_cpu@
build_os = @build_os@
build_vendor = @build_vendor@
datadir = @datadir@
exec_prefix = @exec_prefix@
host = @host@
host_alias = @host_alias@
host_cpu = @host_cpu@
host_os = @host_os@
host_vendor = @host_vendor@
includedir = @includedir@
infodir = @infodir@
install_sh = @install_sh@
libdir = @libdir@
libexecdir = @libexecdir@
localstatedir = @localstatedir@
mandir = @mandir@
mkdir_p = @mkdir_p@
oldincludedir = @oldincludedir@
prefix = @prefix@
program_transform_name = @program_transform_name@
sbindir = @sbindir@
sharedstatedir = @sharedstatedir@
sysconfdir = @sysconfdir@
target_alias = @target_alias@
exampledir = $(datadir)/doc/$(PACKAGE)/examples/filter-bin
example_DATA = \
    README \
    swish_filter.pl \
    _binfilter.sh \
    _pdf2html.pl

CLEANFILES = swish_filter.pl
EXTRA_DIST = \
    README \
    swish_filter.pl.in \
    _binfilter.sh \
    _pdf2html.pl

all: all-am

.SUFFIXES:
$(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am  $(am__configure_deps)
	@for dep in $?; do \
	  case '$(am__configure_deps)' in \
	    *$$dep*) \
	      cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh \
		&& exit 0; \
	      exit 1;; \
	  esac; \
	done; \
	echo ' cd $(top_srcdir) && $(AUTOMAKE) --foreign  filter-bin/Makefile'; \
	cd $(top_srcdir) && \
	  $(AUTOMAKE) --foreign  filter-bin/Makefile
.PRECIOUS: Makefile
Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status
	@case '$?' in \
	  *config.status*) \
	    cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh;; \
	  *) \
	    echo ' cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe)'; \
	    cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe);; \
	esac;

$(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES)
	cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh

$(top_srcdir)/configure: @MAINTAINER_MODE_TRUE@ $(am__configure_deps)
	cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh
$(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps)
	cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh

mostlyclean-libtool:
	-rm -f *.lo

clean-libtool:
	-rm -rf .libs _libs

distclean-libtool:
	-rm -f libtool
uninstall-info-am:
install-exampleDATA: $(example_DATA)
	@$(NORMAL_INSTALL)
	test -z "$(exampledir)" || $(mkdir_p) "$(DESTDIR)$(exampledir)"
	@list='$(example_DATA)'; for p in $$list; do \
	  if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \
	  f=$(am__strip_dir) \
	  echo " $(exampleDATA_INSTALL) '$$d$$p' '$(DESTDIR)$(exampledir)/$$f'"; \
	  $(exampleDATA_INSTALL) "$$d$$p" "$(DESTDIR)$(exampledir)/$$f"; \
	done

uninstall-exampleDATA:
	@$(NORMAL_UNINSTALL)
	@list='$(example_DATA)'; for p in $$list; do \
	  f=$(am__strip_dir) \
	  echo " rm -f '$(DESTDIR)$(exampledir)/$$f'"; \
	  rm -f "$(DESTDIR)$(exampledir)/$$f"; \
	done
tags: TAGS
TAGS:

ctags: CTAGS
CTAGS:


distdir: $(DISTFILES)
	@srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; \
	topsrcdirstrip=`echo "$(top_srcdir)" | sed 's|.|.|g'`; \
	list='$(DISTFILES)'; for file in $$list; do \
	  case $$file in \
	    $(srcdir)/*) file=`echo "$$file" | sed "s|^$$srcdirstrip/||"`;; \
	    $(top_srcdir)/*) file=`echo "$$file" | sed "s|^$$topsrcdirstrip/|$(top_builddir)/|"`;; \
	  esac; \
	  if test -f $$file || test -d $$file; then d=.; else d=$(srcdir); fi; \
	  dir=`echo "$$file" | sed -e 's,/[^/]*$$,,'`; \
	  if test "$$dir" != "$$file" && test "$$dir" != "."; then \
	    dir="/$$dir"; \
	    $(mkdir_p) "$(distdir)$$dir"; \
	  else \
	    dir=''; \
	  fi; \
	  if test -d $$d/$$file; then \
	    if test -d $(srcdir)/$$file && test $$d != $(srcdir); then \
	      cp -pR $(srcdir)/$$file $(distdir)$$dir || exit 1; \
	    fi; \
	    cp -pR $$d/$$file $(distdir)$$dir || exit 1; \
	  else \
	    test -f $(distdir)/$$file \
	    || cp -p $$d/$$file $(distdir)/$$file \
	    || exit 1; \
	  fi; \
	done
check-am: all-am
check: check-am
all-am: Makefile $(DATA)
installdirs:
	for dir in "$(DESTDIR)$(exampledir)"; do \
	  test -z "$$dir" || $(mkdir_p) "$$dir"; \
	done
install: install-am
install-exec: install-exec-am
install-data: install-data-am
uninstall: uninstall-am

install-am: all-am
	@$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am

installcheck: installcheck-am
install-strip:
	$(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \
	  install_sh_PROGRAM="$(INSTALL_STRIP_PROGRAM)" INSTALL_STRIP_FLAG=-s \
	  `test -z '$(STRIP)' || \
	    echo "INSTALL_PROGRAM_ENV=STRIPPROG='$(STRIP)'"` install
mostlyclean-generic:

clean-generic:
	-test -z "$(CLEANFILES)" || rm -f $(CLEANFILES)

distclean-generic:
	-test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES)

maintainer-clean-generic:
	@echo "This command is intended for maintainers to use"
	@echo "it deletes files that may require special tools to rebuild."
clean: clean-am

clean-am: clean-generic clean-libtool mostlyclean-am

distclean: distclean-am
	-rm -f Makefile
distclean-am: clean-am distclean-generic distclean-libtool

dvi: dvi-am

dvi-am:

html: html-am

info: info-am

info-am:

install-data-am: install-exampleDATA

install-exec-am:

install-info: install-info-am

install-man:

installcheck-am:

maintainer-clean: maintainer-clean-am
	-rm -f Makefile
maintainer-clean-am: distclean-am maintainer-clean-generic

mostlyclean: mostlyclean-am

mostlyclean-am: mostlyclean-generic mostlyclean-libtool

pdf: pdf-am

pdf-am:

ps: ps-am

ps-am:

uninstall-am: uninstall-exampleDATA uninstall-info-am

.PHONY: all all-am check check-am clean clean-generic clean-libtool \
	distclean distclean-generic distclean-libtool distdir dvi \
	dvi-am html html-am info info-am install install-am \
	install-data install-data-am install-exampleDATA install-exec \
	install-exec-am install-info install-info-am install-man \
	install-strip installcheck installcheck-am installdirs \
	maintainer-clean maintainer-clean-generic mostlyclean \
	mostlyclean-generic mostlyclean-libtool pdf pdf-am ps ps-am \
	uninstall uninstall-am uninstall-exampleDATA uninstall-info-am


# This is done here to stay in the GNU coding standards
# libexecdir can be modified at make time, so can't use
# variable substitution at configure time

swish_filter.pl: swish_filter.pl.in
	@rm -f swish_filter.pl
	@sed \
		-e 's,@@perlmoduledir@@,$(libexecdir)/perl,' \
		-e 's,@@swishbindir@@,$(bindir),' \
		-e 's,@@perlbinary@@,$(PERL),' \
		$(srcdir)/swish_filter.pl.in > swish_filter.pl
# Tell versions [3.59,3.63) of GNU make to not export all variables.
# Otherwise a system limit (for SysV at least) may be exceeded.
.NOEXPORT:
swish-e-2.4.7/README0000664000077100017500000002102111166010454010702 00000000000000NAME
    The Swish-e README File

Upgrading?
    If you are upgrading Swish-e, please review the CHANGES file before
    installation. The index format may change and existing indexes may need
    to be re-created before use.

OVERVIEW
    Swish-e is Simple Web Indexing System for Humans - Enhanced. Swish-e can
    quickly and easily index directories of files or remote web sites and
    search the generated indexes.

    Swish-e is extremely fast in both indexing and searching, highly
    configurable, and can be seamlessly integrated with existing web sites
    to maintain a consistent design. Swish-e can index web pages, but can
    just as easily index text files, mailing list archives, or data stored
    in a relational database.

    Swish is designed to index small- to medium-sized collection of
    documents, Although a few users are indexing over a million documents,
    typical usage is more often in the tens of thousands. Currently, Swish-e
    only indexes eight bit character encodings.

    Swish-e version 2.2 was a major rewrite of the code and the addition of
    many new features. Memory requirements for indexing have been reduced
    and indexing speed is significantly improved from previous versions. New
    features allow more control over indexing, better document parsing,
    improved indexing and searching logic, better filter code, and the
    ability to index from any data source.

    Swish-e version 2.4 includes a major rewrite of the C API and a new Perl
    module for accessing the Swish-e C library. In addition, Swish-e 2.4
    uses the GNU Auto Tools. The significant changes are where files are
    installed, and the use of Libtool to create the Swish-e library as a
    shared library on many platforms. Basically, installation is easier than
    previous versions, and more files are installed in "standard" locations
    (e.g. documentation is installed in "$prefix/share/doc/swish-e").

    Note: Due to the new build and installation system in Swish-e 2.4, some
    documentation may incorrectly list the location of files. Please report
    any documentation errors to the Swish-e Discussion list.

    Swish-e is not a "turn-key" indexing and searching solution. The Swish-e
    distribution contains most of the parts to create such a system, but you
    need to put the parts together as best meets your needs. This gives you
    the power to index and search your documents the way you wish and to
    seamlessly integrate a search engine into your web site or application.

    To use Swish-e, you will need to configure Swish-e to index your
    documents, create an index by running Swish-e, and setup an interface
    such as a CGI script (a script is included) to search the index and
    display results. Swish uses helper programs to index documents of types
    that Swish-e cannot natively index. These programs may need to be
    installed separately from Swish-e.

    Swish-e is an Open Source (see: http://opensource.org ) program
    supported by developers and a large group of users. Please take time to
    join the Swish-e discussion list at http://Swish-e.org .

  Key features
    *   Quickly index a large number of documents in different formats
        including text, HTML, and XML.

    *   Use "filters" to index other types of files such as PDF, gzip, or
        PostScript.

    *   Includes a web spider for indexing remote documents over HTTP.
        Follows Robots Exclusion Rules (including META tags).

    *   Can use an external program to supply documents to Swish-e, such as
        an advanced spider for your web server or a program to read and
        format records from a relational database.

    *   Document "properties" (some subset of the source document, usually
        defined as a META or XML elements) may be stored in the index and
        returned with search results.

    *   Document summaries can be returned with each search.

    *   Word stemming, soundex, metaphone, and double-metaphone indexing for
        "fuzzy" searching

    *   Phrase searching and wildcard searching

    *   Limit searches to HTML links.

    *   Use powerful Regular Expressions to select documents for indexing or
        exclusion.

    *   Easily limit searches to parts or all of your web site.

    *   Results can be sorted by relevance or by any number of properties in
        ascending or descending order.

    *   Limit searches to parts of documents such as certain HTML tags
        (META, TITLE, comments, etc.) or to XML elements.

    *   Can report structural errors in your XML and HTML documents.

    *   Index file is portable between platforms.

    *   A Swish-e library is provided to allow embedding Swish-e into your
        applications for very fast searching. A Perl module is available
        that provides a standard API for accessing Swish-e.

    *   Includes example search script with context summaries and search
        term and phrase highlighting. Can be used with popular Perl
        templating systems.

    *   Swish-e is fast.

    *   It's Open Source and FREE! You can customize Swish-e and you can
        contribute your fancy new features to the project.

    *   Supported by on-line user and developer groups.

Where do I get Swish-e?
    The current version of Swish-e can be found at:

    http://Swish-e.org

    Please make sure you use a current version of Swish-e.

    Information about Windows binary distributions can also be found at this
    site.

How Do I Install Swish-e?
    Read the INSTALL page.

    Building from source is recommended. On most platforms, Swish-e should
    build without problems. A list of platforms where Swish-e has been built
    can be found in the INSTALL page. Information on building for VMS and
    Win32 can be found in sub-directories of the "src" directory. Check the
    Swish-e site for information about binary distributions (such as for
    Windows).

    In addition to the INSTALL page, make sure you read the SWISH-FAQ page
    if you have any questions, or to get an idea of questions that you might
    someday ask.

    Problems or questions about installing Swish-e should be directed to the
    Swish-e discussion list (see the Swish-e web site at
    http://Swish-e.org).

    Please read "Where do I get help with Swish-e?" below before posting any
    questions to the Swish-e list.

The Swish-e Documentation
    Documentation is provided as HTML pages installed in
    $prefix/share/doc/swish-e where $prefix is /usr/local if building from
    source, or /usr if installed as part of a package from your OS vendor.
    Under Windows $prefix is selected at installation time.

    A subset of the documentation is installed as system man pages as well.

    Documentation is also available on-line at http://swish-e.org.

    Patches or updates to the documentation should be done against the POD
    files, located in the pod directory of the distribution, or (preferably)
    against the CVS repository.

Where do I get help with Swish-e?
    If you need help with installing or using Swish-e, please subscribe to
    the Swish-e mailing list. Visit the Swish-e web site (listed above) for
    information on subscribing to the mailing list.

    Before posting any questions, please read QUESTIONS AND TROUBLESHOOTING.

Speling mistakes
    Please contact the Swish-e list with corrections to this documentation.
    Any help in cleaning up the docs will be appreciated!

    Any patches should be made against the ".pod" files, not the ".html"
    files.

Swish-e Development
    Swish-e is currently being developed as an Open-Source project on
    SourceForge http://sourceforge.net.

    Contact the Swish-e list for questions about Swish-e development.

Swish-e's History
    SWISH was created by Kevin Hughes, circa 1994, to fill the need of the
    growing number of Web administrators on the Internet - many of the
    indexing systems were not well documented, were hard to use and install,
    and were too complex for their own good. The system was widely used for
    several years, long enough to collect some bug fixes and requests for
    enhancements.

    In Fall 1996, The Library of UC Berkeley received permission from Kevin
    Hughes to implement bug fixes and enhancements to the original binary.
    The result is Swish-enhanced or Swish-e, brought to you by the Swish-e
    Development Team.

Document Info
    Each document in the Swish-e distribution contains this section. It
    refers only to the specific page it's located in, and not to the Swish-e
    program or the documentation as a whole.

    $Id: README.pod 1663 2005-02-11 17:00:13Z whmoseley $

    .

swish-e-2.4.7/rpm/0000777000077100017500000000000011166013167010712 500000000000000swish-e-2.4.7/rpm/swish-e.xpm0000664000077100017500000000147411166010111012725 00000000000000/* XPM */
static char *swish-e[] = {
/* columns rows colors chars-per-pixel */
"20 20 16 1",
"  c #5a3025",
". c #743f33",
"X c #7c4d44",
"o c #855c50",
"O c #957065",
"+ c #997d70",
"@ c #a68c7f",
"# c #af9a8e",
"$ c #b0a396",
"% c #bbb0a6",
"& c #c5c1b8",
"* c #cecec1",
"= c #d7d7c8",
"- c #e2e2d5",
"; c #f0f0e5",
": c #fcfcf7",
/* pixels */
"::::;&&%&%&%&&&&&#%:",
":::&%%%%%%%%%%%%%X$;",
"::&&%;;;;;;;;;;;; #;",
":;%%:;;;;;;-;;;-- @-",
":%%=;-+XXX....    +=",
":%$;;# +@@@@@@@+++$&",
":%$;-Oo===****=**&&&",
":$$--$+@@@@@@@%*&&&&",
":$%&--#@@@@@+@@O&&&&",
":;@@*==**&&*&&@@O&**",
":;*+O%=********%X+==",
":;;&oXoO+OOO@%**#.*-",
":;;--#Oooooo.+&*&.#=",
":;;--======*$%@=* @-",
":&%$$##@@@@@+$%=* +=",
":%&$#@@@@@@@@@**@ &*",
":%%---**&&&*&***XX**",
":$$;-===*****=%X %*&",
":$+@#@@@@@@@@o  %**&",
";O           o+&&*&*"
};
swish-e-2.4.7/rpm/swish-e.spec.in0000664000077100017500000001677411166010111013471 00000000000000%define	name	@PACKAGE@
%define	version	@VERSION@
%define release 6

# SWISH::API definitions
%define filelist %{_tmppath}/%{name}-%{version}/%{name}-%{version}-filelist
%define NVR %{name}-%{version}-%{release}

Summary:        SWISH-E - Simple Web Indexing System for Humans - Enhanced
Name:           %{name}
Version:        %{version}
Release:        %{release}
License:        GNU General Public License v2.0 or later, with linking exception
Group:          Applications/Internet
Source:         http://swish-e.org/distribution/%{name}-%{version}.tar.gz
URL:            http://swish-e.org/
BuildRoot:      %{_tmppath}/%name-root
Provides:       %{name}
Obsoletes:      %{name}-doc
Obsoletes:      swish
Requires:       libxml2, pcre, zlib
BuildRequires:  libxml2-devel, pcre-devel, zlib-devel, perl(ExtUtils::MakeMaker)

%description
Swish-e is Simple Web Indexing System for Humans - Enhanced

Swish-e can quickly and easily index directories of files or remote 
web sites and search the generated indexes.

Swish-e is extremely fast in both indexing and searching, highly
configurable, and can be seamlessly integrated with existing web sites
to maintain a consistent design. Swish-e can index web pages, but can
just as easily index text files, mailing list archives, or data stored
in a relational database.

%package        perl
Summary:        SWISH-E - PERL Scripts and Modules
Group:          Applications/Internet
Provides:       %{name}-perl
Requires:       %{name} = %{version}
Requires:       %{name}-perl-api = %{version}

%description    perl
PERL SWISH-E language bindings and scripts.


%package	perl-api
summary:	SWISH::API - Perl interface to the Swish-e C Library
License:	Perl License
Group:		Development/Libraries
Provides:	%{name}-perl-api
Provides:	perl-SWISH-API = 0.04
Requires:	%{name} = %{version}

%description	perl-api
SWISH::API provides a Perl interface to the Swish-e search engine.
SWISH::API allows embedding the swish-e search code into your application
avoiding the need to fork to run the swish-e binary and to keep an index file
open when running multiple queries.  This results in increased search performance.


%package	devel
Summary:	SWISH-E - Static libraries and header files.
Group:		Development/Libraries
Obsoletes:	swish-devel
Provides:       %{name}-devel
Requires:	%{name} = %{version}

%description	devel
Libraries and header files required for compiling applications based on the SWISH-E API.


%prep
%setup -q

%build
%configure --with-pcre=/usr --with-libxml2=/usr --with-zlib=/usr --libexecdir=%{_libexecdir}/swish-e
make

# Make SWISH::API
cp %{_builddir}/%{name}-%{version}/swish-config %{_builddir}/%{name}-%{version}/src/
chmod +x %{_builddir}/%{name}-%{version}/src/swish-config
pushd perl
grep -rsl '^#!.*perl' . |
grep -v '.bak$' |xargs --no-run-if-empty \
%__perl -MExtUtils::MakeMaker -e 'MY->fixin(@ARGV)'

CFLAGS="$RPM_OPT_FLAGS" SWISHBIN="%{_builddir}/%{name}-%{version}/src/swish-e" SWISHBINDIR="%{_builddir}/%{name}-%{version}/src" %{__perl} Makefile.PL `%{__perl} -MExtUtils::MakeMaker -e ' print qq|PREFIX=%{buildroot}%{_prefix}| if \$ExtUtils::MakeMaker::VERSION =~ /5\.9[1-6]|6\.0[0-5]/ '` 

%{__make} PREFIX=%{buildroot}%{_prefix} LIB='%{_libdir}' LIBS='-L%{_libdir} -L%{buildroot}/src/.libs -lswish-e -lz' 'LDFLAGS=-L%{_libdir} -L%{_builddir}/%{name}-%{version}/src/.libs' 'CCFLAGS=-I%{_builddir}/%{name}-%{version}/src' 'LDDLFLAGS=-shared -L%{_builddir}/%{name}-%{version}/src/.libs/ -lswish-e'

popd

%install
[ "%{buildroot}" != "/" ] && [ -d %{buildroot} ] && %{__rm} -rf %{buildroot};
%{__make} DESTDIR=$RPM_BUILD_ROOT prefix=%{prefix} sysconfdir=%{sysconfdir} install

# Install SWISH::API
pushd perl
%{makeinstall} `%{__perl} -MExtUtils::MakeMaker -e ' print \$ExtUtils::MakeMaker::VERSION <= 6.05 ? qq|PREFIX=%{buildroot}%{_prefix}| : qq|DESTDIR=%{buildroot}| '`

[ -x /usr/lib/rpm/brp-compress ] && /usr/lib/rpm/brp-compress

# remove special files
find %{buildroot} -name "perllocal.pod" \
    -o -name ".packlist"                \
    -o -name "*.bs"                     \
    |xargs -i rm -f {}

# fix permissions
find %{buildroot} | xargs -i chmod u+w {}

# no empty directories
find %{buildroot}%{_prefix}             \
    -type d -depth                      \
    -exec rmdir {} \; 2>/dev/null

# build list of installed SWISH::API files
mkdir -p %{_tmppath}/%{name}-%{version} 2>/dev/null
%{__perl} -le '
use strict;
use File::Find;
use File::Spec;
use Config qw(%Config);

my $buildroot = "%{buildroot}";
my $sitearch = File::Spec->catdir( $buildroot , $Config{installsitearch} );
my @sitearch;

find( sub{ 
        push(@sitearch, $File::Find::name =~ /\Q$buildroot\E(.+)$/);
    }, $sitearch );

$" = "\n";
print < %{filelist}

[ -z %filelist ] && {
    echo "ERROR: empty files listing"
    exit -1
    } 
popd
# end install SWISH::API

%post	-p /sbin/ldconfig
%postun -p /sbin/ldconfig

%clean
[ "${RPM_BUILD_ROOT}" != "/" ] && [ -d ${RPM_BUILD_ROOT} ] && rm -rf ${RPM_BUILD_ROOT};

%files
%defattr(-, root, root)
%{_bindir}/swish-e
%{_libexecdir}/swish-e
%{_libdir}/*.so.*
%{_mandir}/man[^3]/*
%{_datadir}/doc/swish-e/*

%files perl
%defattr(-, root, root)
%{_bindir}/swish-filter-test
%{_libexecdir}/swish-e/*
%{_datadir}/swish-e/*

%files perl-api -f %filelist
%defattr(-,root,root)
%doc perl/Changes perl/README
%{_mandir}/man3/SWISH::API.3pm*

%files devel
%defattr(-, root, root)
#%{_mandir}/man3/*.3*
%{_includedir}/*.h
%{_libdir}/*.la
%{_libdir}/*.a
%{_libdir}/*.so
%{_libdir}/pkgconfig/*.pc
%{_bindir}/swish-config

%changelog
* Sun Dec 07 2008 Josh Rabinowitz  2.5.6-6
- fix Source link to tarball, bump release to 6
* Thu Dec 13 2007 David L Norris  2.5.6
- Lots of changes to make Fedora happy.
- Remove SUSE specific stuff.
- Change SWISH::API License to same as perl.
- Changed main license to match http://fedoraproject.org/wiki/Licensing
* Fri Apr 08 2005 Bernhard Weisshuhn  2.4.3-5
- Differentiate between libdir and libexecdir (for x86_64)
- Use swish-config from builddir for perl-build (pretty crude)
- remove buildroot prior to install
- Added pkgconfig and swish-config to devel package
* Sun Nov 07 2004 David L Norris  2.5.2-4
- Simplify File::Find script.  Merge HTML docs with swish-e package.
* Sun Nov 07 2004 David L Norris  2.5.2-3
- Fix dependencies so SWISH::API requires the correct version of SWISH-E.
* Sun Nov 07 2004 David L Norris  2.5.2-2
- Incorporate File::Find script written by Peter Karman 
* Sat Nov 06 2004 David L Norris  2.5.2-1
- Fix SWISH::API build so it compiles without libswish-e being installed.
* Sat Oct 23 2004 David L Norris  2.5.2-0
- Add SWISH::API support. Based roughly on spec from Bernhard Weisshuhn .
* Sun Jul 11 2004 David L Norris  2.5.1
- Made spec a little more generic.
* Fri Oct 24 2003 David L Norris  2.4.0-pr4-0
- Added new files and moved extra documentation and examples to a separate package.
* Mon Jun 30 2003 David L Norris  2.4.0-pr1-1cefha
- Modified spec file to minimize dependences on CEFHA.org server.
* Thu Jun 19 2003 David L Norris  2.4.0-pr1
- Updated RPM spec to provide recently added files
* Sun Apr 20 2003 David L Norris  2.3.5
- Updated RPM to provide the SWISH-E helper scripts.
* Fri Mar 28 2003 David L Norris  2.3.5
- Updated RPM for the new libtool-based 2.3.5 build system.
* Wed Dec 04 2002 David L Norris  2.3-dev04
- Created RPM spec file
swish-e-2.4.7/configure.in0000664000077100017500000002732311166010113012336 00000000000000AC_PREREQ(2.50)/
AC_INIT(src/swish.c)
AC_CONFIG_AUX_DIR(config)

PACKAGE=swish-e

dnl version number
MAJOR_VERSION=2
MINOR_VERSION=4
MICRO_VERSION=7
INTERFACE_AGE=0
BINARY_AGE=0
VERSION=$MAJOR_VERSION.$MINOR_VERSION.$MICRO_VERSION


dnl NOT USED
dnl provide a way to ignore docs
dnl AC_ARG_ENABLE(docs,
dnl              AC_HELP_STRING([--disable-docs], [when building from CVS without doc build tools]),
dnl             docs=no,
dnl             docs=yes)
dnl AM_CONDITIONAL(BUILDDOCS, test x$docs = xyes)



dnl provide a way to build html docs from website
dnl and check if html docs are available for install

SWISH_WEB=""
AM_CONDITIONAL(BUILDDOCS, false )
AM_CONDITIONAL(INSTALLDOCS, false )
AC_ARG_WITH(website,AC_HELP_STRING([--with-website=DIR],[use swish-e.org website src in DIR (YES if found)]),,withval=no)

if test "x$withval" != "xno"; then
    dnl find build program
    SWISH_WEB="$withval/bin/build"

    dnl Not sure how portable -x is (according to the autobook)
    if test ! -f "$SWISH_WEB"; then
        AC_MSG_ERROR([Failed to find program to build swish-e html docs "$SWISH_WEB"])
    fi
else
    AC_PATH_PROG([SWISH_WEB],[build-swish-docs])
fi

if test -n "$SWISH_WEB"; then

    SWISH_WEB_CHK=`$SWISH_WEB -check`

    if test "x$SWISH_WEB_CHK" = xa-ok; then
        AC_MSG_RESULT([Building html docs with $SWISH_WEB])
        AC_SUBST(SWISH_WEB)
        AM_CONDITIONAL(BUILDDOCS, true )
        AM_CONDITIONAL(INSTALLDOCS, true )

    else
        AC_MSG_ERROR([problem running '$SWISH_WEB -check'. Returned '$SWISH_WEB_CHECK'])
    fi

else
    if test -f "$srcdir/html/readme.html"; then
        AM_CONDITIONAL(INSTALLDOCS, true)
    else
        AC_MSG_WARN([** Not installing HTML docs.  "$srcdir/html/README.html" not found **])
    fi
fi






AC_ARG_ENABLE(daystamp,
             AC_HELP_STRING([--enable-daystamp], [Adds today's date to version]),
             daystamp=yes,)

if test x$daystamp = xyes; then
        TODAY=`/bin/date +%Y-%m-%d`
        VERSION="$VERSION-$TODAY"
fi



dnl Header file for -D defines and sets @DEFS@ to -DHAVE_CONFIG_H
AM_CONFIG_HEADER(src/acconfig.h)

AM_INIT_AUTOMAKE($PACKAGE, $VERSION)

dnl Enable DLL builds for Win32.  This must come before AC_PROG_LIBTOOL.
AC_PROG_CC
AM_PROG_CC_STDC
AC_C_CONST
AC_LIBTOOL_WIN32_DLL
AC_PROG_LIBTOOL
AM_PROG_LIBTOOL



dnl prevent automake from generating rules to auto-rebuild tools
dnl see http://sources.redhat.com/automake/automake.html#maintainer-mode
dnl developers: either run configure with --enable-maintainer-mode
dnl or simply rerun ./bootstrap && ./configure when needed

AM_MAINTAINER_MODE



dnl Check for gettimeofday()
AC_CHECK_FUNC(BSDgettimeofday,
              [AC_DEFINE(HAVE_BSDGETTIMEOFDAY,[],[Get time of day])],
              [AC_CHECK_FUNC(gettimeofday, ,
                             [AC_DEFINE(NO_GETTOD,[],[Get time of day])])])




dnl check for #! (shebang)
AC_SYS_INTERPRETER

dnl Set the @SET_MAKE@ variable=make if $(MAKE) not set
AC_PROG_MAKE_SET

dnl Check for Perl - need full path for scripts
AC_PATH_PROG([PERL], [perl], [no])
if test "$PERL" = "false"; then
    AC_MSG_WARN([perl was not found - needed for script shebang lines])
fi


dnl Check pod2man for creating man pages
AC_CHECK_PROG([POD2MAN], [pod2man], [pod2man], [false])
if test "$POD2MAN" = "false"; then
    dnl disable building of man pages?
    AC_MSG_WARN([pod2man was not found - needed for building man pages])
fi

dnl Check for install - used for installing distribution
AC_PROG_INSTALL

dnl -- from src/configure.in --

dnl Check for a C compiler
AC_PROG_CC

dnl Check for vsnprintf in libsnprintf.so
AC_CHECK_LIB(snprintf, vsnprintf)

dnl Checks for header files.
dnl looks for dirent.h and sets HAVE_DIRENT_H -- do we use?
AC_HEADER_DIRENT

AC_HEADER_STAT
AC_HEADER_STDC

dnl Check for some headers
AC_CHECK_HEADERS(unistd.h stdlib.h string.h sys/timeb.h windows.h)
AC_CHECK_HEADERS(sys/resource.h sys/param.h)

AC_HEADER_SYS_WAIT

dnl Checks for typedefs, structures, and compiler characteristics.
AC_C_CONST
AC_TYPE_PID_T
AC_TYPE_SIZE_T
AC_STRUCT_TM

dnl Checks for library functions.
AC_FUNC_ALLOCA
AC_FUNC_STRFTIME
AC_FUNC_VPRINTF
AC_FUNC_FORK
AC_CHECK_FUNCS(waitpid kill)
AC_CHECK_FUNCS(re_comp regcomp strdup strstr lstat access)
AC_CHECK_FUNCS(strchr memcpy)
AC_CHECK_FUNCS(clock times getrusage)
AC_CHECK_LIB(m,log)

dnl Allow use strcoll() instead of strncmp()/strncasecmp() to enable locale dependent collating
AC_FUNC_STRCOLL

AC_FUNC_GETGROUPS

AC_TYPE_GETGROUPS

AC_REPLACE_FUNCS(vsnprintf mkstemp)

dnl Optional building with libxml2

dnl Probably should be 2.4.5 + patches
LIBXML_REQUIRED_VERSION=2.4.3

AC_ARG_WITH(libxml2,AC_HELP_STRING([--with-libxml2=DIR],[use libxml2 in DIR (YES if found)]),,withval=maybe)

dnl if the user explicity asked for no libxml2
if test "$withval" != "no"; then
    dnl find xml2-config program
    XML2_CONFIG="no"
    if test "$withval" != "yes" && test "$withval" != "maybe" ; then
        XML2_CONFIG_PATH="$withval/bin"
        AC_PATH_PROG(XML2_CONFIG, xml2-config,"no", $XML2_CONFIG_PATH)
    else
        XML2_CONFIG_PATH=$PATH
        AC_PATH_PROG(XML2_CONFIG, xml2-config,"no", $XML2_CONFIG_PATH)
    fi

    dnl we can't do anything without xml2-config
    if test "$XML2_CONFIG" = "no"; then
        withval="no"
    else
        withval=`$XML2_CONFIG --prefix`
    fi

    dnl if withval still maybe then we have failed
    if test "$withval" = "maybe"; then
        withval = "no"
    fi
fi

if test "$withval" = "no"; then
    AC_MSG_RESULT([Not building with libxml2 - use --with-libxml2 to enable])
else

    AC_SUBST(LIBXML_REQUIRED_VERSION)
    AC_MSG_CHECKING(for libxml libraries >= $LIBXML_REQUIRED_VERSION)

    AC_DEFUN([VERSION_TO_NUMBER],
    [`$1 | sed -e 's/libxml //' | awk 'BEGIN { FS = "."; } { printf "%d", ([$]1 * 1000 + [$]2) * 1000 + [$]3;}'`])

    dnl
    dnl test version and init our variables
    dnl

    vers=VERSION_TO_NUMBER($XML2_CONFIG --version)
    XML2_VERSION=`$XML2_CONFIG --version`

    if test "$vers" -ge VERSION_TO_NUMBER(echo $LIBXML_REQUIRED_VERSION);then
        LIBXML2_LIB="`$XML2_CONFIG --libs`"
        LIBXML2_CFLAGS="`$XML2_CONFIG --cflags`"
        AC_MSG_RESULT(found version $XML2_VERSION)
    else
        AC_MSG_ERROR(You need at least libxml2 $LIBXML_REQUIRED_VERSION for this version of swish)
    fi


    AC_DEFINE(HAVE_LIBXML2,[],[Libxml2 support included])

    dnl LIBXML2_OBJS="libswishindex_la-parser.lo"
    LIBXML2_OBJS="parser.lo"
    AC_SUBST(LIBXML2_OBJS)
    AC_SUBST(LIBXML2_LIB)
    AC_SUBST(LIBXML2_CFLAGS)
fi



dnl Provide an option for enabling btree/incremental indexing for development
AC_ARG_ENABLE(incremental,
             AC_HELP_STRING([--enable-incremental], [** developer use only **]),btree=yes,)

AC_ARG_ENABLE(psortarray,
             AC_HELP_STRING([--enable-psortarray], [** and use ARRAY persort arrays (if incremental) ]),psortarray=yes,)

if test x$btree = xyes; then
    AC_MSG_WARN([** Buidling with developer only incremental indexing code **])
    BTREE_OBJS="btree.lo array.lo worddata.lo fhash.lo"
    AC_SUBST(BTREE_OBJS)
    AC_DEFINE(USE_BTREE,[],[Experimental BTREE support])

    if test "x$psortarray" = xyes; then
        AC_MSG_WARN([** And using ARRAY presorted tables **])
        AC_DEFINE(USE_PRESORT_ARRAY,[],[Experimental BTREE PRESORT ARRAYS])
    fi
fi


dnl Checks for zlib library. -- from libxml2 configure.in
_cppflags="${CPPFLAGS}"
_ldflags="${LDFLAGS}"

dnl AC_ARG_WITH(zlib,AC_HELP_STRING([--with-zlib=DIR], [use zlib in DIR (YES if found)]),,withval=maybe)

AC_ARG_WITH(zlib,
[  --with-zlib[[=DIR]]       use libz in DIR],[
  if test "$withval" != "no" -a "$withval" != "yes"; then
    Z_DIR=$withval
    CPPFLAGS="${CPPFLAGS} -I$withval/include"
    LDFLAGS="${LDFLAGS} -L$withval/lib"
  fi
])
if test "$with_zlib" = "no"; then
    echo "Disabling compression support"
else
    AC_CHECK_HEADERS(zlib.h,
        AC_CHECK_LIB(z, gzread,[
            AC_DEFINE(HAVE_ZLIB,[],[Do we have zlib])
            if test "x${Z_DIR}" != "x"; then
                Z_CFLAGS="-I${Z_DIR}/include"
                Z_LIBS="-L${Z_DIR}/lib -lz"
                [case ${host} in
                    *-*-solaris*)
                        Z_LIBS="-L${Z_DIR}/lib -R${Z_DIR}/lib -lz"
                        ;;
                esac]
            else
                Z_LIBS="-lz"
            fi]))
fi

dnl mingw build requires this for win32
if test "x${target}" == "xi586-mingw32msvc"; then
     Z_LIBS="-L${Z_DIR}/lib -R${Z_DIR}/lib -lzdll"
fi

echo "Z_LIBS = $Z_LIBS"


AC_SUBST(Z_CFLAGS)
AC_SUBST(Z_LIBS)


CPPFLAGS=${_cppflags}
LDFLAGS=${_ldflags}


PCRE_REQUIRED_VERSION=3.4
AC_ARG_WITH(pcre,AC_HELP_STRING([--with-pcre=DIR], [use pcre in DIR (YES if found)]),,withval=no)

dnl if withval not no then try to enable pcre support
if test "$withval" != "no"; then
    dnl find pcre-config program
    PCRE_CONFIG="no"
    if test "$withval" != "yes" && test "$withval" != "maybe" ; then
        PCRE_CONFIG_PATH="$withval/bin"
        AC_PATH_PROG(PCRE_CONFIG, pcre-config,"no", $PCRE_CONFIG_PATH)
    else
        PCRE_CONFIG_PATH=$PATH
        AC_PATH_PROG(PCRE_CONFIG, pcre-config,"no", $PCRE_CONFIG_PATH)
    fi

    dnl we won't do anything without pcre-config
    if test "$PCRE_CONFIG" = "no"; then
        withval="no"
    else
        withval=`$PCRE_CONFIG --prefix`
    fi

    dnl if withval still maybe then we have failed
    if test "$withval" = "maybe"; then
        withval = "no"
    fi
fi

if test "$withval" != "no"; then
    AC_SUBST(PCRE_REQUIRED_VERSION)
    AC_MSG_CHECKING(for libpcre libraries >= $PCRE_REQUIRED_VERSION)

    AC_DEFUN([VERSION_TO_NUMBER],
    [`$1 | awk 'BEGIN { FS = "."; } { printf "%d", ([$]1 * 1000 + [$]2) * 1000 + [$]3;}'`])

    dnl
    dnl test version and init our variables
    dnl

    vers=VERSION_TO_NUMBER($PCRE_CONFIG --version)
    PCRE_VERSION=`$PCRE_CONFIG --version`

    if test "$vers" -ge VERSION_TO_NUMBER(echo $PCRE_REQUIRED_VERSION);then
        PCRE_LIBS="`$PCRE_CONFIG --libs-posix`"
        PCRE_CFLAGS="`$PCRE_CONFIG --cflags-posix`"
        AC_MSG_RESULT(found version $PCRE_VERSION)
    else
        AC_MSG_ERROR(You need at least libpcre $PCRE_REQUIRED_VERSION for this version of swish)
    fi

    AC_SUBST(PCRE_CFLAGS)
    AC_SUBST(PCRE_LIBS)
    AC_DEFINE(HAVE_PCRE,[],[Perl REGEX library])
else
    AC_MSG_RESULT([Not building with perl compatible regex - use --with-pcre to enable])
fi

dnl enable largefile support by default. disable with --disable-largefile
AC_SYS_LARGEFILE
AC_MSG_NOTICE([fileoffset bits = ${ac_cv_sys_file_offset_bits}])
if test "x${ac_cv_sys_file_offset_bits}" == "x64" ; 
then
	LARGEFILES_MACROS="-D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=$ac_cv_sys_file_offset_bits"
fi
AC_SUBST(LARGEFILES_MACROS)

CPPFLAGS=${_cppflags}
LDFLAGS=${_ldflags}

dnl Set a better default for libexecdir -- Thanks to David Norris!
libexecdiropt=$(echo $ac_option | grep 'libexecdir=')
if test "x$libexecdiropt" = "x"; then
        libexecdir='${exec_prefix}/lib/${PACKAGE}'
        AC_MSG_NOTICE([Setting libexecdir to \${exec_prefix}/lib/${PACKAGE}])
fi

dnl Memory Debugging options

ENABLE_DEFINE([memdebug], [MEM_DEBUG], [(developers only) checks for memory consistency on alloc/free using guards])
ENABLE_DEFINE([memtrace], [MEM_TRACE], [(developers only)  checks for unfreed memory, and where it is allocated] )
ENABLE_DEFINE([memstats], [MEM_STATISTICS], [(developers only) gives memory statistics (bytes allocated, calls, etc)])



dnl Which files to create (some .in files are handled by Makefile.am files)

AC_CONFIG_FILES(
    Makefile
    html/Makefile
    pod/Makefile
    man/Makefile
    src/Makefile
    src/expat/Makefile
    src/replace/Makefile
    src/snowball/Makefile
    rpm/swish-e.spec
    tests/Makefile
    example/Makefile
    prog-bin/Makefile
    filters/Makefile
    filters/SWISH/Makefile
    conf/Makefile
    filter-bin/Makefile
    swish-e.pc
    swish-config)
AC_OUTPUT

swish-e-2.4.7/pod/0000777000077100017500000000000011166013172010672 500000000000000swish-e-2.4.7/pod/Makefile.am0000664000077100017500000000066611166010103012643 00000000000000
# $id$
#
# Conditionally install the pod docs


# Where docs are installed

poddir = $(datadir)/doc/$(PACKAGE)/pod

#if BUILDDOCS
pod_DATA = \
    $(pod_files)

#endif

pod_files = \
    CHANGES.pod \
    INSTALL.pod \
    README.pod \
    SWISH-3.0.pod \
    SWISH-BUGS.pod \
    SWISH-CONFIG.pod \
    swish-e.pod \
    SWISH-FAQ.pod \
    SWISH-LIBRARY.pod \
    SWISH-RUN.pod \
    SWISH-SEARCH.pod


EXTRA_DIST = \
    $(pod_files)
swish-e-2.4.7/pod/SWISH-CONFIG.pod0000664000077100017500000025522211166010103013153 00000000000000=head1 NAME

SWISH-CONFIG - Configuration File Directives

=head1 OVERVIEW

This document lists the available configuration directives available in
Swish-e.

=head1 CONFIGURATION FILE

What files Swish-e indexes and how they are indexed, and where the index
is written can be controlled by a configuration file.

The configuration file is a text file composed of comments, blank
lines, and B.  The order of the directives
is not important.  Some directives may be used more than once in the
configuration file, while others can only be used once (e.g. additional
directives will overwrite preceding directives).  Case of the directive
is not important -- you may use upper, lower, or mixed case.

Comments are any line that begin with a "#".

    # This is a comment

As of 2.4.3 lines may be continued by placing a backslas as the last character
on the line:

    IgnoreWords \
        am \
        the \
        foo


Directives may take more than one parameter.  Enclose single parameters
that include whitespace in quotes (single or double).  Inside of quotes
the backslash escapes the next character.

    ReplaceRules append "foo bar"   <- define "foo bar" as a single parameter

If you need to include a quote character in the value either use a
backslash to escape it, or enclose it in quotes of the other type.

Backslashes also have special meaning in regular expressions.

    FileFilterMatch pdftotext "'%p' -" /\.pdf$/

This says that the dot is a real dot (instead of matching any character).
If you place the regular expression in quotes then you must use
double-backslashes.

    FileFilterMatch pdftotext "'%p' -" "/\\.pdf$/"

Swish-e will convert the double backslash into a single backslash before
passing the parameter to the regular expression compiler.

Commented example configuration files are included in the F
directory of the Swish-e distribution.

Some command line arguments can override directives specified in the
configuration file.  Please see also the L for
instructions on running Swish-e, and the L page for
information and examples on how to search your index.

The configuration file is specified to Swish-e by the C<-c> switch.  For
example,

    swish-e -c myconfig.conf

You may also split your directives up into different configuration files.  This
allows you to have a master configuration file used for many different indexes,
and smaller configuration files for each separate index.  You can specify the
different configuration files when running from the command line with the C<-c>
switch (see L), or you may include other Configuration
file with the B directive below.

Typically, in a configuration file the directives are grouped together in
some logical order -- that is, directives that control the source of the
documents would be grouped together first, and directives that control
how each document is filtered or its words index in another group of
directives. (The directives listed below are grouped in this order).

The configuration file directives are listed below in these groups:    

=over 4

=item *

L -- You may add administrative
information to the header of the index file.

=item *

L -- Directives for selecting the source
documents and the location of the index file.

=item *

L -- Directives that control how a document
content is indexed.

=item *

L -- These directives are only
applicable to the File Access indexing method.

=item *

L -- Likewise, these only apply
to the HTTP Access method.

=item *

L -- These only apply to the prog
Access method.

=item *

L -- This is a special section that describes
using document filters with Swish-e.

=back

=head2 Alphabetical Listing of Directives

=over 4

=item *

L [yes|NO]

=item *

L *string of characters*

=item *

L *string*

=item *

L [*list of buzzwords*|File: path]

=item *

L  [yes|NO]

=item *

L [YES|no]

=item *

L [TXT|HTML|XML|TXT2|HTML2|XML2|TXT*|HTML*|XML*]

=item *

L *seconds*

=item *

L *list of names*

=item *

L *list of names*

=item *

L  [yes|NO]

=item *

L *string of characters*

=item *

L *server alias*

=item *

L *metaname* [replace|remove|prepend|append|regex]

=item *

L *suffix* *program* [options]

=item *

L *program* *options* *regex* [*regex* ...]

=item *

L [yes|NO]

=item *

L [contains|is|regex] *regular expression*

=item *

L [contains|is|regex] *regular expression*

=item *

L [NONE|Stemming|Soundex|Metaphone|DoubleMetaphone]

=item *

L [yes|NO]

=item *

L *metaname*

=item *

L *string of characters*

=item *

L *string of characters*

=item *

L *integer integer*

=item *

L *list of names*

=item *

L *list of characters*

=item *

L [YES|no]

=item *

L [*list of stop words*|File: path]

=item *

L *metaname*

=item *

L

=item *

L *text*

=item *

L *tagname*|as-text

=item *

L [yes|NO]

=item *

L [TXT|HTML|XML|TXT2|HTML2|XML2|TXT*|HTML*|XML*]  *file
extensions*

=item *

L *text*

=item *

L [URL|directories or files]

=item *

L *path*

=item *

L *text*

=item *

L *list of file suffixes*

=item *

L *text*

=item *

L [0|1|2|3]

=item *

L *integer*

=item *

L *integer*

=item *

L *meta name* *list of aliases*

=item *

L *list of names*

=item *

L *integer*

=item *

L *list of file suffixes*

=item *

L [yes|NO]

=item *

L [0|1|2|3]

=item *

L *list of property names*

=item *

L [0-9]

=item *

L *property name* *list of aliases*

=item *

L *list of meta names*

=item *

L *list of meta names*

=item *

L *list of meta names*

=item *

L *list of meta names*

=item *

L *list of meta names*

=item *

L *list of meta names*

=item *

L integer *list of meta names*

=item *

L integer *list of meta names*

=item *

L [replace|remove|prepend|append|regex]

=item *

L  name -x format string

=item *

L *path*

=item *

L [XML EtagE|HTML EmetaE|TXT size]

=item *

L<"SwishProgParameters|/"SwishProgParameters> *list of parameters*

=item *

L   [EAND-WORDE|Eor-wordE]

=item *

L *path*

=item *

L [*string1 string2*|:ascii7:]

=item *

L *number of characters*

=item *

L [error|ignore|INDEX|auto]

=item *

L [DISABLE|error|ignore|index|auto]

=item *

L [yes|NO]

=item *

L [yes|NO]

=item *

L [*list of words*|File: path]

=item *

L *string of characters*

=item *

L *list of XML attribute names*

=back

=head2 Directives that Control Swish

These configuration directives control the general behavior of Swish-e.

=over 4

=item IncludeConfigFile *path to config file*

This directive can be used to include configuration directives located
in another file.

    IncludeConfigFile /usr/local/swish/conf/site_config.config

=item IndexReport [0|1|2|3]

This is how detailed you want reporting while indexing. You can specify
numbers 0 to 3.  0 is totally silent, 3 is the most verbose.   The default
is 1.

This may be overridden from the command line via the C<-v> switch (see
L).

=item ParserWarnLevel [0|1|2|3]

Sets the error level when using the libxml2 parser for XML and HTML.
libxml2 will point out structural errors in your documents.

    0 = no report
    1 = fatal errors
    2 = errors
    3 = warnings

Currently (as of 2.4.4 - early 2005) libxml2 only reports errors at level 2.
The default as of 2.4.4 is "2" which should report any errors that might indicate
a problem parsing a document.

The exception to this is UTF-8 to Latin-1 conversion errors are reported at
level 3 (changed from 1 in 2.4.4).  Although these errors indicate a problem indexing
text, they are only reported at level 3 because they can be very common.

It is recommended that you index at ParserWarnLevel 3 when first starting out to see
what errors and warnings are reported.  Then reduce the level when you understand what
documents are causing parsing problems and why.

=item IndexFile *path*

Index file specifies the location of the generated index file.  If not
specified, Swish-e will create the file F in the current
directory.

    IndexFile /usr/local/swish/site.index

=item obeyRobotsNoIndex [yes|NO]

When enabled, Swish-e will not index any HTML file that contains:

    

The default is to ignore these meta tags and index the document.
This tag is described at http://www.robotstxt.org/wc/exclusion.html.

Note: This feature is only available with the libxml2 HTML parser.

Also, if you are using the libxml2 parser (HTML2 and XML2) then you can use the following
comments in your documents to prevent indexing:

       
       

and/or these may be used also:

       
       

For example, these are very helpful to prevent indexing of common headers, footers, and menus.


=back

B: This following items are currently not available.  These items
require Swish-e to parse the configuration file while searching.


=over 4

=item EnableAltSearchSyntax [yes|NO]

B: This following item is currently not available.

Enable alternate search syntax.  Allows the usage of a basic
"Altavista(c)", "Lycos(c)", etc. like search syntax.  This means a search
query can contain "+" and "-" as syntax parameter.

Example:

    swish-e -w "+word1 +word2 -word3  word4 word5"
    "+"  = following word has to be in all found documents
    "-"  = following word may not be in any document found
    " "  = following word will be searched in documents

=item SwishSearhOperators Eand-wordE Eor-wordE Enot-wordE

B: This following item is currently not available.

Using this config directive you can change the boolean search operators of
Swish-e, e.g. to adapt these to your language.
The default is:    AND  OR  NOT

Example (german):

    SwishSearchOperators   UND  ODER  NICHT

=item SwishSearchDefaultRule   [EAND-WORDE|Eor-wordE]

B: This following item is currently not available.

C defines the default Boolean operator to use if none
is specified between words or phrases.  The default is C.

The word you specify must match one of the available C.

Example:

    SwishSearchOperators   UND  ODER  NICHT
    # Make it act like a web search engine
    SwishSearchDefaultRule ODER

=item ResultExtFormatName name -x format string

B: This following item is currently not available.

The output of Swish-e can be defined by specifying a format string with the
C<-x> command line argument.  Using C you can assign a
predefined format string to a name.

Examples:

    ResultExtFormatName  moreinfo   "%c|%r|%t|%p||\n"

Then when searching you can specify the format string's name

    swish-e   ...  -x moreinfo  ...

See the C<-x> switch in L for more information about
output formats.

=back


=head2 Administrative Headers Directives

Swish-e stores configuration information in the header of the index file.
This information can be retrieved while searching or by functions in
the Swish-e C library.  There are a number of fields available for your
own use.  None of these fields are required:

=over 4

=item IndexName *text*

=item IndexDescription *text*

=item IndexPointer *text*

=item IndexAdmin *text*

These variables specify information that goes into index files to help
users and administrators.  IndexName should be the name of your index,
like a book title.  IndexDescription is a short description of the index
or a URL pointing to a more full description.  IndexPointer should be
a pointer to the original information, most likely a URL.  IndexAdmin
should be the name of the index maintainer and can include name and email
information.  These values should not be more than 70 or so characters
and should be contained in quotes.  Note that the automatically generated
date in index files is in D/M/Y and 24-hour format.

Examples:

    IndexName "Linux Documentation"
    IndexDescription "This is an index of /usr/doc on our Linux machine." 
    IndexPointer http://localhost/swish/linux/index.html
    IndexAdmin webmaster


=back

=head2 Document Source Directives

These directives control I documents are indexed and I they are
accessed.  See also L and L for directives that are
specific to those access methods.


=over 4

=item IndexDir [directories or files|URL|external program]

IndexDir defines the source of the documents for Swish-e.  Swish-e
currently supports three file access methods: B, B
(also called B), and B for reading files from an
external program.

The C<-S> command line argument is used to select the file access method.

    swish-e -c swish.config -S fs    - file system
    swish-e -c swish.config -S http  - internal http spider
    swish-e -c swish.config -S prog  - external program of any type

For the B method of access B is a space-separated
list of files and directories to index.  Use a forward slash as the path
separator in MS Windows.

For the B method the B setting is a list of space-separated
URLs.

For the B method the B setting is a list of space-separated
programs to run (which generate documents for swish to index).

You may specify more than one B directive.

Any sub-directories of any listed directory will also be indexed.

Note: While I directories, Swish-e will ignore any files or
directories that begin with a dot (".").  You may index files or directories
that begin with a dot by specifying their name with C or C<-i>.

Examples:

    # Index this directory an any subdirectories
    IndexDir /usr/local/home/http

    # Index the docs directory in current directory
    IndexDir ./docs

    # Index these files in the current directory
    IndexDir ./index.html ./page1.html ./page2.html
    # and index this directory, too
    IndexDir ../public_html

For the B method of access specify the URL's from which
you want the spidering to begin.

Example:

    IndexDir http://www.my-site.com/index.html
    IndexDir http://localhost/index.html

Obviously, using the B method to index is B slower than indexing
local files.  Be well aware that some sites do not appreciate spidering and may
block your IP address.  You may wish to contact the remote site before
spidering their web site.  More information about spidering can be found in
L below.

For the L method of access B specifies
the path to the program(s) to execute.  The external program must correctly
format the documents being passed back to Swish-e.  Examples of external
programs are provided in the F directory.

    IndexDir ./myprogram.pl

See L for details.


Note: Not all directives work with all methods.

=item NoContents *list of file suffixes*

Files with these suffixes will B have their contents indexed,
but will have their path name (file name) indexed instead.

If the file's type is HTML or HTML2 (as set by C or
C) then the file will be parsed for a HTML title and that
title will be indexed.  Note that you must set the file's type with
C or C: C<.html> and C<.htm> are NOT type HTML
by default.  For example:

   IndexContents HTML* .htm .html

If a title is found, it will still be checked for C, and the
file will be skipped if a match is found.  See C.

If the file's type is not HTML, or it is HTML and no title is found,
then the file's path will be indexed.

For example, this will allow searching by image file name.

    NoContents .gif .xbm .au .mov .mpg .pdf .ps

Note: Using this directive will B cause files with those suffixes to be
indexed.  That is, if you use C to limit the types of files that are
indexed, then you must specify in C the same suffixes listed in
C.

This does B work:

    # Wrong!
    IndexOnly .htm .html
    NoContents .gif .xbm .au .mov .mpg .pdf .ps

A C<-S prog> program may set the C header to enable this feature
for a specific document (although it would be smarter for the C<-S prog>
program to simply only send the pathname or title to be indexed.

=item ReplaceRules [replace|remove|prepend|append|regex]

ReplaceRules allows you to make changes to file pathnames before
they're indexed.  These changed file names or URLs will be returned in
search results.

For example, you may index your files locally (with the File system
indexing method), yet return a URL in search results.  This directive can
be used to map the file names to their respective URLs on your web server.

There are five operations you can specify: B, B,
B, B, and B They will parse the pathname in the
order you've typed these commands.

This directive uses C library regex.h regular expressions.

   replace "the string you want replaced" "what to change it to"
   remove "a string to remove"   
   prepend "a string to add before the result"
   append "a string to add after the result"
   regex  "/search string/replace string/options"

Remember, quotes are needed if an expression contains white space,
and backslashes have special meaning.

Regex is an Extended Regular Expression.  The first character found is 
the delimiter (but it's not smart enough to use matched chars such as [],
(), and {}).

The B string may use substitution variables:

    $0      the entire matched (sub)string
    $1-$9   returns patterns captured in "(" ")" pairs
    $`      the string before the matched pattern
    $'      the string after the matched pattern

The B change the behavior of expression:

    i       ignore the case when matching
    g       repeat the substitution for the entire pattern

Examples:

    ReplaceRules replace testdir/ anotherdir/
    ReplaceRules replace [a-z_0-9]*_m.*\.html index.html

    ReplaceRules remove testdir/

    ReplaceRules prepend http://localhost/
    ReplaceRules append .html

    ReplaceRules regex  !^/web/(.+)/!http://$1.domain.com/!
    replaces a file path:
        /web/search/foo/index.html
    with
        http://search.domain.com/foo/index.html

    ReplaceRules regex  #^#http://localhost/www#
    ReplaceRules prepend http://localhost/www  (same thing)

    # Remove all extensions from C source files
    ReplaceRules remove .c     # ERROR! That "." is *any char*
    ReplaceRules remove \.c    # much better...

    ReplaceRules remove "\\.c" # if in quotes you need double-backslash!  
    ReplaceRules remove "\.c"  # ERROR! "\." -> "." and is *any char*


=item IndexContents [TXT|HTML|XML|TXT2|HTML2|XML2|TXT*|HTML*|XML*]  *file extensions*

The C directive assigns one of Swish-e's document parsers to a
document, based on the its extension.  Swish-e currently knows how to parse
TXT, HTML, and XML documents.

The XML2, HTML2, and TXT2 parsers are currently only available when
Swish-e is configured to use libxml2.

You may use XML*, HTML*, and TXT* to select the parser automatically.
If libxml2 is installed then it will be used to parse the content.  Otherwise,
Swish-e's internal parsers will be used.

Documents that are not assigned a parser with C will, by
default, use the HTML2 parser if libxml2 is installed, otherwise will use
Swish-e's internal HTML parser.  The C directive may be used
to assign a parser to documents that do not match a file extension defined with
the C directive.

Example:

    IndexContents HTML* .htm .html .shtml
    IndexContents TXT*  .txt .log .text
    IndexContents XML*  .xml

HTML* is the default type for all files, unless otherwise specified (and this
default can be changed by the B directive.  Swish-e parses
titles from HTML files, if available, and keeps track of the context of the
text for context searching (see C<-t> in L).

If using filters (with the C directive) to convert documents you
should include those extensions, too.  For example, if using a filter to
convert .pdf to .html, you need to tell Swish-e that .pdf should be indexed by
the internal HTML parser:

    FileFilter  .pdf   pdf2html
    IndexContent  HTML  .pdf

See also L.

B Some of this may be changed in the future to use content-types instead
of file extensions.  See L

=item DefaultContents [TXT|HTML|XML|TXT2|HTML2|XML2|TXT*|HTML*|XML*]

This sets the default parser for documents that are not specified in
B. If not specified the default is HTML.

The XML2, HTML2, and TXT2 parsers are currently only available when
Swish-e is configured to use libxml2.

You may use XML*, HTML*, and TXT* to select the parser automatically.
If libxml2 is installed then it will be used to parse the content.  Otherwise,
Swish-e's internal parsers will be used.


Example:

    DefaultContents HTML

The C directive I be used when spidering, as HTML
files may be returned without a file extension (such as when requesting a
directory and the default index.html is returned).


=item FileInfoCompression [yes|NO]

** This directive is currently not supported **

Setting B to C will compress the index file to save
disk space.  This may result in longer indexing times.  The default is C.

Also see the C<-e> switch in L for saving RAM during
indexing.


=back

=head2 Document Contents Directives

These directives control what information is extracted from your source
documents, and how that information is made available during searching.

=over 4

=item ConvertHTMLEntities [YES|no]

ASCII I can be converted automatically while indexing documents of
type HTML (not for HTML2).  For performance reasons you may wish to set this to
C if your documents do not contain HTML entities.  The default is C.

If C is set C the entities will be indexed without
conversion.

B Entities within XML files and files parsed with libxml2 (HTML2) are
converted regardless of this setting.

=item MetaNames *list of names*

META names are a way to define "fields" in your XML and HTML documents.  You
can use the META names in your queries to limit the search to just the words
contained in that META name of your document.  For example, you might have a
META tagged field in your documents called C and then you can search
your documents for the word "foo" but only return documents where "foo" is
within the C META tag.

    swish-e -w subjects=foo

(See also the C<-t> switch in L for information about
I searching in HTML documents.)

The B directive is a space separated list.  For example:

    MetaNames meta1 meta2 keywords subjects

You may also use C to specify automatic extraction of meta
names from your HTML and XML documents, and also to ignore indexing content of
meta tags.

META tags can have two formats in your B source documents:

    

and (if using the HTML2/libxml2 parser)

    
        some content
    

But this second version is invalid HTML, and will generate a warning if
ParserWarningLevel is set (libxml2 only).

And in B documents, use the format:

    
        Some Content
    

Then you can limit your search to just META B like this:

    swish-e -w 'meta1=(apples or oranges)'

You may nest the XML and the start/end tag versions:

    
        
            some content
        
        
            some other content
        
    

Then you can search in both tag2 and tag2 with:

    swish-e -w 'keywords=(query words)'

Swish-e indexes all text as some metaname.  The default is C, so
these two queries are the same:

    swish-e -w foo
    swish-e -w swishdefault=foo

When indexing HTML Swish-e indexes the HTML title as default text, so
when searching Swish-e will find matches in both the HTML body and the
HTML title.  Swish also, by default, indexes content of meta tags.  So:

    swish-e -w foo

will find "foo" in the body, the title, or any meta tags.

Currently, there's no way to prevent Swish-e from indexing the title contents
along with the body contents, but see C for how to control
the indexing of meta tags.

If you would like to search just the title text, you may use:

    MetaNames swishtitle

This will index the title text separately under the built-in swish
internal meta name "swishtitle".  You may then search like

    swish-e -w foo  -- search for "foo" in title, body (and undefined meta tags)
    swish-e -w swishtitle=foo -- search for "foo" in title only

In addition to swishtitle, you can limit searches to documents' path with:

   MetaNames swishdocpath

Then to search for "foo" but also limit searches to documents that include
"manual" or "tutorial" in their path:

   swish-e -w foo swishdocpath=(manual or tutorial)

See also C.


=item MetaNameAlias *meta name* *list of aliases*

MetaNameAlias assigns aliases for a meta name.  For example, if your
documents contain meta tags "description", "summary", and "overview"
that all give a summary of your documents you could do this:

    MetaNames summary
    MetaNameAlias summary description overview

Then all three tags will get indexed as meta tag "summary".  You can
then search all the fields as:

    -w summary=foo

The Alias work at search time, too.  So these will also limit the search
to the "summary" meta name.

    -w description=foo
    -w overview=foo

=item MetaNamesRank integer *list of meta names*

You can assign a bias to metanames that will affect how ranking is
calculated.  The range of values is from -10 to +10, with zero being
no bias.

    MetaNamesRank 4 subject
    MetaNamesRank 3 swishdefault
    MetaNamesRank 2 author publisher
    MetaNamesRank -5 wrongwords

This feature is still considered experimental. If you use it, please send feedback
to the discussion list.

=item HTMLLinksMetaName *metaname*

Allows indexing of HTML links.  Normally, HTML links (href tags) are
not indexed by Swish-e.  This directive defines a metaname, and links
will be indexed under this meta name.

Example:

    HTMLLinksMetaName links

Now, to limit searches to files with a link to "home.html" do this:

    -w links='"home.html"'

The double quotes force a phrase search.    

To make Swish-e index links as normal text, you may use:

    HTMLLinksMetaName swishdefault

This feature is only available with the libxml2 HTML parser.    

=item ImageLinksMetaName *metaname*

Allows indexing of image links under a metaname.  Normally, image URLs
are not indexed.

Example:

    ImagesLinksMetaName images

Now, if you would like to find pages that include a nice image of a beach:

    -w images='beach'

To make Swish-e index links as normal text, you may use:

    ImageLinksMetaName swishdefault

This feature is only available with the libxml2 HTML parser.


=item IndexAltTagMetaName *tagname*|as-text

Allows indexing of images EIMGE ALT tag text.  Specify either a tag name which will be
used as a metaname, or the special text "as-text" which says to index the ALT text as
if it were plain text at the current location.

For example, by specifying a tag name:

   IndexAltTagMetaName bar

would make this markup:   

    
        Alt text here
    

appear like

    
        Alt text here
    

Then the normal rules (C and C) apply to how that
text is indexed.

If you use the special tag "as-text" then

    
        Alt text here
    

simply becomes

    
        Alt text here
    

This feature is only available when using the libxml2 parser (HTML2 and XML2).    


=item AbsoluteLinks [yes|NO]

If this is set true then Swish-e will attempt to convert relative URIs
extracted from HTML documents for use with C and
C into absolute URIs.  Swish-e will use any EBASEE tag
found in the document, otherwise it will use the file's pathname.  The pathname
used will be the pathname *after* C has been applied to the
document's pathname.

For example, say you wish to index image links under the metaname
"images".

    ImageLinksMetaName images

If an image is located in http://localhost/vacations/france/index.html and
C is set to no, then a image within that document:

     

will only index "beach.jpeg".

But, if you want more detail when searching, you can enable C
and Swish-e will index "http://localhost/vacations/france/beach.jpeg".  You can
then look for images of beaches, but only in France:

    -w images=(beach and france)

This also means you can search for any images within France:

    -w images=(france)

This feature is only available with the libxml2 HTML parser.    

=item UndefinedMetaTags [error|ignore|INDEX|auto]

This directive defines the behavior of Swish-e during indexing when a
meta name is found but is B listed in B.  There are
four choices:


=over 2

=item error

If a meta name is found that is not listed in B
then indexing will be halted and an error reported.

=item ignore

The contents of the meta tag are ignored and B indexed unless a metaname
has been defined with the C directive.

=item index

The contents of the meta tag are indexed, but placed in the
main index unless there's an enclosing metatag already in force. This
is the default.

=item auto

This method create meta tags automatically for HTML meta names
and XML elements.  Using this is the same as specifying all the meta
names explicitly in a B directive.

=back

=item UndefinedXMLAttributes [DISABLE|error|ignore|index|auto]

This is similar to C, but only applies to XML documents
(parsed with libxml2).  This allows indexing of attribute content, and provides
a way to index the content under a metaname.  For example,
C can make

    
          John Doe
    

look like the following to swish:

    
        
            23
        
        John Doe
    

What happens to the text "23" will depend on the setting of
C:    

=over 2

=item disable

XML attributes are not parsed and not indexed.  This is the default.

=item error

If the concatenated meta name (e.g. person.age) is not listed in
B then indexing will be halted and an error reported.

=item ignore

The contents of the meta tag are ignored and B indexed unless a metaname
has been defined with the C directive.

=item index

The contents of the meta tag are indexed, but placed in the main index
unless there's an enclosing metatag already in force.

=item auto

This method will create meta tags from the combined element and attributes
(and XML Class name) This options should be used with caution as it can
generate a lot of metaname entries.

See also the example below C.


=back

=item XMLClassAttributes *list of XML attribute names*

Combines an XML class name with the element name to make up a metaname.
For example:

    XMLClassAttributes class

    
        John
    
    
        Doe
    

Will appear to Swish-e as:

    
        
        John
        
    
    
        
        Doe
        
    

How the data is indexed depends on C and C.

Here's an example using the following configuration which combines the two
directives C and C.

    XMLClassAttributes class
    UndefinedMetaTags auto
    UndefinedXMLAttributes auto
    IndexContents XML2 .xml

The source XML file looks like:    

      John 
    Bill 

Swish-e parses as:

    ./swish-e -c 2 -i 1.xml -T parsed_tags  parsed_text  -v 0
    Indexing Data Source: "File-System"

     (MetaName)

         (MetaName)
             (MetaName)
                 (MetaName)
                    555-1212
                 
                 (MetaName)
                    102
                 
                John
         

         (MetaName)
             (MetaName)
                howdy
             
            Bill
         

     
    Indexing done!

One thing to note is that the first EpersonE block finds a class name
"student" so all metanames that are created from attributes use the
combined name "person.student".  The second EpersonE block doesn't contain
a "class" so, the attribute name is combined directly with the element
name (e.g. "person.greeting").

=item ExtractPath *metaname* [replace|remove|prepend|append|regex]

This directive can be used to index extracted parts of a document's path.
A common use would be to limit searches to specific areas of your
file tree.

The extracted string will be indexed under the specified meta name.

See C for a description of the various pattern replacement
methods, but you will use the I method.

For example, say your file system (or web tree) was organized into departments:

    /web/sales/foo...
    /web/parts/foo...
    /web/accounting/foo...

And you wanted a way to limit searches to just documents under "sales".

    ExtractPath department regex !^/web/([^/]+)/.*$!$1!

Which says, extract out the department name (as substring $1) and index it as
meta name C.  Then to limit a search to the sales department:

    swish-e -w foo AND department=sales

Note that the C method uses a substitution pattern, so to index only a
sub-string match the I document path in the regular expression, as
shown above.  Otherwise any part that is not matched will end up in the
substitution pattern.

See the C option for a way to set a value if not patterns
match.

Although unlikely, you may use more than one C directive.  More
than one directive of the I meta name will operate successively (in order
listed in the configuration file) on the path.  This allows you to use regular
expressions on the results of the previous pattern substitution (as if piping
the output from one expression to the patter of the next).

    ExtractPath foo regex !^(...).+$!$1!
    ExtractPath foo regex !^.+(.)$!$1!

So, the third letter is indexed as meta name "foo" if both patterns match.    

    ExtractPath foo regex !^X(...).+$!$1!
    ExtractPath foo regex !^.+(.)$!$1!

Now (not the "X"), if the first pattern doesn't match, the last character of
the path name is indexed.  You must be clear on this behavior if you are using
more than one C directive with the same metaname.

The document path operated on is the real path swish used to access the
document.  That is, the C directive has no effect on the path
used with C.

The full path is used for each meta name if more than one C
directive is used.  That is, changes to the path used in C do
not affect the path used by C.

=item ExtractPathDefault *metaname* default_value

This can be used with C to set a default string to index under the
given metaname if none of the C patterns match.

For example, say your want to index each document with a metaname
"department" based on the following path examples:

    /web/sales/foo...
    /web/parts/foo...
    /web/accounting/foo...

But you are also indexing documents that do not follow that pattern and you want to search those
separately, too.

    ExtractPath department regex !^/web/([^/]+)/.*$!$1!
    ExtractPathDefault department other

Now, you may search like this:

    -w foo department=(sales)      - limit searches to the sales documents
    -w foo department=(parts)      - limit searches to the parts documents
    -w foo department=(accounting) - limit searches to the accounting documents
    -w foo department=(other)      - everything but sales, parts, and accounting.

This basically is a shortcut for:

    -w foo not department=(sales or parts or accounting)

but you don't need to keep track of what was extracted.    

=item PropertyNames *list of meta names*

=item PropertyNamesCompareCase *list of meta names*

=item PropertyNamesIgnoreCase *list of meta names*

Swish-e allows you to specify certain META tags that can be used as B.  The contents of any META tag that has been identified as a
document property can be returned as part of the search results along with the
rank, file name, title, and document size (see the C<-p> and C<-x> switches in
L).

Properties are useful for returning additional data from documents in
search results -- this saves the effort of reading and parsing the source
files while reading Swish-e search results, and is especially useful
when the source documents are no longer available or slow to access
(e.g. over http).

Another feature of properties is that Swish-e can use the PropertyNames for
sorting the search results (see the C<-s> switch).

    PropertyNames author subjects

Two variations are available.  C and
C.  These tell Swish-e to either ignore or compare
case when sorting results.  The default for C is to ignore the
case.

    PropertyNamesIgnoreCase subject
    PropertyNamesCompareCase keyword

The defaults for "internal" properties are:

    swishtitle          --  ignore the case
    swishdocpath        --  compare case
    swishdescription    --  compare case

These can be overridden with C and
C.

    PropertyNamesCompareCase swishtitle    

Use of PropertyNames will increase the size of your index files, sometimes
significantly.  Properties will be compressed if Swish-e is compiled with zlib
as described in the L manual page.

If Swish-e finds more than one property of the same name in a document
the property's contents will be concatinated for strings, and a warning
issues for numeric (or date) properties.

=item PropertyNamesNoStripChars

PropertyNamesNoStripChars specifies that the listed properties should not
have strings of low ASCII characters replaced with a space character.
Properties will be stored as found in the document.

When printing properties with the swish-e binary newlines are replaced with
a space character.  Use the swish-e library (or SWISH::API perl module) to
fetch properties without newlines replaced.


=item PropertyNamesNumeric

This directive is similar to C, but it flags the property as
being a string of digits (integer value) that will be stored as binary data
instead of a string.  This allows sorting with C<-s> and limiting with C<-L> to
sort and limit the property correctly.

Swish-e uses C to convert the string into an unsigned long integer.
Therefore, only positive integers can be stored.

Future versions of Swish-e may be able to store different property types
(such as negative integers and real numbers).  This directive may change
in future releases of Swish.

=item PropertyNamesDate

This directive is exactly like C, but it also flags the
number as a machine timestamp (seconds since Epoch), and will print a formatted
date when returning this property.  See C<-x> in L.

Swish-e will not parse dates when indexing; you must use a timestamp.

=item PropertyNameAlias  *property name* *list of aliases*

This allows aliases for a property name.  For example, if you are indexing
HTML files, plus XML files that are written in English, German, and
Spanish and thus use the tags "title", "titel", and "título" you can use:

    PropertyNameAlias swishtitle title titel título titulo

Note that "swishtitle" is the built-in property used to store the title of
a document, and therefore you do not need to specify it as a PropertyName
before use.

=item PropertyNamesMaxLength  integer *list of meta names*

This option will set the max length of the text stored in a property.
You must specify a number between 0 and the max integer size on your
platform, and a list of properties.  The properties specified must not
be aliases.

If any of the property names do not exist they will be created (e.g. you
do not need to define the property with PropertyNames first).

In general, this feature will only be useful when parsing HTML or XML
with the libxml2 parser.

For example:

    PropertyNamesMaxLength 1000 swishdescription
    PropertyNameAlias swishdescription body

Is somewhat like

    StoreDescription HTML  1000
    StoreDescription XML  1000
    StoreDescription HTML2  1000
    StoreDescription XML2  1000

but StoreDescription allows setting the tag for each parser type.

    PropertyNamesMaxLength 1000 headings
    PropertyNameAlias headings h1 h2 h3 h4

collects all the heading text into a single property called "headings", not
to exceed 1000 characters.

=item PropertyNamesSortKeyLength  integer *list of meta names*

Sets the length of the string used when sorting.
The default is 100 characters.  The -T metanames debugging option will
list the current values for an index.

This setting is used when sorting during indexing, and perhaps when sorting
while searching.  It also effects the order when limiting to a range of values
with the -L option.

=item PreSortedIndex *list of property names*

By default Swish-e generates presorted tables while indexing for each
property name.  This allows faster sorting when generating results.
On large document collections this presorting may add to the indexing
time, and also adds to the total size of the index.  This directive can
be used to customize exactly which properties will be presorted.

If C it is I present in the config file (default action),
all the properties will be presorted at indexing time.  If it is present
without any parameter, no properties will be presorted.  Otherwise, only the
property names specified will be presorted.

For example, if you only wish to sort results by a property called C:

    PropertyNames title age time
    PreSortedIndex  title


=item StoreDescription [XML E<lt>tagE<gt> size|HTML E<lt>metaE<gt> size|TXT size]

B<StoreDescription> allows you to store a document description in the index
file.  This description can be returned in your search results when the C<-x>
switch is used to include the I<swishdescription> for extended results, or by
using C<-p swishdescription>.

The document type (XML, HTML and TXT) must match the document type currently
being indexed as set by C<IndexContents> or C<DefaultContents>.  See those
directives for possible values.  A common problem is using C<StoreDescription>
yet not setting the document's type with C<IndexContents> or
C<DefaultContents>.  Another problem is different types:

    IndexContents HTML2 .html
    StoreDescription HTML <body>

Then .html documents are assigned a type of HTML2 (and parsed by the libxml2 parser), but the
description will not be stored since it is type HTML instead of HTML2.

For text documents you specify the type TXT (or TXT2 or TXT*) and the number of I<characters> to capture.

    StoreDescription TXT 20

The above stores only the first twenty characters from the text file in the Swish-e index
file.

For HTML, and XML file types, specify the tag to use for the
description, and optionally the number of characters to capture.  If not
specified will capture the entire contents of the tag.

    StoreDescription HTML <body> 20000
    StoreDescription XML  <desc> 40

Again, note that documents must be assigned a document type with
C<IndexContents> or C<DefaultContents> to use this feature.

Swish-e will compress the descriptions (or any other large property) if
compiled to use zlib (see L<INSTALL|INSTALL>).  This is recommended when using
StoreDescription and a large number of documents.  Compression of 30% to 50% is
not uncommon with HTML files.

=item PropCompressionLevel [0-9]

This directive sets the compression level used when storing properties
to disk.  A setting of zero is no compression, and a setting of nine is
the most compression.

The default depends on the default setting compiled with zlib, but is
typically six.

This option is useful when using C<StoreDescription> to store a large amount
text in properties (or if using C<PropertyNames> with large property sizes).

Properties must be over a value defined in F<config.h> (100 is the
default) before compression will be attempted.  Swish-e will never store
the results of the compression if the compressed data is larger than
the original data.

This option is only available when Swish-e is compiled with zlib support.


=item TruncateDocSize *number of characters*

TruncateDocSize limits the size of a document while indexing documents
and/or using filters.  This config directive truncates the numbers of
read bytes of a document to the specified size.  This means: if a document
is larger, read only the specified numbers of bytes of the document.

Example:

    TruncateDocSize    10000000

The default is zero, which means read all data.


Warning: If you use TruncateDocSize, use it with care!  TruncateDocSize
is a safety belt only, to limit e.g.  filteroutput, when accessing
databases, or to limit "runnaway" filters.  Truncating doc input may
destroy document structures for Swish-e (e.g.  swish may miss closing
tags for XML or HTML documents).

TruncateDocSize does not currently work with the C<prog> input source method.

=item FuzzyIndexingMode NONE|Stemming|Soundex|Metaphone|DoubleMetaphone

Selects the type of index to create.  Only one type of index may be created.

It's a good idea to create both a normal index and a fuzzy index and
allow your search interface select which index to use.  Many people find the
fuzzy searches to be too fuzzy.

The available fuzzy indexing options can be displayed by running

   swish-e -T LIST_FUZZY_MODES

Available options include:

=over 4

=item None

Words are stored in the index without any conversion.  This is the default.

=item Stemming_*

This options uses one of the installed Snowball stemmers (http://snowball.tartarus.org/).

The installed stemmers can be viewed by running

   swish-e -T LIST_FUZZY_MODES

For example, to use the Spanish stemming module:

   FuzzyIndexingMode Stemming_es


=item Stem or Stemming_en

B<**This option is no longer supported.**>

Selects the legacy Swish-e English stemmer.

This is deprecated in favor of the Snowball English stemmer Stemming_en1.

Words are converted using the Porter stemming algorithm.

From: http://www.tartarus.org/~martin/PorterStemmer/

    The Porter stemming algorithm (or Porter stemmer) is a
    process for removing the commoner morphological and inflexional
    endings from words in English. Its main use is as part of a
    term normalisation process that is usually done when setting up
    Information Retrieval systems.


This will help a search for "running" to also find "run" and "runs", for example.

The stemming function does not convert words to their root, rather
programmatically removes endings on words in an attempt to make similar
words with different endings stem to the same string of characters.
It's not a perfect system, and searches on stemmed indexes often return
curious results.  For example, two entirely different words may stem to
the same word.

Stemming also can be confusing when used with a wildcard (truncation).
For example, you might expect to find the word "running" by searching for
"runn*".  But this fails when using a stemmed index, as "running" stems to
"run", yet searching for "runn*" looks for words that start with "runn".

=item Soundex

Soundex was developed in the 1880s so records for people with similar
sounding names could be found more readily.  Soundex is a coded surname
based on the way a surname sounds rather than spelling.  Surnames that
sound similar, like Smith and Smyth, are filed together under the same
Soundex code.  This is mostly useful for US English.

Soundex should not be used to search for sound-alike words.  Metaphone
would be more appropriate for generic sound matching of words.  Soundex
should only be used where you need to search multiple documents for
proper names which sound similar.  This is primarily used for indexing
genealogical records.  This may be useful for indexing other collections
of data consisting mostly of names.  Many common name variations are
matched by Soundex.  The only notable exception is the first letter of
the name.  The first letter is not matched for sound.

=item Metaphone and DoubleMetaphone

Words are transformed into a short series of letters representing the sound of the word (in English).
Metaphone algorithms are often used for looking up mis-spelled words in dictionary programs.

From: http://aspell.sourceforge.net/metaphone/

    Lawrence Philips' Metaphone Algorithm is an algorithm which returns
    the rough approximation of how an English word sounds.

The C<DoubleMetaphone> mode will sometimes generate two different metaphones
for the same word.  This is supposed to be useful when a word may be pronounced
more than one way.

A metaphone index should give results somewhere in between Soundex and Stemming.    

=back

=item UseStemming [yes|NO]

Put yes to apply word stemming algorithm during indexing, else no.

    UseStemming no
    UseStemming yes

When UseStemming is set to C<yes> every word is stemmed before placing it in to
the index.

This option is deprecated.  It has been superceded by C<FuzzyIndexingMode>.

=item UseSoundex [yes|NO]

When UseSoundex is set to C<yes> every word is converted to a Soundex code
before placing it in to the index.

This option is deprecated.  It has been superceded by C<FuzzyIndexingMode>.

=item IgnoreTotalWordCountWhenRanking [YES|no]

Put yes to ignore the total number of words in the file when calculating
ranking. Often better with merges and small files. Default is yes.

    IgnoreTotalWordCountWhenRanking no

The default was changed from no to yes in version 2.2.

B<NOTE:> must be set to B<no> if you intend to use the -R 1 option when
searching.

=item MinWordLimit *integer*

Set the minimum length of an word. Shorter words will not be indexed.
The default is 1 (as defined in F<src/config.h>).

    MinWordLimit 5

=item MaxWordLimit *integer*

Set the maximum length of an indexable word. Every longer word will not
be indexed.  The Default is 40 (as defined in F<src/config.h>).

=item WordCharacters *string of characters*

=item IgnoreFirstChar *string of characters*

=item IgnoreLastChar *string of characters*

=item BeginCharacters *string of characters*

=item EndCharacters *string of characters*


These settings define what a word consists of to the Swish-e indexing engine.
Compiled in defaults are in F<src/config.h>.

When indexing Swish-e uses B<WordCharacters> to split up the document
into words.  Words are defined by any string of non-blank characters
that contain only the characters listed in WordCharacters.  If a string
of characters includes a character that is not in WordCharacters then
the word will be spit into two or more separate words.

For example:

    WordCharacters abde

Would turn "abcde" into two words "ab" and "de".

Next, of these words, any characters defined in B<IgnoreFirstChar> are
stripped off the start of the word, and B<IgnoreLastChar> characters
are stripped off the end of the word.  This allows, for example,
periods within a word (www.slashdot.com), but not at the end of
a word.  Characters in IgnoreFirstChar and IgnoreLastChar must be in
WordCharacters.

Finally, the resulting words MUST begin with one of the characters
listed in B<BeginCharacters> and end with one of the characters listed in
B<EndCharacters>.  BeginCharacters and EndCharacters must be a subset of
the characters in WordCharacters.  Often, WordCharacters, BeginCharacters
and EndCharacters will all be the same.

Note that the same process applies to the query while searching.

Getting these settings correct will take careful consideration and practice.
It's helpful to create an index of a single test file, and then look at the
words that are placed in the index (see the C<-v 4>, C<-D> and C<-k> searching
switches).

Currently there is only support for eight-bit characters.

Example:

    WordCharacters  .abcdefghijklmnopqrstuvwxyz
    BeginCharacters abcdefghijklmnopqrstuvwxyz
    EndCharacters   abcdefghijklmnopqrstuvwxyz
    IgnoreFirstChar .
    IgnoreLastChar  .

So the string

    Please visit http://www.example.com/path/to/file.html.

will be indexed as the following words:

    please
    visit
    http
    www.example.com
    path
    to
    file.html

Which means that you can search for C<www.example.com> as a single word, but
searching for just C<example> will not find the document.

Note: when indexing HTML documents HTML entities are converted to their
character equivalents before being processed with these directives.  This is a
change from previous versions of Swish-e where you were required to include the
characters C<0123456789&#;> to index entities.  See also C<ConvertHTMLEntities>

=item Buzzwords [*list of buzzwords*|File: path]

The Buzzwords option allows you to specify words that will be indexed
regardless of WordCharacters, BeginCharacters, EndCharacters, stemming,
soundex and many of the other checks done on words while indexing.

Buzzwords are case insensitive.

Buzzwords should be separated by spaces and may span multiple directives.  If
the special format C<File:filename> is used then the Buzzwords will be read
from an external file during indexing.

Examples:

    Buzzwords C++ TCP/IP

    Buzzwords File: ./buzzwords.lst

If a Buzzword contains search operator characters they must be backslashed
when searching.  For example:

    Buzzwords C++ TCP/IP web=http

    ./swish-e -w 'web\=http'

Buzzwords are found by splitting the text on whitespace, removing
C<IgnoreFirstChar> and C<IgnoreLastChar> characters from the word, and then
comparing with the list of C<Buzzwords>.  Therefore, if adding C<Buzzwords> to
an index you will probably want to define C<IgnoreFirstChar> and
C<IgnoreLastChar> settings.

Note: Buzzwords specific settings for C<IgnoreFirstChar> and C<IgnoreLastChar>
may be used in the future.

=item CompressPositions  [yes|NO]

This option enables zlib compression for individual word data in the index file.
The default is NO, that is the index word data is not compressed by default.

Enabling this option can reduced the size of the index file, but at the expense of
slower wildcard search times.

The default changed from YES to NO starting with version 2.4.3.




=item IgnoreWords [*list of stop words*|File: path]

The IgnoreWords option allows you to specify words to ignore, called
I<stopwords>.  The default is to not use any stopwords.

Words should be separated by spaces and may span multiple directives.  If the
special format C<File:filename> is used then the stop words will be read from
an external file during indexing.

In previous versions of Swish-e you could use the directive

    IgnoreWords swishdefault - obsolete!

to include a default list of compiled in stopwords.  This keyword is no
longer supported.

Examples:

    IgnoreWords www http a an the of and or

    IgnoreWords File: ./stopwords.de

=item UseWords [*list of words*|File: path]

UseWords defines the words that Swish-e will index.  B<Only> the words
listed will be indexed.

You can specify a list of words following the directive (you may specify more
than one C<UseWords> directive in a config file), and/or use the C<File:> form
to specify a path to a file containing the words:

    UseWords perl python pascal fortran basic cobal php
    UseWords File: /path/to/my/wordlist

Please drop the Swish-e list a note if you actually use this feature.
It may be removed from future versions.

=item IgnoreLimit *integer integer*

This automatically omits words that appear too often in the files (these
words are called stopwords). Specify a whole percentage and a number,
such as "80 256". This omits words that occur in over 80% of the files
and appear in over 256 files. Comment out to turn off auto-stopwording.

    IgnoreLimit 50 1000

Swish-e must do extra processing to adjust the entire index when this
feature is used.  It is recommended that instead of using this feature
that you decided what words are stopwords and add them to B<IngoreWords>
in your configuration file.  To do this, use IgnoreLimit one time and
note the stop words that are found while indexing.  Add this list to
IgnoreWords, and then remove IgnoreLimit from the configuration file.

=item IgnoreMetaTags *list of names*

C<IgnoreMetaTags> defines a list of metatags to ignore while indexing XML files
(and HTML files if using libxml2 for parsing HTML).  All text within the tags
will be ignored -- both for indexing (C<MetaNames>) and properties
(C<PropertyNames>).  To still parse properties, yet do not index the text, see
C<UndefinedMetaTags>.

This option is useful to avoid indexing specific data from a file.
For example:

    <person>
        <first_name>
            William
        </first_name> <last_name>
            Shakespeare
        </last_name> <updated_date>
            April 25, 1999
        </updated_date>
    </person>

In the above example you might B<not> want to index the updated date,
and therefore prevent finding this record by searching

    -w 'person=(April)'

This is solved by:

    IgnoreMetaTags updated_date


See also C<UndefinedMetaTags>.

=item IgnoreNumberChars *list of characters*

Experimental Feature

This experimental feature can be used to define a set of characters that
describe a number.  If a word is found to contain only those characters it will
not be indexed.  The characters listed must be part of C<WordCharacters>
settings.  In other words, the "word" checked is a word that Swish-e would
otherwise index.

For example,

    IgnoreNumberChars 0123456789$.,

Then Swish-e would not index the following:

    123
    123,456.78
    $123.45

You might be tempted to avoid indexing hex numbers with:

    IgnoreNumberChars 0123456789abcdef

which will not index 0D31, but will also not index the word "bad".

This is an experimental feature that may change in future versions.
One possible change is to use regular expressions instead.


=item IndexComments [NO|yes]

This option allows the user decide if to index the contents of HTML
comments.  Default is no. Set to yes if comment indexing is required.

    IndexComments yes

Note: This is a change in the default behavior prior to version 2.2.

=item TranslateCharacters [*string1 string2*|:ascii7:]

The TranslateCharacters directive maps the characters in string1 to the
characters listed in string2.

For example:

    # This will index a_b as a-b and ámo as amo
    TranslateCharacters _á -a

C<TranslateCharacters :ascii7:> is a predefined set of characters that will
translate eight bit characters to ascii7 characters.  Using the :ascii7: rule
will translate "Ääç" to "aac". This means: searching "Çelik", "çelik" or
"celik" will all match the same word.

TranslateCharacters is done early in the indexing process, after
converting HTML entities but before splitting the input text into words
based on B<WordCharacters>.  So characters you are translating I<from>
do not need to be listed in word characters.

The same character translations take place when searching.

=item BumpPositionCounterCharacters *string*

When indexing Swish-e assigns a word position to each word.  This enables
phrase searching.  There may be cases where you would like to prevent
phrase matching.  The BumpPositionCounterCharacters directive allows
you to specify a set of characters that when found in the text will
increment the word position -- effectively preventing phrase matches
across that character.

For example, if you have a tag:

    <subjects>
        computer programming | apple computers
    </subjects>

You might want to prevent matching "programming apple" in that meta name.

    BumpPositionCounterCharacters |

There is no default, and you may list a string of characters.

=item DontBumpPositionOnEndTags *list of names*

=item DontBumpPositionOnStartTags *list of names*

Since metatags are typically separate data fields, the word position counter is
automatically bumped between metatags (actually, bumped when a start tag is
found and when an end tag is found).  This prevents matching a phrase that
spans more than one metaname.  C<DontBumpPositionOnEndTags> and
C<DontBumpPositionOnStartTags> disables this feature for the listed metanames.

For example,

    <person>
        <first_name>
            William
        </first_name>
        <last_name>
            Shakespeare
        </last_name>
        <updated_date>
            April 25, 1999
        </updated_date>
    </person>

In the configuration file:

    DontBumpPositionOnEndTags first_name
    DontBumpPositionOnStartTags last_name

This configuration allows this phrase search

    -w 'person=("william shakespeare")'

but this phrase search will fail

    -w 'person=("shakespeare april")'



=back


=head2 Directives for the File Access method only

Some directives have different uses depending on the source of the
documents.  These directives are only valid when using the B<File system>
method of indexing.

=over 4

=item IndexOnly *list of file suffixes*

This directive specifies the allowable file suffixes (extensions) while
indexing.  The default is to index all files specified in B<IndexDir>.

    # Only index .html .htm and .q files
    IndexOnly .html .htm .q

C<IndexOnly> checks that the file end in the characters listed.  It does not
check "extensions".  C<IndexOnly> is tested right before C<FileRules> is
processed.

=item FollowSymLinks [yes|NO]

Put "yes" to follow symbolic links in indexing, else "no".  Default is no.

    FollowSymLinks no
    FollowSymLinks yes

Note that when set to C<no> extra stat(2) system calls must be made for each
file.  For large number of files you may see a small reduction in indexing time
by setting this to C<yes>.

See also the C<-l> switch in L<SWISH-RUN|SWISH-RUN>.

=item FileRules [type] [contains|is|regex] *regular expression*

=item FileMatch [type] [contains|is|regex] *regular expression*

FileRules and FileMatch are used to, respectively, exclude and include files
and directories to index.  Since, by default, Swish-e indexes all files and
recurses all directories (but see also C<FollowSymLinks>) you will typically
only use C<FileRules> to exclude files or directories.  C<FileMatch> is useful
in a few cases, for example, to override the behavior of C<IndexOnly>.  Some
examples are included below.

Except for C<FileRules title ...>, this feature is only available for file
access method (-S fs), which is the default indexing mode.  Also, any pathname
modification with C<ReplaceRules> happens after the check for C<FileRules>.
(It's unlikely that you would exclude files with C<FileRules> based on text you
added with C<ReplaceRules>!)

The regular expression is a C regex.h extended regular expression.
You may supply more than one regular expression per line, or use
separate directives.  Preceding the regular expression with the word
"not" negates the match.

The regular expression is compared against B<[type]> as described below.

For historical reasons, you can specify C<contains> or C<is>.  C<is> simply
forces the regular expression to match at the start and end of the string (by
internally prepending "^" and appending "$" to the regular expression).

The C<regex> option requires delimiter characters:

    FileRules title regex /^private/i

The only advantage of C<regex> is if you want to do case insensitive matches,
or simply like your regular expressions to look like perl regular expressions.
You must use matching delimiters; (), {}, and [], are not currently supported
for no good reason other than laziness.

Use quotes (" or ') around a pattern if it contains any white space.
Note that the backslash character becomes the escape character within
quotes.

For example, these sets generate the same regular expressions.

    FileRules title is hello
    FileRules title contains ^hello$
    FileRules title regex /^hello$/

These all need quotes due to the included space character

    FileRules title is "hello there"
    FileRules title contains "^hello there$"
    FileRules title regex "!^hello there$!"

These show how the backslash must be doubled inside of quotes.
Swish-e converts a double-backslash into a single backslash, and then
passes that single onto the regular expression compiler.

    FileRules filename regex /\.pdf/
    FileRules filename regex "/\\.pdf/"

    FileRules filename regex !hello\\there!     # need double for real backslash 
    FileRules filename regex "!hello\\\\there!" # need double-double inside of quotes


B<Matching Types>

The following types of match strings my be supplied:

    FileRules pathname
    FileRules dirname
    FileRules filename
    FileRules directory
    FileRules title

    FileMatch pathname
    FileMatch filename
    FileMatch dirname
    FileMatch directory

B<pathname> matches the regular expression against the current pathname.  The
pathname may or may not be absolute depending on what you supplied to
C<IndexDir>.

Example:

    # Don't index paths that contain private or hidden
    FileRules pathname contains (private|hidden)

    # Same thing
    FileRules pathname regex /(private|hidden)/

    # Don't index exe files
    FileRules pathname contains \.exe$

B<dirname> and B<filename> split the path name by the last delimiter
character into a directory name, and a file name.  Then these are compared
against the patterns supplied.  Directory names do B<not> have a trailing
slash.  All path names use the forward slash as a delimiter within Swish-e.

Example:

    # Same as last example - don't index *.exe files.
    FileRules filename contains \.exe$

    # Don't index any file called test.html files
    FileRules filename contains ^test\.html$

    # Same thing
    FileRules filename is test\.html

    # Don't index any directories that contain "old"  (/usr/local/myold/docs)
    FileRules dirname contains old

    # Don't index any directories that contain the path segment "old" (/usr/local/old/foo)
    FileRules dirname contains /old/  

    # Index only .htm, .html, plus any all-digit file names
    IndexOnly .htm .html
    FileMatch filename contains ^\d+$

    # Same as previous, but maybe a little slower
    FileRules filename regex not !\.(htm|html)$!
    FileMatch filename contains ^\d+$

Swish-e checks these settings in the order of C<pathname>, C<dirname>, and
C<filename>, and C<FileMatch> patterns are checked before C<FileRules>, in
general.  This allows you to exclude most files with C<FileRules>, yet allow in
a few special cases with C<FileMatch>. For example:

    # Exclude all files of .exe, .bin, and .bat
    FileRules filename contains \.(exe|bin|bat)$
    # But, let these two in
    FileMatch filename is baseball\.bat incoming_mail\.bin

    # Same, but as a single pattern
    FileMatch filename is (baseball\.bat|incoming_mail\.bin)

The C<directory> type is somewhat unique. When Swish-e recurses into a
directory it will compare all the I<files> in the directory with the pattern
and then decide if that entire directory should or should not be indexed (or
recursed).  Note that you are matching against file names in a directory -- and
some of those names may be directory names.

A C<FileRules directory> match will cause Swish-e to ignore all files and
sub-directories in the current directory.

Warning: A match with C<FileMatch directory> says to index B<everything> in the
*current* directory and B<ignore> any FileRules for this directory.


Example:

    # Don't index any directories (and sub directories) that contain
    # a file (or sub-directory) called "index.skip"
    FileRules directory contains ^index\.skip$

    # Don't index directories that contain a .htaccess file.
    FileRules directory contains ^\.htaccess

Note: While I<processing> directories, Swish-e will ignore any files or
directories that begin with a dot (".").  You may index files or directories
that begin with a dot by specifying their name with C<IndexDir> or C<-i>.

C<title> checks for a pattern match in an HTML title.

Example:

    FileRules title contains construction example pointers

    # This example says to ignore case
    FileRules title regex "/^Internal document/i"

Note: C<FileRules title> works for any input method (fs, prog, or http) that is
parsed as HTML, and where a title was found in the document.

In case all this seems a bit confusing, processing a directory happens
in the following order.

First the directory name is checked:

    FileRules dirname - reject entire directory if matches

Next the directory is scanned and each file name (which might be the
name of a sub-directory) is checked:

    FileRules directory - reject entire dir if *any* files match
    FileMatch directory - accept entire dir if *any* files match

Then, unless C<FileMatch directory> matched, each file is tested with
FileMatch.  A match says to index the file without further testing (i.e.
overrides FileRules and IndexOnly):

    FileMatch pathname  \
    FileMatch dirname   - file is accepted if any match
    FileMatch filename  /

otherwise    

    IndexOnly - file is checked for the correct file extension

    FileRules pathname  \
    FileRules dirname   - file is rejected if any match
    FileRules filename  /

finally, the file is indexed.

Files (not directories) listed with C<IndexDir> or C<-i> are processed in a
similar way:

    FileMatch pathname  \
    FileMatch dirname   - file is accepted if any match
    FileMatch filename  /

otherwise, the file is rejected if it doesn't have the correct extension
or a FileRules matches.

    IndexOnly - file is checked for the correct file extension

    FileRules pathname  \
    FileRules dirname   - file is rejected if any match
    FileRules filename  /

Note:  If things are not indexing as you expect, create a directory with some
test files and use the C<-T regex> trace option to see how file names are
checked.  Start with very simple tests!


=back    

=head2 Directives for the HTTP Access Method Only

The HTTP Access method is enabled by the "-S http" switch when indexing.  It works by
running a Perl program called SwishSpider which fetches documents from a web server.

Only text files (content-type of "text/*") are indexed with the HTTP Access Method.
Other document types (e.g. PDF or MSWord) may be indexed as well.  The SwishSpider will
attempt to make use of the SWISH::Filter module (included with the Swish-e distribution) to
convert documents into a format that Swish-e can index.

Note: The -S prog method of spidering (using spider.pl) can be a replacement for the -S http method.
It offers more configuration options and better spidering speed.

These directives below are available when using the HTTP Access Method of indexing.

=over 4

=item MaxDepth *integer*

MaxDepth defines how many links the spider should follow before stopping.
A value of 0 configures the spider to traverse all links.  The default
is MaxDepth 0.

    MaxDepth 5

Note: The default was changed from 5 to 0 in release 2.4.0

=item Delay *seconds*

The number of seconds to wait between issuing requests to a server.
This setting allows for more friendly spidering of remote sites.
The default is 5 seconds.

    Delay 1

Note: The default was changed from 60 to 5 seconds in release 2.4.0

=item TmpDir *path*

The location of a writable temp directory on your system.  The HTTP access
method tells the Perl helper to place its files in this location, and the C<-e>
switch causes Swish-e to use this directory while indexing.  There is no
default.

    TmpDir /tmp/swish

If this directory does not exist or is not writable Swish-e will fail
with an error during indexing.

Note, the environment variables of C<TMPDIR>, C<TMP>, and C<TEMP> (in that
order) will B<override> this setting.

=item SpiderDirectory *path*

The location of the Perl helper script called F<swishspider>.  If you
use a relative directory, it is relative to your directory when you run
Swish-e, not to the directory that Swish-e is in.
The default is the location swishspider was installed.
Normally this does not need to be set.

    SpiderDirectory /usr/local/swish

=item EquivalentServer *server alias*

Often times the same site may be referred to by different names.
A common example is that often http://www.some-server.com and
http://some-server.com are the same.  Each line should have a list of
all the method/names that should be considered equivalent.  Multiple
EquivalentServer directives may be used.  Each directive defines its
own set of equivalent servers.

    EquivalentServer http://library.berkeley.edu http://www.lib.berkeley.edu
    EquivalentServer http://sunsite.berkeley.edu:2000 http://sunsite.berkeley.edu

=back

=head2 Directives for the prog Access Method Only

This section details the directives that are only available for the
"prog" document source feature of Swish-e.  The "prog" access method runs
an external program that "feeds" documents to Swish-e.  This allows indexing
and filtering of documents from any source.

See L<prog - general purpose access method|SWISH-RUN/"item_prog"> in the
SWISH-RUN man page for more information.


A number of example programs for use with the "prog" access method are
provided in the F<prog-bin> directory.  Please see those example if you
have questions about implementing a "prog" input program.

=over 4

=item SwishProgParameters *list of parameters*

This is a list of parameters that will be sent to the external program
when running with the "prog" document source method.

    SwishProgParameters /path/to/config hello there
    IndexDir /path/to/program.pl

Then running:

    swish-e -c config -S prog

Swish-e will execute C</path/to/program.pl> and pass C</path/to/config hello
there> as three command line arguments to the program.  This directive makes it
easy to pass settings from the Swish-e configuration file to the external
program.

For example, the C<spider.pl> program (included in the C<prog-bin> directory)
uses the C<SwishProgParameters> to specify what file to read for configuration
information.

    SwishProgParameters spider.config
    IndexDir ./spider.pl

The C<spider.pl> program also has a default action so you can avoid using a
configuration file:

    SwishProgParameters default http://www.swishe.org/ http://some.other.site/
    IndexDir ./spider.pl

And the spider program will use default settings for spidering those sites.

Swish-e can read documents from standard input, so another way to run an external program
with parameters is:

    ./spider.pl spider.conf | ./swish-e -S prog -i stdin

=back

B<Notes when using MS Windows>

You should use unix style path separators to specify your external program.
Swish will convert forward slashes to backslashes before calling the external
program.  This is only true for the program name specified with C<IndexDir> or
the C<-i> command line option.

In addition, Swish-e will make sure the program specified actually exists,
which means you need to use the full name of the program.

For example, to run the perl spider program F<spider.pl> you would need
a Swish-e configuration file such as:

    IndexDir e:/perl/bin/perl.exe
    SwishProgParameters prog-bin/spider.pl default http://swish-e.org

and run indexing with the command:

    swish-e -c swish.cfg -S prog -v 9

The C<IndexDir> command tells Swish-e the name of the program to run.  Under
unix you can just specify the name of the script, since unix will figure out
the program from the first line of the script.

The C<SwishProgParameters> are the parameters passed to the program specified
by C<IndexDir> (perl.exe in this case).  The first parameter is the perl script
to run (F<prog-bin/spider.pl>).  Perl passes the rest of the parameters
directly to the perl script.  The second parameter F<default> tells the
F<spider.pl> program to use default settings for spidering (or you could
specify a spider config file -- see C<perldoc spider.pl> for details), and
lastly, the URL is passed into the spider program.


=head2 Document Filter Directives

Internally, Swish-e knows how to parse only text, HTML, and XML documents.
With "filters" you can index other types of documents.  For example,
if all your web pages are in gzip format a filter can uncompress these
on the fly for indexing.

You may wish to read the Swish-e FAQ question on filtering before continuing
here.  L<How Do I filter documents?|SWISH-FAQ/"How Do I filter documents?">

There are two suggested methods for filtering.

=head3 Filtering with SWISH::Filter

The Swish-e distribution includes a Perl module called SWISH::Filter and individual
filters located in the F<filters> directory.  This system uses plug-in filters to
extend the types of documents that Swish-e can index.  The plug-in filters do not
actually do the filtering, but rather provide a standard interface for accessing programs that
can filter or convert documents.  The programs that do the filtering are not part of
the Swish-e distribution; they must be downloaded and installed separately.

The advantage of this method is that new filtering methods can be installed easily.

This system is designed to work with the -S http and -prog methods, but may
also be used with the C<FileFilter> feature and -S fs indexing method.  See
F<$prefix/share/doc/swish-e/examples/filter-bin/swish_filter.pl> for an
example.

See the F<filters/README> file for more information.

=head3 Filtering with the FileFilter feature

A filter is an external program that Swish-e executes while processing
a document of a given type.  Swish-e will execute the filter program
for each file that matches the file suffix (extension) set in the
B<FileFilter> or B<FileFilterMatch> directives.  B<FileFilterMatch>
matches using regular expressions and is described below.

Filters may be used with any type of input method (i.e. -S fs, -S http, or -S prog).
But because

Swish-e calls the external program passing as B<default> arguments:

=over 4

=item $0 

the name of the filter program

=item $1

the physical path name of the file to read.  This may be a temporary
file location if indexing by the http method.

=item $2

When indexing under the file system this will be the same as $1 (the
path to the source file), but when indexing under the http method this
will be the URL of the source document.

=back

Swish-e can also pass other parameters to the filter program.  These
parameters can be defined using the B<FileFilter> or B<FileFilterMatch>
directives.  See Filter Options below.

The filter program must open the file, process its contents, and return
it to Swish-e by printing to STDOUT.

Note that this can add a significant amount of time to the indexing
process if your external program is a perl or shell script.  If you
have many files to filter you should consider writing your filter in C
instead of a shell or perl script, or using the "prog" Access Method
along with SWISH::Filter.

=over 4

=item FilterDir  *path-to-directory*

Deprecated.

This is the path to a directory where the filter programs are stored.
Swish-e looks in this directory to find the filter specified in the
B<FileFilter> directive.

This directive is not needed if the filter program can be found in
your system's path.  Even if your filter is not in your system's path
you can specify the full path to the filter in the FileFilter or
FileFilterMatch directives.


Example:

    FilterDir /usr/local/swish/filters

=item FileFilter   *suffix*   "filter-prog"   ["filter-options"]

This maps file suffix (extension) to a filter program.  If I<filter-prog>
starts with a directory delimiter (absolute path), Swish-e doesn't use
the FilterDir settings, but uses the given I<filter-prog> path directly.

On systems that have a working fork(2) system call the filter program
is run by forking swish then executing the filter.  This mean the shell
is not used for running the filter and no arguments are passed through the
shell.

On other systems (e.g. Windows) the arguments are double-quoted and
popen(3) is used to run the program.  This does pass argument though
the shell and may be a security concern depending on the abilities of
the shell.

Filter options:

Filter options are a string passed as arguments to the I<filter-prog>.
Filter options can contain variables, replaced by Swish-e.  If you omit
I<filter-options> Swish-e will use default parameters for the options
listed above.

    Default:      %p %P
    Which means:  pass   "workfile path" and "documentfile path" to filter.

Variables in filter options:

    %%   =  %
    %P   =  Full document pathname (e.g. URL, or path on filesystem)  
    %p   =  Full pathname to work file (maybe a tmpfile or the real document path on filesystem)
    %F   =  Filename stripped from full document pathname
    %f   =  Filename stripped from "work" pathname
    %D   =  Directoryname stripped from full document pathname
    %d   =  Directoryname stripped from full "work" pathname

Examples of strings passed:

    %P =  document pathname:  http://myserver/path1/mydoc.txt
    %p =  work pathname:      /tmp/tmp.1234.mydoc.txt
    %F =     mydoc.txt
    %f =     tmp.1234.mydoc.txt
    %D =     http://myserver/path1
    %d =     /tmp




B<Notes when using MS Windows>

Windows uses double quotes to escape shell metacharacters, so if you need
to use quotes then use single quotes around the entire option string.

    FileFiler .mydoc mydocfilter.exe '--title "text with spaces"'

You can specify the filter program using forward slashes (unix style).
Swish will convert the slashes to backslashes before running your program.

    FileFilter .mydoc     c:/some/path/mydocfilter.exe  '-d "%d" -example -url "%P" "%f"'


Examples of filters:

    FileFilter .doc       /usr/local/bin/catdoc "-s8859-1 -d8859-1 %p"
    FileFilter .pdf       pdftotext   "%p -"
    FileFilter .html.gz   gzip  "-c %p"
    FileFilter .mydoc     "/some/path/mydocfilter"  "-d %d -example -url %P %f"

The above examples are running a I<binary> filter program.  For more
complicated filtering needs you may use a scripting language such as
Perl or a shell script.  Here's some examples of calling a shell and
perl script:

    FileFilter .pdf       pdf2html.sh
    FileFilter .ps        ghostscript-filter.pl

Using a scripting language (or any language that has a large startup cost) can
B<greatly increase the indexing time>.  For small indexing jobs, this may not
be an issue, but for large collections of files that require processing by a
scripting language, you may be better off using the C<-S prog> access method
where the script will only be compiled once, instead of for each document.

Filters are probably easier to write than a C<-S prog> program.  Which you
decide to use depends on your requirements.  Examples of filter scripts can be
found in the F<filter-bin> directory, and examples of C<-S prog> programs can
be found in the F<prog-bin> directory.

=item FileFilterMatch   *filter-prog*   *filter-options*  *regex* [*regex* ...]

This is similar to C<FileMatch> except uses regular expressions to match
against the file name.  *filter-prog* is the path to the program.  Unlike
C<FileFilter> this does B<not> use the C<FilterDir> option.  Also unlike
C<FileFilter> you B<must> specify the *filter-options*.

Examples:

    FileFilterMatch ./pdftotext "%p -" /\.pdf$/

Note that will also match a file called ".pdf", so you may want to use
something that requires a filename that has more than just an extension.
For example:

    FileFilterMatch ./pdftotext "%p -" /.\.pdf$/

To specify more than one extension:

    FileFilterMatch ./check_title.pl "%p" /\.html$/  /\.htm$/

Or a few ways to do the same thing:

    FileFilterMatch ./check_title.pl %p /\.(html|html)$/
    FileFilterMatch ./check_title.pl %p /\.html?$/

And to ignore case:

    FileFilterMatch ./check_title.pl %p /\.html?$/i

You may also precede an expression with "not" to negate regular expression
that follow.  For example, to match files that do not have an extension:

    FileFilterMatch ./convert "%p %P" not /\..+$/

=back

=head1 Document Info

$Id: SWISH-CONFIG.pod 1846 2006-10-20 20:18:30Z whmoseley $

.


������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/pod/SWISH-3.0.pod���������������������������������������������������������������������0000664�0000771�0001750�00000012751�11166010103�012504� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������=head1 NAME

Proposed changes for Swish-e 3.0

=head1 OVERVIEW

This pages is intended to give users of Swish-e an idea of the changes
to come, to foster discussion of the direction of Swish-e, and a place
where developers can map out new ideas.

None of this is written in stone.  Any of the developers can write their
ideas in this document, but that doesn't mean it will actually happen ;).


=head1 UTF-8 support

Supporting Unicode basically requires a full re-write of all the code.


=head1 drop expat-based parsers, require libxml2

This might simplify the code somewhat as well.


=head1 Support Incremental Indexing

The Swish-e index structure currently makes it difficult to do incremental
indexing, range limiting, and presents limits to indexing due to memory
requirements.  A database may solve some of these issues, at possibly
a cost of performance.

Swish-e has been linked with Berkeley DB.  Although much slower in
indexing, this may allow incremental indexing.  Currently, the idea is
to offer both database backends.

UPDATE: Mon Nov  8 15:07:59 CST 2004 (karman@cray.com)

This feature is in the 2.5 branch already. What kind of requirements do we
have to label it 'stable'?


=head1 Split code into Search and Indexing code

There may be a small benefit from creating a smaller search-only program.
CGI scripts may be faster, and the code would be smaller for those that
want to embed Swish-e in to other applications.

Currently, linking libswish-e into a program adds about 720K.  Not real
significant, but it could be if a number of processes are running with
Swish-e.  Another option is to build libswish-e as a shared library.

UPDATE: Mon Nov  8 15:09:12 CST 2004 (karman@cray.com)

This seems done in the 2.4 release. Is that true?

=head1 Switch to Content-Types

Moseley: Dec 28, 2000

I'm wondering if it might be smart to switch from the current "Document
Types" to Content-Types.  Currently, Swish-e know how to parse three
types of documents TXT, HTML, and XML.  There's currently two new
configuration directives DefaultContents and IndexContents that map
file extensions to one of the three types.  This doesn't really work
when spidering since it's the content-type that describes the document
and not the file extension.

It's an issue that can wait, but I'm concerned about backward compatiblity
before people start using the IndexContents and DefaultContents config
directives and then we change to content-type in the future.  There's
probably not that many people using those, but it might be work noting
in the documentation that it will change, if we agree.

The main reason to use content-type instead is for http processing where
you can't depend on the file extension to determine the document type,
so with http we have to use content-type to determine how to deal with
the file.  This is somewhat moot, as mapping can now be done with -S prog.

I'd propose that Swish-e uses a mime.types file to map from extension
to content-type.  You could add or override mappings in the config file:

   AddType text/plain .doc .log

   DefaultType text/html  # like DefaultContents currently

The file source "plug-in" (whatever that ends up being) would return a
content-type, but if not returned then Swish-e would map the type from
the file name using the mime.types file or any AddType directives.

Again, internally Swish-e only knows about text/[TXT|HTML|XML], so there
should be a way to map other types, otherwise Swish-e might ignore
the file.  We could continue to use the three type names or switch
completely to content-types.

For example, if we continued to use [TXT|HTML|XML]

    MapType TXT  text/directory text/logfile
    MapType HTML text/html

Or maybe just extend the current directives

    IndexContents HTML .htm .html text/html

Where the content-type would have precedence over the file extensions.    

This would tell Swish-e that those types are handled by those internal
handlers.

Then as I've mentioned before, you might specify filters as such

   FilterDocument application/msword /path/to/word-to-text

And word-to-text would convert to text and return one of the three
content-types that Swish-e knows how to parse, or a different content
type if were to chain filters.


=head1 Enhanced the PropertyNames directive

Moseley: Updated Jan 13, 2001

If the PropertyNames directive was enhanced to be able to limit the number
of characters stored, optionally extract text from HTML, and was able
to define what type of docs (text, XML, HTML) it applied to, then the
existing PropertyNames feature would work like the new StoreDescription
feature but be useful for more than just one use.

I'm not clear how to enhance the syntax of Properties and/or Metanames,
but here's some ideas.  Rainer suggested that an xml-type of format
might be best and commonly understood.  That's a good idea.  Below are
some older ideas that I had.  But you will get the idea...

The metaname structure could have flags for properties:

    1 - limiting to a length
    2 - stripping HTML
    3 - encoding HTML entities on output

Oct 9, 2001 - The code is now in Swish-e to limit a string property to
a length.  The stripping of HTML is an issue for discussion.  And encoding
entities on output should be a result_outpu.c issue.


UPDATE: Mon Nov  8 15:13:26 CST 2004 (karman@cray.com)

Is this fully supported in 2.4?

=head1 Apache/XML style configuration

This would be to allow some directives to be set per directory, or perl
file extenstion (or content-type).


=head1 Document Info

$Id: SWISH-3.0.pod 1613 2005-02-02 22:53:39Z whmoseley $

.



�����������������������swish-e-2.4.7/pod/SWISH-LIBRARY.pod�����������������������������������������������������������������0000664�0000771�0001750�00000055125�11166010103�013312� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������=head1 NAME

SWISH-LIBRARY - Interface to the Swish-e C library

=head1 OVERVIEW

The C library in an interface to the Swish-e search code.  It provides
a way to embed Swish-e into your applications.
This API is based on Swish-e version 2.3.

B<Note:> This is a NEW API as of Swish-e version 2.3.
The C language interface has changed as has the perl interface to Swish-e.
The new Perl interface is the SWISH::API module and is included with the Swish-e
distribution.
The old SWISHE perl module has been rewritten to work with the new API.  The SWISHE perl module
is no longer included with the Swish-e distribution, but can be downloaded
from the Swish-e web site.

The advantage of the library is that the index files or files can be opened one time
and many queries made on the open index.  This saves the startup time required
to fork and run the swish-e binary, and the expensive time of opening up the
index file.  Some benchmarks have shown a three fold increase in speed.

The downside is that your program now has more code and data in it (the index tables can
use quite a bit of memory), and if a fatal error happens in swish it will bring down your
program.  These are things to think about, especially if embedding swish into a web server
such as Apache where there are many processes serving requests.

The best way to learn about the library is to look at two files included with
the Swish-e distribution that make use of the library.


=over 4

=item src/libtest.c

This file gives a basic overview of linking a C program with the Swish-e library.
Not all available functions are used in that example, but it should give you a good overview
of building a C program with swish-e.

To build and run libtest chdir to the src directory and run the commands:

    $ make libtest
    $ ./libtest [optional name of index file]

You will be prompted for the search words.  The default index used is F<index.swish-e>.
This can be overridden by placing a list of index files in a quote-protected string.

    $ ./libtest 'index1 index2 index3'

=item perl/API.xs

The F<API.xs> file is a Perl "xsub" interface to the C library and is part of the
SWISH::API Perl module.  This is an object-oriented interface to the Swish-e library
and demonstrates how the various search "objects" are created by C calls and how
they are destroyed when no longer needed.

=back

=head1 Installing the Swish-e library

The Swish-e library is installed when you run "make install" when building 
Swish-e.  No extra installation steps are required.

The library consists of a header file "swish-e.h" and a library
"libswish-e.*" that can either be a static or shared library depending on 
your platform.

=head1 Library Overview

When you first attach to an index file (or index files) you are returned a "swish handle".
From the handle you create one or more "search objects" which holds
the parameters to query the index, such as the query string, sort order, search phrase delimiter,
limit parameters and HTML structure bits.  The "object" is really just a pointer to a C structure, but
it's helpful to think of it as an object that data and functionality associated with it.

The search object is used to query the index.  A query returns a "results object".
The results object holds the number of hits, the parsed query per index, and the result set.
The results object keeps track of the current position in the result set.
You may "seek" to a specific record within the result set (useful for displaying a page of results).

Finally, a result object represents a single result from the result list.  A result object provides
access to the result's properties (such as file name, rank, etc.).

In addition to results, there are functions available to access the header values stored
in the index file, functions to check and report errors, and a few utility functions.


=head1 Available Functions

Below is the list of available function included in the Swish-e C language API.

These functions (and typedefs) are defined in the F<swish-e.h> header file.
The common objects (e.g. structures) used are:

    SW_HANDLE  - swish handle that associates with an index file
    SW_SEARCH  - search "object" that holds search parameters
    SW_RESULTS - results "object" that holds a result set
    SW_RESULT  - a single result used for accessing the result's properties
    SW_FUZZYWORD - used for fuzzy (stemming) word conversion    

=head2 Searching

=over 4

=item SW_HANDLE SwishInit(char *IndexFiles);

This functions opens and reads the header info of the index files
included in IndexFiles string.  The string should contain a space-separated
list of index files.

    SW_HANDLE myhandle;
    myhandle = SwishInit("file1.idx");

Typically you will open a handle at the beginning of your program and use it to make
multiple queries on an index.

This function will always return a swish handle.  You must check for errors, and on
error free the memory used by the handle, or abort.

Here's an example of aborting:

    SW_HANDLE swish_handle;
    swish_handle = SwishInit("file1.idx file2.idx");
    if ( SwishError( swish_handle ) )
        SwishAbortLastError( swish_handle );

And here's an example of catching the error:        

    SW_HANDLE swish_handle;
    swish_handle = SwishInit("file1.idx file2.idx");
    if ( SwishError( swish_handle ) )
    {
        printf("Failed to connect to swish. %s\n", SwishErrorString( swish_handle ) );
        SwishClose( swish_handle );  /* free the memory used */
        return 0;
    }

You may have more than one handle active at a time.

Swish-e will not tell you if the index file changes on disk (such as after reindexing).
In a persistent environment (e.g. mod_perl) the calling program should check to see if
the index file has changed on disk.  A common way to do this is to store the inode
number before opening the index file(s), and then stat the file name every so often
and reopen the index files if the inode number changes.


=item void SwishClose(SW_HANDLE handle);

This function closes and frees the memory of a Swish handle.
Every swish handle should be freed when done searching the index.
Failing to close the handle will result in a memory leak.

=item SW_SEARCH New_Search_Object(SW_HANDLE handle, const char *query);

Returns a new search "object".  The search object holds the parameters used for searching
an index.  A single search object can be used to query the index multiple times.
The available settings listed below are "sticky" in that they remain set on the search
object until change.

=item int SwishGetStructure( SW_SEARCH srch );

Returns the "structure" flag of the search object passed or 0 if the search
object is NULL.

=item void SwishPhraseDelimiter( SW_SEARCH srch, char delimiter );

Sets the phrase delimiter character.  The default is double-quotes.

=item char SwishGetPhraseDelimiter( SW_SEARCH srch );

Returns the phrase delimiter character used in the search object or 0 if the
search object is NULL.



=item void SwishSetStructure( SW_SEARCH srch, int structure );

Sets the "structure" flag in the search object.  The structure flag is used to limit
searches to parts of HTML files (such as to the title or headers).  The default
is to not limit.  This provides the functionality of the -H command line switch.

=item void SwishPhraseDelimiter( SW_SEARCH srch, char delimiter );

Sets the phrase delimiter character.  The default is double-quotes.

=item void SwishSetSort( SW_SEARCH srch, char *sort );

Sets the sort order of the results.  This is the same as the -s switch used
with the swish-e binary.

=item void SwishSetQuery( SW_SEARCH srch, char *query );

Sets the query string in the search object.  This typically is not needed since
it can be set when creating the search object or when executing a query.

=item void SwishSetSearchLimit( SW_SEARCH srch, char *propertyname, char *low, char *hi);

Sets the limit parameters for a search.  Provides the same functionality as the -L command
line switch.
You may specify a range of property values that search results must be within.
You may call SwishSetSearchLimit() only one time for each property (but can set
limits on more than one property at a time).

Unlike the other settings on the search object, once you run a query on the
search object you must call SwishResetSearchLimit() to change or clear
the limit parameters.

=item void SwishResetSearchLimit( SW_SEARCH srch );

Resets the limits set on a search object set by SwishSetSearchLimit().

=item void Free_Search_Object( SW_SEARCH srch );

Frees the search object.  This must be called when done with the 
search object.  Generally, you can reuse a search object for
multiple queries so typically you would call this right before
calling SwishClose().

You may free the search object before freeing and
generated results objects.

=item SW_RESULTS SwishExecute( SW_SEARCH search, const char *query);

Searches the index or indexes based on the parameters in the search object.
Returns a results object.  See below for functions to access the data stored
in the results object.

You should always check for errors after calling SwishExecute().


=item SW_RESULTS SwishQuery(SW_HANDLE, const char *words );

This is a short-cut function that bypasses the creation of a
search object (actually, bypasses the need to create and free a search object).
This only allows passing in a query string; other search parameters cannot be set.
The results are sorted by rank.

You should always check for errors after calling SwishQuery().

=back

=head2 Reading Results

=over 4

=item int SwishHits( SW_RESULTS results );

Returns the number of results in the results object.

=item SWISH_HEADER_VALUE SwishParsedWords( SW_RESULTS, const char *index_name );

Returns the tokenized query.  Words are split by WordCharacters and stopwords are
removed.  The parsed words are useful for highlighting search terms in your
program.

The "index_name" is the name of the index supplied in the SwishInit() function call.

Returns a SWISH_HEADER_VALUE union of type SWISH_LIST which is a char **.
See src/libtest.c for an example of accessing the strings in this list, but in
general you may cast this to a (char **).

=item SWISH_HEADER_VALUE SwishRemovedStopwords( SW_RESULTS, const char *index_name );

Returns a list of stopwords removed from the input query.

Returns a SWISH_HEADER_VALUE union of type SWISH_LIST which is a char **.
See src/libtest.c for an example of accessing the strings in this list, but in
general you may cast this to a (char **).

=item int SwishSeekResult( SW_RESULTS, int position );

Sets the current seek position in the list of results, with position zero
being the first record (unlike -b where one is the first result).

Returns the position or a negative number on error.

=item SW_RESULT SwishNextResult( SW_RESULTS );

Returns the next result, or NULL if not more results are available.

The result object returned does not need to be freed after use
(unlike the swish handle, search object, and results object).

=item const char *SwishResultPropertyStr(SW_RESULT, char *propertyname);

This function is mostly useful for testing as it returns odd results on errors.

Aborts if called with a NULL SW_RESULT object

Returns a string value of the specified property.

Returns the empty string "" if the current result does not have
the specified property assigned.

Returns the string "(null)" on invalid property name (i.e. property name
is not defined in the index) and sets an error (see below) indicating the
invalid property name.

The string returned does not need to be freed, but is only valid
for the current result.  If you wish to save the string you must
copy it locally.

Dates are formatted using the hard-coded format string: "%Y-%m-%d %H:%M:%S" in
localtime.

=item unsigned long SwishResultPropertyULong(SW_RESULT r, char *propertyname);

Returns a numeric property as an unsigned long.
Numeric properties are used for both PropertyNamesNumeric and PropertyNamesDate
type of properties.  Dates are returned as a unix timestamp as reported by the system
when the index was created.

Swish-e will abort if called with a NULL SW_RESULT object.  Without the SW_RESULT object
swish-e cannot set any error codes.

On error returns UMAX_LONG.  This is commonly defined in limits.h.
Check SwishError() (see below) for the type of error.

If SwishError() returns false (zero)
then it simply means that this result does not have any data for the specified
property.

If SwishError() returns true (non-zero) then either the propertyname specified is
invalid, or the property requested is not a numeric (or date) property (e.g. it's
a string property).

See below on how to fetch the specific error message when SwishError() is true.


=item PropValue *getResultPropValue (SW_RESULT r, char *propertyname, int ID );

This is a low-level function to fetch a property regardless of type.
This is likely the best function for accessing properties.

Swish-e will abort if called with a NULL SW_RESULT object.  Propertyname is the name
of the property.  ID is the id number of the property, if known.  ID is not normally
used in the API, but it's purpose is to avoid looking up the property ID for every
result displayed.

The return PropValue is a structure that contains a flag to indicate the
type, and a union that holds the property value.  They flags and structure are
defined in swish-e.h.

The property must be copied locally and the returned "PropValue" value must be freed by
calling freeResultPropValue() to avoid a memory leak.

On error returns NULL.
Check SwishError() (see below) for the type of error.

If returns NULL but SwishError() returns false (zero)
then it simply means that this result does not have any data for the specified
property.

If SwishError() returns true (non-zero) then the property name specified
is invalid (i.e. not defined for the index).

See below on how to fetch the specific error message when SwishError() is true.

See perl/API.xs for an example on using this function.

=item void freeResultPropValue(void)

Frees the "PropValue" returned after calling getResultPropValue().

=item void Free_Results_Object( SW_RESULTS results );

Frees the results object (frees the result set).  This must be called
when done reading the results and before calling SwishClose().


=back

=head2 Accessing the Index Header Values

Each index file has associated header values that describe the index.  These functions
provide access to this data.  The header data is returned as a union SWISH_HEADER_VALUE,
and a pointer to a SWISH_HEADER_TYPE is passed in and the returned value indicates the
type of data that is returned.  See src/libtest.c and perl/API.xs for examples.

=over 4

=item const char **SwishHeaderNames( SW_HANDLE );

Returns the list of possible header names.  This list is the same for all index
files of a given version of Swish-e.  It provides a way to gain access to all
headers without having to list them in your program.

=item const char **SwishIndexNames( SW_HANDLE );

Returns a list of index files opened.  This is just the list of index files
specified in the SwishInit() call.  You need the name of the index file
to access a specific index's header values.

=item SWISH_HEADER_VALUE SwishHeaderValue( SW_HANDLE, const char *index_name, const  char *cur_header, SWISH_HEADER_TYPE *type );

Fetches the header value for the given index file, and the header name.  The call
sets the "type" passed in to the type of value returned.

See src/libtest.c and perl/API.xs for examples.


=item SWISH_HEADER_VALUE SwishResultIndexValue( SW_RESULT, const char *name, SWISH_HEADER_TYPE *type );

This is like SwishHeaderValue() above, but instead of supplying an index file name and
a swish handle, supply a result object and the header value is fetched from the result's
related index file.

=back

=head2 Accessing Property Meta Data

In addition to the pre-defined standard properties, you have the option
of adding additional "meta" properties to be indexed and/or added to the
list of properties returned with each result.  Consult the sections on the
MetaNames and PropteryNames directives in the CONFIGURATION FILE for an
explanation of how to do this.

These functions provide access to the meta data stored in an index.  You can
use them to determine what meta/property information is available for an index
including all the pre-defined standard properties.  See libtest.c for an example.

=over 4

=item SWISH_META_LIST SwishMetaList( SW_HANDLE, const char *index_name );

Returns the list of meta entries for the given index file  as a null-terminated 
array of SW_META objects.  Use the functions below to extract specific fields
from the SW_META structure.  Meta's are distinct from properties.

=item SWISH_META_LIST SwishPropertyList( SW_HANDLE, const char *index_name );

This function is the same as SwishMetaList() but it returns an array of properties
as opposed to meta objects.  Property attributes can be extracted in the same
was as meta objects using the functions below.

=item SWISH_META_LIST SwishResultMetaList( SW_RESULT );

This is like SwishMetaList() above but determines the index to use from a result
object.

=item SWISH_META_LIST SwishResultPropertyList( SW_RESULT );

This is like SwishPropertyList() above but like SwishResultMetaList() uses a
result object instead of an index name.

=item const char *SwishMetaName( SW_META );

Given a SW_META object returned by one of the above, this function
will return the meta/property's name.  You can use this name to access a
property's value for a given as described above.

=item int SwishMetaType( SW_META );

Get the data type for the given meta/property. Known types are listed in 
swish-e.h

=item SwishMetaID( SW_META );

Get the internal ID number for the given meta/property.  These id's are
unique per index file but are not unique per results.

=back

=head2 Checking for Errors

You should check for errors after all calls.  The last error is stored in the
swish handle object, and is only valid until the next operation (which resets
the error flags).  

Currently, some errors are flagged as "critical" errors.  In these cases you should
destroy (by calling the SwishClose() function ) the current swish handle.  If you have
other objects in scope (e.g. a search object or results object) destroy those first.

The types of errors that are critical can be seen in src/error.c.
Currently the list includes:

    Could not open index file
    Unknown index file format
    Index file(s) is empty
    Index file error
    Invalid swish handle
    Invalid results object

=over 4    

=item int  SwishError( SW_HANDLE );

This returns true if an error condition exists.  It returns the error number, which
is a integer less than zero on error.  This should be checked before calling any of the other
error functions below.

=item const char *SwishErrorString( SW_HANDLE );

This returns a general text description of the current error.

=item const char *SwishLastErrorMsg( SW_HANDLE );

In some cases this will return a string with specifics about the current error.
For example, SwishErrorString() may return "Unknown metaname", but SwishLastErrorMsg()
will return a string with the name of the unknown metaname.

=item int  SwishCriticalError( SW_HANDLE );

Returns true if the current error condition is a critical error.
On critical errors you should free up any current objects and call SwishClose()
as swish may be in an unstable state.

=item void SwishAbortLastError( SW_HANDLE );

This is a convenience  function that will format and print the last error message, and
then abort the program.


=item void set_error_handle( FILE *where );

Sets where errors and warnings are printed (when printed by swish).
For historical reasons, when swish-e first starts up errors and warnings are
sent to stdout.

=item void SwishErrorsToStderr( void );

A convenience method to send errors to stderr instead of stdout.

=back

=head2 Utility Functions

=over 4

=item const char *SwishWordsByLetter(SWISH * sw, char *indexname, char c);

Returns all the words in the index "indexname" that begin with the letter passed in.
Returns NULL if the name of the index file is invalid.

This fuction may change in the future since only 8-bit chars can currently be used.

=item char * SwsishStemWord( SW_HANDLE sw, char *in_word );

Deprecated

This can be used to convert a word to its stem.  It uses only the 
original Porter Stemmer.

=item SW_FUZZYWORD SwishFuzzyWord( SW_RESULT r, char *word );

Stems "word" based on the fuzzy mode selected during indexing.

The fuzzy mode used during indexing is stored in the index file.
Since each result is linked to a given index file this method allows
stemming a word based on it's index file.

One possible use for this is to highlight search terms in a document 
summary, which would be based on a given result.

The methods below can be used to access the data returned.  The 
SW_FUZZYWORD object must be freed when done to avoid a memory leak.

=item const char **SwishFuzzyWordList( SW_FUZZYWORD fw );

Returns a null terminated list of strings returned by the stemmer.  In most
cases this will be a single string.

Here's an example:

    SW_FYZZYWORD fuzzy_word = SwishFuzzyWord( result );
    const char **word_list = SwishFuzzyWordList( fuzzy_word );
    while ( *word_list )
    {
        printf("%s\n", *word_list );
        word_list++;
    }
    SwishFuzzyWordFree( fuzzy_word );

If the stemmer does not convert the string (for example attempting to 
stem numeric data) the word_list will contain the original word.
To tell if the stemmer actually stemmed the word check the return value with 
SwishFuzzyWordError().

=item int SwishFuzzyWordError( SW_FUZZYWORD fw );

This returns zero if the stemming operation was sucessfull, otherwise it
returns a value indicating the reason the word was not stemmed.  The return
values are defined in the swish-e src/stemmer.h file.

Not all stemmers set this value correctly.  But since SwishFuzzyWordList() 
will return a valid string regardless of the return value, you can often 
just ignore this setting.  That's what I do.

=item int SwishFuzzyWordCount( SW_FUZZYWORD fw );

Returns the count of string in the word list available by calling 
SwishFuzzyWordList().

This is normally just one, but in the case of DoubleMetaphone it can be one 
or two (i.e. DoubleMetaphone can return one or two strings).

=item const char *SwishFuzzyMode( SW_RESULT r );

Returns the name of the stemmer used for the given result (which is related 
to an index).

=item void SwishFuzzyWordFree( SW_FUZZYWORD fw );

Frees the memory used by  the SW_FUZZYWORD.

=back

=head1 Bug-Reports

Please report bug reports to the Swish-e discussion group.
Feel also free to improve or enhance this feature.

=head1 Author

Original interface: Aug 2000 Jose Ruiz jmruiz@boe.es

Updated: Aug 22, 2002 - Bill Moseley

Interface redesigned for Swish-e version 2.3 Oct 17, 2002 - Bill Moseley

=head1 Document Info

$Id: SWISH-LIBRARY.pod 1906 2007-02-07 19:25:16Z moseley $

.

�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/pod/SWISH-FAQ.pod���������������������������������������������������������������������0000664�0000771�0001750�00000150203�11166010103�012606� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������=head1 NAME

The Swish-e FAQ - Answers to Common Questions

=head1 OVERVIEW

List of commonly asked and answered questions.  Please review this
document before asking questions on the Swish-e discussion list.

=head2 General Questions

=head3 What is Swish-e?

Swish-e is B<S>imple B<W>eb B<I>ndexing B<S>ystem for B<H>umans -
B<E>nhanced.  With it, you can quickly and easily index directories of
files or remote web sites and search the generated indexes for words
and phrases.

=head3 So, is Swish-e a search engine?

Well, yes.  Probably the most common use of Swish-e is to provide a search
engine for web sites.  The Swish-e distribution includes CGI scripts that
can be used with it to add a I<search engine> for your web site.  The CGI
scripts can be found in the F<example> directory of the distribution
package.  See the F<README> file for information about the scripts.

But Swish-e can also be used to index all sorts of data, such as email
messages, data stored in a relational database management system,
XML documents, or documents such as Word and PDF documents -- or any
combination of those sources at the same time.  Searches can be limited
to fields or I<MetaNames> within a document, or limited to areas within
an HTML document (e.g. body, title).  Programs other than CGI applications
can use Swish-e, as well.

=head3 Should I upgrade if I'm already running a previous version
of Swish-e?

A large number of bug fixes, feature additions, and logic corrections were
made in version 2.2.  In addition, indexing speed has been drastically
improved (reports of indexing times changing from four hours to 5
minutes), and major parts of the indexing and search parsers have been
rewritten.  There's better debugging options, enhanced output formats,
more document meta data (e.g. last modified date, document summary),
options for indexing from external data sources, and faster spidering
just to name a few changes.  (See the CHANGES file for more information.

Since so much effort has gone into version 2.2, support for previous
versions will probably be limited.

=head3 Are there binary distributions available for Swish-e on platform foo?

Foo?  Well, yes there are some binary distributions available.  Please see
the Swish-e web site for a list at http://swish-e.org/.

In general, it is recommended that you build Swish-e from source,
if possible.

=head3 Do I need to reindex my site each time I upgrade to a new Swish-e
version?

At times it might not strictly be necessary, but since you don't really
know if anything in the index has changed, it is a good rule to reindex.

=head3 What's the advantage of using the libxml2 library for parsing HTML?

Swish-e may be linked with libxml2, a library for working with HTML and XML
documents.  Swish-e can use libxml2 for parsing HTML and XML documents.

The libxml2 parser is a better parser than Swish-e's built-in HTML
parser.  It offers more features, and it does a much better job at
extracting out the text from a web page.  In addition, you can use the
C<ParserWarningLevel> configuration setting to find structural errors
in your documents that could (and would with Swish-e's HTML parser)
cause documents to be indexed incorrectly.

Libxml2 is not required, but is strongly recommended for parsing HTML
documents.  It's also recommended for parsing XML, as it offers many
more features than the internal Expat xml.c parser.

The internal HTML parser will have limited support, and does have a
number of bugs.  For example, HTML entities may not always be correctly
converted and properties do not have entities converted.  The internal
parser tends to get confused when invalid HTML is parsed where the libxml2
parser doesn't get confused as often.  The structure is better detected
with the libxml2 parser.

If you are using the Perl module (the C interface to the Swish-e
library) you may wish to build two versions of Swish-e, one with the
libxml2 library linked in the binary, and one without, and build the
Perl module against the library without the libxml2 code.  This is to
save space in the library.  Hopefully, the library will someday soon be
split into indexing and searching code (volunteers welcome).

=head3 Does Swish-e include a CGI interface?

Yes.  Kind of.

There's two example CGI scripts included, swish.cgi and search.cgi.
Both are installed at F<$prefix/lib/swish-e>.

Both require a bit of work to setup and use.  Swish.cgi is probably what most
people will want to use as it contains more features.  Search.cgi is for those
that want to start with a small script and customize it to fit their needs.

An example of using swish.cgi is given in
the L<INSTALL|INSTALL> man page, and it the swish.cgi documentation.
Like often is the case, it will be easier to use if you first read the documentation.

Please use caution about CGI scripts found on the Internet for use with Swish-e.
Some are not secure.

The included example CGI scripts were designed with security in mind.
Regardless, you are encouraged to have your local Perl expert review it
(and all other CGI scripts you use) before placing it into production.
This is just a good policy to follow.

=head3 How secure is Swish-e?

We know of no security issues with using Swish-e.  Careful attention
has been made with regard to common security problems such as buffer
overruns when programming Swish-e.

The most likely security issue with Swish-e is when it is run via
a poorly written CGI interface.  This is not limited to CGI scripts
written in Perl, as it's just as easy to write an insecure CGI script
in C, Java, PHP, or Python.  A good source of information is included
with the Perl distribution.  Type C<perldoc perlsec> at your local
prompt for more information.  Another must-read document is located at
C<http://www.w3.org/Security/faq/wwwsf4.html>.

Note that there are many I<free> yet insecure and poorly written CGI
scripts available -- even some designed for use with Swish-e.  Please
carefully review any CGI script you use.  Free is not such a good price
when you get your server hacked...

=head3 Should I run Swish-e as the superuser (root)?

No.  Never.

=head3 What files does Swish-e write?

Swish writes the index file, of course.  This is specified with the
C<IndexFile> configuration directive or by the C<-f> command line switch.

The index file is actually a collection of files, but all start with
the file name specified with the C<IndexFile> directive or the C<-f>
command line switch.

For example, the file ending in F<.prop> contains the document properties.

When creating the index files Swish-e appends the extension F<.temp>
to the index file names.  When indexing is complete Swish-e renames the
F<.temp> files to the index files specified by C<IndexFile> or C<-f>.
This is done so that existing indexes remain untouched until it completes
indexing.

Swish-e also writes temporary files in some cases during indexing
(e.g. C<-s http>, C<-s prog> with filters), when merging, and when
using C<-e>).  Temporary files are created with the mkstemp(3) function
(with 0600 permission on unix-like operating systems).

The temporary files are created in the directory specified by the
environment variables C<TMPDIR> and C<TMP> in that order.  If those
are not set then swish uses the setting the configuration setting
L<TmpDir|SWISH-CONFIG/"item_TmpDir">.  Otherwise, the temporary file
will be located in the current directory.

=head3 Can I index PDF and MS-Word documents?

Yes, you can use a I<Filter> to convert documents while indexing, or you
can use a program that "feeds" documents to Swish-e that have already
been converted.  See C<Indexing> below.

=head3 Can I index documents on a web server?

Yes, Swish-e provides two ways to index (spider) documents on a web
server.  See C<Spidering> below.

Swish-e can retrieve documents from a file system or from a remote web
server.  It can also execute a program that returns documents back
to it.  This program can retrieve documents from a database, filter
compressed documents files, convert PDF files, extract data from mail
archives, or spider remote web sites.

=head3 Can I implement keywords in my documents? 

Yes, Swish-e can associate words with I<MetaNames> while indexing,
and you can limit your searches to these MetaNames while searching.

In your HTML files you can put keywords in HTML META tags or in XML blocks.

META tags can have two formats in your source documents:

    <META NAME="DC.subject" CONTENT="digital libraries">


And in XML format (can also be used in HTML documents when using libxml2):

    <meta2>
        Some Content
    </meta2>


Then, to inform Swish-e about the existence of the meta name in your
documents, edit the line in your configuration file:

    MetaNames DC.subject meta1 meta2

When searching you can now limit some or all search terms to that
MetaName.  For example, to look for documents that contain the word
apple and also have either fruit or cooking in the DC.subject meta tag.

=head3 What are document properties?

A document property is typically data that describes the document.
For example, properties might include a document's path name, its last
modified date, its title, or its size.  Swish-e stores a document's
properties in the index file, and they can be reported back in search
results.

Swish-e also uses properties for sorting.  You may sort your results by
one or more properties, in ascending or descending order.

Properties can also be defined within your documents.  HTML and
XML files can specify tags (see previous question) as properties.
The I<contents> of these tags can then be returned with search results.
These user-defined properties can also be used for sorting search results.

For example, if you had the following in your documents

   <meta name="creator" content="accounting department">

and C<creator> is defined as a property (see C<PropertyNames> in
L<SWISH-CONFIG|SWISH-CONFIG>) Swish-e can return C<accounting department>
with the result for that document.

    swish-e -w foo -p creator

Or for sorting:

    swish-e -w foo -s creator

=head3 What's the difference between MetaNames and PropertyNames?

MetaNames allows keywords searches in your documents.  That is, you can
use MetaNames to restrict searches to just parts of your documents.

PropertyNames, on the other hand, define text that can be returned with
results, and can be used for sorting.

Both use I<meta tags> found in your documents (as shown in the above two
questions) to define the text you wish to use as a property or meta name.

You may define a tag as B<both> a property and a meta name.  For example:

   <meta name="creator" content="accounting department">

placed in your documents and then using configuration settings of:

    PropertyNames creator
    MetaNames creator

will allow you to limit your searches to documents created by accounting:

    swish-e -w 'foo and creator=(accounting)'

That will find all documents with the word C<foo> that also have a creator
meta tag that contains the word C<accounting>.  This is using MetaNames.

And you can also say:

    swish-e -w foo -p creator

which will return all documents with the word C<foo>, but the results will
also include the contents of the C<creator> meta tag along with results.
This is using properties.

You can use properties and meta names at the same time, too:

    swish-e -w creator=(accounting or marketing) -p creator -s creator

That searches only in the C<creator> I<meta name> for either of the words
C<accounting> or C<marketing>, prints out the contents of the contents
of the C<creator> I<property>, and sorts the results by the C<creator>
I<property name>.

(See also the C<-x> output format switch in L<SWISH-RUN|SWISH-RUN>.)

=head3 Can Swish-e index multi-byte characters?

No.  This will require much work to change.  But, Swish-e works with
eight-bit characters, so many characters sets can be used.  Note that it
does call the ANSI-C tolower() function which does depend on the current
locale setting.  See C<locale(7)> for more information.

=head2 Indexing

=head3 How do I pass Swish-e a list of files to index?

Currently, there is not a configuration directive to include a file that
contains a list of files to index.  But, there is a directive to include
another configuration file.

    IncludeConfigFile /path/to/other/config

And in C</path/to/other/config> you can say:

    IndexDir file1 file2 file3 file4 file5 ...
    IndexDir file20 file21 file22

You may also specify more than one configuration file on the command line:

    ./swish-e -c config_one config_two config_three

Another option is to create a directory with symbolic links of the files
to index, and index just that directory.

=head3 How does Swish-e know which parser to use?

Swish can parse HTML, XML, and text documents.  The parser is set by
associating a file extension with a parser by the C<IndexContents>
directive.  You may set the default parser with the C<DefaultContents>
directive.  If a document is not assigned a parser it will default to
the HTML parser (HTML2 if built with libxml2).

You may use Filters or an external program to convert documents to HTML,
XML, or text.

=head3 Can I reindex and search at the same time?

Yes.  Starting with version 2.2 Swish-e indexes to temporary files, and then
renames the files when indexing is complete.  On most systems renames
are atomic.  But, since Swish-e also generates more than one file during
indexing there will be a very short period of time between renaming the
various files when the index is out of sync.

Settings in F<src/config.h> control some options related to temporary files,
and their use during indexing.

=head3 Can I index phrases? 

Phrases are indexed automatically.  To search for a phrase simply place
double quotes around the phrase.

For example:

    swish-e -w 'free and "fast search engine"'

=head3 How can I prevent phrases from matching across sentences?

Use the
L<BumpPositionCounterCharacters|SWISH-CONFIG/"item_BumpPositionCounterCharacters">
configuration directive.

=head3 Swish-e isn't indexing a certain word or phrase.

There are a number of configuration parameters that control what Swish-e
considers a "word" and it has a debugging feature to help pinpoint
any indexing problems.

Configuration file directives (L<SWISH-CONFIG|SWISH-CONFIG>)
C<WordCharacters>, C<BeginCharacters>, C<EndCharacters>,
C<IgnoreFirstChar>, and C<IgnoreLastChar> are the main settings that
Swish-e uses to define a "word".  See L<SWISH-CONFIG|SWISH-CONFIG> and
L<SWISH-RUN|SWISH-RUN> for details.

Swish-e also uses compile-time defaults for many settings.  These are
located in F<src/config.h> file.

Use of the command line arguments C<-k>, C<-v> and C<-T> are useful when
debugging these problems.  Using C<-T INDEXED_WORDS> while indexing will
display each word as it is indexed.  You should specify one file when
using this feature since it can generate a lot of output.

     ./swish-e -c my.conf -i problem.file -T INDEXED_WORDS

You may also wish to index a single file that contains words that are or
are not indexing as you expect and use -T to output debugging information
about the index.  A useful command might be:

    ./swish-e -f index.swish-e -T INDEX_FULL

Once you see how Swish-e is parsing and indexing your words, you can
adjust the configuration settings mentioned above to control what words
are indexed.

Another useful command might be:

     ./swish-e -c my.conf -i problem.file -T PARSED_WORDS INDEXED_WORDS

This will show white-spaced words parsed from the document (PARSED_WORDS),
and how those words are split up into separate words for indexing
(INDEXED_WORDS).


=head3 How do I keep Swish-e from indexing numbers?

Swish-e indexes words as defined by the C<WordCharacters> setting, as
described above.  So to avoid indexing numbers you simply remove digits
from the C<WordCharacters> setting.

There are also some settings in F<src/config.h> that control what "words"
are indexed.  You can configure swish to never index words that are all
digits, vowels, or consonants, or that contain more than some consecutive
number of digits, vowels, or consonants.  In general, you won't need to
change these settings.

Also, there's an experimental feature called C<IgnoreNumberChars>
which allows you to define a set of characters that describe a number.
If a word is made up of B<only> those characters it will not be indexed.


=head3 Swish-e crashes and burns on a certain file. What can I do?

This shouldn't happen.  If it does please post to the Swish-e discussion
list the details so it can be reproduced by the developers.

In the mean time, you can use a C<FileRules> directive to exclude the
particular file name, or pathname, or its title.  If there are serious
problems in indexing certain types of files, they may not have valid text
in them (they may be binary files, for instance). You can use NoContents
to exclude that type of file.

Swish-e will issue a warning if an embedded null character is found in a
document.  This warning will be an indication that you are trying to index
binary data.  If you need to index binary files try to find a program
that will extract out the text (e.g. strings(1), catdoc(1), pdftotext(1)).

=head3 How to I prevent indexing of some documents?

When using the file system to index your files you can use the
C<FileRules> directive.  Other than C<FileRules title>, C<FileRules>
only works with the file system (C<-S fs>) indexing method, not with
C<-S prog> or C<-S http>.

If you are spidering a site you have control over, use a F<robots.txt> file in
your document root.  This is a standard way to excluded files from search
engines, and is fully supported by Swish-e.  See http://www.robotstxt.org/

If spidering a website with the included F<spider.pl> program then add any
necessary tests to the spider's configuration file.
Type <perldoc spider.pl> in the C<prog-bin> directory for details or
see the spider documentation on the Swish-e website.  Look for the section
on L<callback functions|spider/"callback_functions">.

If using the libxml2 library for parsing HTML (which you probably are), you may
also use the Meta Robots Exclusion in your documents:

    <meta name="robots" content="noindex">

See the L<obeyRobotsNoIndex|SWISH-CONFIG/"item_obeyRobotsNoIndex"> directive.

=head3 How do I prevent indexing parts of a document?

To prevent Swish-e from indexing a common header, footer, or navigation
bar, AND you are using libxml2 for parsing HTML, then you may
use a fake HTML tag around the text you wish to ignore and use the
C<IgnoreMetaTags> directive.  This will generate an error message if
the C<ParserWarningLevel> is set as it's invalid HTML.

C<IgnoreMetaTags> works with XML documents (and HTML documents when
using libxml2 as the parser), but not with documents parsed by the text
(TXT) parser.

If you are using the libxml2 parser (HTML2 and XML2) then you can use the the following
comments in your documents to prevent indexing:

       <!-- SwishCommand noindex -->
       <!-- SwishCommand index -->

and/or these may be used also:

       <!-- noindex -->
       <!-- index -->


=head3 How do I modify the path or URL of the indexed documents.

Use the C<ReplaceRules> configuration directive to rewrite path names
and URLs.  If you are using C<-S prog> input method you may set the path
to any string.

=head3 How can I index data from a database?

Use the "prog" document source method of indexing.  Write a program to
extract out the data from your database, and format it as XML, HTML,
or text.  See the examples in the C<prog-bin> directory, and the next
question.

=head3 How do I index my PDF, Word, and compressed documents?

Swish-e can internally only parse HTML, XML and TXT (text) files by
default, but can make use of I<filters> that will convert other types
of files such as MS Word documents, PDF, or gzipped files into one of
the file types that Swish-e understands.

Please see L<SWISH-CONFIG|SWISH-CONFIG/"Document Filter Directives">
and the examples in the F<filters> and F<filter-bin> directory for more information.

See the next question to learn about the filtering options with Swish-e.

=head3 How do I filter documents?

The term "filter" in Swish-e means the converstion of a document of one type (one that
swish-e cannot index directly) into a type that Swish-e can index, namely HTML, plain text, or XML.
To add to the confusion, there are a number of ways to accomplish this in Swish-e.
So here's a bit of background.

The L<FileFilter|SWISH-CONFIG/"Document Filter Directives"> directive was added to swish first.
This feature allows you to specify a program to run for documents that match a given file extension.
For example, to filter PDF files (files that end in .pdf) you can specify the configuation setting of:

    FileFilter .pdf pdftotext   "'%p' -"

which says to run the program "pdftotext" passing it the pathname of the file (%p)
and a dash (which tells pdftotext to output to stdout).   Then for each .pdf file Swish-e runs this
program and reads in the filtered document from the output from the filter program.

This has the advantage that it is easy to setup -- a single line in the config file is all that is
needed to add the filter into Swish-e.  But it also has a number of problems.  For example,
if you use a Perl script to do your filtering it can be very slow since the filter script must be
run (and thus compiled) for each processed document.
This is exacerbated when using the -S http method since the -S http method also uses a Perl script
that is run for every URL fetched.  Also, when using -S prog method of input
(reading input from a program) using FileFilter means that Swish-e must first read the file
in from the external program and then write the file out to a temporary file before running the
filter.

With -S prog it makes much more sense to filter the document in the program that is
fetching the documents than to have swish-e read the file into memory, write it to a temporary
file and then run an external program.

The Swish-e distribution contains a couple of example -S prog programs.  F<spider.pl> is a reasonably
full-featured web spider that offers many more options than the -S http method.  And it is much faster
than running -S http, too.

The spider has a perl configuration file, which means you can add programming logic right into the
configuration file without editing the spider program.  One bit of logic that is provided in the
spider's configuration file is a "call-back" function that allows you to filter the content.
In other words, before the spider passes a fetched web document to swish for indexing the spider can call
a simple subroutine in the spider's configuration file passing the document and its content type.
The subroutine can then look at the content type and decide if the document needs to be filtered.

For example, when processing a document of type "application/msword" the call-back subroutine
might call the doc2txt.pm perl module, and a document of type
"appliation/pdf" could use the pdf2html.pm module.  The F<prog-bin/SwishSpiderConfig.pl> file
shows this usage.

This system works reasonably well, but also means that more work is required
to setup the filters.  First, you must explicitly check for specific content types and then call
the appropriate Perl module, and second, you have to know how each module must be called and how
each returns the possibly modified content.

In comes SWISH::Filter.

To make things easier the SWISH::Filter Perl module was created.  The idea of this module is that
there is one interface used to filter all types of documents.  So instead of checking for specific
types of content you just pass the content type and the document to the SWISH::Filter module and
it returns a new content type and document if it was filtered.  The filters that do the actual work
are designed with a standard interface and work like filter "plug-ins". Adding new filters
means just downloading the filter to a directory and no changes are needed to the spider's configuation
file.  Download a filter for Postscript and next time you run indexing your Postscript files will be indexed.

Since the filters are standardized, hopefully when you have the need to filter documents of a specific
type there will already be a filter ready for your use.

Now, note that the perl modules may or may not do the actual conversion of a document.
For example, the PDF conversion
module calls the pdfinfo and pdftotext programs.  Those programs (part of the Xpfd package)
must be installed separately from the filters.

The SwishSpiderConfig.pl examle spider configuration file shows how to use the SWISH::Filter module for filtering.
This file is installed at $prefix/share/doc/swish-e/examples/prog-bin, where $prefix is normally /usr/local on
unix-type machines.

The SWISH::Filter method of filtering can also be used with the -S http method of indexing.  By default
the F<swishspider> program (the Perl helper script that fetches documents from the web) will attempt to
use the SWISH::Filter module if it can be found in Perls library path.  This path is set automatically for
spider.pl but not for swishspider (because it would slow down a method that's already slow and spider.pl is
recommended over the -S http method).

Therefore, all that's required to use this system with -S http is setting
the @INC array to point to the filter directory.

For example, if the swish-e distribution was unpacked into ~/swish-e:

   PERL5LIB=~/swish-e/filters swish-e -c conf -S http

will allow the -S http method to make use of the SWISH::Filter module.

Note that if you are not using the SWISH::Filter module you may wish to edit the F<swishspider> program
and disable the use of the SWISH::Filter module using this setting:

    use constant USE_FILTERS  => 0;  # disable SWISH::Filter

This prevents the program from attempting to use the SWISH::Filter module for every non-text
URL that is fetched.  Of course, if you are concerned with indexing speed you should be using
the -S prog method with spider.pl instead of -S http.

If you are not spidering, but you still want to make use of the SWISH::Filter module for
filtering you can use the DirTree.pl program (in $prefix/lib/swish-e).  This is a simple
program that traverses the file system and uses SWISH::Filter for filtering.


Here's two examples of how to run a filter program, one using Swish-e's
C<FileFilter> directive, another using a C<prog> input method program.
See the F<SwishSpiderConfig.pl> file for an example of using the SWISH::Filter
module.

These filters simply use the program C</bin/cat> as a filter and only
indexes .html files.

First, using the C<FileFilter> method, here's the entire configuration
file (swish.conf):

    IndexDir .
    IndexOnly .html
    FileFilter .html "/bin/cat"   "'%p'"

and index with the command

    swish-e -c swish.conf -v 1

Now, the same thing with using the C<-S prog> document source input method
and a Perl program called catfilter.pl.  You can see that's it's much
more work than using the C<FileFilter> method above, but provides a
place to do additional processing.  In this example, the C<prog> method
is only slightly faster.  But if you needed a perl script to run as a
FileFilter then C<prog> will be significantly faster.

    #!/usr/local/bin/perl -w
    use strict;
    use File::Find;  # for recursing a directory tree

    $/ = undef;
    find(
        { wanted => \&wanted, no_chdir => 1, },
        '.',
    );

    sub wanted {
        return if -d;
        return unless /\.html$/;

        my $mtime  = (stat)[9];

        my $child = open( FH, '-|' );
        die "Failed to fork $!" unless defined $child;
        exec '/bin/cat', $_ unless $child;

        my $content = <FH>;
        my $size = length $content;

        print <<EOF;
    Content-Length: $size
    Last-Mtime: $mtime
    Path-Name: $_

    EOF

        print <FH>;
    }

And index with the command:

    swish-e -S prog -i ./catfilter.pl -v 1

This example will probably not work under Windows due to the '-|' open.
A simple piped open may work just as well:

That is, replace:

    my $child = open( FH, '-|' );
    die "Failed to fork $!" unless defined $child;
    exec '/bin/cat', $_ unless $child;

with this:

    open( FH, "/bin/cat $_ |" ) or die $!;

Perl will try to avoid running the command through the shell if meta
characters are not passed to the open.  See C<perldoc -f open> for
more information.

=head3 Eh, but I just want to know how to index PDF documents!

See the examples in the F<conf> directory and the comments in
the F<SwishSpiderConfig.pl> file.

See the previous question for the details on filtering.  The method you decide to use
will depend on how fast you want to index, and your comfort level with using Perl modules.

Regardless of the filtering method you use you will need to install the Xpdf packages
available from http://www.foolabs.com/xpdf/.

=head3 I'm using Windows and can't get Filters or the prog input method
to work!

Both the C<-S prog> input method and filters use the C<popen()> system
call to run the external program.  If your external program is, for
example, a perl script, you have to tell Swish-e to run perl, instead of
the script.  Swish-e will convert forward slashes to backslashes
when running under Windows.

For example, you would need to specify the path to perl as (assuming
this is where perl is on your system):

    IndexDir e:/perl/bin/perl.exe

Or run a filter like:

    FileFilter .foo e:/perl/bin/perl.exe 'myscript.pl "%p"'

It's often easier to just install Linux.

=head3 How do I index non-English words?

Swish-e indexes 8-bit characters only.  This is the ISO 8859-1 Latin-1
character set, and includes many non-English letters (and symbols).
As long as they are listed in C<WordCharacters> they will be indexed.

Actually, you probably can index any 8-bit character set, as long as
you don't mix character sets in the same index and don't use libxml2 for
parsing (see below).

The C<TranslateCharacters> directive (L<SWISH-CONFIG|SWISH-CONFIG>)
can translate characters while indexing and searching.  You may
specify the mapping of one character to another character with the
C<TranslateCharacters> directive.

C<TranslateCharacters :ascii7:> is a predefined set of characters that
will translate eight-bit characters to ascii7 characters.  Using the
C<:ascii7:> rule will, for example, translate "Ääç" to "aac".  This means:
searching "Çelik", "çelik" or "celik" will all match the same word.

Note: When using libxml2 for parsing, parsed documents are converted
internally (within libxml2) to UTF-8.  This is converted to ISO 8859-1
Latin-1 when indexing.  In cases where a string can not be converted
from UTF-8 to ISO 8859-1 (because it contains non 8859-1 characters),
the string will be sent to Swish-e in UTF-8 encoding.  This will results
in some words indexed incorrectly.  Setting C<ParserWarningLevel> to 1
or more will display warnings when UTF-8 to 8859-1 conversion fails.

=head3 Can I add/remove files from an index?

Try building swish-e with the C<--enable-incremental> option.

The rest of this FAQ applies to the default swish-e format.

Swish-e currently has no way to add or remove items from
its index.  But, Swish-e indexes so quickly that it's often possible to
reindex the entire document set when a file needs to be added, modified or removed.
If you are spidering a remote site then consider caching documents locally compressed.

Incremental additions can be handled in a couple of ways, depending on
your situation.  It's probably easiest to create one main index every
night (or every week), and then create an index of just the new files
between main indexing jobs and use the C<-f> option to pass both indexes
to Swish-e while searching.

You can merge the indexes into one index (instead of using -f), but it's
not clear that this has any advantage over searching multiple indexes.

How does one create the incremental index?

One method is by using the C<-N> switch to pass a file path to
Swish-e when indexing.  It will only index files that have a last
modification date C<newer> than the file supplied with the C<-N> switch.

This option has the disadvantage that Swish-e must process every file in
every directory as if they were going to be indexed (the test for C<-N>
is done last right before indexing of the file contents begin and after
all other tests on the file have been completed) -- all that just to
find a few new files.

Also, if you use the Swish-e index file as the file passed to C<-N> there
may be files that were added after indexing was started, but before the
index file was written.  This could result in a file not being added to
the index.

Another option is to maintain a parallel directory tree that contains
symlinks pointing to the main files.  When a new file is added (or
changed) to the main directory tree you create a symlink to the real file
in the parallel directory tree.  Then just index the symlink directory
to generate the incremental index.

This option has the disadvantage that you need to have a central
program that creates the new files that can also create the symlinks.
But, indexing is quite fast since Swish-e only has to look at the files
that need to be indexed.  When you run full indexing you simply unlink
(delete) all the symlinks.

Both of these methods have issues where files could end up in both
indexes, or files being left out of an index.  Use of file locks while
indexing, and hash lookups during searches can help prevent these
problems.

=head3 I run out of memory trying to index my files. 

It's true that indexing can take up a lot of memory!  Swish-e is extremely
fast at indexing, but that comes at the cost of memory.

The best answer is install more memory.

Another option is use the C<-e> switch.  This will require less memory,
but indexing will take longer as not all data will be stored in memory
while indexing.  How much less memory and how much more time depends on
the documents you are indexing, and the hardware that you are using.

Here's an example of indexing all .html files in /usr/doc on Linux.
This first example is I<without> C<-e> and used about 84M of memory:

    270279 unique words indexed.
    23841 files indexed.  177640166 total bytes.
    Elapsed time: 00:04:45 CPU time: 00:03:19

This is I<with> C<-e>, and used about 26M or memory:    

    270279 unique words indexed.
    23841 files indexed.  177640166 total bytes.
    Elapsed time: 00:06:43 CPU time: 00:04:12

You can also build a number of smaller indexes and then merge together
with C<-M>.  Using C<-e> while merging will save memory.

Finally, if you do build a number of smaller indexes, you can specify more
than one index when searching by using the C<-f> switch.  Sorting large
results sets by a property will be slower when specifying multiple index
files while searching.

=head3 "too many open files" when indexing with -e option

Some platforms report "too many open files" when using the -e economy option.
The -e feature uses many temporary files (something like 377) plus 
the index files
and this may exceed your system's limits.

Depending on your platform you may need to set "ulimit" or "unlimit".

For example, under Linux bash shell:

  $ ulimit -n 1024

Or under an old Sparc

  % unlimit openfiles

=head3 My system admin says Swish-e uses too much of the CPU!

That's a good thing!  That expensive CPU is supposed to be busy.

Indexing takes a lot of work -- to make indexing fast much of the work is
done in memory which reduces the amount of time Swish-e is waiting on I/O.
But, there's two things you can try:

The C<-e> option will run Swish-e in economy mode, which uses the disk
to store data while indexing.  This makes Swish-e run somewhat slower,
but also uses less memory.  Since it is writing to disk more often it
will be spending more time waiting on I/O and less time in CPU.  Maybe.

The other thing is to simply lower the priority of the job using the
nice(1) command:

    /bin/nice -15 swish-e -c search.conf

If concerned about searching time, make sure you are using the -b and -m
switches to only return a page at a time.  If you know that your result
sets will be large, and that you wish to return results one page at a
time, and that often times many pages of the same query will be requested,
you may be smart to request all the documents on the first request, and
then cache the results to a temporary file.  The perl module File::Cache
makes this very simple to accomplish.


=head2 Spidering

=head3 How can I index documents on a web server?

If possible, use the file system method C<-S fs> of indexing to index
documents in you web area of the file system.  This avoids the overhead
of spidering a web server and is much faster.  (C<-S fs> is the default
method if C<-S> is not specified).

If this is impossible (the web server is not local, or documents are dynamically
generated), Swish-e provides two methods of spidering. First, it includes the http method
of indexing C<-S http>. A number of special configuration directives are available that
control spidering (see L<SWISH-CONFIG/"Directives for the HTTP Access Method Only">).  A perl helper
script (swishspider) is included in the F<src> directory to assist with spidering web
servers. There are example configurations for spidering in the F<conf> directory.

As of Swish-e 2.2, there's a general purpose "prog" document source where
a program can feed documents to it for indexing.  A number of example
programs can be found in the C<prog-bin> directory, including a program
to spider web servers.  The provided spider.pl program is full-featured
and is easily customized.

The advantage of the "prog" document source feature over the "http" method
is that the program is only executed one time, where the swishspider.pl
program used in the "http" method is executed once for every document
read from the web server.  The forking of Swish-e and compiling of the
perl script can be quite expensive, time-wise.

The other advantage of the C<spider.pl> program is that it's simple and
efficient to add filtering (such as for PDF or MS Word docs) right into
the spider.pl's configuration, and it includes features such as MD5 checks
to prevent duplicate indexing, options to avoid spidering some files,
or index but avoid spidering.  And since it's a perl program there's no
limit on the features you can add.

=head3 Why does swish report "./swishspider: not found"?

Does the file F<swishspider> exist where the error message displays?  If not, either
set the configuration option L<SpiderDirectory|SWISH-CONFIG/"item_spiderdirectory">
to point to the directory where the F<swishspider> program is found, or place the
F<swishspider> program in the current directory when running swish-e.

If you are running Windows, make sure "perl" is in your path.  Try typing F<perl> from
a command prompt.

If you not running windows, make sure that the shebang line (the first line of the
swishspider program that starts with #!) points to the correct location of perl.
Typically this will be F</usr/bin/perl> or F</usr/local/bin/perl>.  Also, make sure that
you have execute and read permissions on F<swishspider>.

The F<swishspider> perl script is only used with the -S http method of indexing.

=head3 I'm using the spider.pl program to spider my web site, but some
large files are not indexed.

The C<spider.pl> program has a default limit of 5MB file size.  This can
be changed with the C<max_size> parameter setting.  See C<perldoc
spider.pl> for more information.

=head3 I still don't think all my web pages are being indexed.

The F<spider.pl> program has a number of debugging switches and can be
quite verbose in telling you what's happening, and why.  See C<perldoc
spider.pl> for instructions.

=head3 Swish is not spidering Javascript links!

Swish cannot follow links generated by Javascript, as they are generated
by the browser and are not part of the document.

=head3 How do I spider other websites and combine it with my own
(filesystem) index?

You can either merge C<-M> two indexes into a single index, or use C<-f>
to specify more than one index while searching.

You will have better results with the C<-f> method.


=head2 Searching

=head3 How do I limit searches to just parts of the index?

If you can identify "parts" of your index by the path name you have
two options.

The first options is by indexing the document path.  Add this to your
configuration:

    MetaNames swishdocpath

Now you can search for words or phrases in the path name:

    swish-e -w 'foo AND swishdocpath=(sales)'

So that will only find documents with the word "foo" and where the file's
path contains "sales".  That might not works as well as you like, though,
as both of these paths will match:

    /web/sales/products/index.html
    /web/accounting/private/sales_we_messed_up.html

This can be solved by searching with a phrase (assuming "/" is not
a WordCharacter):

    swish-e -w 'foo AND swishdocpath=("/web/sales/")'
    swish-e -w 'foo AND swishdocpath=("web sales")'  (same thing)


The second option is a bit more powerful.  With the C<ExtractPath>
directive you can use a regular expression to extract out a sub-set of
the path and save it as a separate meta name:

    MetaNames department
    ExtractPath department regex !^/web/([^/]+).+$!$1/

Which says match a path that starts with "/web/" and extract out
everything after that up to, but not including the next "/" and save it in
variable $1, and then match everything from the "/" onward.  Then replace
the entire matches string with $1.  And that gets indexed as meta name
"department".

Now you can search like:

    swish-e -w 'foo AND department=sales'

and be sure that you will only match the documents in the /www/sales/*
path.  Note that you can map completely different areas of your file
system to the same metaname:

    # flag the marketing specific pages
    ExtractPath department regex !^/web/(marketing|sales)/.+$!marketing/
    ExtractPath department regex !^/internal/marketing/.+$!marketing/

    # flag the technical departments pages
    ExtractPath department regex !^/web/(tech|bugs)/.+$!tech/


Finally, if you have something more complicated, use C<-S prog> and
write a perl program or use a filter to set a meta tag when processing
each file.

=head3 How is ranking calculated?

The C<swishrank> property value is calculated based on which Ranking Scheme (or algorithm)
you have selected. In this discussion, any time the word B<fancy> is used, you should
consult the actual code for more details. It is open source, after all.

Things you can do to affect ranking:

=over

=item MetaNamesRank

You may configure your index to bias certain metaname values more or less than others.
See the C<MetaNamesRank> configuration option in L<SWISH-CONFIG>.

=item IgnoreTotalWordCountWhenRanking

Set to 1 (default) or 0 in your config file. See L<SWISH-CONFIG>.
B<NOTE:> You must set this to 0 to use the IDF Ranking Scheme.

=item structure

Each term's position in each HTML document is given a structure value based on the context
in which the word appears. The structure value is used to artificially inflate
the frequency of each term in that particular document.
These structural values are defined in F<config.h>:

 #define RANK_TITLE		7
 #define RANK_HEADER		5
 #define RANK_META		3
 #define RANK_COMMENTS		1
 #define RANK_EMPHASIZED 	0


For example, if the word C<foo> appears in the title of a document, the Scheme
will treat that document as if C<foo> appeared 7 additional times.


=back


All Schemes share the following characteristics:

=over

=item AND searches

The rank value is averaged for all AND'd terms. Terms within a set of parentheses () are
averaged as a single term (this is an acknowledged weakness and is on the TODO list).

=item OR searches

The rank value is summed and then doubled for each pair of OR'd terms. This results
in higher ranks for documents that have multiple OR'd terms.

=item scaled rank

After a document's raw rank score is calculated, a final rank score is calculated using a
fancy C<log()> function. All the documents are then scaled against a base score of 1000.
The top-ranked document will therefore always have a C<swishrank> value of 1000.

=back

Here is a brief overview of how the different Schemes work. The number in parentheses after
the name is the value to invoke that scheme with C<swish-e -R> or C<RankScheme()>.

=over

=item Default (0)

The default ranking scheme considers the number of times a term appears in a 
document (frequency), the MetaNamesRank and the structure value. The rank might be summarized
as:

 DocRank = Sum of ( structure + metabias )

Consider this output with the DEBUG_RANK variable set at compile time: 

 Ranking Scheme: 0 
 Word entry 0 at position 6 has struct 7
 Word entry 1 at position 64 has struct 41
 Word entry 2 at position 71 has struct 9
 Word entry 3 at position 132 has struct 9
 Word entry 4 at position 154 has struct 9
 Word entry 5 at position 423 has struct 73
 Word entry 6 at position 541 has struct 73
 Word entry 7 at position 662 has struct 73
 File num: 1104.  Raw Rank: 21.  Frequency: 8 scaled rank: 30445
  Structure tally:
  struct 0x7 = count of 1 ( HEAD TITLE FILE ) x rank map of 8 = 8

  struct 0x9 = count of 3 ( BODY FILE ) x rank map of 1 = 3

  struct 0x29 = count of 1 ( HEADING BODY FILE ) x rank map of 6 = 6

  struct 0x49 = count of 3 ( EM BODY FILE ) x rank map of 1 = 3

Every word instance starts with a base score of 1.
Then for each instance of your word, a running
sum is taken of the structural value of that word position plus any bias you've configured.
In the example above, the raw rank is C<1 + 8 + 3 + 6 + 3 = 21>.

Consider this line:

  struct 0x7 = count of 1 ( HEAD TITLE FILE ) x rank map of 8 = 8

That means there was one instance of our word in the title of the file.
It's context was in the E<lt>headE<gt> tagset, inside the E<lt>titleE<gt>. 
The E<lt>titleE<gt> is the most specific structure, so it gets the
RANK_TITLE score: 7. The base rank of 1 plus the structure score of 7 equals 8. If there
had been two instances of this word in the title, then the score would have been C<8 + 8 = 16>.

=item IDF (1)

IDF is short for Inverse Document Frequency. That's fancy ranking lingo for taking into
account the total frequency of a term across the entire index, in addition to the term's
frequency in a single document. IDF ranking also uses the relative density of a word in a
document to judge its relevancy. Words that appear more often in a doc make that doc's rank
higher, and longer docs are not weighted higher than shorter docs.

The IDF Scheme might be summarized as:

  DocRank = Sum of ( density * idf * ( structure + metabias ) )

Consider this output from DEBUG_RANK:

 Ranking Scheme: 1 
 File num: 1104  Word Score: 1  Frequency: 8  Total files: 1451   
 Total word freq: 108   IDF: 2564  
 Total words: 1145877   Indexed words in this doc: 562   
 Average words: 789   Density: 1120    Word Weight: 28716   
 Word entry 0 at position 6 has struct 7
 Word entry 1 at position 64 has struct 41
 Word entry 2 at position 71 has struct 9
 Word entry 3 at position 132 has struct 9
 Word entry 4 at position 154 has struct 9
 Word entry 5 at position 423 has struct 73
 Word entry 6 at position 541 has struct 73
 Word entry 7 at position 662 has struct 73
 Rank after IDF weighting: 574321  
 scaled rank: 132609
  Structure tally:
  struct 0x7 = count of  1 ( HEAD TITLE FILE ) x rank map of 8 = 8

  struct 0x9 = count of  3 ( BODY FILE ) x rank map of 1 = 3

  struct 0x29 = count of  1 ( HEADING BODY FILE ) x rank map of 6 = 6

  struct 0x49 = count of  3 ( EM BODY FILE ) x rank map of 1 = 3

It is similar to the default Scheme, but notice how the total number of files in the index
and the total word frequency (as opposed to the document frequency) are both part of the
equation.

=back

Ranking is a complicated subject. SWISH-E allows for more Ranking Schemes to be developed
and experimented with, using the -R option (from the swish-e command) and the RankScheme
(see the API documentation). Experiment and share your findings via the discussion list.


=head3 How can I limit searches to the title, body, or comment?

Use the C<-t> switch.

=head3 I can't limit searches to title/body/comment.

Or, I<I can't search with meta names, all the names are indexed as
"plain".>

Check in the config.h file if #define INDEXTAGS is set to 1. If it is,
change it to 0, recompile, and index again.  When INDEXTAGS is 1, ALL
the tags are indexed as plain text, that is you index "title", "h1", and
so on, AND they loose their indexing meaning.  If INDEXTAGS is set to 0,
you will still index meta tags and comments, unless you have indicated
otherwise in the user config file with the IndexComments directive.

Also, check for the C<UndefinedMetaTags> setting in your configuration
file.

=head3 I've tried running the included CGI script and I get a "Internal
Server Error"

Debugging CGI scripts are beyond the scope of this document.
Internal Server Error basically means "check the web server's log for
an error message", as it can mean a bad shebang (#!) line, a missing
perl module, FTP transfer error, or simply an error in the program.
The CGI script F<swish.cgi> in the F<example> directory contains some
debugging suggestions.  Type C<perldoc swish.cgi> for information.

There are also many, many CGI FAQs available on the Internet.  A quick web
search should offer help.  As a last resort you might ask your webadmin
for help...

=head3 When I try to view the swish.cgi page I see the contents of the
Perl program.

Your web server is not configured to run the program as a CGI script.
This problem is described in C<perldoc swish.cgi>.


=head3 How do I make Swish-e highlight words in search results?

Short answer:

Use the supplied swish.cgi or search.cgi scripts located in the F<example> directory.

Long answer:

Swish-e can't because it doesn't have access to the source documents when
returning results, of course.  But a front-end program of your creation
can highlight terms.  Your program can open up the source documents and
then use regular expressions to replace search terms with highlighted
or bolded words.

But, that will fail with all but the most simple source documents.
For HTML documents, for example, you must parse the document into words
and tags (and comments).  A word you wish to highlight may span multiple
HTML tags, or be a word in a URL and you wish to highlight the entire
link text.

Perl modules such as HTML::Parser and XML::Parser make word extraction
possible.  Next, you need to consider that Swish-e uses settings such
as WordCharacters, BeginCharacters, EndCharacters, IgnoreFirstChar,
and IgnoreLast, char to define a "word".  That is, you can't consider
that a string of characters with white space on each side is a word.

Then things like TranslateCharacters, and HTML Entities may transform a
source word into something else, as far as Swish-e is concerned.  Finally,
searches can be limited by metanames, so you may need to limit your
highlighting to only parts of the source document.  Throw phrase searches
and stopwords into the equation and you can see that it's not a trivial
problem to solve.

All hope is not lost, thought, as Swish-e does provide some help.
Using the C<-H> option it will return in the headers the current index
(or indexes) settings for WordCharacters (and others) required to parse
your source documents as it parses them during indexing, and will return a
"Parsed Words:" header that will show how it parsed the query internally.
If you use fuzzy indexing (word stemming, soundex, or metaphone)
then you will also need to stem each word in your
document before comparing with the "Parsed Words:" returned by Swish-e.

The Swish-e stemming code is available either by using the Swish-e
Perl module (SWISH::API) or the C library (included with the swish-e distribution),
or by using the SWISH::Stemmer module available on CPAN.  Also on CPAN is
the module Text::DoubleMetaphone.  Using SWISH::API probably provides the best
stemming support.

=head3 Do filters effect the performance during search?

No.  Filters (FileFilter or via "prog" method) are only used for building
the search index database.  During search requests there will be no
filter calls.


=head2 I have read the FAQ but I still have questions about using Swish-e.

The Swish-e discussion list is the place to go.  http://swish-e.org/.
Please do not email developers directly.  The list is the best place to
ask questions.

Before you post please read I<QUESTIONS AND TROUBLESHOOTING> located
in the L<INSTALL|INSTALL> page.  You should also search the Swish-e
discussion list archive which can be found on the swish-e web site.

In short, be sure to include in the following when asking for help.

=over 4

=item * The swish-e version (./swish-e -V)

=item * What you are indexing (and perhaps a sample), and the number
of files

=item * Your Swish-e configuration file

=item * Any error messages that Swish-e is reporting

=back

=head1 Document Info

$Id: SWISH-FAQ.pod 2147 2008-07-21 02:48:55Z karpet $

.


���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/pod/SWISH-BUGS.pod��������������������������������������������������������������������0000775�0000771�0001750�00000004567�11166010103�012755� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������=head1 NAME

SWISH-BUGS - List of bugs known in Swish-e

=head1 DESCRIPTION

This file contains a list of bugs reported or known in Swish-e.  If
you find a bug listed here you do not need to report it as a bug.  But feel
free to bug the developers about it on the Swish-e discussion list.

Note that this list is imcomplete and may not be up to date.


=head1 Bugs in Swish-e version 2.4

=over 4

=item * Stopwords not removed from query with Soundex

In dev version 2.5.2 noticed that stopwords are not removed from the query
when using Soundex.  The plan is to rewrite the parser soon... (July 2004)

=item * Wild card searching can be very slow

Wild card searching needs to be optimized.

Here's a three letter search:

  $ swish-e -w 'tra*' -m1
  # Number of hits: 99952
  # Search time: 5.424 seconds

Two letters:

  $ swish-e -w 'tr*' -m1
  # Number of hits: 100000
  # Search time: 10.563 seconds

Single letter search:

  $ swish-e -w 't*' -m1
  # Number of hits: 100000
  # Search time: 510.939 seconds

and used about 280MB or RAM.

This is a potential for a DoS attack.  If you have a large index you may wish to filter
out single character wild cards.


=item * Character Encodings

The XML parser (Expat) returns UTF-8 data to swish-e.  Therefore, the XML
parser should only be used for parsing US-ASCII encoded text.

The XML2 & HTML2 parsers (Libxml2) converts characters from UTF-8 to 8859-1 encodings before indexing
and writing properties.  Indexing non-8859-1 data may result in invalid character mappings.

These issues will be resolved soon.

=item *

Phrase search failes with DoubleMetaphone

DoubleMetaphone searching can produce two search words for a single query
word.  The words are expanded to (word1 OR word2), but that fails in a
phrase query:   "some phrase (word1 or word2) here"

swish-e query parser is due for a rewrite, and this could be resolved then.

    Reported: August 20, 2002 - moseley

=item *

Merging

merge.c does not check for matching stopwords or buzzwords in each index.

History:

    Reported: September 3, 2002 - moseley


=item *

ResultSortOrder

ResultSort order is not used (and is not documented).  The problem is that
the data passed to Compare_Properties() does not have access to the
ResultSortOrder table.

=back

History:

    Reported: September 3, 2002 - moseley



=head1 Document Info

$Id: SWISH-BUGS.pod 1613 2005-02-02 22:53:39Z whmoseley $

.



�����������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/pod/README.pod������������������������������������������������������������������������0000664�0000771�0001750�00000020364�11166010103�012245� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������=head1 NAME

The Swish-e README File

=head1 Upgrading?

If you are upgrading Swish-e, please review the CHANGES file before installation.
The index format may change and existing indexes may need to be re-created before
use.

=head1 OVERVIEW

Swish-e is B<S>imple B<W>eb B<I>ndexing B<S>ystem for B<H>umans - B<E>nhanced.
Swish-e can quickly and easily index directories of files or remote web sites
and search the generated indexes.

Swish-e is extremely fast in both indexing and searching, highly
configurable, and can be seamlessly integrated with existing web sites to
maintain a consistent design. Swish-e can index web pages, but can just as
easily index text files, mailing list archives, or data stored in a
relational database.

Swish is designed to index small- to medium-sized collection of documents,
Although a few users are indexing over a million documents, typical usage
is more often in the tens of thousands.  Currently, Swish-e only indexes
eight bit character encodings.

Swish-e version 2.2 was a major rewrite of the code and the addition of many
new features.  Memory requirements for indexing have been reduced and
indexing speed is significantly improved from previous versions. New
features allow more control over indexing, better document parsing, improved
indexing and searching logic, better filter code, and the ability to index
from any data source.

Swish-e version 2.4 includes a major rewrite of the C API and a new Perl
module for accessing the Swish-e C library.  In addition, Swish-e 2.4 uses
the GNU Auto Tools.  The significant changes are where files are installed,
and the use of Libtool to create the Swish-e library as a shared library on
many platforms.  Basically, installation is easier than previous versions,
and more files are installed in "standard" locations (e.g. documentation
is installed in C<$prefix/share/doc/swish-e>).

Note: Due to the new build and installation system in Swish-e 2.4, some 
documentation may incorrectly list the location of files.  Please report
any documentation errors to the Swish-e Discussion list.

Swish-e is not a "turn-key" indexing and searching solution.  The Swish-e
distribution contains most of the parts to create such a system, but you
need to put the parts together as best meets your needs. This gives you the
power to index and search your documents the way you wish and to seamlessly
integrate a search engine into your web site or application.

To use Swish-e, you will need to configure Swish-e to index your documents,
create an index by running Swish-e, and setup an interface such as a CGI
script (a script is included) to search the index and display results. 
Swish uses helper programs to index documents of types that Swish-e cannot
natively index.  These programs may need to be installed separately from
Swish-e.

Swish-e is an Open Source (see: http://opensource.org ) program supported by
developers and a large group of users. Please take time to join the Swish-e
discussion list at http://Swish-e.org .


=head2 Key features

=over 4

=item *

Quickly index a large number of documents in different formats
including text, HTML, and XML.

=item *

Use "filters" to index other types of files such as PDF, gzip, or
PostScript.

=item *

Includes a web spider for indexing remote documents over HTTP.
Follows Robots Exclusion Rules (including META tags).

=item *

Can use an external program to supply documents to Swish-e, such as an
advanced spider for your web server or a program to read and format
records from a relational database.

=item *

Document "properties" (some subset of the source document, usually defined
as a META or XML elements) may be stored in the index and returned with
search results.

=item *

Document summaries can be returned with each search.

=item *

Word stemming, soundex, metaphone, and double-metaphone indexing for "fuzzy" searching

=item *

Phrase searching and wildcard searching

=item *

Limit searches to HTML links.

=item *

Use powerful Regular Expressions to select documents for indexing or exclusion.

=item *

Easily limit searches to parts or all of your web site.

=item *

Results can be sorted by relevance or by any number of properties
in ascending or descending order.

=item *

Limit searches to parts of documents such as certain HTML tags
(META, TITLE, comments, etc.) or to XML elements.

=item *

Can report structural errors in your XML and HTML documents.

=item *

Index file is portable between platforms.

=item *

A Swish-e library is provided to allow embedding Swish-e into your applications for
very fast searching. 
A Perl module is available that provides a standard API for accessing Swish-e.

=item *

Includes example search script with context summaries and search term and phrase highlighting.
Can be used with popular Perl templating systems.

=item *

Swish-e is fast.

=item *

It's Open Source and FREE!  You can customize Swish-e and you can
contribute your fancy new features to the project.

=item *

Supported by on-line user and developer groups.

=back


=head1 Where do I get Swish-e?

The current version of Swish-e can be found at:

http://Swish-e.org

Please make sure you use a current version of Swish-e.

Information about Windows binary distributions can also be found at
this site.

=head1 How Do I Install Swish-e?

Read the L<INSTALL|INSTALL> page.

Building from source is recommended.  On most platforms, Swish-e should build without problems.
A list of platforms where Swish-e has been built can be found in the L<INSTALL|INSTALL> page.
Information on building for VMS and Win32 can be found in sub-directories of the C<src> directory.
Check the Swish-e site for information about binary distributions (such as for Windows).

In addition to the INSTALL page, make sure you read the L<SWISH-FAQ|SWISH-FAQ> page if
you have any questions, or to get an idea of questions that you might someday ask.

Problems or questions about installing Swish-e should be directed to the Swish-e discussion list (see the
Swish-e web site at http://Swish-e.org).

Please read C<Where do I get help with Swish-e?> below before posting any questions to the
Swish-e list.


=head1 The Swish-e Documentation

Documentation is provided as HTML pages installed in
$prefix/share/doc/swish-e where $prefix is /usr/local if building from
source, or /usr if installed as part of a package from your OS vendor.
Under Windows $prefix is selected at installation time.

A subset of the documentation is installed as system man pages as well.

Documentation is also available on-line at http://swish-e.org.

Patches or updates to the documentation should be done against the POD files,
located in the pod directory of the distribution, or (preferably) against
the CVS repository.

=head1 Where do I get help with Swish-e?

If you need help with installing or using Swish-e, please subscribe to
the Swish-e mailing list.  Visit the Swish-e web site (listed above)
for information on subscribing to the mailing list.

Before posting any questions, please read
L<QUESTIONS AND TROUBLESHOOTING|INSTALL/"QUESTIONS AND TROUBLESHOOTING">.

=head1 Speling mistakes

Please contact the Swish-e list with corrections to this documentation.
Any help in cleaning up the docs will be appreciated!

Any patches should be made against the C<.pod> files, not the C<.html> files.

=head1 Swish-e Development

Swish-e is currently being developed as an Open-Source project on
SourceForge http://sourceforge.net.

Contact the Swish-e list for questions about Swish-e development.

=head1 Swish-e's History

SWISH was created by Kevin Hughes, circa 1994, to fill the need of the growing number
of Web administrators on the Internet - many of the indexing systems were
not well documented, were hard to use and install, and were too complex
for their own good. The system was widely used for several years, long
enough to collect some bug fixes and requests for enhancements.

In Fall 1996, The Library of UC Berkeley received permission from
Kevin Hughes to implement bug fixes and enhancements to the original
binary. The result is Swish-enhanced or Swish-e, brought to you by the
Swish-e Development Team.

=head1 Document Info

Each document in the Swish-e distribution contains this section.
It refers only to the specific page it's located in, and not to the
Swish-e program or the documentation as a whole.

$Id: README.pod 1663 2005-02-11 17:00:13Z whmoseley $

.
����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/pod/CHANGES.pod�����������������������������������������������������������������������0000664�0000771�0001750�00000143154�11166011035�012370� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������=head1 NAME

CHANGES - List of revisions

=head1 OVERVIEW

This document contains list of bug fixes and feature additions to Swish-e.

=head2 Version 2.4.7 - 4 April 2009

=over 4

=item Added ReturnRawRank for raw rank score

Setting ReturnRawRank to a true value
will return the rank score unscaled. Can be set with the -a command line
option (mnemonic: "a"bsolute rank score).

=item Yanked setenv feature introduced in 2.4.6

The ranking debugging feature using setenv introduced in 2.4.6 was yanked.
Some platforms (notably HP-UX and Windows) lack the setenv feature, and the
convenience of setting the env var was not worth the limitations.

=back

=head2 Version 2.4.6 - 10 March 2008

=over 4

=item MinWordLength respected in query parser

Clark Vent reported that the query parser was not respecting MinWordLength
settings.  See http://dev.swish-e.org/changeset/2145

=item Patch to file.c.  

The file.c patch was in response to
http://swish-e.org/archive/2007-03/11321.html
although that user never responded about that patch.

=item SWISH_DEBUG_RANK env var now enables rank debugging

Set SWISH_DEBUG_RANK to a true value to enable lots of rank debugging
on stderr.

=item Perl Makefile.PL patched to fix MakeMaker issue

Recent versions of ExtUtils::MakeMaker revealed a bug in Makefile.PL.
Patch from mschwern via RT, report by mpeters.

=item LARGEFILE support detected automatically in configure

jrobinson852@yahoo.com suggest LARGEFILE support be auto-detected since
it is needed so often on Linux systems.

=item New Snowball stemmers

Trygve Falch contributed patches to update
the Snowball stemmers, including new Hungarian and Romanian stemmers.

=item Patched leaks

Anthony Dovgal patched two leaks.  One when there's a failure to
open a file the file name was not freed.

SwishSetSearchLimit() was nulling the search limits when an error was
found in the parameters, but not freeing the existing limits.

=item Leak in SwishResetSearchLimit

Fixed a leak if a limit was set and then reset but not prepared.
Patch provided by Antony Dovgal.

=item New API functions added

Added SwishGetStructure() and SwishGetPhraseDelimiter() functions which return
relevant properties of the search object.
Patch provided by Antony Dovgal.


=back

=head2 Version 2.4.5 - 22 Jan 2007

=over 4

=item Fixed 'deflate' handling in spider.pl

spider.pl was using the wrong method do uncompress HTTP responses that were
'deflate' encoded.  Also decode content based on the document's charset and
encode back to charset before outputting.

=item re-indexing required

The magic numbers in src/swish.h were changed to require re-indexing from
version 2.4.4 indexes. This should have been done in 2.4.4 as well, and anytime
the index format changes. -- karman

=item fixed stemmer bug introduced in 2.4.4

stemmer.c had a mix up in the deprecated stemmer assignments for "Stemmer_en"
and "Stem". Also fixed stemmer.h so that 2.4.3 indexes can be read correctly.
-- karman

=item Now fork/exec to run filters

FileFilter* was using popen to run the filter, which could pass user
data though the shell.  Now uses fork/exec if fork is available which
should be everywhere except Windows.  In windows popen is used but all
parameters are double-quoted. -- moseley

=item fixed signed/unsigned warnings from gcc 4.x

Cleaned up search.c to catch mismatched signedness warnings from newer GCC versions.
This issue pre-existed 2.4.4 but the new wildcard features in search.c made for a lot
more warnings. -- karman

=item Makefile.mingw included in distrib

Modified root Makefile to include the perl/Makefile.mingw file. -- karman

=back

=head2 Version 2.4.4 - 11 Oct 2006

=over 4

=item Version 2.4.4 RC1

Release Candidate 1 for 2.4.4, 2 Oct 2006.

=item quote fix for FileFilter config param

Ludovic Drolez contributed a patch to fix a quoting issue with filenames. This affects
non-Windows builds only.

=item SWISH::Filter now on CPAN

SWISH::Filter is now available on http://cpan.org/. The version in the distribution is
B<not> kept in sync with the CPAN version. Install the CPAN version if you want
the latest and greatest version.

=item SWISH::API updated to 0.04

Added several fixes, including:

=over

=item Perlish method names from mpeters@plusthree.com

=item switched to XSLoader with DynaLoader as fallback

=item added VERSION method to satisfy some versions of MakeMaker

=item Fuzzify() method now actually works as advertised

=back

=item added proximity feature and single character wildcard with '?' instead of '*'

Herman Knoops contributed these patches.
See http://swish-e.org/archive/2006-05/10543.html

Error messages were also changed to better reflect correct use of wildcards.

=item fixed bug when using DoubleMetaphone

Fixed problem reported by Andreas Völter where a query that generated a
two-word query with DoubleMetaphone fuzzy mode was not working.

=item fix sparc64 property issue

Sorithy Seng (pourlassi@gmail.com) submitted a patch against docprop.c to fix
an issue on sparc64 platforms. It is unknown whether this bug affected other 64-bit
architectures.

=item fixed bug when StopWords resulted in no unique words

Added check in db_native.c to check that some words exist before writing index.

=item updates to SWISH-RUN.1

Added doc for -u and -r options.

=item filename only in SWISH::Filters

added fix to SWISH::Filters::pp2html and SWISH::Filters::XLtoHTML to 
save only filename as title without full path

=item Removed Stem and Stemmer_en

The legacy Porter stemmer was removed. This had been deprecated some time ago.
A warning will issue if the old stemmer is indicated in config file, and Stemmer_en1
will be used instead.

=item GPL'd all the source files with the new Swish-e License

After a source code review, the developers decided to put Swish-e under the GPL
with a special exception for linking against libswish-e. See http://swish-e.org/license.html
for the details.

=item Fixed Segfault with updating incremental index

Dobrica Pavlinusic reported a segfaut after updating an index multiple times.
José provided updated worddata.c.  - April 27, 2005

=item Fixed NOT check with incremental indexes

Swish was returning results for deleted files when the NOT operator was used.

=item Fixed bug when using old parsers with zero length input

Thomas Angst reported swish consuming memory when using -S prog
to process large number of empty documents.

When -S prog generated a zero length file the old parsers (e.g. TXT) would
attempt to read in *all* content from the -S prog program into a buffer.
The old parser incorrectly assumed it was reading from a filter and tried to
read to eof().

=item Changes to ParserWarnLevel

The default value for ParserWarnLevel was changed form zero to two.

The ParserWarnLevel controls the error handling of the libxml2 parser. The higher
the setting, the more verbose the output. The change to the default is to report
when libxml2 has problems parsing a document (which often times results in processing
only part of a document).

To get the old behavior, either set ParserWarnLevel to zero in your config file,
or use the new -W command line option to set the ParserWarnLevel at run time.
If ParserWarnLevel is set in the config file, it will override the -W option.

Also, to see UTF-8 to 8859-1 conversion errors set ParserWarnLevel to 3 or more.  Previously,
these warning were issues at ParserWarnLevel of one.

=item Documentation changes

Removed all the target documentation (html, pdf, ps) from cvs.  There's now a separate
cvs module "swish_website" that is used to generate both the website and the html
docs.  If building swish-e from cvs please see the README.cvs file for instructions.

=item Fixed bug in pre-sorted indexes with USE_BTREE

Gunnar Mätzler reported a problem with reading the pre-sorted property index
tables when running with USE_BTREE (--enable-enremental).  Not all entries were
being written to disk.  There was/is a question if the "array" code used for
pre-sorted indexes with USE_BTREE would be slower.  So, added a separate 
define USE_PRESORT_ARRAY to enable that code when USE_BTREE is set.  This allows
using the old integer arrays with USE_BTREE.  Gunnar reported that this is working,
but more testing is needed.  Need to compare speed of the array code vs. the non-array
code, and to verify the workings of USE_PRESORT_ARRAY code.

=item Add strcoll() usage for sorting properties

Andreas Seltenreich provided a patch to use strcoll when sorting properties.
strcoll is locale dependent.

=item Fix incremental indexing when adding back a file

Jose fixed a problem with incremental indexing where a file could not be
added back to the index once removed.

Patch initially provided by Dobrica Pavlinusic:

    http://swish-e.org/Discussion/archive/2004-12/8694.html



=item Documentation correction

A change in the default way the index is compressed was not documented
in 2.4.3.  The change resulted in larger indexes.  See CompressPositions
below and in SWISH-CONFIG.

=item libxml2 UTF-8 conversion failures

Fixed issue where a UTF-8 to Latin1 encoding failure would skip
more input than just the failed character.   Libxml2 passes swish text
that is not null terminated, but the libxml2 functions to skip UTF-8
chars expected a null-terminated string.  Replace libxml2 call with
fixed version.

=back

=head2 Version 2.4.3 December 9, 2004

=over 4

=item New config directive: CompressPositions

This option enables zlib compression for word data in the index.
Previously word data was always compressed but resulted in slower
wildcard searches.  The default now is to not compress the word data,
but results in larger index files.  Set to "YES" to get pre-2.4.3 index
sizes.

[This CHANGES entry was added after 2.4.3 was released]

=item Improved error messsages when using incremental indexing

There was a bit of confusion on how to use incremental indexing (still
experimental) so added better logic for error messages.

Also fixed a logic error when setting the incremental update mode.  Caught by
Paul Loner.

=back

=head2 Version 2.4.3-pr1 - Wed Dec  1 09:52:50 PST 2004

=over 4

=item "Fixed" libxml2's change in UTF8Toisolat1() return value

Bernhard Weisshuhn supplied a patch to parser.c for checking the return value of
UTF8Toisolat1().  Seems that libxml2 now returns the number of characters converted
instead of zero for success.

   http://bugzilla.gnome.org/show_bug.cgi?id=153937

=item Added swish-config and pkg-config

Swish now provides a swish-config script and config file for the pkg-config
utility.  These tools help when building programs that link with the swish-e
library.

The SWISH::API Makefile.PL program uses swish-config to locate the installation
directory of swish-e.  This should make building SWISH::API easier when swish-e
is installed in a non-standard location.

=item Fixed rank bias in merge

Peter van Dijk noticed that MetaNamesRank settings were not being copied to the output
index when merging.

=item Added SwishFuzzy function

SwishFuzzy function (SWISH::API::Fuzzy) lets you stem a word without first searching.
This might be helpful for playing with queries prior to the search.


=item Fixed translate character table

Michael Levy found an error in the table used to translate 8859-1 to
ascii7.  Luckily, it was an upper case translation and the table is only used on lower
case characters.

=item MetaNamesRank documentation

Changed the 'not yet implemented' caveat to 'implemented but experimental'.

=item Added Continuation option to config processing

You can now use continuation lines in the config file:

    IgnoreWords \
        the \
        am \
        is \
        are \
        was

There may not be any characters following the backslash.

=item Fixed Buzzwords (and other word lists entered in the config)

Words entered in config were not converted to lower case before storing in the index.


=item Fixed metaname mapping problem in Merge

Peter Karman found an error when merging indexes where the source indexes had the
same metanames, but listed in a different order in their config files.  Words
would then be indexed under the wrong metaID number in the output index.


=item SWISH::Filters and spider.pl updates

The web spider F<spider.pl> was updated to work better with SWISH::Filter
by default and also make it easier to use the spider default along with
a spider config file.  See spider.pl for details.

SWISH::Filter was updated.  The way filters are created has changed.
If you created your own filters you will need to update them.  Take a look
at SWISH::Filter and the filters included in the distribution.

=item Updates to Documentation

Richard Morin submitted formatting and punctuation dates to the README and
INSTALL docs.

=item Added -R option to support IDF word weighting in ranking. (karman)

Added Inverse Document Frequency calculation to the getrank() routine.
This will allow the relative frequency of a word in relationship to other
words in the query to impact the ranking of documents.

Example: if 'foo' is present twice as often as 'bar' in the collection as a whole,
a search for 'foo bar' will weight documents with 'bar' more heavily (i.e., higher
rank) than those with 'foo'. 

The impact is greatest when OR'ing words in a query rather than
AND'ing them (which is the default).

Also added Rank discussion to the FAQ.


=item Updates to the example scripts

Updated PhraseHighlight.pm as suggested by Bill Schell for an optimization
when all words in a document are highlighted.

Updated search.cgi and PhraseHighlight.pm to use the internal stemmers via
the SWISH::API module as suggested by Jonas Wolf.


=item Leak when using C library

David Windmueller found a memory leak when calling multiple searches
on a swish handle.  The problem was swish loading the pre-sorted
property index on every search, even after the table had been loaded
into memory.

=item Swish.cgi now kills swish-e on time out

The example script F<swish.cgi> uses an alarm (on platforms that support
alarm) to abort processing after some number of seconds, but it was not
killing the child process, swish-e.  Bill Schell submitted a patch to kill
the child when the alarm triggers.

=item The template search.tt was renamed to swish.tt

The template was renamed because it's used by F<swish.cgi>, not by
F<search.cgi>, which was confusing.

=item Updates to the search.cgi

The example script F<search.cgi> was updated to work better with mod_perl
and to use external template files and style sheets.


=item New MS Word Filter

James Job provided the SWISH::Filter::Doc2html filter that uses
the wvWare (http://wvware.sourceforge.net/) program for filtering
MS Word documents.  If both catdoc and wvWare are installed then wvWare
will be used.

wvWare is reported to do a good job at converting MS Word docs
to HTML.  In a few tests it did work well, but other cases it
failed to generate correct output.  It was also much, much slower
than catdoc.  I tested with wvWare 0.7.3 on Debian Linux.  Testing with
both is recommended.

=item Change in way symbolic links are followed

John-Marc Chandonia pointed out that if a symlink is skipped
by FileRules, then the actual file/directory is marked as
"already seen" and cannot be indexed by other links or directly.

Now, files and directories are not marked "already seen" until
after passing FileRules (i.e after a file is actually indexed
or a directory is processed).

=item Could not set SwishSetSort() more than once

David Windmueller found a problem when trying to set the sort
order more than once on an existing search object.  Memory was not
correctly reset after clearing the previous sort values.

=item Access MetaNames and PropertyNames from API

Patch provided by Jamie Herre to access the MetaNames and PropertyNames
via the C API and to test via the testlib program.  Swish::API also updated
to access this data.

=item SwishResultPropertyULong() bug fixed

David Windmueller reported that SwishResultPropertyULong() was 
returning ULONG_MAX on all calls.  This was fixed.

=item Null written to wrong location in file.c

Bill Schell with the help of valgrind found a null written past the end of a
buffer in file.c in the code that supports the old parsers.  This resulted in a
segfault while indexing a large set of XML documents.

=item Fixed problem when indexing very large files

Steve Harris reported a problem when indexing a very large document that 
caused an integer overflow.  José Ruiz updated to used unsigned integers.

=item Bump word position on block tags with HTML2 parser

Peter Karman pointed out the the libxml2 HTML parser was allowing phrase
matches across block level html elements.  Swish now bumps the word
position on these elements.


=back

=head2 Version 2.4.2 - March 09, 2004

=over 4

=item * UseStemming didn't take no for an answer

UseStemming was coded as an alias for FuzzyIndexingMode when Snowball was
compiled in (the default), but "no" doesn't always mean no when the Norwegian
stemmer is available.

=item * Fixed problem building incremental version

Fixed compile problem with building incremental indexing mode.  This is an
experimental option with swish-e to allow adding files to an index.
See configure --help for build option.  Incremental indexes are not
compatible with standard indexes.

=item * Updated build instructions in INSTALL

Added a few comments about use of CPPFLAGS and LDFLAGS.

=item * Updated the index_hypermail.pl

Updated to work with latest version of hypermail (pre-2.1.9).


=item * Time zone in ResultPropertyStr()

Format string for generating date did not include the time zone in location.
Add strftime format string to config.h

=item * Undefined and Blank Properties and (NULL)

Fixed a few problems with printing properties:

1) Using -p and -x showed different results if a bad property value was given:

    $ swish-e -w not dkdk -p badname -H0
    err: Unknown Display property name "badname"
    .
    $ swish-e -w not dkdk -x '<badname>\n' -H0
    (NULL)

Now both return an error.

2) Fixed bug where using a "fmt" string with -x output generated (bad) output
if the result did not have the specified property.

    $ swish-e -w not dkdk -x '<somedate>\n' -H0  # undefined value

    $ swish-e -w not dkdk -x '<somedate fmt="%Y %B %d">\n' -H0
    %Y %B 1075353525

Now nothing is printed if the property does not exist.

3) Updated SWISH::API to croak() on invalid property names, and to return
undefined values for missing properties.

4) Updated swish.cgi and search.cgi to not generate warnings on undefined values
return as properties.  Note that swish.cgi will now die on undefined properties.
Previously would just display (NULL).


=item * Fixed segfault when generating warnings while parsing

Parser.c was incorrectly calling warning() incorrectly.
And -Wall was not catching this!

=item * Added check for internal property names.

Parser was not checking for use of Swish-e reserved property
names.

   <swishrank>foo</swishrank>

This will now generate a warning.

=back

=head2 Version 2.4.1 - December 17, 2003

=over 4

=item * Added new example CGI script

search.cgi is a new skeleton CGI script that uses SWISH::API for searching.
It is installed in the same location as swish.cgi.

=item * Add Fuzzy access to C and Perl interfaces

Added a number of functions to the C API (and SWISH::API)
to access the stemmer used when indexing a given index.

=item * Commas in numbers

Added commas to summary display at end of indexing.

=item * Insert whitespace between tags

Parser.c was updated to flush the text buffer before and after
every (non-inline HTML) tag.

The problem was that:

    foo<tag>bar</tag>baz

would index as a single word "foobarbaz".

=item * DirTree.pl

DirTree.pl was updated to work with SWISH::Filter and to work on Windows.
DirTree.pl is a program to fetch files from the file system and works with
the -S prog input method.

=item * Problem with --enable-incremental option

Fixed configure script to build incremental option.  Note that this is still
experimental.  But testers are welcome.

=item * headers.c bug

Mark Fletcher with the help of valgrind found a bug in headers.c
function SwishIndexHeaderNames used by the C API.

=item * Clarify documentation regarding search order

At the prompting of Doralyn Rossmann updated SEARCH.pod to
try and make the explanation of searching clearer, and to fix an error
in the description of nested searches.

=back

=head2 Version 2.4.0 - October 27, 2003

=over 4

=item * Note: Different Index Format

Swish-e version 2.4.0 has a different index file format from previous
versions of Swish-e.  Upgrading will B<require> reindexing -- version 2.4.0
cannot read indexes created with previous versions.

=back

=head2 Version 2.4.0 (Release Candidate 4)  September 26, 2003

=over 4

=item * robots.txt not closed correctly

When using -S http method robots.txt was not closed and that caused
the (last) .contents file to not be unlinked under Windows.  Windows
seems to think filenames are related to files.

=item * SWISH::Filter and locating programs on Windows

SWISH::Filter now scans $libexecdir in addition to the PATH for programs (such at catdoc and
pdftotext), and also checks for programs by adding the extensions ".exe" and ".bat" to the 
program name.

=item * Install sample templates

The sample templates included with swish.cgi are now installed
in $pkgdatadir (typically /usr/local/share/swish-e).

=back

=head2 Version 2.4.0 (Release Candidate 3)  September 11, 2003

=over 4

=item * Fix parser bug meta=(foo*)

Fixed bug in query parser caused in rc2's (pr2) attempt to catch wildcards
errors. 

=back

=head2 Version 2.4.0 (Release Candidate 2)  September 10, 2003

=over 4

=item * Indexing HTML title

Fixed a problem when these were used in combination:

  MetaNames swishtitle
  MetaNameAlias swishtitle title

That failed to correctly reset the metaname stack and indexed text under
the wrong metaID.

=item * Single Wildcards

Due to the way the query parser "works" a search of

   "foo *"

would result in a search of "foo*".  Now that results in:

   err: Single wildcard not allowed as word 

=item * Fixed search parsing bug

Brad Miele reported that the word "andes" was not being found.  It was being
stemmed to "and" when was then considered an operator.  [moseley]

=item * Add new directive PropertyNamesSortKeyLength

PropertyNamesSortKeyLength sets the sort key length to use when sorting
string properties.  The default is 100 characters.  There was a hard-coded
100 char limit before, but that was a problem where people were not building
from source (Windows).  The value of this is questionable -- it's intended to
limit how much memory is used when sorting while indexing and searching. [moseley]

=item * Fixed sorting issues with multiple indexes and reverse sorting

Reworked much of the sorting code.  Still to do is setting the character sort order. 
[moseley]

=item * Fixed minor memory leak

Fixed leak of not releasing memory of index file name and swish_handle 
destroy, and fixed SwishStemWord to default to the Stemmer_en. [moseley]

Fixed libtest.c example program that was not cleaning up memory after an
error condition.

=item * Replaced Swish-e's Porter Stemmer with Snowball

Swish-e now has support for Snowball stemmers (http://snowball.tartarus.org/).
The stemmers are enabled for an index with FuzzyIndexingMode Stemming_* where "*" can be:

  de, dk, en1, en2, es, fi, fr, it, nl, no, pt, ru, se

In addition, UseStemming yes or FuzzyIndexingMode Stemming_en will use the old stemmer.

=back

=head2 Version 2.4.0 (Release Candidate 1)  May 21, 2003

=over 4

=item * Security Fix: swish.cgi

The swish.cgi script was not correctly escaping HTML when searching by 
the right combination of metanames and highlighting module.  This could
lead to cross-site scripting if indexing un-trusted documents. [moseley]

=item * Added Support for building a Debian Package

To build as a .deb unpack the distribution and chdir then run

   $ fakeroot debian/build binary

Then install the generated .deb file with dpkg -i

=item * Use SWISH::Filter by default with spider.pl

spider.pl is installed in the libexecdir directory as well as the SWISH::Filter modules. 
PDF, MS Word, MP3, and XML documents will be indexed automatically if the required helper
applications (e.g. catdoc, pdftotext) or scripts (e.g. MP3::Tag) are installed.

Swish also knows about libexecdir, so you you specify a relative path with -S prog
swish-e will look for the program in libexecdir.  This is mostly for spider.pl so
indexing only requires:

    IndexDir spider.pl
    SwishProgParameters default http://localhost/index.html

And swish-e will find spider.pl and SWISH::Filter will be used to convert docs.

=item * Fixed Document-Type bug

Document-Type was not being reset after set input from a -S prog program causing
the wrong parser to be used. [moseley]

=item * New Directive: PropertyNamesNoStripChars

Swish replaces all series of low ASCII chars with a single space
character.  This option instructs swish to store all chars in the property. [moseley]

=item * Change HTTP access defaults

Defaults used with -S http access method were changed.


Delay was reduced from one minute between start of each request to five seconds
between requests. 

MaxDepth was changed from five to zero, meaning there is no limit to depth indexed by
default. [moseley]

=item * swishspider location and SpiderDirectory

The swishspider program is now installed in $prefix/lib/swish-e by default.  This can
be changed by the --libexecdir option to configure.  

The SpiderDirectory option now defaults to the value of libexecdir instead of the current
directory. [moseley]


=item * Added libtool and automake support

Replaces the build system with Autotools.  Now builds libswish-e as
a shared library on systems that support shared libraries.
The swish-e binary links against this shared library.
Can also build outside the source tree on platforms with GNU make. [moseley]

=item * Updates to installation

Running "make install" now installs additional files.
Files include the swish-e binary, the libswish-e search library, swish-e.h
header, documentation files, the swishspider program, and Perl modules used for the example
swish.cgi search script. Directories will be created if they do not already exist.
Installation directories can be specified at build time.

=item * Fixed bug when searching at end of inverted index

Swish was not correctly detecting the end of the inverted index
when searching a wildcard word that was past the last word in the index.
Caught by Frank Heasley. [moseley]


=item * Increase sort key length from 50 to 100 characters

The setting MAX_SORT_STRING_LEN in F<src/config.h> sets the max length used 
when sorting in swish-e.  You may reduce this number to save memory while 
sorting, or increase it if you have very long properties to sort.

=item * Remove " entity from -p output

The -p option to print properties was escaping double quotes in properties 
with the &quot; entity.  -x does not do that, so inconsistent.  -p no longer
converts double quotes.  The user should pick a good delimiter with -d or preferably use
the -x method for generating output.

=item * XML parser and Windows

The XML parser was being passed the incorrect buffer length when used on Windows
platform causing the parser to abort with an error.

=item * Version Numbering

SWISH-E versions starting with 2.3.4 use kernel version numbering.  Versions are 
in the form: Major.Minor.Build.  Odd minor versions are development.  Even minor 
versions are releases.  2.3.4 would be a development version.  
2.4.0 would be a release version.  2.3.20 would be the 20th build of 2.3.

=item * Added RPM support

RPMs can be built with:

    ./configure
    make dist

Copy the resulting tarball to RPM's SOURCES directory and then run as a superuser:

    rpmbuild -ba rpm/swish-e.spec


You should have swish-e packages in your RPMS/$arch directory.  [augur]

=item * Changed default perl binary location

Most perl scripts provided with SWISH-E now use /usr/bin/perl by default.
Note that some scripts are generated at build time, so those will look in the
path for the location of the perl binary.

=item * New Feature: MetaNamesRank

MetaNamesRank can be used to adjust the ranking for words based on
the word's MetaName.

=item * New Swish Library API and Perl Module

The Swish-e C library interface was rewritten to provide
better memory management and better separation of data.
Most indexing related code has been removed from the library.
A new header file is provided for the API: swish-e.h.

The Perl module SWISHE was replaced with the SWISH::API module
in the Swish-e distribution.

B<Previous versions of the SWISHE module will not work with this version of Swish-e.>

If you are using the SWISHE module from a previous version of Swish then you must
either rewrite your code to use the new SWISH::API module (highly recommended)
or use the replacement SWISHE module.  The replacement SWISHE module is a thin
interface to the SWISH::API module.  It can be downloaded from

    http://swish-e.org/Download/old/SWISHE-0.03.tar.gz

=item * NoContents not working with libxml2 parser

Corrected problem when using NoContents with binary files and the HTML2 parser.

Trying to index image file names with:

    IndexOnly .gif .jpeg
    NoContents .gif .jpeg

failed to index the path names because the default parser
(HTML2 when libxml2 is linked with swish-e)
was not finding any text in the binary files. [moseley]

=item * Updates to swish.cgi

The example/swish.cgi script can now use the SWISH::API module
for searching an index.  Combined with mod_perl this module
can improve search performance considerably.

The Perl modules used with the swish.cgi script have all been moved into
the SWISH::* namespace.  Hence, files in the F<modules> directory were moved
into the F<modules::SWISH> directory.

=back

=head2 Version 2.2.3 - December 11, 2002

Multiple -L options were ORing instead of ANDing.
Catch by Patrick Mouret. [moseley]

=head2 Version 2.2.2 - November 14, 2002

Pass non- text/* files onto indexing code IF there is a FileFilter
associated with the *extension* of the URL.  Fixes the problem of not
being able to index, say, pdf files by using the FileFilter configuation
option.

Fixed bug where nulls were stripped when using FileFilter with -S prog.
Catch by Greg Fenton. [moseley]

=head2 Version 2.2.1 - September 26, 2002

=over 4

=item * NoContents with -S prog

Failed to use the correct default parser when using the No-Contents header
and libxml2 linked in. [moseley]

=item * Add tests for IRIX and sparc machines

8-byte alignment in mem_zones is is required for these machine [moseley]


=item * Fixed code when removing files

Was not correctly removing words from index when parser aborted [jmruiz]

=item * Merge segfault

Fixed segfault caused by trying to print null dates while merging
duplicate files. [moseley]

=item * Documentation patches

Spelling corrections to the SWISH-CONFIG pod page [Steve Eckert]

=item * Configure corrections

Fixed a zlib test error that used "==" in a test [Steve Eckert]

=item * Updates to VMS build

The VMS build was updated [Jean-François PIÉRONNE]

=item * MANIFEST corrections

Added missing filters and vms build file into MANIFEST [moseley]

=back

=head2 Version 2.2 - September 18, 2002


=over 4

=item * Default parser

Swish-e will now use the HTML2 (libxml2) parser by default if libxml2 is
installed and DefaultContents or IndexContents is not used.

=item * Selecting parsers

Allow HTML*, XML*, and TXT* to automatically select the libxml2-based parsers
if libxml2 is linked with Swish-e, otherwise fallback to the built-in parsers.

=item * SwishSpider and Filters

Filters (FileFilter directive) did not work correctly when spidering
with the -S http method.  A new filter system was developed and now
filtering of documents (e.g. pdf-E<gt>html or MSWord-E<gt>text) is handled
by the src/SwishSpider program.

When indexing with the -S http method only documents of content-type "text/*"
are indexed.  Other documents must be converted to text by using the filter system.

=item * Buffer overflow in xml.c

Fixed bug in xml.c reported by Rodney Barnett when very long words
were indexed. [moseley]

=item * configure script updates

Updated from _WIN32 checks to feature checks using autoconf [moseley, norris]

=item * updates to run on Alpha (Linux 2.4 (Debian 3.0))

Fixed a cast error when calling zlib, and the calls to read/write a packed longs
to disk. [jmruiz, moseley]

=item * COALESCE_BUFFER_MAX_SIZE

Some people were seeing the following error:

    err: Buffer too short in coalesce_word_locations.
    Increase COALESCE_BUFFER_MAX_SIZE in config.h and rebuild.

This was due to indexing binary data or files with very large number of words.
The best solution is to not index binary data or files with a very large number
of words.

Swish-e will now automatically reallocate the buffer as needed.  [jmruiz]


=back

=head2 Version 2.2rc1 - August 29, 2002

Many large changes were made internally in the code, some for performance
reasons, some for feature changes and additions, and some to prepare
for new features in later versions of Swish-e.

=over 4

=item * Documentation!

Documentation is now included in the source distribution as .pod
(perldoc) files, and as HTML files.  In addition, the distribution can now
generate PDF, postscript, and unix man pages from the source .pod files.
See L<README|README> for more information.

=item * Indexing and searching speed

The indexing process has been imporoved.  Depending on a number of
factors, you may see a significant improvement in indexing speed,
especially if upgrading from version 1.x.

Searching speed has also been improved.  Properties are not loaded until
results are displayed, and properties are pre-sorted during indexing to
speed up sorting results by properties while searching.

=item * Properties are written to a sepearte file

Swish-e now stores document properties in a separate file.  This means
there are now two files that make up a Swish-e index.  The default files
are C<index.swish-e> and C<index.swish-e.prop>.

This change frees memory while indexing, allowing larger collections to
be indexed in memory.

=item * Internal data stored as Properties

Pre 2.2 some internal data was stored in fixed locations within the
index, namely the file name, file size, and title.  2.2 introduced new
internal data such as the last modified date, and document summaries.
This data is considered I<meta data> since it is data about a document.

Instead of adding new data to the internal structure of the index file,
it was decided to use the MetaNames and PropertyNames feature of Swish-e
to store this meta information.  This allows for new meta data to be added
at a later time (e.g. Content-type), and provides an easy and customizable
way to print results with the C<-p> switch and the new C<-x> switch.
In addition, search results can now be sorted and limited by properties.

For example, to sort by the rank and title:

    swish-e -w foo -s swishrank desc swishtitle asc


=item * The header display has been slightly reorganized.

If you are parsing output headers in a program then you may need to
adjust your code.  There's a new switch '-H' to control the level of
header output when searching.

=item * Results are now combined when searching more than one index.

Swish-e now merges (and sorts) the results from multiple indexes when
using C<-f> to specify more than one index.  This change effects the way
maxhits (C<-m>) works.  Here's a summary of the way it works for the
different versions.


    1.3.2 - MaxHits returns first N results starting from the first index.
            e.g. maxhits=20; 15 hits Index1, 40 hits Index2
            All 15 from Index1 plus first five from Index2 = 20 hits.

    2.0.0 - MaxHits returns first N results from each index.
            e.g. Maxhits=20; 15 hits Index1, 40 hits Index2
            All 15 from Index1 plus 15 from Index2.

    2.2.0 - Results are merged and first N results are returned.
            e.g. Maxhits=20; 15 hits Index1, 40 hits Index2
            Results are merged from each index and sorted
            (rank is the default sort) and only the first
            20 are returned.


=item * New B<prog> document source indexing method

You can now use -S prog to use an external program to supply documents
to Swish-e.  This external program can be used to spider web servers,
index databases, or to convert any type of document into html, xml,
or text, so it can be indexed by Swish-e.  Examples are given in the
C<prog-bin> directory.

=item * The indexing parser was rewritten to be more logical.

TranslateCharacters now is done before WordCharacters is checked.  For example,

    WordCharacters abcdefghijklmnopqrstuvwxyz
    TranslateCharacters ñ n

Now C<El Niño> will be indexed as El Nino (el and nino), even though C<ñ>
is not listed in WordCharacters.

Previously, stopwords were checked after stemming and soundex conversions,
as well as most of the other word checks (WordCharacters, min/max length
and so on).  This meant that the stopword list probably didn't work as
expected when using stemming.

=item * The search parser was rewritten to be more logical

The search parser was rewritten to correct a number of logic errors.
Swish-e did not differentiate between meta names, Swish-e operators
and search words when parsing the query.  This meant, for example,
that metanames might be broken up by the WordCharacters setting, and
that they could be stemmed.

Swish-e operator characters C<"*()=> can now be searched by escaping
with a backslash.  For example:

    ./swish-e -w 'this\=odd\)word'

will end up searching for the word C<this=odd)word>.  To search for a
backslash character preceed it with a backslash.

Currently, searching for:

    ./swish-e -w 'this\*'

is the same as a wildcard search.  This may be fixed in the future.    

Searching for buzzwords with those characters will still require
backslashing.  This also may change to allow some un-escaped operator
characters, but some will always need to be escaped (e.g. the double-quote
phrase character).

=item * Quotes and Backslash escapes in strings

A bug was fixed in the C<parse_line()> function (in F<string.c>) where
backslashes were not escaping the next character.  C<parse_line()> is used
to parse a string of text into tokens (words).  Normally splitting is done
at whitespace.  You may use quotes (single or double) to define a string
(that might include whitespace) as a single parameter.  The backslash
can also be used to escape the following character when *within* quotes
(e.g. to escape an embedded quote character).

    ReplaceRules append "foo bar"   <- define "foo bar" as a single word
    ReplaceRules append "foo\"bar"  <- escape the quotes
    ReplaceRules append 'foo"bar'   <- same thing


=item * Example C<user.config> file removed.

Previous versions of Swish-e included a configuration file called
C<user.config> which contained examples of all directives.  This has
been replaced by a series of example configuration files located in the
C<conf> directory.  The configuration directives are now described in
L<SWISH-CONFIG|SWISH-CONFIG>.

=item * Ports to Win32 and VMS

David Norris has included the files required to build Swish-e under
Windows.  See C<src/win32>.  A self-extracting Windows version is
available from the Download page of the swish-e.org web site.

Jean-François Piéronne has provided the files required to build Swish-e
under OpenVMS.  See C<src/vms> for more information.

=item * String properties are concatenated

Multiple I<string> properties of the same name in a document are now
concatenated into one property.  A space character is added between
the strings if needed.  A warning will be generated if multiple numeric
or date properties are found in the same document, and the additional
properties will be ignored.

Previously, properties of the same name were added to the index, but
could not be retrieved.

To do: remove the C<next> pointer, and allow user-defined character to
place between properties.

=item * regex type added to ReplaceRules

A more general purpose pattern replacement syntax.


=item * New Parsers

Swish-e's XML parser was replaced with James Clark's expat XML parser
library.

Swish-e can now use Daniel Veillard's libxml2 library for parsing HTML and
XML.  This requires installation of the library before building Swish-e.
See the L<INSTALL|INSTALL> document for information.  libxml2 is not
required, but is strongly recommended for parsing HTML over Swish-e's
internal HTML parser, and provides more features for both HTML and
XML parsing.

=item * Support for zlib

Swish-e can be compiled with zlib.  This is useful for compressing large
properties.  Building Swish-e with zlib is stronly recommended if you
use its C<StoreDescription> feature.

=item * LST type of document no longer supported

LST allowed indexing of files that contained multiple documents.

=item * Temporary files

To improve security Swish-e now uses the C<mkstemp(3)> function to
create temporary files.  Temporary files are used while indexing only.
This may result in some portability issues, but the security issues
were overriding.

(Currently this does not apply to the -S http indexing method.)

C<mkstemp> opens the temporary with O_EXCL|O_CREAT flags.  This prevents
overwriting existing files.  In addition, the name of the file created
is a lot harder to guess by attackers.  The temporary file is created
with only owner permissions.

Please report any portability issues on the Swish-e discussion list.

=item * Temporary file locations

Swish-e now uses the environment variables C<TMPDIR>, C<TMP>, and
C<TEMP> (in that order) to decide where to write temporary files.
The configuration setting of L<TmpDir|SWISH-CONFIG/"item_TmpDir"> will
be used if none of the environment variables are set.  Swish-e uses the
current directory otherwise; there is no default temporary directory.

Since the environment variables override the configuration settings,
a warning will be issued if you set L<TmpDir|SWISH-CONFIG/"item_TmpDir">
in the configuration file and there's also an environment variable set.

Temporary files begin with the letters "swtmp" (which can be changed in
F<config.h>), followed by two or more letters that indicate the type of
temporary file, and some random characters to complete the file name.
If indexing is aborted for some reason you may find these temporary
files left behind.

=item * New Fuzzy indexing method Double Metaphone

Based on Lawrence Philips' Metaphone algorithm, add two
new methods of creating a fuzzy index (in addition to Stemming and Soundex).


=back

Changes to Configuration File Directives.  Please see
L<SWISH-CONFIG|SWISH-CONFIG> for more info.

=over 4

=item * New directives: IndexContents and DefaultContents

The IndexContents directive assigns internal Swish-e document parsers
to files based on their file type.  The DefaultContents directive
assigns a parser to be used on file that are not assigned a parser with
IndexContents.

=item * New directive: UndefinedMetaTags [error|ignore|index|auto]

This describes what to do when a meta tag is found in a document that
is not listed in the MetaNames directive.

=item * New directive: IgnoreTags

Will ignore text with the listed tags.

=item * New directive: SwishProgParameters *list of words*

Passes words listed to the external Swish-e program when running with
C<-S prog> document source method.

=item * New directive: ConvertHTMLEntities [yes|no]

Controls parsing and conversion of HTML entities.

=item * New directive: DontBumpPositionOnMetaTags

The word position is now bumped when a new metatag is found -- this is
to prevent phrases from matching across meta tags.  This directive will
disable this behavior for the listed tags.

This directive works for HTML and XML documents.

=item * Changed directive: IndexComments

This has been changed such that comments are not indexed by default.

=item * Changed directive: IgnoreWords

The builtin list of stopwords has been removed. Use of the SwishDefault
word will generate a warning, and no stop words will be used.  You must
now specify a list of stopwords, or specify a file of stopwords.

A sample file C<stopwords.txt> has been included in the F<conf/stopwords>
directory of the distribution, and can be used by the directive:

    IgnoreWords File: /path/to/stopwords.txt

=item * Change of the default for IgnoreTotalWordCountWhenRanking

The default is now "yes".

=item * New directive: Buzzwords

Buzzwords are words that should be indexed as-is, without checking
for stopwords, word length, WordCharacters, or any other of the word
limiting features.  This allows indexing of things like C<C++> when "+"
is not listed in WordCharacters.

Currenly, IgnoreFirstChar and IgnoreLastChar will be stripped before
processing Buzzwords.

In the future we may use separate IgnoreFirst/Last settings for buzzwords
since, for example, you may wish to index all C<+> within Swish-e words,
but strip C<+> from the start/end of Swish-e words, but not from the
buzzword C<C++>.

=item * New directives: PropertyNamesNumeric PropertyNamesDate

Before Swish-e 2.2 all user-defined document properties were stored in
the index as strings.  PropertyNamesNumeric and PropertyNamesDate tell
it that a property should be stored in binary format.  This allows
for correct sorting of numeric properties.

Currenly, only integers can be stored, such as a unix timestamp.  (Swish-e
uses C<strtoul> to convert the number to an unsigned long internally.)

PropertyNamesDate only indicates to Swish-e that a number is a unix
timestamp, and to display the property as a formatted time when printing
results.  Swish does not currently parse date strings; you must provide
a unix timestamp.

=item * New directive: MetaNameAlias

You may now create alias names for MetaNames.  This allow you to map or
group multiple names to the same MetaName.

=item * New directive: PropertyNameAlias

Creates aliases for a PropertyName.

=item * New directive: PropertyNamesMaxLength

Sets the max length of a text property.

=item * New directive: HTMLLinksMetaName

Defines a metaname to use for indexing href links in HTML documents.
Available only with libxml2 parser.

=item * New directive: ImageLinksMetaName

Defines a metaname to use for indexing src links in E<lt>imgE<gt> tags.
Allow you to search image pathnames within HTML pages.  Available only
with libxml2 parser.

=item * New directive: IndexAltTagMetaName

Allows indexing of image ALT tags.  Only available when using the libxml2 parser.

=item * New directive: AbsoluteLinks

Attempts to convert relative links indexed with HTMLLinksMetaName and
ImageLinksMetaName to absolute links.  Available only with libxml2 parser.

=item * New directive: ExtractPath

Allows you to use a regular expression to extract out part of the path
of each file and index it with a meta name.  For example, this allows
searches to be limited to parts of your file tree.

=item * New directive: FileMatch

FileMatch is similar to FileRules.  Where FileRules is used to exclude
files and directoires, FileMatch is used to I<include> files.

=item * New directive: PreSortedIndex

Controls which properties are pre-sorted while indexing.  All properties
are sorted by default.

=item * New directive: ParserWarnLevel

Sets the level of warning printed when using libxml2.

=item * New directive: obeyRobotsNoIndex [yes|NO]

When using libxml2 to parse HTML, Swish-e will skip files marked as
NOINDEX.

    <meta name="robots" content="noindex">

Also, comments may be used within HTML and XML source docs to block sections of
content from indexing:

       <!-- SwishCommand noindex -->
       <!-- SwishCommand index -->

and/or these may be used also:

       <!-- noindex -->
       <!-- index -->


=item * New directive: UndefinedXMLAttributes

This describes how the content of XML attributes should be indexed,
if at all.  This is similar to UndefinedMetaTags, but is only for XML
attributes and when parsed by libxml2.  The default is to not index
XML attributes.

=item * New directive: XMLClassAttributes

XMLClassAttributes can specify a list of attribute names whose content
is combined with the element name to form metanames.

=item * New directive: PropCompressionLevel [0-9]

If compiled with zlib, Swish-e uses this setting to control the level
of compression applied to properties.  Properties must be long enough
(defined in config.h) to be compressed.  Useful for StoreDescription.

=item * Experimental directive: IgnoreNumberChars

Defines a set of characters.  If a word is made of of *only* those
characters the word will not be indexed.

=item * New directive: FuzzyIndexingMode

This configuration directive is used to define the type of "fuzzy" index to create.
Currently the options are:

    None
    Stemming
    Soundex
    Metaphone
    DoubleMetaphone



=back

Changes to command line arguments.  See L<SWISH-RUN|SWISH-RUN> for
documentation on these switches.

=over 4

=item * New command line argument C<-H>

Controls the level (verbosity) of header information printed with
search results.

=item * New command line argument C<-x>

Provides additional header output and allows for a I<format string>
to describe what data to print.

=item * New command line argument C<-k>

Prints words stored in the Swish-e index.

=item * New command line argument C<-N>

Provides a way to do incremental indexing by comparing last modification
dates.  You pass C<-N> a path to a file and only files newer than the
last modified date of that file will be indexed.

=item * Removed command line argument C<-D>

C<-D> no longer dumps the index file data.  Use C<-T> instead.

=item * New command line argument C<-T>

C<-T> is used for debugging indexing and searching.

=item * Enhanced command line argument C<-d>

Now C<-d> can accept some back-slashed characters to be used as output
separators.

=item * Enhanced command line argument C<-P>

Now -P sets the phrase delimiter character in searches.

=item * New command line argument C<-L>

Swish-e 2.2 contains an B<experimental> feature to limit results by a
range of property values.  This behavior of this feature may change in
the future.

=item * Modified command line argument C<-v>

Now the argument C<-v 0> results in *no* output unless there is an error.
This is a bit more handy when indexing with cron.


=back

��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/pod/Makefile.in�����������������������������������������������������������������������0000664�0000771�0001750�00000025173�11166010103�012654� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������# Makefile.in generated by automake 1.9.6 from Makefile.am.
# @configure_input@

# Copyright (C) 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002,
# 2003, 2004, 2005  Free Software Foundation, Inc.
# This Makefile.in is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.

# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY, to the extent permitted by law; without
# even the implied warranty of MERCHANTABILITY or FITNESS FOR A
# PARTICULAR PURPOSE.

@SET_MAKE@

# $id$
#
# Conditionally install the pod docs

# Where docs are installed

srcdir = @srcdir@
top_srcdir = @top_srcdir@
VPATH = @srcdir@
pkgdatadir = $(datadir)/@PACKAGE@
pkglibdir = $(libdir)/@PACKAGE@
pkgincludedir = $(includedir)/@PACKAGE@
top_builddir = ..
am__cd = CDPATH="$${ZSH_VERSION+.}$(PATH_SEPARATOR)" && cd
INSTALL = @INSTALL@
install_sh_DATA = $(install_sh) -c -m 644
install_sh_PROGRAM = $(install_sh) -c
install_sh_SCRIPT = $(install_sh) -c
INSTALL_HEADER = $(INSTALL_DATA)
transform = $(program_transform_name)
NORMAL_INSTALL = :
PRE_INSTALL = :
POST_INSTALL = :
NORMAL_UNINSTALL = :
PRE_UNINSTALL = :
POST_UNINSTALL = :
build_triplet = @build@
host_triplet = @host@
subdir = pod
DIST_COMMON = $(srcdir)/Makefile.am $(srcdir)/Makefile.in
ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
am__aclocal_m4_deps = $(top_srcdir)/config/acinclude.m4 \
	$(top_srcdir)/configure.in
am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \
	$(ACLOCAL_M4)
mkinstalldirs = $(SHELL) $(top_srcdir)/config/mkinstalldirs
CONFIG_HEADER = $(top_builddir)/src/acconfig.h
CONFIG_CLEAN_FILES =
SOURCES =
DIST_SOURCES =
am__vpath_adj_setup = srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`;
am__vpath_adj = case $$p in \
    $(srcdir)/*) f=`echo "$$p" | sed "s|^$$srcdirstrip/||"`;; \
    *) f=$$p;; \
  esac;
am__strip_dir = `echo $$p | sed -e 's|^.*/||'`;
am__installdirs = "$(DESTDIR)$(poddir)"
podDATA_INSTALL = $(INSTALL_DATA)
DATA = $(pod_DATA)
DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST)
ACLOCAL = @ACLOCAL@
ALLOCA = @ALLOCA@
AMDEP_FALSE = @AMDEP_FALSE@
AMDEP_TRUE = @AMDEP_TRUE@
AMTAR = @AMTAR@
AR = @AR@
AS = @AS@
AUTOCONF = @AUTOCONF@
AUTOHEADER = @AUTOHEADER@
AUTOMAKE = @AUTOMAKE@
AWK = @AWK@
BTREE_OBJS = @BTREE_OBJS@
BUILDDOCS_FALSE = @BUILDDOCS_FALSE@
BUILDDOCS_TRUE = @BUILDDOCS_TRUE@
CC = @CC@
CCDEPMODE = @CCDEPMODE@
CFLAGS = @CFLAGS@
CPP = @CPP@
CPPFLAGS = @CPPFLAGS@
CXX = @CXX@
CXXCPP = @CXXCPP@
CXXDEPMODE = @CXXDEPMODE@
CXXFLAGS = @CXXFLAGS@
CYGPATH_W = @CYGPATH_W@
DEFS = @DEFS@
DEPDIR = @DEPDIR@
DLLTOOL = @DLLTOOL@
ECHO = @ECHO@
ECHO_C = @ECHO_C@
ECHO_N = @ECHO_N@
ECHO_T = @ECHO_T@
EGREP = @EGREP@
EXEEXT = @EXEEXT@
F77 = @F77@
FFLAGS = @FFLAGS@
INSTALLDOCS_FALSE = @INSTALLDOCS_FALSE@
INSTALLDOCS_TRUE = @INSTALLDOCS_TRUE@
INSTALL_DATA = @INSTALL_DATA@
INSTALL_PROGRAM = @INSTALL_PROGRAM@
INSTALL_SCRIPT = @INSTALL_SCRIPT@
INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@
LARGEFILES_MACROS = @LARGEFILES_MACROS@
LDFLAGS = @LDFLAGS@
LIBOBJS = @LIBOBJS@
LIBS = @LIBS@
LIBTOOL = @LIBTOOL@
LIBXML2_CFLAGS = @LIBXML2_CFLAGS@
LIBXML2_LIB = @LIBXML2_LIB@
LIBXML2_OBJS = @LIBXML2_OBJS@
LIBXML_REQUIRED_VERSION = @LIBXML_REQUIRED_VERSION@
LN_S = @LN_S@
LTLIBOBJS = @LTLIBOBJS@
MAINT = @MAINT@
MAINTAINER_MODE_FALSE = @MAINTAINER_MODE_FALSE@
MAINTAINER_MODE_TRUE = @MAINTAINER_MODE_TRUE@
MAKEINFO = @MAKEINFO@
OBJDUMP = @OBJDUMP@
OBJEXT = @OBJEXT@
PACKAGE = @PACKAGE@
PACKAGE_BUGREPORT = @PACKAGE_BUGREPORT@
PACKAGE_NAME = @PACKAGE_NAME@
PACKAGE_STRING = @PACKAGE_STRING@
PACKAGE_TARNAME = @PACKAGE_TARNAME@
PACKAGE_VERSION = @PACKAGE_VERSION@
PATH_SEPARATOR = @PATH_SEPARATOR@
PCRE_CFLAGS = @PCRE_CFLAGS@
PCRE_CONFIG = @PCRE_CONFIG@
PCRE_LIBS = @PCRE_LIBS@
PCRE_REQUIRED_VERSION = @PCRE_REQUIRED_VERSION@
PERL = @PERL@
POD2MAN = @POD2MAN@
RANLIB = @RANLIB@
SET_MAKE = @SET_MAKE@
SHELL = @SHELL@
STRIP = @STRIP@
SWISH_WEB = @SWISH_WEB@
VERSION = @VERSION@
XML2_CONFIG = @XML2_CONFIG@
Z_CFLAGS = @Z_CFLAGS@
Z_LIBS = @Z_LIBS@
ac_ct_AR = @ac_ct_AR@
ac_ct_AS = @ac_ct_AS@
ac_ct_CC = @ac_ct_CC@
ac_ct_CXX = @ac_ct_CXX@
ac_ct_DLLTOOL = @ac_ct_DLLTOOL@
ac_ct_F77 = @ac_ct_F77@
ac_ct_OBJDUMP = @ac_ct_OBJDUMP@
ac_ct_RANLIB = @ac_ct_RANLIB@
ac_ct_STRIP = @ac_ct_STRIP@
am__fastdepCC_FALSE = @am__fastdepCC_FALSE@
am__fastdepCC_TRUE = @am__fastdepCC_TRUE@
am__fastdepCXX_FALSE = @am__fastdepCXX_FALSE@
am__fastdepCXX_TRUE = @am__fastdepCXX_TRUE@
am__include = @am__include@
am__leading_dot = @am__leading_dot@
am__quote = @am__quote@
am__tar = @am__tar@
am__untar = @am__untar@
bindir = @bindir@
build = @build@
build_alias = @build_alias@
build_cpu = @build_cpu@
build_os = @build_os@
build_vendor = @build_vendor@
datadir = @datadir@
exec_prefix = @exec_prefix@
host = @host@
host_alias = @host_alias@
host_cpu = @host_cpu@
host_os = @host_os@
host_vendor = @host_vendor@
includedir = @includedir@
infodir = @infodir@
install_sh = @install_sh@
libdir = @libdir@
libexecdir = @libexecdir@
localstatedir = @localstatedir@
mandir = @mandir@
mkdir_p = @mkdir_p@
oldincludedir = @oldincludedir@
prefix = @prefix@
program_transform_name = @program_transform_name@
sbindir = @sbindir@
sharedstatedir = @sharedstatedir@
sysconfdir = @sysconfdir@
target_alias = @target_alias@
poddir = $(datadir)/doc/$(PACKAGE)/pod

#if BUILDDOCS
pod_DATA = \
    $(pod_files)


#endif
pod_files = \
    CHANGES.pod \
    INSTALL.pod \
    README.pod \
    SWISH-3.0.pod \
    SWISH-BUGS.pod \
    SWISH-CONFIG.pod \
    swish-e.pod \
    SWISH-FAQ.pod \
    SWISH-LIBRARY.pod \
    SWISH-RUN.pod \
    SWISH-SEARCH.pod

EXTRA_DIST = \
    $(pod_files)

all: all-am

.SUFFIXES:
$(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am  $(am__configure_deps)
	@for dep in $?; do \
	  case '$(am__configure_deps)' in \
	    *$$dep*) \
	      cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh \
		&& exit 0; \
	      exit 1;; \
	  esac; \
	done; \
	echo ' cd $(top_srcdir) && $(AUTOMAKE) --foreign  pod/Makefile'; \
	cd $(top_srcdir) && \
	  $(AUTOMAKE) --foreign  pod/Makefile
.PRECIOUS: Makefile
Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status
	@case '$?' in \
	  *config.status*) \
	    cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh;; \
	  *) \
	    echo ' cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe)'; \
	    cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe);; \
	esac;

$(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES)
	cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh

$(top_srcdir)/configure: @MAINTAINER_MODE_TRUE@ $(am__configure_deps)
	cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh
$(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps)
	cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh

mostlyclean-libtool:
	-rm -f *.lo

clean-libtool:
	-rm -rf .libs _libs

distclean-libtool:
	-rm -f libtool
uninstall-info-am:
install-podDATA: $(pod_DATA)
	@$(NORMAL_INSTALL)
	test -z "$(poddir)" || $(mkdir_p) "$(DESTDIR)$(poddir)"
	@list='$(pod_DATA)'; for p in $$list; do \
	  if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \
	  f=$(am__strip_dir) \
	  echo " $(podDATA_INSTALL) '$$d$$p' '$(DESTDIR)$(poddir)/$$f'"; \
	  $(podDATA_INSTALL) "$$d$$p" "$(DESTDIR)$(poddir)/$$f"; \
	done

uninstall-podDATA:
	@$(NORMAL_UNINSTALL)
	@list='$(pod_DATA)'; for p in $$list; do \
	  f=$(am__strip_dir) \
	  echo " rm -f '$(DESTDIR)$(poddir)/$$f'"; \
	  rm -f "$(DESTDIR)$(poddir)/$$f"; \
	done
tags: TAGS
TAGS:

ctags: CTAGS
CTAGS:


distdir: $(DISTFILES)
	@srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; \
	topsrcdirstrip=`echo "$(top_srcdir)" | sed 's|.|.|g'`; \
	list='$(DISTFILES)'; for file in $$list; do \
	  case $$file in \
	    $(srcdir)/*) file=`echo "$$file" | sed "s|^$$srcdirstrip/||"`;; \
	    $(top_srcdir)/*) file=`echo "$$file" | sed "s|^$$topsrcdirstrip/|$(top_builddir)/|"`;; \
	  esac; \
	  if test -f $$file || test -d $$file; then d=.; else d=$(srcdir); fi; \
	  dir=`echo "$$file" | sed -e 's,/[^/]*$$,,'`; \
	  if test "$$dir" != "$$file" && test "$$dir" != "."; then \
	    dir="/$$dir"; \
	    $(mkdir_p) "$(distdir)$$dir"; \
	  else \
	    dir=''; \
	  fi; \
	  if test -d $$d/$$file; then \
	    if test -d $(srcdir)/$$file && test $$d != $(srcdir); then \
	      cp -pR $(srcdir)/$$file $(distdir)$$dir || exit 1; \
	    fi; \
	    cp -pR $$d/$$file $(distdir)$$dir || exit 1; \
	  else \
	    test -f $(distdir)/$$file \
	    || cp -p $$d/$$file $(distdir)/$$file \
	    || exit 1; \
	  fi; \
	done
check-am: all-am
check: check-am
all-am: Makefile $(DATA)
installdirs:
	for dir in "$(DESTDIR)$(poddir)"; do \
	  test -z "$$dir" || $(mkdir_p) "$$dir"; \
	done
install: install-am
install-exec: install-exec-am
install-data: install-data-am
uninstall: uninstall-am

install-am: all-am
	@$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am

installcheck: installcheck-am
install-strip:
	$(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \
	  install_sh_PROGRAM="$(INSTALL_STRIP_PROGRAM)" INSTALL_STRIP_FLAG=-s \
	  `test -z '$(STRIP)' || \
	    echo "INSTALL_PROGRAM_ENV=STRIPPROG='$(STRIP)'"` install
mostlyclean-generic:

clean-generic:

distclean-generic:
	-test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES)

maintainer-clean-generic:
	@echo "This command is intended for maintainers to use"
	@echo "it deletes files that may require special tools to rebuild."
clean: clean-am

clean-am: clean-generic clean-libtool mostlyclean-am

distclean: distclean-am
	-rm -f Makefile
distclean-am: clean-am distclean-generic distclean-libtool

dvi: dvi-am

dvi-am:

html: html-am

info: info-am

info-am:

install-data-am: install-podDATA

install-exec-am:

install-info: install-info-am

install-man:

installcheck-am:

maintainer-clean: maintainer-clean-am
	-rm -f Makefile
maintainer-clean-am: distclean-am maintainer-clean-generic

mostlyclean: mostlyclean-am

mostlyclean-am: mostlyclean-generic mostlyclean-libtool

pdf: pdf-am

pdf-am:

ps: ps-am

ps-am:

uninstall-am: uninstall-info-am uninstall-podDATA

.PHONY: all all-am check check-am clean clean-generic clean-libtool \
	distclean distclean-generic distclean-libtool distdir dvi \
	dvi-am html html-am info info-am install install-am \
	install-data install-data-am install-exec install-exec-am \
	install-info install-info-am install-man install-podDATA \
	install-strip installcheck installcheck-am installdirs \
	maintainer-clean maintainer-clean-generic mostlyclean \
	mostlyclean-generic mostlyclean-libtool pdf pdf-am ps ps-am \
	uninstall uninstall-am uninstall-info-am uninstall-podDATA

# Tell versions [3.59,3.63) of GNU make to not export all variables.
# Otherwise a system limit (for SysV at least) may be exceeded.
.NOEXPORT:
�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/pod/swish-e.pod�����������������������������������������������������������������������0000664�0000771�0001750�00000004300�11166010103�012657� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������=head1 NAME

Swish-e - A Search Engine

=head1 SYNOPSIS

    swish [-e] [-i dir file ... ] [-S system] [-c file] [-f file] [-l] [-v (num)]
    swish -w word1 word2 ... [-f file1 file2 ...] \
          [-P phrase_delimiter] [-p prop1 ...] [-s sortprop1 [asc|desc] ...] \
          [-m num] [-t str] [-d delim] [-H (num)] [-x output_format]
    swish -k (char|*) [-f file1 file2 ...]
    swish -M index1 index2 ... outputfile
    swish -N /path/to/compare/file
    swish -V

See the the SWISH-RUN(1) man page for details on run-time options.

=head1 DESCRIPTION

Swish-e is Simple Web Indexing System for Humans - Enhanced. Swish-e can
quickly and easily index directories of files or remote web sites and
search the generated indexes.

Swish-e is extremely fast in both indexing and searching, highly
configurable, and can be seamlessly integrated with existing web sites
to maintain a consistent design. Swish-e can index web pages, but can
just as easily index text files, mailing list archives, or data stored
in a relational database.

Swish is designed to index small to medium sized collection of
documents, Although a few users are indexing over a million documents,
typical usage is more often in the tens of thousands. Currently, Swish-e
only indexes eight bit character encodings.

=head1 DOCUMENTATION


Documentation is provided as HTML pages installed in
$prefix/share/doc/swish-e where $prefix is /usr/local if building from
source, or /usr if installed as part of a package from your OS vendor.
Under Windows $prefix is selected at installation time.

Documentation is also available on-line at http://swish-e.org.

A subset of the documentation is installed as system man pages as well.
The following man pages should be installed:

=over 4

=item swish-e(1)

This man page.

=item SWISH-CONFIG(1)

Defines options that can be used in a configuration file.

=item SWISH-RUN(1)

Describes the run-time options and switches.

=item SWISH-FAQ(1)

Answers to commonly asked questions.

=item SWISH-LIBRARY(1)

API for the Swish-e search library.  Applications can link against this
library.

=back

=head1 SUPPORT

Support for Swish-e is provide via the Swish-e discussion list.  See
http://swish-e.org for information.









��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/pod/INSTALL.pod�����������������������������������������������������������������������0000664�0000771�0001750�00000140432�11166010103�012415� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������=head1 NAME

INSTALL - Swish-e Installation Instructions

=head1 OVERVIEW

This document describes how to download, build,
and install Swish-e from source.
Also found below is a basic overview of using Swish-e to index documents,
with pointers to other, more advanced examples.

This document also provides instructions
on how to get help installing and using Swish-e
(and the important information you should provide when asking for help).
Please read these instructions B<before requesting help>
on the Swish-e discussion list.
See L<"QUESTIONS AND TROUBLESHOOTING">.

Although building from source is recommended,
some OS distributions (e.g., Debian) provide pre-compiled binaries.
Check with your distribution for available packages.
Build from source,
if your distribution does not offer the current version of Swish-e.

Also, please read the Swish-e FAQ (L<SWISH-FAQ|SWISH-FAQ>),
as it answers many frequently-asked questions.

Swish-e knows how to index HTML, XML, and plain text documents.
Helper applications and other tools are used to convert documents
such as PDF or MS Word into a format that Swish-e can index.
These additional applications and tools (listed below)
must be installed separately.
The process of converting documents is called "filtering".

NOTE: Swish-e version 4.2.0 installs a lot more files
when running "make install".
Be aware that the Swish-e documentation may thus include errors
about where files are located.
Please notify the Swish-e discussion list of any documentation errors.

=head2 Upgrading from previous versions of Swish-e

If you are upgrading from a previous version of Swish-e,
read the L<CHANGES|CHANGES> page first.
The Swish-e index format may have changed
and existing indexes may not work with the newer version of Swish-e.

If you have existing indexes,
you may need to re-index your data
before running the "make install" step described below.
Swish-e may be run from the build directory after compiling,
but before installation.

=head2 Windows Users

A Windows binary version is available
as a separate download from the Swish-e site (http://swish-e.org).
Many of the installation instructions below will not apply to Windows users;
the Windows version is pre-compiled
and includes F<libxml2>, F<zlib>, F<xpdf>, and F<catdoc>.

A number of Perl modules may also be needed.
These can be installed with ActiveState's PPM utility.

   libwww-perl   - the LWP modules (for spidering)
   HTML-Tagset   - used by web spider
   HTML-Parser   - used by web spider
   MIME-Types    - used for filtering documents when not spidering
   HTML-Template - formatting output from swish.cgi (optional)
   HTML-FillInForm (if HTML-Template is used)

=head2 Building from CVS

Please refer to the F<README.cvs> file found in the documentation directory
F<$prefix/share/doc/swish-e>.


=head1 SYSTEM REQUIREMENTS

Swish-e makes use of a number of libraries and tools
that are not distributed with Swish-e.
Some libraries need to be installed before building Swish-e from source;
other tools can be installed at any time.
See below for details.

=head2 Software Requirements

Swish-e is written in C.
It has been tested on a number of platforms,
including Sun/Solaris, Dec Alpha, BSD, Linux, Mac OS X, and Open VMS.

The GNU C compiler (gcc) and GNU make are strongly recommended.
Repeat: you will find life easier if you use the GNU tools.

=head2 Optional but Recommended Packages

Most of the packages listed below are available
as easily installable packages.
Check with your operating system vendor or install them from source.
Most are very common packages that may already be installed on your computer.

As noted below,
some packages need to be installed before building Swish-e from source,
while others may be added after Swish-e is installed.

=over 4

=item * Libxml2

F<libxml2> is very strongly recommended.
It is used for parsing both HTML and XML files.
Swish-e can be built and installed without F<libxml2>,
but the HTML parser that is built into Swish-e
is not as accurate as F<libxml2>.

    http://xmlsoft.org/

F<libxml2> must be installed before Swish-e is built,
or it will not be used.

If F<libxml2> is installed in a non-standard location
(e.g., F<libxml2> is built with C<--prefix $HOME/local>),
make sure that you add the C<bin> directory to your C<$PATH>
before building Swish-e.
Swish-e's configure script uses a program
created by F<libxml2> (C<xml2-config>)
to find the location of F<libxml2>.
Use C<which xml2-config> to verify
that the program can be found where expected.

=item * Zlib Compression

The F<Zlib> compression library is commonly installed on most systems
and is recommended for use with Swish-e.
F<Zlib> is used for compressing text stored in the Swish-e index.

    http://www.gzip.org/zlib/

F<Zlib> must be installed before building Swish-e.

=item * Perl Modules

Although Swish-e is a compiled C program,
many support features use Perl.
For example, both the web spiders
and modules to help with filtering documents are written in Perl.

The following Perl modules may be required.
Check your current Perl installation,
as many may already be installed.

    LWP
    URI
    HTML::Parser
    HTML::Tagset
    MIME::Types (optional)

Note that installing C<Bundle::LWP> with the CPAN module

    perl -MCPAN -e 'install Bundle::LWP'

will install many of the above modules.

If you wish to use C<HTML-Template> with swish.cgi to generate output,
install:


    HTML::Template
    HTML::FillInForm

If you wish to use C<Template-Toolkit> with C<swish.cgi>
to generate output, install:

    Template

Questions about installing these modules
may be sent to the Swish-e discussion list.

The C<search.cgi> example script
requires both C<Template-Toolkit> and C<HTML::FillInForm>.

=item * Indexing PDF Documents

Indexing PDF files requires the C<xpdf> package.
This is a common package,
available with most operating systems
and often provided as an add-on package.

    http://www.foolabs.com/xpdf/

Xpdf may be added after Swish-e is installed.

=item * Indexing MS Word Documents

Indexing MS Word documents requires the F<Catdoc> program.

    http://www.wagner.pp.ru/~vitus/software/catdoc/

F<Catdoc> may be added after Swish-e is installed.

=item * Indexing MP3 ID3 Tags

Indexing MP3 ID3 Tags requires the C<MP3::Tag> Perl module.
See http://search.cpan.org.
C<MP3::Tag> may be installed after Swish-e is installed.

=item * Indexing MS Excel Files

Indexing MS Excel files is supported by the following Perl modules,
also available at http://search.cpan.org.

    Spreadsheet::ParseExcel
    HTML::Entities

These Perl modules may be installed after Swish-e is installed.

=back

=head1 INSTALLATION

Here are brief installation instructions that should work in most cases.
Following this section are more detailed instructions and examples.

=head2 Building Swish-e

Download Swish-e using your favorite web browser
or a utility such as C<wget>, C<lynx>, or C<lwp-download>.
Unpack and build the distribution, using the following steps:

Note: "swish-e-2.4.0" is used as an example.
Download the most current available version
and adjust the commands below!
Also, if you are running Debian,
see the notes below on building a C<.deb> package
from the Swish-e source package.

Pay careful attention to the "prompt" character used
on the following command lines.
A "$" prompt indicates steps run as an unprivileged user.
A "#" indicates steps run as the superuser (root).


    $ wget http://swish-e.org/Download/swish-e-2.4.0.tar.gz
    $ gzip -dc swihs-e-2.4.0.tar.gz | tar xof -
    $ cd swish-e-2.4.0  (this directory will depend on the version of Swish-e)

    $ ./configure
    $ make
    $ make check
    ...
    ==================
    All 3 tests passed
    ==================

    $ su root  (or use sudo)
    (enter password)

    # make install
    # exit
    $ swish-e -V
    SWISH-E 2.4.0

B<IMPORTANT:>
Once Swish-e is installed, do not run it as the superuser (root) --
root is only required during the installation step,
when installing into system directories.
Please do not break this rule.

B<NOTE:>
If you are upgrading from an older version of Swish-e,
be sure and review the L<CHANGES|CHANGES> file.
Old index files may not be compatible with newer versions of Swish-e.
After building Swish-e (but before running "make install"),
Swish-e can be run from the build directory:

    $ src/swish-e -V

To minimize downtime,
create new index files before running "make install",
by using Swish-e from the build directory.
Then, copy the index files to the live location and run "make install":

    $ src/swish-e -c /path/to/config -f index.new

Keep in mind that the location you index from
may affect the paths stored in the index file.

=head2 Installing without root access

Here's another installation example.
This might be used if you do not have root access
or you wish to install Swish-e someplace other than C</usr/local>.

This example also shows building Swish-e in a "build" directory
that is  separate from where the source files are located.
This is the recommended way to build Swish-e,
but it requires GNU Make.
Without GNU Make,
you will likely need to build from within the source directory,
as shown in the previous example.

    $ tar zxof swish-e-2.4.0.tar.gz  (GNU tar with "z" option)
    $ mkdir build
    $ cd build

Note that the current directory is not where Swish-e was unpacked.

Swish-e uses a F<configure> script.
F<configure> has many options,
but it uses reasonable and standard defaults.
Running

    $ ../swish-e-2.4.0/configure --help

will display the options.

Two options are of common interest:
C<--prefix> sets the top-level installation directory;
C<--disable-shared> will link Swish-e statically,
which may be needed on some platforms (Solaris 2.6, perhaps).

Platforms may require varying link instructions
when libraries are installed in non-standard locations.
Swish-e uses the GNU F<autoconf> tools for building the package.
F<autoconf> is good at building and testing,
but still requires you to provide information appropriate for your platform.
This may mean reading the manual page for your compiler and linker
to see how to specify non-standard file locations.

For most Unix-type platforms,
you can use C<LDFLAGS> and C<CPPFLAGS> environment variables
to specify paths to "include" (header) files
and to libraries that are not in standard locations.

In this example, we do not have root access.
We have installed F<libxml2> and F<libz> in C<$HOME/local>.
Swish-e will also be installed in C<$HOME/local>
(by using the C<--prefix> setting).

In this case,
you would need to add C<$HOME/local/bin>
to the start of your shell's C<$PATH> setting.
This is required because F<libxml2> installs a program
that is used when running the configure script.
Before running configure, type:

    $ which xml2-config

It should list C<$HOME/local/bin/xml2-config>.

Now run F<configure> (remember, we are in a separate "build" directory):

    $ ../swish-e-2.4.0/configure \
        --prefix=$HOME/local \
        CPPFLAGS=-I$HOME/local/include \
        LDFLAGS="-R$HOME/local/lib -L$HOME/local/lib"

    $ make >/dev/null  (redirect output to only see warnings and errors)

    $ make check
    ...
    ==================
    All 3 tests passed
    ==================

    $ make install
    $ $HOME/local/bin/swish-e -V 
    SWISH-E 2.4.0

Note the use of double quotes in the C<LDFLAGS> line above.
This allows C<$HOME> to be expanded within the text string.


=head2 Run-time paths

The C<-R> option says to add a specified path (or paths)
to those that are used to find shared libraries at run time.
These paths are stored in the Swish-e binary.
When Swish-e is run,
it will look in these directories for shared libraries.

Some platforms may not support the C<-R> option.
In this event,
set the C<LD_RUN_PATH> environment variable B<before> running make.

Some systems, such as Redhat,
do not look in C</usr/local/lib> for libraries.
In these cases,
you can either use C<-R>, as above, when building Swish-e
or add C</usr/local/lib> to C</etc/ld.so.conf> and run F<ldconfig> as root.

If all else fails,
you may need to actually read the man pages for your platform.


=head2 Building a Debian Package

The Swish-e distribution includes the files required
to build a Debian package.

    $ tar zxof swish-e-2.4.0.tar.gz  (GNU tar with "z" option)
    $ cd swish-e-2.4.0
    $ fakeroot debian/rules binary
    [lots of output]
    dpkg-deb: building package `swish-e' in `../swish-e_2.4.0-0_i386.deb'.
    $ su
    # dpkg -i ../swish-e_2.4.0-0_i386.deb


=head2 What's installed

Swish installs a number of files.
By default, all files are installed below C</usr/local>,
but this can be changed by setting C<--prefix>
when running F<configure> (as shown above).
Individual paths may also be set.
Run C<configure --help> for details.

   $prefix/bin/swish-e         The Swish-e binary program
   $prefix/share/doc/swish-e/  Full documentation and examples
   $prefix/lib/libswish-e      The Swish-e C library
   $prefix/include/swish-e.h   The library header file
   $prefix/man/man1/           Documentation as manual pages
   $prefix/lib/swish-e/        Helper programs (spider.pl, swishspider, swish.cgi)
   $prefix/lib/swish-e/perl/   Perl helper modules

Note that the Perl modules are I<not> installed in the system Perl library.
Swish-e and the Perl scripts that require the modules
know where to find the modules,
but the F<perldoc> program (used for reading documentation) does not. 
This can be corrected by adding C<$prefix/lib/swish-e>
and C<$prefix/lib/swish-e/perl> to the C<PERL5LIB> environment variable.

=head2 Documentation

Documentation can be found in the C<$prefix/share/doc/swish-e> directory.
Documentation is in html format at C<$prefix/share/doc/swish-e/html> and
can also be read on-line at the Swish-e web site:

    http://swish-e.org/

=head2 The Swish-e documentation as man(1) pages

Running "make install" installs some of the Swish-e documentation as man pages.
The following man pages are installed:

    SWISH-FAQ(1)
    SWISH-CONFIG(1)
    SWISH-RUN(1)
    SWISH-LIBRARY(1)

The man pages are installed, by default, in the system man directory.
This directory is determined when F<configure> is run;
it can be set by passing a directory name to F<configure>.

For example,

    ./configure --mandir=/usr/local/doc/man

The man directory is specified relative to the C<--prefix> setting.
If you use C<--prefix>,
you do not normally need to also specify C<--mandir>.

Information on running F<configure> can be found by typing:

    ./configure --help

=head2 Join the Swish-e discussion list

The final step, when installing Swish-e,
is to join the Swish-e discussion list.

The Swish-e discussion list is the place
to ask questions about installing and using Swish-e,
see or post bug fixes or security announcements,
and offer help to others.
Please do not contact the developers directly.

The list is typically I<very low traffic>,
so it won't overload your inbox.
Please take the time to subscribe.
See http://Swish-e.org.

If you are using Swish-e on a public site,
please let the list know,
so that your URL can be added to the list of sites that use Swish-e!

Please review the next section
before posting questions to the Swish-e list.

=head1 QUESTIONS AND TROUBLESHOOTING

Support for installation, configuration, and usage
is available via the Swish-e discussion list.
Visit http://swish-e.org for information.
Do not contact developers directly for help --
always post your question to the list.

It's very important to provide the right information
when asking for help.

Please search the Swish-e list archive before posting a question.
Also, check the L<SWISH-FAQ|SWISH-FAQ>
to see if your question has already been asked and answered.

Before posting, use the available tools to narrow down the problem.

Swish-e has several switches
(e.g., C<-T>, C<-v>, and C<-k>)
that may help you resolve issues.
These switches are described on the L<SWISH-RUN|SWISH-RUN> page.
For example, if you cannot find a document by a keyword
that you believe should be indexed,
try indexing just that single file
and use the C<-T INDEXED_WORDS> option
to see if the word is actually being indexed.
First, try it without any changes to default settings:

    swish-e -i testdoc.html -T indexed_words | less

if that works, add in your configuration file:

    swish-e -i testdoc.html -c swish.conf -T indexed_words | less

If it still isn't working as you expect,
try to reduce the test document to a very small example.
This will be very helpful to your readers,
when you are asking for help.

Another useful trick is to use C<-H9> when searching,
to display full headers in search results.
Look at the "Parsed Words" header
to see what words Swish-e is searching for.

=head2 When posting, please provide the following information:

Use these guidelines when asking for help.
The most important tip is to provide the B<least> amount of information
that can be used to reproduce your problem.
Do not paraphrase output -- copy-and-paste --
but trim text that is not necessary. 

=over 4

=item *

The exact version of Swish-e that you are using.
Running Swish-e with the C<-V> switch will print the version number.
Also, supply the output from C<uname -a> or similar command
that identifies the operating system you are running on.
If you are running an old version of swish,
be prepared for a response of "upgrade" to your question.

=item *

A summary of the problem.
This should include the commands issued
(e.g. for indexing or searching) and their output,
along with an explanation of why you don't think it's working correctly.
Please copy-and-paste the exact commands and their output,
instead of retyping, to avoid errors.

=item *

Include a copy of the configuration file you are using, if any.
Swish-e has reasonable defaults,
so in many cases you can run it without using a configuration file.
But, if you need to use a configuration file,
B<reduce it down> to the absolute minimum number of commands
that is required to demonstrate your problem.
Again, copy-and-paste.

=item *

A small copy of a source document that demonstrates the problem.

If you are having problems spidering a web server,
use lwp-download or wget to copy the file locally,
then make sure you can index the document using the file system method.
This will help you determine if the problem
is with spidering or indexing.

If you expect help with spidering,
don't post fake URLs, as it makes it impossible to test.
If you don't want to expose your web page to the people on the Swish-e list,
find some other site to test spidering on.
If that works, but you still cannot spider your own site,
you may need to request help from others.
If so, you must post your real URL
or make a test document available via some other source.

=item *

If you are having trouble building Swish-e,
please copy-and-paste the output from make
(or from C<./configure>, if that's where the problem is).


=back

The key is to provide enough information
so that others may reproduce the problem. 

=head1 ADDITIONAL INSTALLATION OPTIONS

These steps are not required for normal use of Swish-e.

=head2 The SWISH::API Perl Module

The Swish-e distribution includes a module
that provides a Perl interface to the Swish-e C library.
This module provides a way to search a Swish-e index
without running the Swish-e program.
Searching an index will be many times faster
when running under a persistent environment
such as Apache/mod_perl with the C<SWISH::API> module.

See the F<perl/README> file for information
on installing and using the C<SWISH::API> Perl module.

=head1 GENERAL CONFIGURATION AND USAGE

This section should give you a basic overview
of indexing and searching with B<Swish-e>.
Other examples can be found in the C<conf> directory;
these will step you through a number of different configurations.
Also, please review the L<SWISH-FAQ|SWISH-FAQ>.

Swish-e is a command-line program.
The program is controlled by passing switches on the command line.
A configuration file may be used,
but often is not required.
Swish-e does not include a graphical user interface.
Example CGI scripts are provided in the distribution,
but they require additional setup to use.

=head2 Introduction to Indexing and Searching

Swish-e can index files that are located on the local file system.
For example, running:

     swish-e -i /var/www/htdocs

will index I<all> files in the C</var/www/htdocs> directory.
You may specify one or more files or directories with the C<-i> option.
By default, this will create an index called C<index.swish-e>
in the current directory.

To search the resulting index for a given word, try:

     swish-e -w apache

This will find the word "apache" in the body or title
of the indexed documents.

As mentioned above,
Swish-e will index all files in a directory,
unless instructed otherwise.
So, if C</var/www/htdocs> contains non-HTML files,
you will need a configuration file to limit the files that Swish-e indexes.
Create a file called C<swish.conf>:

    # Example configuration file

    # Tell Swish-e what to index (same as -i switch above)
    IndexDir /var/www/htdocs

    # Only index HTML and text files
    IndexOnly .htm .html .txt

    # Tell Swish-e that .txt files are to use the text parser.
    IndexContents TXT* .txt

    # Otherwise, use the HTML parser
    DefaultContents HTML*

    # Ask libxml2 to report any parsing errors and warnings or 
    # any UTF-8 to 8859-1 conversion errors
    ParserWarnLevel 9

After saving the configuration file, reindex:

    swish-e -c swish.conf

The Swish-e configuration settings are described
in the L<SWISH-CONFIG|SWISH-CONFIG> manual page.
The order of statements in the configuration file is typically not important,
although some statements depend on previously set statements.
There are many possible settings. 
Good advice is to use as few settings as possible
when first starting out with Swish-e.

The runtime options (switches) are described
in the L<SWISH-RUN|SWISH-RUN> manual page.
You may also see a summary of options by running:

    swish-e -h

Swish-e has two other methods for reading input files.
One method uses a Perl helper script and the LWP Perl library
to spider remote web sites:

    swish-e -S http -i http://localhost/index.html -v2

This will spider the web server running on the local host.
The C<-S> option defines the input source method to be "http",
C<-i> specifies the URL to spider,
and C<-v> sets the verbose level to two.
There are a number of configuration options
that are specific to the C<-S> http input source.
See L<SWISH-CONFIG|SWISH-CONFIG>.
Note that only files of C<Content-Type text/*> will be indexed.

The C<-S http> method is deprecated, however,
in favor of a variation on the following input method.

There is a general-purpose input method
wherein Swish-e reads input from a program 
that produces documents in a special format.
The program might read and format data stored in a database,
or parse and format messages in a mailing list archive,
or run a program that spiders web sites (like the previous method).

The Swish-e distribution includes a spider program
that uses this method of input.
This spider program is much more configurable and feature-rich
than the previous (C<-S http>) method.

To duplicate the previous example,
create a configuration file called C<swish2.conf>:

    # Example for spidering
    # Use the "spider.pl" program included with Swish-e
    IndexDir spider.pl

    # Define what site to index
    SwishProgParameters default http://localhost/index.html

Then, create the index using the command:

    swish-e -S prog -c swish2.conf

This says to use the C<-S prog> input source method.
Note that, in this case,
the C<IndexDir> setting does not specify a file or directory to index,
but a program name to be run.
This program, C<spider.pl>,
does the work of fetching the documents from the web server
and passing them to Swish-e for indexing.

The C<SwishProgParameters> option is a special feature
that allows passing command-line parameters
to the program specified with C<IndexDir>.
In this case, we are passing the word C<default>
(which tells C<spider.pl> to use default settings)
and the URL to spider.

Running a script under Windows requires specifying the interpreter
(e.g., C<perl.exe>)
and then using C<SwishPropParameters>
to specify the script and the script's parameters.
See I<Notes when using C<-S prog> on MS Windows>
on the L<SWISH-RUN|SWISH-RUN> page.

The advantage of the C<-S prog> method of spidering
(over the previous C<-S http> method)
is that the Perl code is only compiled once
instead of once for every document fetched from the web server.
In addition, it is a much more advanced spider with many, many features.
Still, as used here,
C<spider.pl> will automatically index PDF or MS Word documents
if (when) Xpdf and Catdoc are installed.

A special form of the C<-S prog> input source method is:

    ./myprog --option | swish-e -S prog -i stdin -c config

This allows running Swish-e from a program
(instead of running the external program from Swish-e).
So, this also can be done as:

    ./myprog --option > outfile
    swish-e -S prog -i stdin -c config < outfile

or

    ./myprog --option > outfile
    cat outfile | swish-e -S prog -i stdin -c config

One final note about the C<-S prog> input source method.
The program specified with C<-i> or C<IndexDir> needs to be an absolute path.
The exception is when the program is installed in the C<libexecdir> directory.
Then, a plain program name may be specified
(as in the example showing C<spider.pl>, above).

All three input source methods are described in more detail
on the L<SWISH-RUN|SWISH-RUN> page.

=head2 Metanames and Properties

There are two key Swish-e concepts
that you need to be familiar with: 
Metanames and Properties.

=over 4

=item * Metanames

Swish-e creates a reverse (i.e., inverted) index.
Just like an index in a book,
you look up a word and it lists the pages (or documents)
where that word can be found.

Swish-e can create multiple index tables within the same index file.
For example,
you might want to create an index that only contains words in HTML titles,
so that searches can be limited to title text.
Or, you might have descriptive words
that you would like to search,
stored in a meta tag called "keywords".

Some database systems might call these different "fields" or "columns",
but Swish-e calls them I<MetaNames>
(as a result of its first indexing HTML "meta" tags).

To find documents containing "foo" in their titles, you might run:

    swish-e -w swishtitle=foo

or, a more advanced example:

    swish-e -w swishtitle=(foo or bar) or swishdefault=(baz)

The Metaname "swishdefault" is the name that is used by Swish-e
if no other name is specified.
The following two searches are thus equivalent:

    swish-e -w foo
    swish-e -w swishdefault=foo

When indexing HTML documents,
Swish-e indexes words in the body and title 
under the Metaname "swishdefault".

=item * Properties

Swish-e's search result is a list of files --
actually, Swish-e uses file numbers internally.
Data can be associated with each file number when indexing. 
For example, by default Swish-e associates the file's name, title,
last modified date, and size with the file number.
These items can be printed in search results.

In Swish-e, this associated data is called a file's I<Properties>.
Properties can be any data you wish to associated with a document --
in fact, the entire text of the document can be stored in the index. 
What data is stored as a Property is controlled by the I<PropertyNames>
(and other) configuration directives.

What properties are printed with search results
depends on the C<-x> or C<-p> switches.
By default,
Swish-e returns the rank, path/URL, title, and file size in bytes
for each result.

=back

=head2 Getting Started With Swish-e

Swish-e reads a configuration file (see L<SWISH-CONFIG|SWISH-CONFIG>)
for directives that control whether and how Swish-e indexes files.
Swish-e is also controlled by command-line arguments
(see L<SWISH-RUN|SWISH-RUN>).
Many of the command-line arguments
have equivalent configuration directives (e.g., C<-i> and C<IndexDir>).

Swish-e does not require a configuration file,
but most people change its default behavior
by placing settings in a configuration file.

To try the examples below,
go to the C<tests> subdirectory of the distribution.
The tests will use the C<*.html> files in this directory
when creating the test index.
You may wish to review these C<*.html> files
to get an idea of the various native file formats that Swish-e supports.

You may also use your own test documents.
It's recommended to use small test documents when first using Swish-e.

=head2 Step 1: Create a Configuration File

The configuration file controls what and how Swish-e indexes.  The
configuration file consists of directives, comments, and blank lines.
The configuration file can be any name you like.

This example will work with the documents in the F<tests> directory.
You may wish to review the F<tests/test.config> configuration file used
for the C<make test> tests.

For example, a simple configuration file (F<swish-e.conf>):

    # Example Swish-e Configuration file

    # Define *what* to index
    # IndexDir can point to a directories and/or a files
    # Here it's pointing to the current directory
    # Swish-e will also recurse into sub-directories.
    IndexDir .

    # But only index the .html files
    IndexOnly .html

    # Show basic info while indexing
    IndexReport 1

And that's a simple configuration file.
It says to index all the C<.html> files
in the current directory and sub-directories, if any,
and provide some basic output while indexing.

As mentioned above,
the complete list of all configuration file directives
is detailed in L<SWISH-CONFIG|SWISH-CONFIG>.

=head2 Step 2: Index your Files

Run Swish-e,
using the C<-c> switch to specify the name of the configuration file.

    swish-e -c swish-e.conf

    Indexing Data Source: "File-System"
    Indexing "."
    Removing very common words...
    no words removed.
    Writing main index...
    Sorting words ...
    Sorting 55 words alphabetically
    Writing header ...
    Writing index entries ...
      Writing word text: Complete
      Writing word hash: Complete
      Writing word data: Complete
    55 unique words indexed.
    4 properties sorted.                                              
    5 files indexed.  1252 total bytes.  140 total words.
    Elapsed time: 00:00:00 CPU time: 00:00:00
    Indexing done!

This created the index file C<index.swish-e>.
This is the default index file name,
unless the B<IndexFile> directive is specified in the configuration file:

    IndexFile ./website.index

You may use the C<-f> switch to specify a index file at indexing time.
The C<-f> option overrides any C<IndexFile> setting
that may be in the configuration file.

=head2 Step 3: Search

You specify your search terms with the C<-w> switch.
For example, to find the files that contain the word C<sample>,
you would issue the command:

    swish-e -w sample

This example assumes that you are in the C<tests> directory.
Swish-e returns the following, in response to this command:

    swish-e -w sample

    # SWISH format: 2.4.0
    # Search words: sample
    # Number of hits: 2
    # Search time: 0.000 seconds
    # Run time: 0.005 seconds
    1000 ./test_xml.html "If you are seeing this, the METATAG XML search was successful!" 159
    1000 ./test.html "If you are seeing this, the test was successful!" 437
    .

So, the word C<sample> was found in two documents.
The first number shown is the relevance (or rank) of the search term,
followed by the file containing the search term,
the title of the document,
and finally, the length of the document (in bytes).

The period ("."), sitting alone at the end,
marks the end of the search results.

Much more information may be retrieved while searching,
by using the C<-x> and C<-H> switches (see L<SWISH-RUN|SWISH-RUN>)
and by using Document Properties (see L<SWISH-CONFIG|SWISH-CONFIG>).

=head2 Phrase Searching

To search for a phrase in a document,
use double-quotes to delimit your search terms.
(The default phrase delimiter is set in C<src/swish.h>.)

You must protect the quotes from the shell.

For example, under Unix:

    swish-e -w '"this is a phrase" or (this and that)'
    swish-e -w 'meta1=("this is a phrase") or (this and that)'

Or under the Windows C<command.com> shell.

    swish-e -w \"this is a phrase\" or (this and that)

The phrase delimiter can be set with the C<-P> switch.

=head2 Boolean Searching

You can use the Boolean operators B<and>, B<or>, or B<not> in searching.
Without these Boolean operatots,
Swish-e will assume you're B<and>ing the words together.

Here are some examples:

    swish-e -w 'apples oranges'
    swish-e -w 'apples and oranges'  ( Same thing )

    swish-e -w 'apples or oranges'

    swish-e -w 'apples or oranges not juice' -f myIndex 

retrieves first the files that contain both the words "apples" and "oranges";
then among those, selects the ones that do not contain the word "juice".

A few other examples to ponder:

    swish-e -w 'apples and oranges or pears'
    swish-e -w '(apples and oranges) or pears'  ( Same thing )
    swish-e -w 'apples and (oranges or pears)'  ( Not the same thing )

Swish processes the query left to right.

See L<SWISH-SEARCH|SWISH-SEARCH> for more information.


=head2 Context Searching

The C<-t> option in the search command line
allows you to search for words that exist only in specific HTML tags.
This option takes a string of characters as its argument.
Each character represents a different tag in which the word is searched;
that is, you can use any combinations of the following characters:

    H search in all <HEAD> tags
    B search in the <BODY> tags
    t search in <TITLE> tags
    h is <H1> to <H6> (header) tags
    e is emphasized tags (this may be <B>, <I>, <EM>, or <STRONG>)
    c is HTML comment tags (<!-- ... -->)

For example:

    # Find only documents with the word "linux" in the <TITLE> tags.
    swish-e -w linux -t t

    # Find the word "apple" in titles or comments
    swish-e -w apple -t tc


=head2 META Tags

As mentioned above,
Metanames are a way to define "fields" in your documents.
You can use the Metanames in your queries to limit the search
to just the words contained in that META name of your document.
For example,
you might have a META-tagged field called C<subjects> in your documents.
This would let you search your documents for the word "foo",
but only return documents where "foo" is within the C<subjects> META tag.

Document I<Properties> are somewhat related:
Properties allow the content of a META tag in a source document
to be stored within the index,
and that text to be returned along with search results.

META tags can have two formats in your documents.

    <META NAME="keyName" CONTENT="some Content">

And in XML format

    <keyName>
        Some Content
    </keyName>

If using F<libxml>, you can optionally use a non-HTML tag as a metaname:

    <html>
        <body>
            Hello swish users!
            <keyName>
                this is meta data
            </keyName>.
        </body>

This, of course, is invalid HTML.

To continue with our sample C<Swish-e.conf> file,
add the following lines:

    # Define META tags
    MetaNames meta1 meta2 meta3

Reindex to include the changes:

    swish-e -c swish-e.conf

Now search, but this time limit your search to META tag C<meta1>:

    swish-e -w 'meta1=metatest1'

Again, please see L<SWISH-RUN|SWISH-RUN> and L<SWISH-CONFIG|SWISH-CONFIG>
for complete documentation of the various indexing and searching options.

=head2 Spidering and Searching with a Web form.

This example demonstrates how to spider a web site
and set up the included CGI script to provide a web-based search page.
This example uses Perl programs that are included in the Swish-e distribution:
F<spider.pl> will be used for reading files from the web server;
F<swish.cgi> will provide the web search form and display results.

As an example,
we will index the Apache Web Server documentation,
installed on the local computer at http://localhost/apache_docs/index.html.

=over 4

=item 1 Make a Working Directory

Create a directory to store the Swish-e configuration and the Swish-e index.

    ~$ mkdir web_index
    ~$ cd web_index/
    ~/web_index$

=item 2 Create a Swish-e Configuration file

    ~/web_index$ cat swish.conf 
    # Swish-e config to index the Apache documentation
    #
    # Use spider.pl for indexing (location of spider.pl set at installation time)
    IndexDir spider.pl

    # Use spider.pl's default configuration and specify the URL to spider
    SwishProgParameters default http://localhost/apache_docs/index.html

    # Allow extra searching by title, path
    Metanames swishtitle swishdocpath

    # Set StoreDescription for each parser
    #  to display context with search results
    StoreDescription TXT* 10000
    StoreDescription HTML* <body> 10000

=item 3 Generate the Index

Now, run Swish-e to create the index:

    ~/web_index$ swish-e -S prog -c swish.conf 

    Indexing Data Source: "External-Program"
    Indexing "spider.pl"
    /usr/local/lib/swish-e/spider.pl: Reading parameters from 'default'

    Summary for: http://localhost/apache_docs/index.html
        Duplicates:     4,188  (349.0/sec)
    Off-site links:       276  (23.0/sec)
           Skipped:         1  (0.1/sec)
       Total Bytes: 2,090,125  (174177.1/sec)
        Total Docs:       147  (12.2/sec)
       Unique URLs:       149  (12.4/sec)
    Removing very common words...
    no words removed.
    Writing main index...
    Sorting words ...
    Sorting 7736 words alphabetically
    Writing header ...
    Writing index entries ...
      Writing word text: Complete
      Writing word hash: Complete
      Writing word data: Complete
    7736 unique words indexed.
    5 properties sorted.                                              
    147 files indexed.  2090125 total bytes.  200783 total words.
    Elapsed time: 00:00:13 CPU time: 00:00:02
    Indexing done!

The above output is actually a mix of output from both Swish-e
and C<spider.pl>.
C<spider.pl> reports the
"Summary for: http://localhost/apache_docs/index.html".

Also note that Swish-e knows to find C<spider.pl>
at C</usr/local/lib/swish-e/spider.pl>.
The script installation directory (called C<libexecdir>)
is set at configure time.
You can see your setting by running C<swish-e -h>:

    ~/web_index$ swish-e -h | grep libexecdir
     Scripts and Modules at: (libexecdir) = /usr/local/lib/swish-e

This directory will be needed in the next step,
when setting up the CGI script.

Finally, verify that the index can be searched from the command line:

    ~/web_index$ swish-e -w installing -m3
    # SWISH format: 2.4.0
    # Search words: installing
    # Removed stopwords: 
    # Number of hits: 17
    # Search time: 0.018 seconds
    # Run time: 0.050 seconds
    1000 http://localhost/apache_docs/install.html "Compiling and Installing Apache" 17960
    718 http://localhost/apache_docs/install-tpf.html "Installing Apache on TPF" 25734
    680 http://localhost/apache_docs/windows.html "Using Apache with Microsoft Windows" 27165
    .

Now, try limiting the search to the title:

    ~/web_index$ swish-e -w swishtitle=installing -m3 
    # SWISH format: 2.3.5
    # Search words: swishtitle=installing
    # Removed stopwords: 
    # Number of hits: 2
    # Search time: 0.018 seconds
    # Run time: 0.048 seconds
    1000 http://localhost/apache_docs/install-tpf.html "Installing Apache on TPF" 25734
    1000 http://localhost/apache_docs/install.html "Compiling and Installing Apache" 17960
    .

Note that the above can also be done using the C<-t> option:

    ~/web_index$ swish-e -w installing -m3 -tH

=item 4 Set up the CGI script

Swish-e does not include a web server.
So, you must use your locally installed web server.
Apache is highly recommended, of course.

Locate your web server's CGI directory.
This may be a C<cgi-bin> directory in your home directory
or a central C<cgi-bin> directory set up by the web server administrator.
Once this is located,
copy the C<swish.cgi> script into the C<cgi-bin> directory.

Where CGI scripts can be located
depends completely on the web server that is being used
and how it has been configured.
See your web server's documentation or your site's administrator
for additional information.

This example will use a site C<cgi-bin> directory,
located at C</usr/lib/cgi-bin>. 
Copy the C<swish.cgi> script into the C<cgi-bin> directory.
Again, we will need the location of the C<libexecdir> directory:

    ~/web_index$ swish-e -h | grep libexecdir
     Scripts and Modules at: (libexecdir) = /usr/local/lib/swish-e


    ~/web_index$ cd /usr/lib/cgi-bin
    /usr/lib/cgi-bin$ su
    Password: 
    /usr/lib/cgi-bin# cp /usr/local/lib/swish-e/swish.cgi.

If your operating system supports symbolic links
B<and> your web server allows programs to be symbolic links,
then you may wish to create a link to the C<swish.cgi> program, instead.

    /usr/lib/cgi-bin# ln -s /usr/local/lib/swish-e/swish.cgi

We need to tell the C<swish.cgi> script where to look
for the index created in the previous step.
It's also recommended to enter the path to the swish-e binary.
Otherwise, the C<swish.cgi> script will look for the binary in the C<PATH>,
and that may change when running under the CGI environment.

Here's the configuration file:

    /usr/lib/cgi-bin# cat .swishcgi.conf 
    return {
        title        => 'Search Apache Documentation',
        swish_binary => '/usr/local/bin/swish-e',
        swish_index  => '/home/moseley/web_index/index.swish-e',
    }


Now, test the script from the command line (as a normal user!):

    /usr/lib/cgi-bin# exit
    exit

    /usr/lib/cgi-bin$  ./swish.cgi | head
    Content-Type: text/html; charset=ISO-8859-1

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    <html>
        <head>
           <title>
              Search Apache Documentation
           
        
        

Notice that the CGI script returns the HTTP header (Content-Type)
and the body of the web page,
just like a well behaved CGI scrip should do.

Now, test using the web server
(this step depends on the location of your C directory).
This example uses the "GET" command that is part of the LWP Perl library,
but any web browser can run this test.

    /usr/lib/cgi-bin$ GET http://localhost/cgi-bin/swish.cgi | head
    
    
        
           
              Search Apache Documentation
           
        
        
            

The script reports errors to stderr, so consult the web server's error log if problems occur. The message "Service currently unavailable", reported by running C, typically indicates a configuration error; the exact problem will be listed in the web server's error log. Detailed instructions on using the C script and debugging tips can be found by running: $ perldoc swish.cgi while in the C directory where C was copied. The spider program C also has a large number of configuration options. Documentation is also available in the directory C<$prefix/share/doc/swish-e> or at http://swish-e.org. Note: Also check out the C script, found at the same location as the C script. This is more of a skeleton script, for those that want to create a custom search script. =back Now you are ready to search. =head1 Indexing Other Types of Documents - Filtering Swish-e can only index HTML, XML, and text documents. In order to index other documents, such as PDF or MS Word documents, you must use a utility to convert or "filter" those documents. How documents are filtered with Swish-e has changed over time. This has resulting in a bit of confusion. It's also a somewhat complex process, as different programs need to communicate with each other. You may wish to read the Swish-e FAQ question on filtering, before continuing here. L =head2 Filtering Overview There are two ways to filter documents with Swish-e. Both are described in the L man page. They use the C directive and the C Perl module. The C directive is a general-purpose method of filtering. It allows running of an external program for each document processed (based on file extension), and requires one or more external programs. These programs open an input file, convert as needed, and write their output to standard output. Previous versions of Swish-e (before 2.4.0) used a collection of filter programs for converting files such as PDF or MS Word documents. The external programs call other program to do the work of filtering (e.g. F to extract the contents from PDF files). Although these filter programs are still included with the Swish-e distribution as examples, it is recommended to use the C method, instead. One disadvantage of using C is that the filter program is run once for every document that needs to be filtered. This can slow down the indexing process B. The C Perl module works very much like the old system and uses the same helper programs. Convieniently, however, it provides a single interface for filtering all types of documents. The primary advantage of C is that it is built into the program used for spidering web sites (spider.pl), so all that's required is installing the filter programs that do the actual work of filtering (e.g. F, F). (The Windows binary includes some of the filter programs.) But, Swish-e will not use C by default when using the file system method of indexing. To use C when indexing by file system method (-S fs), you can use a C directive with the C filter (which is just a program that uses C) or use the C<-S prog> method of indexing and use the C program for fetching documents. C is included with the Swish-e distribution and is designed to work with C. Using DirTree.pl will likely be a faster way to index, since the C set of modules does not need to be compiled for every document that needs to be filtered. See the contents of C and C for specifics on their use. =head2 Filtering Examples The C directive can be used in your config file to convert documents, based on their extensions. This is the old way of filtering, but provides an easy way to add filters to Swish-e. For example: FileFilter .pdf pdftotext "'%p' -" IndexContents TXT* .pdf will cause all C<.pdf> files to be filtered through the F program (part of the F package) and to parse the resulting output (from F) with the text ("TXT") parser. The other way to filter documents is to use a C<-S prog> prograam and convert the documents before passing them onto Swish-e. For example, C makes use of the C Perl module, included with the Swish-e distribution. C is passed a document and the document's content type; it looks for modules and utilities to convert the document into one of the types that Swish-e can index. Swish-e comes ready to index PDF, MS Word, MP3 ID3 tags, and MS Excel file types. But these filters need extra modules or tools to do the actual conversion. For example, the Swish-e distribution includes a module called C that uses the F and F utilities provided by the F package. This means that if you are using C to spider your web site and you wish to index PDF documents, all that is needed is to install the Xpdf package and Swish-e (with the help of spider.pl) will begin indexing your PDF files. Ok, so what does all that mean? For a very simple site, you should be able to run this: $ /usr/local/lib/swish-e/spider.pl default http://localhost/ | swish-e -S prog -i stdin which is running the spider with default spider settings, indexing the Web server on localhost, and piping its output into Swish-e (using the default indexing settings). Documents will be filtered automatically, if you have the required helper applications installed. Most people will not want to just use the default settings (for one thing, the spider will take a while because its default is to delay a few seconds between every request). So, read the documentation for C, to learn how to use a spider config file. Also read L to learn about what configuration options can be used with Swish-e. The C documentation provides more details on filtering and hints for debugging problems when filtering. =head1 Document Info $Id: INSTALL.pod 1978 2007-12-08 01:59:17Z karpet $ . swish-e-2.4.7/pod/SWISH-SEARCH.pod0000664000077100017500000003133011166010103013143 00000000000000=head1 NAME SWISH-SEARCH - Swish-e Searching Instructions =head1 OVERVIEW This page describes the process of searching with Swish-e. Please see the L page for information the Swish-e configuration file directives, and L for a complete list of command line arguments. Searching a Swish-e index involves passing L to it that specify the index file to use, and the L (or search words) to locate in the index. Swish-e returns a list of file names (or URLs) that contain the matched search words. L is often used as a front-end to Swish-e such as in CGI applications, and L exist to for interfacing with Swish-e. =head1 Searching Syntax and Operations The C<-w> command line argument is used specify the search query to Swish-e. swish-e -w airplane will find all documents that contain the word B. When running Swish-e from a shell prompt, be careful to protect your query from shell metacharacters and shell expansions. This often means placing single or double quotes around your query. See L if you plan to use Perl as a front end to Swish-e. In the examples below single quotes are used to protect the search from the shell. The following section describes various aspects of searching with Swish-e. =head2 Boolean Operators You can use the Boolean operators B, B, B or B in searching. Without these Boolean operators Swish-e will assume you're B'ing the words together. The operators are not case sensitive. These three searches are the same: swish-e -w foo bar swish-e -w bar foo swish-e -w foo AND bar [Note: you can change the default to Bing by changing the variable DEFAULT_RULE in the config.h file and recompiling Swish-e.] The B operator inverts the results of a search. swish-e -w not foo finds all the documents that do not contain the word foo. Parentheses can be used to group searches. swish-e -w 'not (foo and bar)' The result is all documents that have none or one term, but not both. To search for the words B, B, B or B, place them in a double quotes. Remember to protect the quotes from the shell: swish-e -w '"not"' swish-e -w \"not\" will search for the word "not". Other examples: swish-e -w smilla or snow Retrieves files containing either the words "smilla" or "snow". swish-e -w smilla snow not sense swish-e -w '(smilla and snow) and not sense' (same thing) retrieves first the files that contain both the words "smilla" and "snow"; then among those the ones that do not contain the word "sense". The B keyword is similar to B but implies a proximity between the words. The B keyword takes a integer argument as well, indicating the maximum distance between two words to consider a valid match. Example: swish-e -w smilla near5 snow would match the document if the words C and C appeared within 5 positions of one another. A B search with no argument or argument of 0 is the same as an B search. =head2 Wildcards Two different wildcard characters are available, each evoking different behaviour. The C<*> means "match zero or more characters." The C means "match exactly one character." The wildcard C<*> may only be used at the end of a word. Otherwise C<*> is considered a normal character (i.e. can be searched for if included in the WordCharacters directive). Example: swish-e -w librarian this query only retrieves files which contain the given word. On the other hand: swish-e -w 'librarian*' retrieves "librarians", "librarianship", etc. along with "librarian". Note that wildcard searches combined with word stemming can lead to unexpected results. If stemming is enabled, a search term with a wildcard will be stemmed internally before searching. So searching for C will actually be a search for C, so C would find C. Also, searching for C will not find C as you might expect, since C stems to C in the index, and thus C will not find C. The C wildcard matches exactly one character, but may not be used at the start of a word. Example: swish-e -w 's?ow' will match C, C and C but B C. This: swish-e -w '?how' will throw an error. =head2 Order of Evaluation In general, the order of evaluation is not important. Internally swish-e processes the search terms from left to right. Parenthesis can be used to group searches together, effectively changing the order of evaluation. For example these three are the same: swish-e -w foo not bar baz swish-e -w not bar foo baz swish-e -w baz foo not bar but these two are B the same: swish-e -w foo not bar baz swish-e -w foo not (bar baz) The first finds all documents that contain both foo and baz, but do not contain bar. The second finds all that contain foo, and contain either bar or baz, but not both. It is often helpful in understanding searches to use the boolean terms and parenthesis. So the above two become: swish-e -w foo AND (not bar) AND baz swish-e -w foo AND (not (bar AND baz)) These four examples are all the same search (assuming that AND is the default search type): swish-e -w 'juliet not ophelia and pac' swish-e -w '(juliet) AND (NOT ophelia) AND (pac)' swish-e -w 'juliet not ophelia pac' swish-e -w 'pac and juliet and not ophelia' Looking at the the first three searches, first Swish-e finds all the documents with "juliet". Then it finds all documents that do not contain "ophelia". Those two lists are then combined with the boolean AND operator resulting with a list of documents that include "juliet" but not "ophelia". Finally, that list is ANDed with the list of documents that contain "pac" resulting. However it is always possible to force the order of evaluation by using parenthesis. For example: swish-e -w 'juliet not (ophelia and pac)' retrieves files with "juliet" that do not contain both words "ophelia" and "pac". =head2 Meta Tags MetaNames are used to represent I (called I in a database) and provide a way to search in only parts of a document. See L for a description of MetaNames, and how they are specified in the source document. To limit a search to words found in a meta tag you prefix the keywords with the name of the meta tag, followed by the equal sign: metaname = word metaname = (this or that) metaname = ( (this or that) or "this phrase" ) It is not necessary to have spaces at either side of the "=", consequently the following are equivalent: swish-e -w "metaName=word" swish-e -w "metaName = word" swish-e -w "metaName= word" To search on a word that contains a "=", precede the "=" with a "\" (backslash). swish-e -w "test\=3 = x\=4 or y\=5" this query returns the files where the word "x=4" is associated with the metaName "test=3" or that contains the word "y=5" not associated with any metaName. Queries can be also constructed using any of the usual search features, moreover metaName and plain search can be mixed in a single query. swish-e -w "metaName1 = (a1 or a4) not (a3 and a7)" This query will retrieve all the files in which "a1" or "a2" are found in the META tag "metaName1" and that do not contain the words "a3" and "a7", where "a3" and "a7" are not associated to any meta name. =head2 Phrase Searching To search for a phrase in a document use double-quotes to delimit your search terms. (The phrase delimiter is set in src/swish.h.) You must protect the quotes from the shell. For example, under Unix: swish-e -w '"this is a phrase" or (this and that)' swish-e -w 'meta1=("this is a phrase") or (this and that)' Or under Windows: swish-e -w \"this is a phrase\" or (this and that) You can not use boolean search terms inside a phrase. That is: swish-e -w 'this and that' finds documents with both words "this" and "that", but: swish-e -w '"this and that"' finds documents that have the phrase "that and that". A phrase can consist of a single word, so this is how to search for the words used as boolean operators: swish-e -w 'this "and" that' finds documents that contain all three words, but in any order. You can use the C<-P> switch to set the phrase delimiter character. See L for examples. =head2 Context At times you might not want to search for a word in every part of your files since you know that the word(s) are present in a particular tag. The ability to search according to context greatly increases the chances that your hits will be relevant, and Swish-e provides a mechanism to do just that. The -t option in the search command line allows you to search for words that exist only in specific HTML tags. Each character in the string you specify in the argument to this option represents a different tag in which the word is searched; that is you can use any combinations of the following characters: H means all tags B stands for tags t is all tags h is <H1> to <H6> (header) tags e is emphasized tags (this may be <B>, <I>, <EM>, or <STRONG>) c is HTML comment tags (<!-- ... -->) # This search will look for files with these two words in their titles only. swish-e -w "apples oranges" -t t # This search will look for files with these words in comments only. swish-e -w "keywords draft release" -t c This search will look for words in titles, headers, and emphasized tags. swish-e -w "world wide web" -t the =head1 Searching with Perl Perl ( http://www.perl.com/ ) is probably the most common programming language used with Swish-e, especially in CGI interfaces. Perl makes searching and parsing results with Swish-e easy, but if not done properly can leave your server vulnerable to attacks. When designing your CGI scripts you should carefully screen user input, and include features such as paged results and a timer to limit time required for a search to complete. These are to protect your web site against a denial of service (DoS) attack. Included with every distribution of Perl is a document called perlsec -- Perl Security. I<Please> take time to read and understand that document before writing CGI scripts in perl. Type at your shell/command prompt: perldoc perlsec If nothing else, start every CGI program in perl as such: #!/usr/local/bin/perl -wT use strict; That alone won't make your script secure, but may help you find insecure code. =head2 CGI Danger! There are many examples of CGI scripts on the Internet. Many are poorly written and insecure. A commonly seen way to execute Swish-e from a perl CGI script is with a I<piped open>. For example, it is common to see this type of C<open()>: open(SWISH, "$swish -w $query -f $index|"); This C<open()> gives shell access to the entire Internet! Often an attempt is made to strip C<$query> of I<bad> characters. But, this often fails since it's hard to guess what every I<bad> character is. Would you have thought about a null? A better approach is to only allow I<in> known safe characters. Even if you can be sure that any user supplied data is safe, this I<piped open> still passes the command parameters through the shell. If nothing else, it's just an extra unnecessary step to running Swish-e. Therefore, the recommended approach is to fork and exec C<swish-e> directly without passing through the shell. This process is described in the perl man page C<perlipc> under the appropriate heading B<Safe Pipe Opens>. Type: perldoc perlipc If all this sounds complicated you may wish to use a Perl module that does all the hard work for you. =head2 Perl Modules The Swish-e distribution includes a Perl module called SWISH::API. SWISH::API provides access to the Swish-e C Library. The SWISH::API module is I<not> installed by default. The SWISH::API module will I<embed> Swish-e into your perl program so that searching does not require running an external program. Embedding the Swish-e program into your perl program results in faster Swish-e searches, especially when running under a persistent environment like mod_perl since it avoids the cost of opening the index file for every request (mod_perl is much also much faster than CGI because it avoids the need to compile Perl code for every request). See the README file in the F<perl> directory of the Swish-e distribution for installation instructions. Documentation for the SWISH::API module is available at http://swish-e.org and is installed along with other HTML documentation on your computer. =head1 Document Info $Id: SWISH-SEARCH.pod 1815 2006-08-27 20:22:54Z karman $ . ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/pod/SWISH-RUN.pod���������������������������������������������������������������������0000664�0000771�0001750�00000115443�11166010103�012652� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������=head1 NAME SWISH-RUN - Running Swish-e and Command Line Switches =head1 OVERVIEW The Swish-e program is controlled by command line arguments (called I<switches>). Often, it is run manually from a shell (command prompt), or from a program such as a CGI script that passes the command line arguments to swish. Note: A number of the command line switches may be specified in the Swish-e configuration file specified with the C<-c> command line argument. Please see L<SWISH-CONFIG|SWISH-CONFIG> for a complete description of available configuration file directives. There are two basic operating modes of Swish-e: indexing and searching. There are command line arguments that are unique to each mode, and others that apply to both (yet may have different meaning depending on the operating mode). These command line arguments are listed below, grouped by: L<INDEXING|/"INDEXING"> -- describes the command line arguments used while indexing. L<SEARCHING|/"SEARCHING"> -- lists the command line arguments used while searching. L<OTHER SWITCHES|/"OTHER SWITCHES"> -- lists switches that don't apply to searching or indexing. Beginning with Swish-e version 2.1, you may embed its search engine into your applications. Please see L<SWISH-LIBRARY|SWISH-LIBRARY>. =head1 INDEXING Swish-e indexing is initiated by passing I<command line arguments> to swish. The command line arguments used for I<searching> are described in L<SEARCHING|/"SEARCHING">. Also, see L<SWISH-SEARCH|SWISH-SEARCH> for examples of searching with Swish-e. Swish-e usage: swish-e [-i dir file ... ] [-c file] [-f file] [-l] \ [-v (num)] [-S method(fs|http|prog)] [-N path] The C<-h> switch (help) will list the available Swish-e command line arguments: swish-e -h Typically, most if not all indexing settings are placed in a configuration file (specified with the C<-c> switch). Once the configuration file is setup indexing is initiated as: swish-e -c /path/to/config/file See L<SWISH-CONFIG|SWISH-CONFIG> for information on the configuration file. Security Note: If the swish binary is named F<swish-search> then swish will not allow any operation that would cause swish to write to the index file. When indexing it may be advisable to index to a temporary file, and then after indexing has successfully completed rename the file to the final location. This is especially important when replacing an index that is currently in use. swish-e -c swish.config -f index.tmp [check return code from swish or look for err: output] mv index.tmp index.swish-e =head2 Indexing Command Line Arguments =over 4 =item -i *directories and/or files* (input file) This specifies the directories and/or files to index. Directories will be indexed recursively. This is typically specified in the L<configuration file|SWISH-CONFIG> with the B<IndexDir> directive instead of on the command line. Use of this switch overrides the configuration file settings. =item -S [fs|http|prog] (document source/access mode) This specifies the method to use for accessing documents to index. Can be either C<fs> for local indexing via the file system (the default), C<http> for spidering, or C<prog> for reading documents from an external program. Located in the C<conf> directory are example configuration files that demonstrate indexing with the different document source methods. See the L<SWISH-FAQ|SWISH-FAQ> for a discussion on the different indexing methods, and the difference between spidering with the http method vs. using the file system method. =over 4 =item fs - file system The C<fs> method simply reads files from a local (or networked) drive. This is the default method if the C<-S> switch is not specified. See L<SWISH-CONFIG|SWISH-CONFIG> for configuration directives specific to the C<fs> method. =item http - spider a web server The C<http> method is used to spider web servers. It uses an included helper program called F<swishspider>. See L<SWISH-CONFIG|SWISH-CONFIG> for configuration directives specific to the C<http> method. Security Note: Under Windows swish passes the URLs fetched from remote documents through the shell (swish uses the system() command for running F<swishspider> under Windows), and this may be considered an additional security risk. The C<http> method is deprecated (or at least not very well appreciated). Consider using the C<prog> method described below for spidering. There's a spider program available in the F<prog-bin> directory for use with the C<prog> method. Here's a number of limitation with this method that are solved with the C<prog> method: =over 4 =item * swishspider only spiders standard E<lt>a href="..."E<gt> links. Frames and other links are not followed. =item * By default, this method of spidering only indexes files that have a content type of "text/*" (e.g. text/plain, text/html, text/xml). You should use C<DefaultContents> and C<IndexContents> to map file extensions to parsers used by swish (e.g. C<IndexContents HTML* .html .htm>), but this will fail where a document does not have a file extension. =item * Swish-e's C<FileFilter> directive can be used with the C<http> access method, although it requires a separate process (in addition to the swsihspider process) for each document filtered. =item * The SWISH::Filter modules can be used with the swishspider program. SWISH::Filter provides a general purpose filtering system (see SWISH::Filter documentation). To use SWISH::Filter set PERL5LIB to point to the location of the SWISH module name space (typically /usr/local/lib/swish-e under Unix). For example: export PERL5LIB=/usr/local/lib/swish-e # bash, bourne shells setenv PERL5LIB /usr/local/lib/swish-e # csh, tcsh or under Windows set PERL5LIB=c:\program files\swish-e2.4\lib\swish-e SWISH::Filter is not enabled by default due to the overhead of loading the modules for every document fetched. The Swish-e distribution includes perl modules in the SWISH::Filters::* namespace to make converting non-text documents into a format that Swish-e can parse easy. As mentioned above, the helper script F<swishspider> will use these modules if can be found via PERL5LIB. These modules only provide an interface to programs that do the conversion. For example, you will need to download and install the "catdoc" program to convert MSWord documents into text for indexing. Please see F<filters/README> to see how to use this filter system. =back =item prog - general purpose access method The C<prog> method is new to Swish-e version 2.2. It's designed as a general purpose method to feed documents to swish from an external program. For example, the external program can read a database (e.g. MySQL), spider a web server, or convert documents from one format to another (e.g. pdf to html). Or, you can simply use it to read the files of the file system (like C<-S fs>), yet provide you with full control of what files are indexed. The external program name to run is passed to swish either by the L<IndexDir|SWISH-CONFIG/"item_IndexDir"> directive, or via the C<-i> option. The program specified should be an absolute path as swish-e will attempt to stat() the program to make sure it exists. Swish does this to help in error reporting. If the program specified with -i or IndexDir is not an absolute path (i.e. does not include "/" ) then swish-e will append the "libexecdir" directory defined during configuration. Typically, libexecdir is set to "$prefix/lib/swish-e" (/usr/local/lib/swish-e), but is platform and installation dependent. Running swish-e -h will report the directory. For example, the -S prog program "spider.pl" is a Perl helper program for use with -S prog and is installed in libexecdir. IndexDir spider.pl SwishProgParameters default http://localhost/index.html and swish-e will find spider.pl in libexecdir. Additional parameters may be passed to the external program via the L<SwishProgParameters|SWISH-CONFIG/"item_SwishProgParameters"> directive. In the example above swish-e will pass two parameters to spider.pl, "default" and "http://localhost/index.html". A special name "stdin" may be used with C<-i> or L<IndexDir|SWISH-CONFIG/"item_IndexDir"> which tells swish to read from standard input instead of from an external program. See example below. The external program prints to standard output (which swish captures) a set of headers followed by the content of the file to index. The output looks similar to an email message or a HTTP document returned by a web server in that it includes name/value pairs of headers, a blank line, and the content. The content length is determined by a content-length header supplied to swish by the program; there is no "end of record" character or flag sent between documents. Therefore, it is critical that the content-length header is correct. This is a common source of errors. One advantage of this method (over using filters, for example) is that the external program is run only once for the entire indexing job, instead of once for every document. This avoids forking and creating a new process for every document, and makes a huge difference when your external program is something like perl that has a large startup cost. Here's a simple example written in Perl: #!/usr/local/bin/perl -w use strict; # Build a document my $doc = <<EOF; <html> <head> <title>Document Title This is the text. EOF # Prepare the headers for swish my $path = 'Example.file'; my $size = length $doc; my $mtime = time; # Output the document (to swish) print <) by using the C header. The document type is used to select which parser Swish-e uses to parse the document's contents. For example, a spider program might map the content-type returned from a web server to one of the types Swish-e understands. For example, my $doc_type = 'HTML*' if $response->content_type =~ m!text/html!' This header is not required. =item Update-Mode: When updating an incremental index this header can be used to select the mode for updating the index. There are three possible values: Update Remove Index "Update" will update the index with the given file if the date of the given file is newer than the date of the file already in the index. Setting to "Update" is the same as using -u on the command line. "Remove" mode will remove the file specified by the Path-Name header. Setting "Remove" is the same as using -r on the command line. "Index" will add the file to the index. NOTE: swish-e will not check to see if the file already exists. If this header is not specified, the default is the mode specified on the command line (-u, -r, or none). This option is still experimental and is subject to change in the future. Ask on the Swish-e list before using. =back The above example program only returns one document and exits, which is not very useful. Normally, your program would read data from some source, such as files or a database, format as XML, HTML, or text, and pass them to swish, one after another. The C header tells swish where each document ends -- there is not any special "end of record" character or marker. To index with the above example you need to make sure that the program is executable (and that the path to perl is correct), and then call swish telling to run in C mode, and the name of the program to use for input. % chmod 755 example.pl % ./swish-e -S prog -i ./example.pl Programs can and should be tested prior to running swish. For example: % ./example.pl > test.out A few more useful example programs are provided in the swish-e distribution located in the F directory. Some include documentation: % cd prog-bin % perldoc spider.pl Others are small examples that include comments: % cd prog-bin % less DirTree.pl The F program can be used as a replacement for the F<-S http> method. It is far more feature-rich and offers much more control over indexing. If you use the special program name "stdin" with C<-i> or L then swish-e will read from standard input instead of from a program. For example: % ./example.pl --count=1000 /path/to/data | ./swish-e -S prog -i stdin This is basically the same as using a swish-e configuration file of: SwishProgParameters --count=1000 /path/to/data IndexDir ./example.pl in a config file and running % ./swish-e -S prog -c swish.conf This gives an easy way to run swish without a configuration file with a C<-S prog> program that requires parameters. It also means you can capture data to a file and then index more once with the same data: % ./example.pl /path/to/data --count=1000 > docs.txt % cat docs.txt | ./swish-e -S prog -i stdin -c normal_index % cat docs.txt | ./swish-e -S prog -i stdin -c fuzzy_index Using "stdin" might also be useful for programs that call swish (instead of swish calling the program). (The reason "stdin" is used instead of the more common "-" dash is due to the rotten way swish parses the command line. This should be fixed in the future.) The C method bypasses some of the configuration parameters available to the file system method -- settings such as C, C, C and C are ignored when using the C method. It's expected that these operations are better accomplished in the external program before passing the document onto swish. In other words, when using the C method, only send the documents to swish that you want indexed. You may use swish's filter feature with the C method, but performance will be better if you run filtering programs from within your external program. See also F for an example how to easily add document converstion and filtering into your Perl-based programs. B Windows does not use the shebang (#!) line of a program to determine the program to run. So, when running, for example, a perl program you may need to specify the perl.exe binary as the program, and use the C to name the file. IndexDir e:/perl/bin/perl.exe SwishProgParameters read_database.pl Swish will replace the forward slashes with backslashes before running the command specified with C. Swish uses the popen(3) command which passes the command through the shell. =back =item -f *indexfile* (index file) If you are indexing, this specifies the file to save the generated index in, and you can only specify one file. See also B in the L. If you are searching, this specifies the index files (one or more) to search from. The default index file is index.swish-e in the current directory. =item -c *file ...* (configuration files) Specify the configuration file(s) to use for indexing. This file contains many directives that control how Swish-e proceeds. See L for a complete listing of configuration file directives. Example: swish-e -c docs.conf If you specify a directory to index, an index file, or the verbose option on the command-line, these values will override any specified in the configuration file. You can specify multiple configuration files. For example, you may have one configuration file that has common site-wide settings, and another for a specific index. Examples: 1) swish-e -c swish-e.conf 2) swish-e -i /usr/local/www -f index.swish-e -v -c swish-e.conf 3) swish-e -c swish-e.conf stopwords.conf =over 3 =item 1 The settings in the configuration file will be used to index a site. =item 2 These command-line options will override anything in the configuration file. =item 3 The variables in swish-e.conf will be read, then the variable in stopwords.conf will be read. Note that if the same variables occur in both files, older values may be written over. =back =item -e (economy mode) For large sites indexing may require more RAM than is available. The C<-e> switch tells swish to use disk space to store data structures while indexing, saving memory. This option is recommended if swish uses so much RAM that the computer begins to swap excessively, and you cannot increase available memory. The trade-off is slightly longer indexing times, and a busy disk drive. =item -l (symbolic links) Specifying this option tells swish to follow symbolic links when indexing. The configuration file value B will override the command-line value. The default is not to follow symlinks. A small improvement in indexing time my result from enabling FollowSymLinks since swish does not need to stat every directory and file processed to determine if it is a symbolic link. =item -N path (index only newer files) The C<-N> option takes a path to a file, and only files I than the specified file will be indexed. This is helpful for creating incremental indexes -- that is, indexes that contain just files added since the last full index was created of all files. Example (bad example) swish-e -c config.file -N index.swish-e -f index.new This will index as normal, but only files with a modified date newer than F will be indexed. This is a bad example because it uses F which one might assume was the date of last indexing. The problem is that files might have been added between the time indexing read the directory and when the F file was created -- which can be quite a bit of time for very large indexing jobs. The only solution is to prevent any new file additions while full indexing is running. If this is impossible then it will be slightly better to do this: Full indexing: touch indexing_time.file swish-e -c config.file -f index.tmp mv index.tmp index.full Incremental indexing: swish-e -c config.file -N indexing_time.file -f index.tmp mv index.tmp index.incremental Then search with swish-e -w foo -f index.full index.incremental or merge the indexes swish-e -M index.full index.incremental index.tmp mv index.tmp index.swish-e swish-e -w foo =item -r B<**incremental index format only**> The C<-r> option puts swish-e into "removal" mode. Any input files (given with C<-i> or the C parameter) are removed from an existing index. Example: swish-e -r -i file.html would remove F from the existing index. =item -u B<**incremental index format only**> The C<-u> option puts swish-e into "update" mode. The timestamp of each input file is compared against the corresponding file in the existing index. If swish-e encounters an input file that either does not exist yet in the index or exists with a timestamp older than the input file, the input file is updated in the index. Any words in the input file that have been added or removed are reflected as such in the index. Example: swish-e -i file.html -u would update the index.swish-e index with the contents of file.html. If file.html was new, it would be added. If file.html already existed in the index, its contents would be updated in the index. =item -v [0|1|2|3] (verbosity level) The C<-v> option can take a numerical value from 0 to 3. Specify 0 for completely silent operation and 3 for detailed reports. If no value is given then 1 is assumed. See also B in the L. Warnings and errors are reported regardless of the verbosity level. In addition, all error and warnings are written to standard out. This is for historical reasons (many scripts exist that parse standard out for error messages). =item -W (0|1|2|3) (parser warning level) If using the libxml2 parser, the default parser warning level is set at C<2>. Use the C<-W> option to override that default. Most often, you might want to turn it off altogether: swish-e -W0 -i path/to/files would fail silently if the parser encountered any errors. =back =head1 SEARCHING The following command line arguments are available when searching with Swish-e. These switches are used to select the index to search, what fields to search, and how and what to print as results. This section just lists the available command line arguments and their usage. Please see L for detailed searching instructions. B: If using Swish-e via a CGI interface, please see L Security Note: If the swish binary is named F then swish will not allow any operation that would cause swish to write to the index file. =head2 Searching Command Line Arguments =over 4 =item -w *word1 word2 ...* (query words) This performs a case-insensitive search using a number of keywords. If no index file to search is specified (via the C<-f> switch), swish-e will try to search a file called index.swish-e in the current directory. swish-e -w word Phrase searching is accomplished by placing the quote delimiter (a double-quote by default) around the search phrase. swish-e -w 'word or "this phrase"' Search would should be protected from the shell by quotes. Typically, this is single quotes when running under Unix. Under Windows F you may not need to use quotes, but you will need to backslash the quotes used to delimit phrases: swish-e -w \"a phrase\" The phrase delimiter can be set with the C<-P> switch. The search may be limited to a I. For example: swish-e -w meta1=(foo or baz) will only search within the B tag. Please see L for a description of MetaNames =item -f *file1 file2 ...* (index files) Specifies the index file(s) used while searching. More than one file may be listed, and each file will be searched. If no C<-f> switch is specified then the file F in the current directory will be used as the index file. =item -m *number* (max results) While searching, this specifies the maximum number of results to return. The default is to return all results. This switch is often used in conjunction with the C<-b> switch to return results one page at a time (strongly recommended for large indexes). =item -b *number* (beginning result) Sets the I search result to return (records are numbered from 1). This switch can be used with the C<-m> switch to return results in groups or pages. Example: swish-e -w 'word' -b 1 -m 20 # first 'page' swish-e -w 'word' -b 21 -m 20 # second 'page' =item -t HBthec (context searching) The C<-t> option allows you to search for words that exist only in specific HTML tags. Each character in the string you specify in the argument to this option represents a different tag in which to search for the word. H means all HEAD tags, B stands for BODY tags, t is all TITLE tags, h is H1 to H6 (header) tags, e is emphasized tags (this may be B, I, EM, or STRONG), and c is HTML comment tags search only in header (EH*E) tags swish-e -w word -t h =item -d *string* (delimiter) Set the delimiter used when printing results. By default, Swish-e separates the output fields by a space, and places double-quotes around the document title. This output may be hard to parse, so it is recommended to use C<-d> to specify a character or string used as a separator between fields. The string C means "double-quotes". swish-e -w word -d , # single char swish-e -w word -d :: # string swish-e -w word -d '"' # double quotes under Unix swish-e -w word -d \" # double quotes under Windows swish-e -w word -d dq # double quotes The following control characters may also be specified: C<\t \r \n \f>. Warning: This string is passed directly to sprintf() and therefore exposes a securty hole. Do not allow user data to set -d format strings directly. =item -P *character* Sets the delimiter used for phrase searches. The default is double quotes C<">. Some examples under bash: (be careful about you shell metacharacters) swish-e -P ^ -w 'title=^words in a phrase^' swish-e -P \' -w "title='words in a pharse"' =item -p *property1 property2 ...* (display properties) This causes swish to print the listed property in the search results. The properties are returned in the order they are listed in the C<-p> argument. Properties are defined by the B directive in the configuration file (see L) and properties must also be defined in B. Swish stores the text of the meta name as a I, and then will return this text while searching if this option is used. Properties are very useful for returning data included in a source documnet without having to re-read the source document while searching. For example, this could be used to return a short document description. See also see B and L in L. To return the subject and category properties while indexing. swish-e -w word -p subject category Properties are returned in double quotes. If a property contains a double quote it is HTML escaped ("). See the C<-x> switch for a more advanced method of returning a list of properties. NOTE: it is necessary to have indexed with the proper PropertyNames directive in the user config file in order to use this option. =item -s *property [asc|desc] ...* (sort) Normally, search results are printed out in order of relevancy, with the most relevant listed first. The C<-s> sort switch allows you to sort results in order of a specified I, where a I was defined using the B and B directives during indexing (see L). The string passed can include the strings C and C to specify the sort order, and more than one property may be specified to sort on more than one key. Examples: sort by title property ascending order -s title sort descending by title, ascending by name -s title desc name asc Note: Swish limits sort keys to 100 characters. This limit can be changed by changing MAX_SORT_STRING_LEN in src/config.h and rebuilding swish-e. =item -L limit to a range of property values (Limit) B The C<-L> switch can be used to limit search results to a range of property values Example: swish-e -w foo -L swishtitle a m finds all documents that contain the word C, and where the document's title is in the range of C to C, inclusive. By default, the case of the property is ignored, but this can be changed by using L configuation directive. Limiting may be done with user-defined properties, as well. For example, if you indexed documents that contain a created timestamp in a meta tag: Then you tell Swish that you have a property called C, and that it's a timestamp. PropertyNamesDate created_on After indexing you will be able to limit documents to a range of timestamps: -w foo -L created_on 946684800 949363199 will find documents containing the word foo and that have a created_on date from the start of Jan 1, 2000 to the end of Jan 31, 2000. Note: swish currently does not parse dates; Unix timestamps must be used. Two special formats can be used: -L swishtitle <= m -L swishtitle >= m Finds titles less than or equal, or grater than or equal to the letter C. This feature will not work with C or C properties. This feature takes advantages of the pre-sorted tables built by swish during indexing to make this feature fast while searching. You should see in the indexing output a line such as: 6 properties sorted. That indicates that six pre-sorted tables were built during indexing. By default, all properties are presorted while indexing. What properties are pre-sorted can be controlled by the configuration parameter C. Using the C<-L> switch on a property that was not pre-sorted will still work, but may be I slower during searching. Note that the PropertyNamesSortKeyLength setting is used for sorting properties. Using too small a PropertyNamesSortKeyLength could result in -L selecting the wrong properties due to incomplete sorting. This is an experimental feature, and its use and interface are subject to change. =item -x formatstring (extended output format) The C<-x> switch defines the output format string. The format string can contain plain text and property names (including swish-defined internal property names) and is used to generate the output for every result. In addition, the output format of the property name can be controlled with C-like printf format strings. This feature overrides the cmdline switches C<-d> and C<-p>, and a warning will be generated if C<-d> or C<-p> are used with C<-x>. Warning: The format string (fmt) is passed directly to sprintf() and therefore exposes a securty hole. Do not allow user data to set -x format strings directly. For example, to return just the title, one per line, in the search results: swish-e -w ... -x '\n' ... Note: the C<\n> may need to be protected from your shell. See also L for a way to define I format strings in the swish configuration file. B "texttexttext..." Where B is: =over 4 =item * the name of a user property as specified with the config file directive "PropertyNames" =item * the name of a swish Auto property (see below). These properties are defined automatically by swish -- you do not need to specify them with PropertyNames directive. (This may change in the future.) =back propertynames must be placed within "E" and "E". B Swish-e allows you to specify certain META tags within your documents that can be used as B. The contents of any META tag that has been identified as a document property can be returned as part of the search results. Doucment properties must be defined while indexing using the B configuration directive (see L). Examples of user-defined PropertyNames: B Swish defines a number of "Auto" properties for each document indexed. These are available for output when using the C<-x> format. Name Type Contents -------------- ------- ---------------------------------------------- swishreccount Integer Result record counter swishtitle String Document title swishrank Integer Result rank for this hit swishdocpath String URL or filepath to document swishdocsize Integer Document size in bytes swishlastmodified Date Last modified date of document swishdescription String Description of document (see:StoreDescription) swishdbfile String Path of swish database indexfile The Auto properties can also be specified using shortcuts: Shortcut Property Name -------- -------------- %c swishreccount %d swishdescription %D swishlastmodified %I swishdbfile %p swishdocpath %r swishrank %l swishdocsize %t swishtitle For example, these are equivalent: -x '::\n' -x '%r:%p:%t\n' Use a double percent sign "%%" to enter a literal percent sign in the output. B Properties listed in an C<-x> format string can include format control strings. These "propertyformats" are used to control how the contents of the associated property are printed. Property formats are used like C-language printf formats. The property format is specified by including the attribute "fmt" within the property tag. Format strings cannot be used with the "%" shortcuts described above. General syntax: -x '' where C controls the output format of C. Examples of property format strings: date type: string type: integer type: Please see the manual pages for strftime(3) and sprintf(3) for an explanation of format strings. Note: some versions of strftime do not offer the %s format string (number of seconds since the Epoch), so swish provides a special format string "%ld" to display the number of seconds since the Epoch. The first character of a property format string defines the delimiter for the format string. For example, -x " ...\n" -x " ...\n" -x " ...\n" B If you ommit the sub-format, the following formats are used: String type: "%s" (like printf char *) Integer type: "%d" (like printf int) Float type: "%f" (like printf double) Date type: "%Y-%m-%d %H:%M:%S" (like strftime) B Text will be output as-is in format strings (and property format strings). Special characters can be escaped with a backslash. To get a new line for each result hit, you have to include the Newline-Character "\n" at the end of "fmtstr". -x "||\n" -x "Count=, Rank=\n" -x "Title=\\" -x 'Date: \n' -x 'Date in seconds: \n' B you can use C-like control escapes in the format string: known controls: \a, \b, \f, \n, \r, \t, \v, digit escapes: \xhexdigits \0octaldigits character escapes: \anychar Example, swish -x "%c\t%r\t%p\t\"\"\n" B -x "%c|%r|%p|%t|%D|%d\n" -x "%c|%r|%p|%t||%d\n" -x "\t\t\t\n -x "xml_out: \\>\\n" -x "xml_out: \n" =item -H [0|1|2|3|EnE] (header output verbosity) The C<-H n> switch generates extened I
output. This is most useful when searching more than one index file at a time by specifying more than one index file with the C<-f> switch. C<-H 2> will generate a set of headers specific to each index file. This gives access to the settings used to generate each index file. Even when searching a single index file, C<-H n> will provided additional information about the index file, how it was indexed, and how swish is interperting the query. -H 0 : print no header information, output only search result entries. -H 1 : print standard result header (default). -H 2 : print additional header information for each searched index file. -H 3 : enhanced header output (e.g. print stopwords). -H 9 : print diagnostic information in the header of the results (changed from: C<-v 4>) =item -R [0|1] (Ranking Scheme) B The default ranking scheme in SWISH-E evaluates each word in a query in terms of its frequency and position in each document. The default scheme is 0. New in version 2.4.3 you may optionally select an experimental ranking scheme that, in addition to document frequency and position, uses Inverse Document Frequency (IDF), or the relative frequency of each word across all the indexes being searched, and Relative Density, or the normalization of the frequency of a word in relationship to the number of words in the document. B IgnoreTotalWordCountWhenRanking must be set to B or B<0> in your index(es) for -R 1 to work. Specify -R 1 to turn on IDF ranking. See the API documentation for how to set the ranking scheme in your Perl or C program. =back =head1 OTHER SWITCHES =over 4 =item -V (version) Print the current version. =item -k *letter* (print out keywords) The C<-k> switch is used for testing and will cause swish to print out all keywords in the index beginning with that letter. You may enter C<-k '*'> to generate a list of all words indexed by swish. =item -D *index file* (debug index) The -D option is no longer supported in version 2.2. =item -T *options* (trace/debug swish) The -T option is used to print out information that may be helpful when debugging swish-e's operation. This option replaced the C<-D> option of previous versions. Running C<-T help> will print out a list of available *options* =back =head1 Merging Index Files In previous versions of Swish-e indexing would require a very large amount of memory and the indexing process could be very slow. Merging provided a way to index in chunks and then combine the indexes together into a single index. Indexing is much faster now and uses much less memory, and with the C<-e> switch very little memory is needed to index a large site. Still, at times it can be useful to merge different index files into one file for searching. This could be because you want to keep separate site indexes and a common one for a global search, or you have separate collections of documents that you wish to search all at one time, but manage separately. =over 4 =item -M *index1 index2 ... indexN out_index Merges the indexes specified on the command line -- the last file name entered is the output file. The output index must not exist (otherwise merge will not proceed). Only indexes that were indexed with common settings may be merged. (e.g. don't mix stemming and non-stemming indexes, or indexes with different WordCharacter settings, etc.). Use the C<-e> switch while merging to reduce memory usage. Merge generates progress messages regardless of the setting of C<-v>. =item -c *configuration file* Specify a configuration file while indexing to add administrative information to the output index file. =back =head1 Document Info $Id: SWISH-RUN.pod 1741 2005-05-17 02:22:40Z karman $ . swish-e-2.4.7/COPYING0000664000077100017500000003336511166010113011063 00000000000000Swish-e is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. See below for version 2 of the GNU GPL. Unless otherwise indicated, all source files are Copyright (C) 2005, the Swish-e Project, http://swish-e.org. Swish-e includes a library for searching with a well-defined API. The library is named libswish-e. Linking libswish-e statically or dynamically with other modules is making a combined work based on Swish-e. Thus, the terms and conditions of the GNU General Public License cover the whole combination. As a special exception, the copyright holders of Swish-e give you permission to link Swish-e with independent modules that communicate with Swish-e solely through the libswish-e API interface, regardless of the license terms of these independent modules, and to copy and distribute the resulting combined work under terms of your choice, provided that every copy of the combined work is accompanied by, or provides a URL link to, a complete copy of the source code of Swish-e (the version of Swish-e used to produce the combined work), being distributed under the terms of the GNU General Public License plus this exception. An independent module is a module which is not derived from or based on Swish-e. Note that people who make modified versions of Swish-e are not obligated to grant this special exception for their modified versions; it is their choice whether to do so. The GNU General Public License gives permission to release a modified version without this exception; this exception also makes it possible to release a modified version which carries forward this exception. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. swish-e-2.4.7/perl/0000777000077100017500000000000011166013167011056 500000000000000swish-e-2.4.7/perl/typemap0000664000077100017500000000232111166010111012356 00000000000000# $Id: typemap 1467 2004-05-25 23:52:26Z whmoseley $ TYPEMAP SW_HANDLE O_OBJECT SW_SEARCH O_OBJECT SW_RESULTS O_OBJECT SW_RESULT O_OBJECT SW_FUZZYWORD O_OBJECT META_OBJ * O_OBJECT SW_META O_OBJECT_META const char * T_PV # From: "perlobject.map" Dean Roehrich, version 19960302 # O_OBJECT -> link an opaque C or C++ object to a blessed Perl object. OUTPUT # The Perl object is blessed into 'CLASS', which should be a # char* having the name of the package for the blessing. O_OBJECT sv_setref_pv( $arg, CLASS, (void*)$var ); INPUT O_OBJECT if( sv_isobject($arg) && (SvTYPE(SvRV($arg)) == SVt_PVMG) ) $var = ($type)SvIV((SV*)SvRV( $arg )); else{ warn( \"${Package}::$func_name() -- $var is not a blessed SV reference\" ); XSRETURN_UNDEF; } O_OBJECT_META if( sv_isobject($arg) && (SvTYPE(SvRV($arg)) == SVt_PVMG) ) { META_OBJ *m = (META_OBJ *)SvIV((SV*)SvRV( $arg )); $var = m->meta; } else { warn( \"${Package}::$func_name() -- $var is not a blessed SV reference\" ); XSRETURN_UNDEF; } swish-e-2.4.7/perl/Makefile.PL0000664000077100017500000002157611166010111012743 00000000000000#!/usr/bin/perl -w use strict; use ExtUtils::MakeMaker; use Config; # for path separator use File::Spec; # for catpath use File::Basename ; # for locating swish-e binary based on location of swish-config # $Id: Makefile.PL 1994 2007-12-14 14:25:55Z karpet $ #---------------------------------------------------------------------------------- # Default settings my %make_maker_opts = ( NAME => 'SWISH::API', VERSION_FROM => 'API.pm', AUTHOR => 'Bill Moseley', ABSTRACT => 'Perl interface to the swish-e search library', # Set LIBS and INC from swish-confg NORECURS => 1, # keep it from recursing into subdirectories DIR => [], XSPROTOARG => '-noprototypes', PREREQ_PM => { 'File::Spec' => '0.8', }, test => { TESTS => 't/*.t', }, clean => { FILES => join( ' ', qw( t/index.swish-e t/index.swish-e.prop ) ), }, ); my $SWISH_BINARY = 'swish-e'; my $SWISH_CONFIG = 'swish-config'; my $MIN_VERSION = '2.4.3'; my @valid_params = qw/ SWISHBINDIR SWISHHELP SWISHIGNOREVER SWISHSKIPTEST SWISHLIBS SWISHINC SWISHVERSION /; my $help = <catdir( $swish_config{BINDIR}, $SWISH_BINARY ); create_index($swish_binary); } else { $config{test}{TESTS} = 't/dummy.t'; } WriteMakefile( %make_maker_opts, %config ); #---------------------------------------------------------------------------------- # Test the swish-e version #---------------------------------------------------------------------------------- sub test_version { my %versions; my %split_vers; return 1 if exists $ENV{SWISHIGNOREVER}; my @tags = qw/ running_swish_version required_version /; my @versions = qw/ major minor release /; @versions{@tags} = @_; for (@tags) { die "Failed to find version for $_\n" unless $versions{$_}; die "Failed to parse version ($versions{$_}) for $_\n" unless $versions{$_} =~ /(\d+)\.(\d+)\.(\d+)/; @{ $split_vers{$_} }{@versions} = ( $1, $2, $3 ); } for (@versions) { return 1 if $split_vers{running_swish_version}{$_} > $split_vers{required_version}{$_}; return 0 if $split_vers{running_swish_version}{$_} < $split_vers{required_version}{$_}; } return 1; # same version. } #------------------------------------------------------------------------ # Returns a hash of LIBS, INC, VERSION either from environment or command # line if all three are set, otherwise, looks for swish-config for values #------------------------------------------------------------------------ sub get_swish_configuration { if ( $ENV{SWISHINC} && $ENV{SWISHLIBS} && $ENV{SWISHVERSION} ) { die "Must set SWISHBINDIR if not using swish-config\n" unless $ENV{SWISHBINDIR}; return ( INC => $ENV{SWISHINC}, LIBS => $ENV{SWISHLIBS}, VERSION => $ENV{SWISHVERSION}, BINDIR => $ENV{SWISHBINDIR}, ); } # Otherwise, read from swish-config my $swish_config_path = find_swish_config($SWISH_CONFIG); return read_swish_config($swish_config_path); } #---------------------------------------------------------------------------------- # Reads swish-config and returns hash of values #---------------------------------------------------------------------------------- sub find_swish_config { my $prog = shift; my $binary = find_program($prog); if ( $ENV{SWISHBINDIR} ) { die "SWISHBINDIR [$ENV{SWISHBINDIR}] is not a directory\n" unless -d $ENV{SWISHBINDIR}; my $p = find_program( $prog, $ENV{SWISHBINDIR} ); die "Failed to find [$prog] in directory $ENV{SWISHBINDIR}: $!" unless $p; print "Using config program [$p], but also noticed you have $binary available in \$PATH\n" if $binary; $binary = $p; } die "Failed to find [$prog] in PATH\n" unless $binary; print "Using swish-config found at [$binary]\n"; return $binary; } #---------------------------------------------------------------------------------- # Reads swish-config and returns hash of values #---------------------------------------------------------------------------------- sub read_swish_config { my $binary = shift; my %config; $config{VERSION} = backtick("$binary --version"); $config{LIBS} = backtick("$binary --libs"); $config{INC} = backtick("$binary --cflags"); $config{BINDIR} = dirname($binary); return %config; } #---------------------------------------------------------------------------------- # Sub to fetch parameters form command line. # Sets $ENV for SWISH options, otherwise returns them #---------------------------------------------------------------------------------- sub load_command_line { my %valid = map { $_, 1 } @_; my %config; while ( $_ = shift @ARGV ) { if ( $_ eq 'SWISHHELP' ) { $ENV{SWISHHELP} = 'y'; last; } my ( $param, $value ) = split /=/, $_, 2; if ( $param =~ /^SWISH/ ) { die "Invalid option '$param'\n" unless $valid{$param}; $ENV{$param} = $value || ''; } else { $config{$param} = $value || ''; } } return %config; } #---------------------------------------------------------------------------------- # Find a program in either $PATH or path/directory passed in. #---------------------------------------------------------------------------------- sub find_program { my ( $name, $search_path ) = @_; $search_path ||= $ENV{PATH} || ''; for my $dir ( split /$Config{path_sep}/, $search_path ) { my $path = File::Spec->catfile( $dir, $name ); for my $extension ( '', '.exe' ) { my $file = $path . $extension; return $file if -x $file && !-d _; } } return; } #---------------------------------------------------------------------------------- # Run a program with backtics, checking for errors #---------------------------------------------------------------------------------- sub backtick { my ($command) = @_; my $output = `$command`; my $status = $? == 0 ? '' : $? == -1 ? "Failed to execute: $!" : $? & 127 ? sprintf( "Child died with signal %d, %s corefile", ( $? & 127 ), ( $? & 128 ) ? 'with' : 'without' ) : sprintf( "Child exited with value %d", $? >> 8 ); die "Failed to run program [$command]: $status\n" if $status; chomp $output; return $output; } sub create_index { my ($swish) = @_; die "Failed to find swish-e binary [$swish]: $!\n" unless -e $swish; die "Cannot execute swish-e binary [$swish]: $!\n" unless -x $swish; my $index = 't/index.swish-e'; my $conf = 't/test.conf'; unlink $index if -e $index; my @command = ( $swish, '-c', $conf, '-f', $index, '-v', '0' ); print "Creating index...'@command'\n\n"; system(@command); die "Failed to create index file '$index'" unless -r $index; } swish-e-2.4.7/perl/README0000664000077100017500000000644211166010111011644 00000000000000SWISH::API - Perl interface to the Swish-e C search library $Id: README 2049 2008-03-08 15:33:49Z moseley $ DESCRIPTION ----------- SWISH::API is an Object Oriented Perl interface to the swish-e C library. This can be used to embed the swish-e search code into your perl program avoiding the need to run the swish-e binary for searching. The real difference is that search speed is improved since you may attach to a swish-e index once and then run many queries on that open "swish handle". This speed comes at a cost of memory added to your program. Note: This module replaces the SWISHE module available with versions prior to 2.3 of Swish-e. It's recommended to upgrade your Perl code to use the SWISH::API module. INSTALLATION ------------ See the FAQ below if you do not have root access or installed swish in a non-standard directory. 1) Download, build and install swish-e See http://swish-e.org for instructions. Swish is also available as a binary package from some operating system distributions (e.g. Debian). 2) Build the module in the normal way $ perl Makefile.PL $ make $ make test Then install, this may need to be done as the root user $ sudo make install Makefile.PL requires the "swish-config" program which is created when installing swish-e. It must reside in the same directory as the swish-e binary. See below if installing swish-e in a non-standard location. FAQ --- 1) I do not have root access. How do I link to the swish-e library? When building the SWISH::API module the compiler and linker look in locations for header and library files. If swish was installed in a non-standard location you will need to specify that location when building the module. For example, to install *swish* in $HOME/local: $ ./configure --prefix=$HOME/local $ make && make install Now build SWISH::API $ cd perl Makefile.PL has to find the "swish-config" program. It does this normally by searching your PATH environment variable: $ PATH=$HOME/local/bin:$PATH perl Makefile.PL another way is to specify the path with the SWISHBINDIR parameter: $ perl Makefile.PL SWISHBINDIR=$HOME/local/bin (or as an environment variable) $ SWISHBINDIR=$HOME/local/bin perl Makefile.PL Since you don't have root access, you should also specify where to install the SWISH::API perl module by using the PREFIX parameter: $ perl Makefile.PL SWISHBINDIR=$HOME/local/bin PREFIX=$HOME/my_perl_lib Note, that you can also specify LIBS and INC to override the settings that the swish-config program reports. If you have a reason to do this then you probably already know how to override these settings. 2) How do I build a PPM under Windows using MSVC and PERL 5.8? $ cd perl $ perl Makefile.PL \ LIBS="../src/win32/libswish-e-mt.lib ../../zlib/lib/zlib.lib libcmt.lib" \ OPTIMIZE="-MT -Zi -DNDEBUG -O1 -I../src" # Logic says to use CCFLAGS for -I../src but it explodes spectacularly... $ nmake $ nmake ppd $ tar cvzf SWISH-API.tar.gz blib Edit SWISH-API.ppd to your liking and upload it and SWISH-API.tar.gz to your repository in the appropriate locations. PROBLEMS ======== If you have problems or need help please contact the swish-e discussion list. The list is low traffic and is the place to get help with this module or swish-e in general. swish-e-2.4.7/perl/Changes0000664000077100017500000000107311166010111012252 00000000000000Revision history for Perl extension SWISH::API ## $Id: Changes 1994 2007-12-14 14:25:55Z karpet $ ## 0.04 Thu Jul 14 11:59:04 CDT 2005 Added perlize() function to make all method names Perl-ish. Fixed namespace issue with SwishFuzzy (now SwishFuzzify) 0.03 Wed Sep 01, 2004 Added RankScheme to access the new RankScheme feature. IgnoreTotalWordCountWhenRanking set to 0 in test.config. [karman] 0.02 Sun May 02, 2004 Added access to MetaNames and PropertyNames defined in the index. 0.01 Mon Oct 14, 2002 Initial module [moseley] swish-e-2.4.7/perl/MANIFEST0000664000077100017500000000023211166010111012104 00000000000000Changes MANIFEST README Makefile.PL API.pm API.xs typemap Makefile.mingw t/test.t t/dummy.t t/test.conf t/first.html t/second.html t/third.html t/dummy.t swish-e-2.4.7/perl/API.pm0000664000077100017500000005210211166010111011725 00000000000000package SWISH::API; # $Id: API.pm 1806 2006-06-21 19:01:20Z karman $ use vars qw/ @ISA $VERSION /; $VERSION = '0.04'; # prefer XSLoader over DynaLoader eval { require XSLoader; XSLoader::load('SWISH::API',$VERSION); 1; } or do { require DynaLoader; push(@ISA,'DynaLoader'); bootstrap SWISH::API $VERSION; }; # VERSION sub satisfies some versions of MakeMaker sub VERSION { $VERSION } # create perl-ish aliases for all C method names # based on patch contributed by mpeters@plusthree.com sub perlize { my $m = shift; $m =~ s/_//g; $m =~ s/([a-z])([A-Z])/$1_$2/g; $m = lc($m); return $m; } CL: for my $class ( grep { m/::$/ } keys %SWISH::API:: ) { local *c = $SWISH::API::{$class}; METH: foreach my $meth ( keys %c ) { next METH if $meth eq 'DESTROY'; # special name my $new_meth = perlize( $meth ); # now create the typeglob alias local *name = 'SWISH::API::' . $class . $meth; *{'SWISH::API::' . $class . $new_meth} = \&name; } } M: for my $meth ( grep { ! m/::$/ } keys %SWISH::API:: ) { next M if $meth eq 'DESTROY'; my $new_meth = perlize( $meth ); local *name = 'SWISH::API::' . $meth; *{ 'SWISH::API::' . $new_meth } = \&name; } sub dispSymbols { my($hashRef) = shift; for ( sort keys %$hashRef ) { printf("%-15.15s| %s\n", $_, $hasRef->{$_}); } } # for debugging symbol table #dispSymbols( \%SWISH::API:: ); 1; __END__ =head1 NAME SWISH::API - Perl interface to the Swish-e C Library =head1 SYNOPSIS use SWISH::API; my $swish = SWISH::API->new( 'index.swish-e' ); $swish->abort_last_error if $swish->Error; # A short-cut way to search my $results = $swish->query( "foo OR bar" ); # Or more typically my $search = $swish->new_search_object; # then in a loop my $results = $search->execute( $query ); # always check for errors (but aborting is not always necessary) $swish->abort_last_error if $swish->Error; # Display a list of results my $hits = $results->hits; if ( !$hits ) { print "No Results\n"; return; /* for example *. } print "Found ", $results->hits, " hits\n"; # Seek to a given page - should check for errors $results->seek_result( ($page-1) * $page_size ); while ( my $result = $results->next_result ) { printf("Path: %s\n Rank: %lu\n Size: %lu\n Title: %s\n Index: %s\n Modified: %s\n Record #: %lu\n File #: %lu\n\n", $result->property( "swishdocpath" ), $result->property( "swishrank" ), $result->property( "swishdocsize" ), $result->property( "swishtitle" ), $result->property( "swishdbfile" ), $result->result_property_str( "swishlastmodified" ), $result->property( "swishreccount" ), $result->property( "swishfilenum" ) ); } # display properties and metanames for my $index_name ( $swish->index_names ) { my @metas = $swish->meta_list( $index_name ); my @props = $swish->property_list( $index_name ); for my $m ( @metas ) { my $name = $m->name; my $id = $m->id; my $type = $m->type; } # (repeat above for @props) } =head1 DESCRIPTION This module provides a Perl interface to the Swish-e search engine. This module allows embedding the swish-e search code into your application avoiding the need to fork to run the swish-e binary and to keep an index file open when running multiple queries. This results in increased search performance. =head1 DEPENDENCIES You must have installed Swish-e version 2.4 before building this module. Download from: http://swish-e.org =head1 OVERVIEW This module includes a number of classes. Searching consists of connecting to a swish-e index (or indexes), and then running queries against the open index. Connecting to the index creates a swish object blessed into the SWISH::API class. A SWISH::API::Search object is created from the SWISH::API object. The SWISH::API::Search object can have associated parameters (e.g. result sort order). The SWISH::API::Search object is used to query the associated index file or files. A query on a search object returns a results object of the class SWISH::API::Results. Then individual results of the SWISH::API::Result class can be fetched by calling a method of the results object. Finally, a result's properties can be accessed by calling methods on the result object. =head1 METHODS =head2 SWISH::API - Swish Handle Object To begin using Swish you must first create a Swish Handle object. This object makes the connection to one or more index files and is used to create objects used for searching the associated index files. =over 4 =item $swish = SWISH::API-Enew( $index_files ); This method returns a swish handle object blessed into the SWISH::API class. $index_files is a space separated list of index files to open. This always returns an object, even on errors. Caller must check for errors (see below). =item @indexes = $swish-Eindex_names; Returns a list of index names associated with the swish handle. These were the indexes specified as a parameter on the SWISH::API-Enew call. This can be used in calls below that require specifying the index file name. =item @header_names = $swish-Eheader_names; Returns a list of possible header names. These can be used to lookup header values. See C method below. =item @values = $swish-Eheader_value( $index_file, $header_name ); A swish-e index has data associated with it stored in the index header. This method provides access to that data. Returns the header value for the header and index file specified. Most headers are a single item, but some headers (e.g. "Stopwords") return a list. The list of possible header names can be obtained from the Swishheader_names method. =item $swish-Erank_scheme( 0|1 ); Similar to the -R option with the swish-e command line tool. The default ranking scheme is 0. Set it to 1 to experiment with other ranking features. See the SWISH-CONFIG documentation for more on ranking schemes. =back =head3 Error Handling All errors are stored in and accessed via the SWISH::API object (the Swish Handle). That is, even an error that occurs when calling a method on a result (SWISH::API::Result) object will store the error in the parent SWISH:API object. Check for errors after every method call. Some errors are critical errors and will require destruction of the SWISH::API object. Critical errors will typically only happen when attaching to the database and are errors such as an invalid index file name, permissions errors, or passing invalid objects to calls. Typically, if you receive an error when attaching to an index file or files you should assume that the error is critical and let the swish object fall out of scope (and destroyed). Otherwise, if an error is detected you should check if it is a critical error. If the error is not critical you may continue using the objects that have been created (for example, an invalid meta name will generate a non-critical error, so you may continue searching using the same search object). Error state is cleared upon a new query. Again, all error methods need to be called on the parent swish object =over 4 =item $swish-Eerror Returns true if an error occurred on the last operation. On errors the value returned is the internal Swish-e error number (which is less than zero). =item $swish-Ecritical_error Returns true if the last error was a critical error =item $swish-Eabort_last_error Aborts the running program and prints an error message to STDERR. =item $str = $swish-Eerror_string Returns the string description of the current error (based on the value returned by $swish-Eerror). This is a generic error string. =item $msg = $swish-Elast_error_msg Returns a string with specific information about the last error, if any. For example, if a query of: badmeta=foo and "badmeta" is an invalid metaname $swish-Eerror_string might return "Unknown metaname", but $swish-Elast_error_msg might return "badmeta". =back =head3 Generating Search and Result Objects =over 4 =item $search = $swish-Enew_search_object( $query ); This creates a new search object blessed into the SWISH::API::Search class. The optional $query parameter is a query string to store in the search object. See the section on C for methods available on the returned object. The advantage of this method is that a search object can be used for multiple queries: $search = $swish->New_Search_Objet; while ( $query = next_query() ) { $results = $search->execute( $query ); ... } =item $results = $swish-Equery( $query ); This is a short-cut which avoids the step of creating a separate search object. It returns a results object blessed into the SWISH::API::Results class described below. This method basically is the equivalent of $results = $swish->new_search_object->execute( $query ); =back =head2 SWISH::API::Search - Search Objects A search object holds the parameters used to generate a list of results. These methods are used to adjust these parameters and to create the list of results for the current set of search parameters. =over 4 =item $search-Eset_query( $query ); This will set (or replace) the query string associated with a search object. This method is typically not used as the query can be set when executing the actual query or when creating a search object. =item $search-Eset_structure( $structure_bits ); This method may change in the future. A "structure" is a bit-mapped flag used to limit search results to specific parts of an HTML document, such as the title or in H tags. The possible bits are: IN_FILE = 1 This is the default IN_TITLE = 2 In tag IN_HEAD = 4 In <head> tag IN_BODY = 8 In <body> IN_COMMENTS = 16 In html comments IN_HEADER = 32 In <h*> IN_EMPHASIZED = 64 In <em>, <b>, <strong>, <i> IN_META = 128 In a meta tag (e.g. not swishdefault) So if you wish to limit your searches to words in heading tags (e.g. E<lt>H1E<gt>) or in the E<lt>titleE<gt> tag use: $search->set_structure( IN_HEAD | IN_TITLE ); =item $search-E<gt>phrase_delimiter( $char ); Sets the character used as the phrase delimiter in searches. The default is double-quotes ("). =item $search-E<gt>set_search_limit( $property, $low, $high ); Sets a range from $low to $high inclusive that the given $property must be in to be selected as a result. Call multiple times to set more than one limit on different properties. Limits are ANDed, that is, a result must be within the range of all limits specified to be included in a list of results. For example to limit searches to documents modified in the last 48 hours: my $start = time - 48 * 60 * 60; $search->set_search_limit( 'swishlastmodified', $start, time() ); An error will be set if the property has already been specified or if $high E<lt> $low. Other errors may not be reported until running the query, such as the property name is invalid or if $low or $high are not numeric and the property specified is a numeric property. Once a query is run you cannot change the limit settings for the search object without calling the reset_search_limit method first. =item $search-E<gt>reset_search_limit; Clears the limit parameters for the given object. This must be called if the limit parameters need to be changed. =item $search-E<gt>set_sort( $sort_string ); Sets the sort order of search results. The string is a space separated list of valid document properties. Each property may contain a qualifier that sets the direction of the sort. For example, to sort the results by path name in ascending order and by rank in descending order: $search->set_sort( 'swishdocpath asc swishrank desc' ); The "asc" and "desc" qualifiers are optional, and if omitted ascending is assumed. Currently, errors (e.g invalid property name) are not detected on this call, but rather when executing a query. This may change in the future. =back =head2 SWISH::API::Results - Generating and accessing results Searching generates a results object blessed into the SWISH::API::Results class. =over 4 =item $results = $search-E<gt>execute( $query ); Executes a query based on the parameters in the search object. $query is an optional query string to use for the search ($query replaces the set query string in the search object). A typical use would be to create a search object once and then call this method for each query using the same search object changing only the passed in $query. The caller should check for errors after making this all. =back =head2 Results Methods A query creates a results object that contains information about the query (e.g. number of hits) and access to the individual results. =over 4 =item $hits = $results-E<gt>hits; Returns the number of results for the query. If zero and no errors were reported after calling $search-E<gt>execute then the query returned zero results. =item @parsed_words = $results-E<gt>parsed_words( $index_name ); Returns an array of tokenized words and operators with stopwords removed. This is the array of tokens used by swish for the query. $index_name must match one of the index files specified on the creation of the swish object (via the SWISH::API-E<gt>new call). The parsed words are useful for highlighting search terms in associated documents. =item @removed_stopwords = $results-E<gt>removed_stopwords( $index_name) ; Returns an array of stopwords removed from a query, if any, for the index specified. $index_name must match one of the index files specified on the creation of the swish object (via the SWISH::API-E<gt>new call). =item $results-E<gt>seek_result( $position ); Seeks to the position specified in the result list. Zero is the first position and $results-E<gt>hits-1 is the last position. Seeking past the end of results sets a non-critical error condition. Useful for seeking to a specific "page" of results. =item $result = $results-E<gt>next_result; Fetches the next result from the list of results. Returns undef if no more results are available. $result is an object blessed into the SWISH::API::Result class. =back =head2 SWISH::API::Result - Result Methods The follow methods provide access to data related to an individual result. =over 4 =item $prop = $result-E<gt>property( $prop_name ); Fetches the property specified for the current result. An invalid property name will cause an exception (which can be caught by wrapping the call in an eval block). Can return undefined. Date properties are returned as a timestamp. Use something like Date::Format to format the strings (or just call scalar localtime( $prop ) ). =item $prop = $result-E<gt>result_property_str( $prop_name ); Fetches and formats the property. Unlike above, invalid property names return the string "(null)" -- this will likely change to match the above (i.e. throw an exception). Undefined values are returned at the null string (""). =item $value = $result-E<gt>result_index_value( $header_name ); Returns the header value specified. This is similar to $swish-E<gt>header_value(), but the index file is not specified (it is determined by the result). =back =head2 Utility Methods =over 4 =item @metas = $swish-E<gt>meta_list( $index_name ); Swish-e has "MetaNames" which allow searching by fields in the index. This method returns information about the Metanames. Pass in the name of an open index file name and returns a list of SWISH::API::MetaName objects. Three methods are currently defined on these objects: $meta->name; $meta->id; $meta->type; Name returns the name of the meta as defined in the MetaNames config option when the index was created. The id is the internal ID number used to represent the meta name. type is the type of metaname. Currently only one type exists and its value is zero. =item @props = $swish-E<gt>property_list( $index_name ); Swish-e can store content or "properties" in the index and return this data when running a query. A document's path, URL, title, size, date or summary are examples of properites. Each property is accessed via its PropertyName. This method returns information about the PropertNames stored in the index. Pass in the name of an open index file name and returns a list of SWISH::API::MetaName objects. Three methods are currently defined on these objects: $prop->name; $prop->id; $prop->type; name returns the name of the meta as defined in the MetaNames config option when the index was created. The id is the internal ID number used to represent the meta name. type is the type of metaname. Currently only one type exists and its value is zero. =item @propes = $result-E<gt>property_list; =item @meta = $result-E<gt>meta_list; These also return a list of Property or Metaname description objects, but are accessed via a result record. Since the result comes from a specific index file there's no need to specify the index file name. =item $stemmed_word = $swish-E<gt>stem_word( $word ); *Deprecated* Returns the stemmed version of the passed in word. Deprecated because only stems using the original Porter Stemmer and uses a shared memory location in the SW_HANDLE object to store the stemmed word. See below for other stemming options. =item $fuzzy_word = $swish-E<gt>Fuzzify( $indexname, $word ); Like stem_word() used to work, only it uses whatever stemmer is named in $indexname. Returns the same kind of fuzzy_word object as the fuzzy_word() method. =item $mode_string = $result-E<gt>fuzzy_mode; Returns the string (e.g. "Stemming_en", "Soundex", "None" ) indicating the stemming method used while indexing the given document. =item $fuzzy_word = $result-E<gt>fuzzy_word( $word ); Converts $word using the same fuzzy mode used to index the $result. Returns a SWISH::API::fuzzy_word object. Methods on the object are used to access the converted words and other data as shown below. =item $count = $fuzzy_word-E<gt>word_count; Returns the number of output words. Normally this is the value one, but may be more depending on the stemmer used. DoubleMetaphone can return two strings for a single input string. =item $status = $fuzzy_word-E<gt>word_error; Returns any error code that the stemmer might set. Normally, this return value is zero, indicating that the stemming/fuzzy operation succedded. The values returned are defined in the swish-e source file /src/stemmer.h. =item @words = $fuzzy_word-E<gt>word_list; Returns the converted words from the stemming/fuzzy operation. Normally, the array will contain a single element, although may contain more (i.e. if DoubleMetaphone is used and the input word returns two strings). In the event that a word does not stem (e.g. trying to stem a number), this method will return the original input word specified when $result-E<gt>fuzzy_word( $word ) was called. =item @parsed_words = $swish-E<gt>swish_words( $string, $index_file ); * Not implemented * Splits up the input string into tokens of swish words and operators. =back =head1 NOTES Perl's garbage collection makes it easy to write code for searching with Swish-e, but care must be taken not to keep objects around too long which can use up memory. Here's an example of a potential problem. Say you have a very large number of documents indexed and you want to find the first hit for a number of popular keywords (error checking omitted in this bad example): sub first_hit { my $query = shift; my $handle = SWISH::API->new( 'index.swish-e'); my $results = $handle->query( $query ); my $first_hit = $results->next_result; return $first_hit; } my @first_hit_list; for ( @keywords ) push @first_hit_list, $first_hit($_); } The first_hit() subroutine is returning a SWISH::Result object. That makes it easy to access properties: # print file names for my $result ( @first_hit_list ) { print $result->property('swishdocpath'),"\n"; } But as long as a SWISH::API::Result object is around, so is the entire list of results generated by the $handle-E<gt>query() call, and the index file is still open (because a SWISH::API::Result depends on a SWISH::API::Results object, which depends on a SWISH::API object). In this case it would be better to return from first_hit() just the properties you need: ... my $first_hit = $results->next_result; return $first_hit->property('swishdocpath'); } Then when first_hit() sub ends the result list will be freed, and the index file closed, thanks to Perl's reference count tracking. Note: the other problem with the above code is that the same index file is opened for each call to the function. Don't do that, instead open the index file once. =head1 COPYRIGHT This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =head1 AUTHOR Bill Moseley moseley@hank.org. 2002/2003/2004 =head1 SUPPORT Please contact the Swish-e discussion email list for support with this module or with Swish-e. Please do not contact the developers directly. =cut ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/perl/Makefile.mingw�������������������������������������������������������������������0000775�0000771�0001750�00000004640�11166010111�013545� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#!/usr/bin/make -f # # File: Makefile.mingw # Desc: Build SWISH::API for ActivePerl/Win32 on Linux using # WINE, ActivePerl, and MinGW # # Some brief notes on how to setup a build environment: # 1.) Install mingw (Debian: mingw32, mingw32-runtime, mingw32-binutils) # 2.) Install WINE (Debian: wine) # 3.) Install xvfb dummy X server (Debian: xvfb) # 4.) Install ActivePerl on Windows (or WINE if you can) # 5.) Copy over the entire Perl tree to Linux (e.g. C:\Perl) # 6.) Run the Perl/bin/reloc_perl script to correct the runtime Prefix. # 7.) Make sure that your WINE install sets up a binfmt handler # for Windows executables. (Debian wine does) # 8.) Note: you may wish/need to rebuild Debian wine with debugging # disabled. WINE debug output is printed to stdout. (Why? Why?) # Also you need to comment out the "Wine exited with successful status" # line in /usr/bin/wine to prevent corruption of API.c # SWISH-E "install" Location SWISH_PREFIX = ../../prefix SWISH_BIN = $(SWISH_PREFIX)/bin/swish-e.exe # PERL Location (Need WINE and ActivePerl on Linux) PERL_PREFIX = ../../perl PERL_BIN = xvfb-run /usr/bin/wine $(PERL_PREFIX)/bin/perl.exe PERL_LIB = $(PERL_PREFIX)/bin/perl510.dll #PERL_LIB = $(PERL_PREFIX)/bin/perl58.dll # Compiler Stuff CC=i586-mingw32msvc-gcc LIBS=../src/.libs/libswish-e.dll.a $(PERL_LIB) # OS Commands CP = cp -f MKDIR = mkdir -p TOUCH = touch # Perl module directories INST_LIB = blib/lib INST_ARCHLIB = blib/arch API_DLL = $(INST_ARCHLIB)/auto/SWISH/API/API.dll $(API_DLL): API.c $(MKDIR) $(INST_ARCHLIB)/auto/SWISH/API $(MKDIR) $(INST_LIB)/SWISH $(CC) -shared -o $(API_DLL) -I ../../perl/lib/CORE -I ../src API.c $(LIBS) $(TOUCH) $(INST_ARCHLIB)/auto/SWISH/API/API.bs $(CP) API.pm $(INST_LIB)/SWISH API.c: $(PERL_BIN) $(PERL_PREFIX)/lib/ExtUtils/xsubpp -noprototypes -typemap $(PERL_PREFIX)/lib/ExtUtils/typemap -typemap typemap API.xs > API.c TEST_VERBOSE=1 TEST_TYPE=test_$(LINKTYPE) TEST_FILE = test.pl TEST_FILES = t/test.t TEST_CONF = t/test.conf TEST_DB = t/index.swish-e TESTDB_SW = -d test: $(API_DLL) index.swish-e # Ugh. API.dll needs pcre, zlib, libxml2, libswish-e, etc. $(CP) ../../test/*.dll $(INST_ARCHLIB)/auto/SWISH/API $(PERL_BIN) "-MExtUtils::Command::MM" "-e" "test_harness($(TEST_VERBOSE), '$(INST_LIB)', '$(INST_ARCHLIB)')" $(TEST_FILES) index.swish-e: $(SWISH_BIN) -c $(TEST_CONF) -f $(TEST_DB) -v0 clean: rm -fr API.c API.dll t/index.swish-e* blib ������������������������������������������������������������������������������������������������swish-e-2.4.7/perl/t/�������������������������������������������������������������������������������0000777�0000771�0001750�00000000000�11166013167�011321� 5����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/perl/t/dummy.t������������������������������������������������������������������������0000664�0000771�0001750�00000000134�11166010111�012537� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������# Dummy test file for when index is not created print "1..1\n"; print "ok - Dummy test\n"; ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/perl/t/second.html��������������������������������������������������������������������0000664�0000771�0001750�00000000257�11166010111�013366� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������<html> <head> <title>Title of Second Secondbody foo swish-e-2.4.7/perl/t/first.html0000664000077100017500000000025411166010111013237 00000000000000 Title of First Firstbody swish-e-2.4.7/perl/t/test.t0000775000077100017500000002226111166010111012373 00000000000000#!perl -w # $Id: test.t 1806 2006-06-21 19:01:20Z karman $ use strict; require SWISH::API; my $lastcase = 147; print "1..$lastcase\n"; my $test_num = 1; my $mem_test = 0; is_ok( SWISH::API->VERSION, $SWISH::API::VERSION ); ###################################################################### { my $swish = SWISH::API->new( 't/index.swish-e' ); check_error('Call SWISH::API::new', $swish); my @header_names = $swish->HeaderNames; is_ok( "header names " . join(':',@header_names), @header_names); my @index_names = $swish->IndexNames; $swish->RankScheme( 1 ); # default is 0 -- just testing the method for my $index ( @index_names ) { is_ok( "index name '$index'", $index); for my $header ( @header_names ) { my @value = $swish->HeaderValue( $index, $header ); my $value = @value ? join( ':', @value) : '*undefined*'; is_ok( "Header '$header' = '$value'", defined $value ); } my @metas = $swish->MetaList( $index ); for my $meta ( @metas ) { my $name = $meta->Name; my $type = $meta->Type; my $id = $meta->ID; is_ok("Meta: $name type=$type id=$id", $name ); } my @props = $swish->PropertyList( $index ); for my $meta ( @props ) { my $name = $meta->Name; my $type = $meta->Type; my $id = $meta->ID; is_ok("Prop: $name type=$type id=$id", $name ); } } # A short-cut way to search { my $results = $swish->Query( "foo OR bar" ); check_error('Call $swish->Query', $swish); my $hits = $results->Hits; is_ok( "returned $hits hits", $hits ); my $result = $results->NextResult; if ( !$result ) { is_ok("failed to read a resut -- can't test stemmers", 0); } else { for my $word (qw/ running runs sugar 1234/, '') { stem_it($result,$word); swish_stem($swish,$index_names[0],$word); } # fetch the related metanames and properties my @metas = $result->MetaList; for my $meta ( @metas ) { my $name = $meta->Name; my $type = $meta->Type; my $id = $meta->ID; is_ok("Meta: $name type=$type id=$id", $name ); } my @props = $result->PropertyList; for my $meta ( @props ) { my $name = $meta->Name; my $type = $meta->Type; my $id = $meta->ID; is_ok("Prop: $name type=$type id=$id", $name ); } } } # A short-cut way to search with a metaname { my $results = $swish->Query( "meta_name=f*" ); check_error('metaname Call $swish->Query', $swish); my $hits = $results->Hits; is_ok( "returned $hits hits", $hits ); } # Or more typically my $search = $swish->New_Search_Object; check_error('Call $swish->New_Search_Object', $swish); $search->SetSort("swishfilenum"); # then in a loop my $query = "not dkdkd stopword otherstop"; my $results = $search->Execute( $query ); check_error('Call $swish->Execute', $swish); # Check parsed words my @parsed_words = $results->ParsedWords( 't/index.swish-e' ); is_ok("ParsedWords [" . join(', ', @parsed_words) . "]", scalar @parsed_words ); my @removed_stopwords = $results->RemovedStopwords( 't/index.swish-e' ); is_ok("RemovedStopwords [" . join( ', ', @removed_stopwords). "]", scalar @removed_stopwords ); # Display a list of results my $hits = $results->Hits; is_ok( "returned $hits results", $hits ); # Seek to a given page - should check for errors #$results->SeekResult( ($page-1) * $page_size ); my @props = qw/ swishreccount swishfilenum age blankdate swishdocpath swishrank swishdocsize swishtitle swishdbfile swishlastmodified /; # access results my $seen; my @results; while ( my $result = $results->NextResult ) { push @results, $result; check_error('Call $swish->NextResult', $swish) unless $seen; my %props; $props{$_} = $result->Property( $_ ) for @props; check_error('Call $result->Property', $swish) unless $seen; my $string = $result->Property('swishdocpath') ."\n" . join "\n", map { " $_ => " . (defined $props{$_} ? $props{$_} : '*not defined*') } @props; is_ok( "$string\n", $string ); for ( @props ) { my $propstr = $result->ResultPropertyStr( $_ ); # I don't like this method ' is_ok(" ResultPropertyStr($_) = " . $propstr || '??', defined $propstr ); } unless ( $seen++ ) { my $header = $result->ResultIndexValue( 'WordCharacters' ); is_ok("header '$header'", $header ); } last if $seen >= 20; } # Check for catching invalid property name is_ok("Seek to start of results", $results->SeekResult(0) == 0 ); eval { $results->NextResult->Property('badpropname') }; is_ok( "Croak on bad property: " . ($@ || "nope!"), $@ ); my $strnull = $results->NextResult->ResultPropertyStr('blankdate'); # check on blank props using the Str method is_ok( "Returns empty string for ResultPropertyStr: [$strnull]", $strnull eq '' ); $strnull = $results->NextResult->ResultPropertyStr('badpropname'); # check on blank props using the Str method is_ok( "Returns '(null)' string for ResultPropertyStr: [$strnull]", $strnull eq '(null)' ); $results = $search->Execute('firstbody or secondbody'); is_ok("firstbody or secondbody", $results->Hits == 2 ); $results = $search->Execute('foo'); is_ok("foo", $results->Hits == 2 ); my $IN_HEAD = 32; $search->SetStructure( $IN_HEAD ); $results = $search->Execute('foo'); $hits = $results->Hits; is_ok("foo in tags $hits hits", $hits == 1 ); $search->SetStructure( 1 ); $results = $search->Execute('foo'); $hits = $results->Hits; is_ok("foo again $hits hits", $hits == 2 ); $search->SetSearchLimit("age", 30, 40 ); check_error('SetSearchLimit', $swish); $results = $search->Execute('not dkdkd'); check_error('1st Execute', $swish); $hits = $results->Hits; is_ok("Limit Search range $hits hits", $hits == 2 ); $search->ResetSearchLimit; $search->SetSearchLimit("age", 40, 40 ); check_error('2nd SetSearchLimit', $swish); $results = $search->Execute('not dkdkd'); check_error('2nd Execute', $swish); $hits = $results->Hits; is_ok("2nd Limit Search range $hits hits", $hits == 1 ); if ( $mem_test ) { require Time::HiRes; my $t0 = [Time::HiRes::gettimeofday()]; my $count = 0; my $flags = 'v'; my $ttl; while ( 1 ) { my $results = $search->Execute("not dkdk"); while ( my $result = $results->NextResult ) { my $path = $result->Property('swishdocpath'); $ttl ++; } unless ( $count % 1000 ) { $hits = $results->Hits; my $elapsed = Time::HiRes::tv_interval ( $t0, [Time::HiRes::gettimeofday()]); my $ps = $count % 10000 ? '': `/bin/ps $flags -p $$`; printf("$count - Results: $hits - Total Results: $ttl %d req/s\n$ps", $count/$elapsed ); $flags = 'hv'; } $count++; } } my @words = $swish->WordsByLetter( 't/index.swish-e' , 'f' ); check_error('WordsByLetter', $swish); is_ok( "WordsByLetter 'f' [@words]", @words ); for ( qw/running runs library libraries/ ) { my $fw = $swish->fuzzify( 't/index.swish-e', $_ ); my $stemmed = ( $fw->WordList )[0]; if ( $fw->WordError ) { warn $fw->WordError, $/; } is_ok( "Stemmed: '$_' => '" . ($stemmed||'*failed to stem*') ."'", $stemmed ); } # cough, hack, cough.... print "ok $_ (noop)\n" for $test_num..$lastcase } sub check_error { my ( $str, $swish ) = @_; my $num = $test_num++; if ( !$swish->Error ) { print "ok $num $str\n"; return; } my $msg = $swish->ErrorString . ' (' . $swish->LastErrorMsg . ')'; print "not ok $num $str - $msg\n"; die "Found critical error" if $swish->CriticalError; } sub is_ok { my ( $str, $is_ok ) = @_; my $num = $test_num++; print $is_ok ? "ok $num $str\n" : "not ok $num $str\n"; } sub stem_it { my ($result, $word) = @_; my $fw; is_ok("Testing FuzzyWord [$word]", ($fw = $result->FuzzyWord($word)) ); return unless $fw; my $wc = $fw->WordCount; is_ok(" Word count $wc", $wc ); my $error = $fw->WordError; is_ok(" Fuzzy status $error", 1); my @words = $fw->WordList; is_ok(" [$word] -> [@words]", scalar @words ); } sub swish_stem { my ($swish,$index,$word) = @_; my $fw; is_ok("Testing Fuzzy [$word]", ($fw = $swish->Fuzzify($index,$word)) ); return unless $fw; my @fuzzed = $fw->WordList; is_ok(" [$word] -> [@fuzzed]", scalar @fuzzed ); } swish-e-2.4.7/perl/t/test.conf0000664000077100017500000000050611166010111013050 00000000000000# Test swish-e config file for the SWISH::API perl module IndexDir t/first.html t/second.html t/third.html MetaNames meta_name PropertyNamesNumeric age PropertyNamesDate blankdate IgnoreWords stopword otherstop BuzzWords C++ mod_perl FuzzyIndexingMode Stemming_en1 # to test the RankScheme IgnoreTotalWordCountWhenRanking 0 swish-e-2.4.7/perl/t/third.html0000664000077100017500000000026611166010111013225 00000000000000 Title of Third Thirdbody

foo

swish-e-2.4.7/perl/API.xs0000664000077100017500000004766111166010111011761 00000000000000#include "EXTERN.h" #include "perl.h" #include "XSUB.h" /* $Id: API.xs 2291 2009-03-31 01:56:00Z karpet $ */ #include #ifndef newSVuv # define newSVuv(i) newSViv(i) #endif #ifndef call_pv # define call_pv(i,j) perl_call_pv(i,j) #endif /* * Create a typedef for managing the metanames objects. This allows storing * both the SV and the pointer to the parent object so its refcount can be * adjusted on DESTROY. The other way is to provide a way to get to the * parent's SV in the swish-e library. That means modifying the swish-e * library. This is already done for most other objects -- the SV of the perl * swish handle is stored in the C SW_HANDLE struct -- see SwishSetRefPtr and * Swish*Parent functions. It's actually much easier to provide a way to get * the SV via the C library since don't need to malloc and free an extra * structre. But, done for the meta descriptions as an exercise. */ typedef struct { SV *handle_sv; /* Parent SV for DESTROY */ SW_META meta; /* meta description C pointer */ } META_OBJ; MODULE = SWISH::API PACKAGE = SWISH::API PREFIX = Swish # Make sure that we have at least xsubpp version 1.922. REQUIRE: 1.922 # This returns SW_HANDLE void new(CLASS, index_file_list ) char *CLASS char *index_file_list PREINIT: SW_HANDLE handle; PPCODE: SwishErrorsToStderr(); handle = SwishInit( index_file_list ); ST(0) = sv_newmortal(); sv_setref_pv( ST(0), CLASS, (void *)handle ); SwishSetRefPtr( handle, (void *)SvRV(ST(0)) ); XSRETURN(1); void DESTROY(self) SW_HANDLE self; CODE: SwishClose( self ); void SwishIndexNames(self) SW_HANDLE self PREINIT: const char **index_name; PPCODE: index_name = SwishIndexNames( self ); while ( *index_name ) { XPUSHs(sv_2mortal(newSVpv( (char *)*index_name ,0 ))); index_name++; } ############################################ # set RankScheme # karman - Wed Sep 1 09:22:50 CDT 2004 void SwishRankScheme(self, scheme) SW_HANDLE self int scheme ############################################# # set ReturnRawRank # karman - 30 Mar 2009 void SwishReturnRawRank(self, flag) SW_HANDLE self int flag ############################################# # added SwishFuzzy to give access directly from SW object # karman - Wed Oct 27 11:16:45 CDT 2004 # This returns a fuzzy word object based on the result # Thu Jul 14 11:33:27 CDT 2005 # fixed namespace issue: now SwishFuzzify (called like $swish->Fuzzify) SW_FUZZYWORD SwishFuzzify(swobj, index_name, word) SW_HANDLE swobj char * index_name char * word PREINIT: char * CLASS = "SWISH::API::FuzzyWord"; CODE: RETVAL = SwishFuzzify(swobj, index_name, word); OUTPUT: RETVAL void SwishHeaderNames(self) SW_HANDLE self PREINIT: const char **name; PPCODE: name = SwishHeaderNames( self ); while ( *name ) { XPUSHs(sv_2mortal(newSVpv( (char *)*name ,0 ))); name++; } void SwishHeaderValue(swish_handle, index_file, header_name) SW_HANDLE swish_handle char * index_file char * header_name PREINIT: SWISH_HEADER_TYPE header_type; SWISH_HEADER_VALUE head_value; int i; PPCODE: head_value = SwishHeaderValue( swish_handle, index_file, header_name, &header_type ); PUSHMARK(SP); XPUSHs((SV *)swish_handle); XPUSHs((SV *)&head_value); XPUSHs((SV *)&header_type); PUTBACK; i = call_pv( "SWISH::API::decode_header_value", G_ARRAY ); SPAGAIN; void decode_header_value( swish_handle, header_value, header_type ) SV *swish_handle SV *header_value SV *header_type PREINIT: const char **string_list; SWISH_HEADER_VALUE *head_value; PPCODE: head_value = (SWISH_HEADER_VALUE *)header_value; switch ( *(SWISH_HEADER_TYPE *)header_type ) { case SWISH_STRING: if ( head_value->string && head_value->string[0] ) XPUSHs(sv_2mortal(newSVpv( (char *)head_value->string,0 ))); else ST(0) = &PL_sv_undef; break; case SWISH_NUMBER: XPUSHs(sv_2mortal(newSVuv( head_value->number ))); break; case SWISH_BOOL: // how about pushing &PL_sv_yes and &PL_sv_no or using boolSV()? XPUSHs(sv_2mortal(newSViv( head_value->boolean ? 1 : 0 ))); break; case SWISH_LIST: string_list = head_value->string_list; if ( !string_list ) /* Don't think this can happen */ XSRETURN_EMPTY; while ( *string_list ) { XPUSHs(sv_2mortal(newSVpv( (char *)*string_list ,0 ))); string_list++; } break; case SWISH_HEADER_ERROR: SwishAbortLastError( (SW_HANDLE)swish_handle ); break; default: croak(" Unknown header type '%d'\n", header_type ); } # Error Management void SwishAbortLastError(self) SW_HANDLE self int SwishError(self) SW_HANDLE self char * SwishErrorString(self) SW_HANDLE self char * SwishLastErrorMsg(self) SW_HANDLE self int SwishCriticalError(self) SW_HANDLE self # Return a search object (uses a typemap to bless the return object) SW_SEARCH New_Search_Object(swish_handle, query = NULL) SW_HANDLE swish_handle char *query PREINIT: char * CLASS = "SWISH::API::Search"; CODE: RETVAL = New_Search_Object( swish_handle, query ); if ( RETVAL ) SvREFCNT_inc( (SV *)SwishSearch_parent( RETVAL ) ); OUTPUT: RETVAL # Returns a SW_RESULTS object void SwishQuery( swish_handle, query = NULL ) SW_HANDLE swish_handle char *query PREINIT: char * CLASS = "SWISH::API::Results"; SW_RESULTS results; PPCODE: results = SwishQuery( swish_handle, query ); if ( results ) { SvREFCNT_inc( (SV *)SwishResults_parent( results ) ); ST(0) = sv_newmortal(); sv_setref_pv( ST(0), CLASS, (void *)results ); ResultsSetRefPtr( results, (void *)SvRV(ST(0)) ); XSRETURN(1); } # Methods to return info about MetaNames and Properties # The C API provided by Jamie Herre in March 2004 # Returns an array of SWISH::API::MetaName objects void SwishMetaList( swish_handle, index_name ) SW_HANDLE swish_handle char *index_name PREINIT: SWISH_META_LIST meta_list; PPCODE: /* Grab the list of pointers */ meta_list = SwishMetaList( swish_handle, index_name ); PUSHMARK(SP) ; /* always need to PUSHMARK, even w/o params */ XPUSHs( (SV *)swish_handle ); XPUSHs( (SV *)meta_list ); XPUSHs( (SV *)"SWISH::API::MetaName"); PUTBACK ; /* lets perl know how many parameters are here */ call_pv("SWISH::API::push_meta_list", G_ARRAY ); SPAGAIN; # Returns an array of SWISH::API::MetaName objects void SwishPropertyList( swish_handle, index_name ) SW_HANDLE swish_handle char *index_name PREINIT: SWISH_META_LIST meta_list; PPCODE: /* Grab the list of pointers */ meta_list = SwishPropertyList( swish_handle, index_name ); PUSHMARK(SP) ; XPUSHs( (SV *)swish_handle ); XPUSHs( (SV *)meta_list ); XPUSHs( (SV *)"SWISH::API::PropertyName"); PUTBACK ; call_pv("SWISH::API::push_meta_list", G_ARRAY ); SPAGAIN; void push_meta_list( s_handle, m_list, m_class ) SV *s_handle SV *m_list SV *m_class PREINIT: SW_HANDLE swish_handle; SWISH_META_LIST meta_list; char *class; PPCODE: class = (char *)m_class; swish_handle = (SW_HANDLE)s_handle; meta_list = (SWISH_META_LIST)m_list; /* Check for an error -- typically this would be an invalid index name */ /* Fix: calling with an invalid swish_handle will call progerr */ if ( SwishError( swish_handle ) ) croak("%s %s", SwishErrorString( swish_handle ), SwishLastErrorMsg( swish_handle ) ); /* Make sure a list is returned and it's not empty */ if ( !meta_list || !*meta_list ) XSRETURN_EMPTY; while ( *meta_list ) { SV *o; /* Create a new structure for storing the meta description and the parent SV */ META_OBJ *object = (META_OBJ *)safemalloc(sizeof(META_OBJ)); /* Store the meta entry */ object->meta = *meta_list; /* Store the and bump the swish_handle SV */ object->handle_sv = (SV *)SwishGetRefPtr( swish_handle ); SvREFCNT_inc( object->handle_sv ); /* And create the Perl object and assign the object to it */ o = sv_newmortal(); sv_setref_pv( o, class, (void *)object ); /* and push onto list */ XPUSHs( o ); meta_list++; } # Misc utility routines void SwishWordsByLetter(handle, filename, c) SW_HANDLE handle char *filename char c PREINIT: char *Words,*tmp; int c2; PPCODE: if(c=='*') { for(c2=1;c2<256;c2++) { Words=(char *)SwishWordsByLetter(handle,filename,(unsigned char)c2); for(tmp=Words;tmp && tmp[0];tmp+=strlen(tmp)+1) { XPUSHs(sv_2mortal(newSVpv(tmp,0))); } } } else { Words=(char *)SwishWordsByLetter(handle,filename,c); for(tmp=Words;tmp && tmp[0];tmp+=strlen(tmp)+1) { XPUSHs(sv_2mortal(newSVpv(tmp,0))); } } char * SwishStemWord(handle, word) SW_HANDLE handle char *word # ************************************************************** # # SWISH::API::Search # # *************************************************************** MODULE = SWISH::API PACKAGE = SWISH::API::Search PREFIX = Swish void DESTROY(search) SW_SEARCH search CODE: if ( search ) { SV *parent = (SV *)SwishSearch_parent( search ); Free_Search_Object( search ); SvREFCNT_dec( parent ); } void SwishSetQuery(search,query) SW_SEARCH search char * query void SwishSetStructure(search, structure) SW_SEARCH search int structure void SwishPhraseDelimiter(search, delimiter) SW_SEARCH search char * delimiter CODE: SwishPhraseDelimiter(search, delimiter[0] ); void SwishSetSearchLimit(search, property, low, high) SW_SEARCH search char * property char * low char * high void SwishResetSearchLimit(search) SW_SEARCH search void SwishSetSort(search, sort_string) SW_SEARCH search char *sort_string # Returns a SW_RESULTS object void SwishExecute( search, query = NULL ) SW_SEARCH search char *query PREINIT: char * CLASS = "SWISH::API::Results"; SW_RESULTS results; PPCODE: results = SwishExecute( search, query ); { SvREFCNT_inc( (SV *)SwishResults_parent( results ) ); ST(0) = sv_newmortal(); sv_setref_pv( ST(0), CLASS, (void *)results ); ResultsSetRefPtr( results, (void *)SvRV(ST(0)) ); XSRETURN(1); } # ************************************************************** # # SWISH::API::Results # # *************************************************************** MODULE = SWISH::API PACKAGE = SWISH::API::Results PREFIX = Swish void DESTROY(results) SW_RESULTS results CODE: if ( results ) { SV *parent = (SV *)SwishResults_parent( results ); Free_Results_Object( results ); SvREFCNT_dec( parent ); } int SwishHits(self) SW_RESULTS self int SwishSeekResult(self, position) SW_RESULTS self int position SW_RESULT SwishNextResult(results) SW_RESULTS results PREINIT: char * CLASS = "SWISH::API::Result"; CODE: RETVAL = SwishNextResult(results); if ( RETVAL ) SvREFCNT_inc( (SV *)SwishResult_parent( RETVAL )); OUTPUT: RETVAL void SwishRemovedStopwords(results, index_name) SW_RESULTS results char * index_name PREINIT: SW_HANDLE swish_handle; SWISH_HEADER_TYPE header_type; SWISH_HEADER_VALUE head_value; int i; PPCODE: swish_handle = SW_ResultsToSW_HANDLE( results ); header_type = SWISH_LIST; head_value = SwishRemovedStopwords( results, index_name ); PUSHMARK(SP); XPUSHs((SV *)swish_handle); XPUSHs((SV *)&head_value); XPUSHs((SV *)&header_type); PUTBACK; i = call_pv( "SWISH::API::decode_header_value", G_ARRAY ); SPAGAIN; # PUTBACK; void SwishParsedWords(results, index_name) SW_RESULTS results char * index_name PREINIT: SW_HANDLE swish_handle; SWISH_HEADER_TYPE header_type; SWISH_HEADER_VALUE head_value; int i; PPCODE: swish_handle = SW_ResultsToSW_HANDLE( results ); header_type = SWISH_LIST; head_value = SwishParsedWords( results, index_name ); PUSHMARK(SP); XPUSHs((SV *)swish_handle); XPUSHs((SV *)&head_value); XPUSHs((SV *)&header_type); PUTBACK; i = call_pv( "SWISH::API::decode_header_value", G_ARRAY ); SPAGAIN; # PUTBACK; # ************************************************************** # # SWISH::API::Result (single result) # # # *************************************************************** MODULE = SWISH::API PACKAGE = SWISH::API::Result PREFIX = Swish void DESTROY(result) SW_RESULT result CODE: if ( result ) { SV *parent = (SV *)SwishResult_parent(result); SvREFCNT_dec( parent ); } void SwishProperty(result, property) SW_RESULT result char *property PREINIT: PropValue *pv; PPCODE: # This will abort swish-e if result is NULL pv = getResultPropValue( result, property, 0 ); if ( !pv ) { # this is always the case SW_HANDLE h = SW_ResultToSW_HANDLE( result ); if ( SwishError( h ) ) croak("%s %s", SwishErrorString( h ), SwishLastErrorMsg( h ) ); XSRETURN_UNDEF; } switch (pv->datatype) { case PROP_INTEGER: PUSHs(sv_2mortal(newSViv(pv->value.v_int))); break; case PROP_ULONG: PUSHs(sv_2mortal(newSViv(pv->value.v_ulong))); break; case PROP_STRING: PUSHs(sv_2mortal(newSVpv(pv->value.v_str,0))); break; case PROP_DATE: PUSHs(sv_2mortal(newSViv(pv->value.v_date))); break; case PROP_UNDEFINED: freeResultPropValue(pv); XSRETURN_UNDEF; break; default: croak("Unknown property data type '%d' for property '%s'\n", pv->datatype, property); } freeResultPropValue(pv); char * SwishResultPropertyStr( result, pname) SW_RESULT result char * pname void SwishResultIndexValue(self, header_name) SW_RESULT self char * header_name PREINIT: SW_HANDLE swish_handle; SWISH_HEADER_TYPE header_type; SWISH_HEADER_VALUE head_value; int i; PPCODE: swish_handle = SW_ResultToSW_HANDLE( self ); head_value = SwishResultIndexValue( self, header_name, &header_type ); PUSHMARK(SP); XPUSHs((SV *)swish_handle); XPUSHs((SV *)&head_value); XPUSHs((SV *)&header_type); PUTBACK; i = call_pv( "SWISH::API::decode_header_value", G_ARRAY ); SPAGAIN; # PUTBACK; # This returns a fuzzy word object based on the result SW_FUZZYWORD SwishFuzzyWord(result, word) SW_RESULT result char *word PREINIT: char * CLASS = "SWISH::API::FuzzyWord"; CODE: RETVAL = SwishFuzzyWord(result,word); OUTPUT: RETVAL # This returns the name of the stemmer used for this index const char* SwishFuzzyMode(result) SW_RESULT result MODULE = SWISH::API PACKAGE = SWISH::API::Result PREFIX = SwishResult void SwishResultMetaList(result) SW_RESULT result PREINIT: SWISH_META_LIST meta_list; SW_HANDLE swish_handle; PPCODE: meta_list = SwishResultMetaList( result ); swish_handle = SW_ResultToSW_HANDLE( result ); PUSHMARK(SP) ; XPUSHs( (SV *)swish_handle ); XPUSHs( (SV *)meta_list ); XPUSHs( (SV *)"SWISH::API::MetaName"); PUTBACK ; call_pv("SWISH::API::push_meta_list", G_ARRAY ); SPAGAIN; void SwishResultPropertyList(result) SW_RESULT result PREINIT: SWISH_META_LIST meta_list; SW_HANDLE swish_handle; PPCODE: meta_list = SwishResultPropertyList( result ); swish_handle = SW_ResultToSW_HANDLE( result ); PUSHMARK(SP) ; XPUSHs( (SV *)swish_handle ); XPUSHs( (SV *)meta_list ); XPUSHs( (SV *)"SWISH::API::PropertyName"); PUTBACK ; call_pv("SWISH::API::push_meta_list", G_ARRAY ); SPAGAIN; # ************************************************************** # # SWISH::API::FuzzyWord # # Methods for accessing a SW_FUZZYWORD structure # # *************************************************************** MODULE = SWISH::API PACKAGE = SWISH::API::FuzzyWord PREFIX = SwishFuzzy # method to automatically free memory when object goes out of scope void DESTROY(fw) SW_FUZZYWORD fw CODE: if ( fw ) SwishFuzzyWordFree( fw ); # returns number of words in the fuzzy structure int SwishFuzzyWordCount( fw ) SW_FUZZYWORD fw # returns return value from stemmer int SwishFuzzyWordError( fw ) SW_FUZZYWORD fw # returns an array of stemmed (or not) words. # The "or not" is because the word might not have been stemmed. void SwishFuzzyWordList( fw ) SW_FUZZYWORD fw PREINIT: const char **list; PPCODE: list = SwishFuzzyWordList( fw ); while ( *list ) { XPUSHs(sv_2mortal( newSVpv( (char *)*list, 0 ) )); list++; } # ******************************************************************** # # SWISH::API::MetaName # # Methods for accessing data about metanames # # ******************************************************************** MODULE = SWISH::API PACKAGE = SWISH::API::MetaName PREFIX = SwishMeta void DESTROY ( self ) META_OBJ *self CODE: SvREFCNT_dec( self->handle_sv ); safefree( self ); const char * SwishMetaName( meta ) SW_META meta int SwishMetaType( meta ) SW_META meta int SwishMetaID( meta ) SW_META meta # ******************************************************************** # # SWISH::API::PropertyName # # Methods for accessing data about metanames # Should set a base class for both, but they are small classes # and may want different behavior in the future. # # ******************************************************************** MODULE = SWISH::API PACKAGE = SWISH::API::PropertyName PREFIX = SwishMeta void DESTROY ( self ) META_OBJ *self CODE: SvREFCNT_dec( self->handle_sv ); safefree( self ); const char * SwishMetaName( meta ) SW_META meta int SwishMetaType( meta ) SW_META meta int SwishMetaID( meta ) SW_META meta swish-e-2.4.7/man/0000777000077100017500000000000011166013171010662 500000000000000swish-e-2.4.7/man/SWISH-FAQ.10000664000077100017500000020115011166010472012164 00000000000000.\" Automatically generated by Pod::Man v1.37, Pod::Parser v1.14 .\" .\" Standard preamble: .\" ======================================================================== .de Sh \" Subsection heading .br .if t .Sp .ne 5 .PP \fB\\$1\fR .PP .. .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. | will give a .\" real vertical bar. \*(C+ will give a nicer C++. Capital omega is used to .\" do unbreakable dashes and therefore won't be available. \*(C` and \*(C' .\" expand to `' in nroff, nothing in troff, for use with C<>. .tr \(*W-|\(bv\*(Tr .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.Sh), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .\" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .hy 0 .if n .na .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "SWISH-FAQ 1" .TH SWISH-FAQ 1 "2009-04-04" "2.4.7" "SWISH-E Documentation" .SH "NAME" The Swish\-e FAQ \- Answers to Common Questions .SH "OVERVIEW" .IX Header "OVERVIEW" List of commonly asked and answered questions. Please review this document before asking questions on the Swish-e discussion list. .Sh "General Questions" .IX Subsection "General Questions" \fIWhat is Swish\-e?\fR .IX Subsection "What is Swish-e?" .PP Swish-e is \fBS\fRimple \fBW\fReb \fBI\fRndexing \fBS\fRystem for \fBH\fRumans \- \&\fBE\fRnhanced. With it, you can quickly and easily index directories of files or remote web sites and search the generated indexes for words and phrases. .PP \fISo, is Swish-e a search engine?\fR .IX Subsection "So, is Swish-e a search engine?" .PP Well, yes. Probably the most common use of Swish-e is to provide a search engine for web sites. The Swish-e distribution includes \s-1CGI\s0 scripts that can be used with it to add a \fIsearch engine\fR for your web site. The \s-1CGI\s0 scripts can be found in the \fIexample\fR directory of the distribution package. See the \fI\s-1README\s0\fR file for information about the scripts. .PP But Swish-e can also be used to index all sorts of data, such as email messages, data stored in a relational database management system, \&\s-1XML\s0 documents, or documents such as Word and \s-1PDF\s0 documents \*(-- or any combination of those sources at the same time. Searches can be limited to fields or \fIMetaNames\fR within a document, or limited to areas within an \s-1HTML\s0 document (e.g. body, title). Programs other than \s-1CGI\s0 applications can use Swish\-e, as well. .PP \fIShould I upgrade if I'm already running a previous version of Swish\-e?\fR .IX Subsection "Should I upgrade if I'm already running a previous version of Swish-e?" .PP A large number of bug fixes, feature additions, and logic corrections were made in version 2.2. In addition, indexing speed has been drastically improved (reports of indexing times changing from four hours to 5 minutes), and major parts of the indexing and search parsers have been rewritten. There's better debugging options, enhanced output formats, more document meta data (e.g. last modified date, document summary), options for indexing from external data sources, and faster spidering just to name a few changes. (See the \s-1CHANGES\s0 file for more information. .PP Since so much effort has gone into version 2.2, support for previous versions will probably be limited. .PP \fIAre there binary distributions available for Swish-e on platform foo?\fR .IX Subsection "Are there binary distributions available for Swish-e on platform foo?" .PP Foo? Well, yes there are some binary distributions available. Please see the Swish-e web site for a list at http://swish\-e.org/. .PP In general, it is recommended that you build Swish-e from source, if possible. .PP \fIDo I need to reindex my site each time I upgrade to a new Swish-e version?\fR .IX Subsection "Do I need to reindex my site each time I upgrade to a new Swish-e version?" .PP At times it might not strictly be necessary, but since you don't really know if anything in the index has changed, it is a good rule to reindex. .PP \fIWhat's the advantage of using the libxml2 library for parsing \s-1HTML\s0?\fR .IX Subsection "What's the advantage of using the libxml2 library for parsing HTML?" .PP Swish-e may be linked with libxml2, a library for working with \s-1HTML\s0 and \s-1XML\s0 documents. Swish-e can use libxml2 for parsing \s-1HTML\s0 and \s-1XML\s0 documents. .PP The libxml2 parser is a better parser than Swish\-e's built-in \s-1HTML\s0 parser. It offers more features, and it does a much better job at extracting out the text from a web page. In addition, you can use the \&\f(CW\*(C`ParserWarningLevel\*(C'\fR configuration setting to find structural errors in your documents that could (and would with Swish\-e's \s-1HTML\s0 parser) cause documents to be indexed incorrectly. .PP Libxml2 is not required, but is strongly recommended for parsing \s-1HTML\s0 documents. It's also recommended for parsing \s-1XML\s0, as it offers many more features than the internal Expat xml.c parser. .PP The internal \s-1HTML\s0 parser will have limited support, and does have a number of bugs. For example, \s-1HTML\s0 entities may not always be correctly converted and properties do not have entities converted. The internal parser tends to get confused when invalid \s-1HTML\s0 is parsed where the libxml2 parser doesn't get confused as often. The structure is better detected with the libxml2 parser. .PP If you are using the Perl module (the C interface to the Swish-e library) you may wish to build two versions of Swish\-e, one with the libxml2 library linked in the binary, and one without, and build the Perl module against the library without the libxml2 code. This is to save space in the library. Hopefully, the library will someday soon be split into indexing and searching code (volunteers welcome). .PP \fIDoes Swish-e include a \s-1CGI\s0 interface?\fR .IX Subsection "Does Swish-e include a CGI interface?" .PP Yes. Kind of. .PP There's two example \s-1CGI\s0 scripts included, swish.cgi and search.cgi. Both are installed at \fI$prefix/lib/swish\-e\fR. .PP Both require a bit of work to setup and use. Swish.cgi is probably what most people will want to use as it contains more features. Search.cgi is for those that want to start with a small script and customize it to fit their needs. .PP An example of using swish.cgi is given in the \s-1INSTALL\s0 man page, and it the swish.cgi documentation. Like often is the case, it will be easier to use if you first read the documentation. .PP Please use caution about \s-1CGI\s0 scripts found on the Internet for use with Swish\-e. Some are not secure. .PP The included example \s-1CGI\s0 scripts were designed with security in mind. Regardless, you are encouraged to have your local Perl expert review it (and all other \s-1CGI\s0 scripts you use) before placing it into production. This is just a good policy to follow. .PP \fIHow secure is Swish\-e?\fR .IX Subsection "How secure is Swish-e?" .PP We know of no security issues with using Swish\-e. Careful attention has been made with regard to common security problems such as buffer overruns when programming Swish\-e. .PP The most likely security issue with Swish-e is when it is run via a poorly written \s-1CGI\s0 interface. This is not limited to \s-1CGI\s0 scripts written in Perl, as it's just as easy to write an insecure \s-1CGI\s0 script in C, Java, \s-1PHP\s0, or Python. A good source of information is included with the Perl distribution. Type \f(CW\*(C`perldoc perlsec\*(C'\fR at your local prompt for more information. Another must-read document is located at \&\f(CW\*(C`http://www.w3.org/Security/faq/wwwsf4.html\*(C'\fR. .PP Note that there are many \fIfree\fR yet insecure and poorly written \s-1CGI\s0 scripts available \*(-- even some designed for use with Swish\-e. Please carefully review any \s-1CGI\s0 script you use. Free is not such a good price when you get your server hacked... .PP \fIShould I run Swish-e as the superuser (root)?\fR .IX Subsection "Should I run Swish-e as the superuser (root)?" .PP No. Never. .PP \fIWhat files does Swish-e write?\fR .IX Subsection "What files does Swish-e write?" .PP Swish writes the index file, of course. This is specified with the \&\f(CW\*(C`IndexFile\*(C'\fR configuration directive or by the \f(CW\*(C`\-f\*(C'\fR command line switch. .PP The index file is actually a collection of files, but all start with the file name specified with the \f(CW\*(C`IndexFile\*(C'\fR directive or the \f(CW\*(C`\-f\*(C'\fR command line switch. .PP For example, the file ending in \fI.prop\fR contains the document properties. .PP When creating the index files Swish-e appends the extension \fI.temp\fR to the index file names. When indexing is complete Swish-e renames the \&\fI.temp\fR files to the index files specified by \f(CW\*(C`IndexFile\*(C'\fR or \f(CW\*(C`\-f\*(C'\fR. This is done so that existing indexes remain untouched until it completes indexing. .PP Swish-e also writes temporary files in some cases during indexing (e.g. \f(CW\*(C`\-s http\*(C'\fR, \f(CW\*(C`\-s prog\*(C'\fR with filters), when merging, and when using \f(CW\*(C`\-e\*(C'\fR). Temporary files are created with the \fImkstemp\fR\|(3) function (with 0600 permission on unix-like operating systems). .PP The temporary files are created in the directory specified by the environment variables \f(CW\*(C`TMPDIR\*(C'\fR and \f(CW\*(C`TMP\*(C'\fR in that order. If those are not set then swish uses the setting the configuration setting TmpDir. Otherwise, the temporary file will be located in the current directory. .PP \fICan I index \s-1PDF\s0 and MS-Word documents?\fR .IX Subsection "Can I index PDF and MS-Word documents?" .PP Yes, you can use a \fIFilter\fR to convert documents while indexing, or you can use a program that \*(L"feeds\*(R" documents to Swish-e that have already been converted. See \f(CW\*(C`Indexing\*(C'\fR below. .PP \fICan I index documents on a web server?\fR .IX Subsection "Can I index documents on a web server?" .PP Yes, Swish-e provides two ways to index (spider) documents on a web server. See \f(CW\*(C`Spidering\*(C'\fR below. .PP Swish-e can retrieve documents from a file system or from a remote web server. It can also execute a program that returns documents back to it. This program can retrieve documents from a database, filter compressed documents files, convert \s-1PDF\s0 files, extract data from mail archives, or spider remote web sites. .PP \fICan I implement keywords in my documents?\fR .IX Subsection "Can I implement keywords in my documents?" .PP Yes, Swish-e can associate words with \fIMetaNames\fR while indexing, and you can limit your searches to these MetaNames while searching. .PP In your \s-1HTML\s0 files you can put keywords in \s-1HTML\s0 \s-1META\s0 tags or in \s-1XML\s0 blocks. .PP \&\s-1META\s0 tags can have two formats in your source documents: .PP .Vb 1 \& .Ve .PP And in \s-1XML\s0 format (can also be used in \s-1HTML\s0 documents when using libxml2): .PP .Vb 3 \& \& Some Content \& .Ve .PP Then, to inform Swish-e about the existence of the meta name in your documents, edit the line in your configuration file: .PP .Vb 1 \& MetaNames DC.subject meta1 meta2 .Ve .PP When searching you can now limit some or all search terms to that MetaName. For example, to look for documents that contain the word apple and also have either fruit or cooking in the \s-1DC\s0.subject meta tag. .PP \fIWhat are document properties?\fR .IX Subsection "What are document properties?" .PP A document property is typically data that describes the document. For example, properties might include a document's path name, its last modified date, its title, or its size. Swish-e stores a document's properties in the index file, and they can be reported back in search results. .PP Swish-e also uses properties for sorting. You may sort your results by one or more properties, in ascending or descending order. .PP Properties can also be defined within your documents. \s-1HTML\s0 and \&\s-1XML\s0 files can specify tags (see previous question) as properties. The \fIcontents\fR of these tags can then be returned with search results. These user-defined properties can also be used for sorting search results. .PP For example, if you had the following in your documents .PP .Vb 1 \& .Ve .PP and \f(CW\*(C`creator\*(C'\fR is defined as a property (see \f(CW\*(C`PropertyNames\*(C'\fR in SWISH-CONFIG) Swish-e can return \f(CW\*(C`accounting department\*(C'\fR with the result for that document. .PP .Vb 1 \& swish-e -w foo -p creator .Ve .PP Or for sorting: .PP .Vb 1 \& swish-e -w foo -s creator .Ve .PP \fIWhat's the difference between MetaNames and PropertyNames?\fR .IX Subsection "What's the difference between MetaNames and PropertyNames?" .PP MetaNames allows keywords searches in your documents. That is, you can use MetaNames to restrict searches to just parts of your documents. .PP PropertyNames, on the other hand, define text that can be returned with results, and can be used for sorting. .PP Both use \fImeta tags\fR found in your documents (as shown in the above two questions) to define the text you wish to use as a property or meta name. .PP You may define a tag as \fBboth\fR a property and a meta name. For example: .PP .Vb 1 \& .Ve .PP placed in your documents and then using configuration settings of: .PP .Vb 2 \& PropertyNames creator \& MetaNames creator .Ve .PP will allow you to limit your searches to documents created by accounting: .PP .Vb 1 \& swish-e -w 'foo and creator=(accounting)' .Ve .PP That will find all documents with the word \f(CW\*(C`foo\*(C'\fR that also have a creator meta tag that contains the word \f(CW\*(C`accounting\*(C'\fR. This is using MetaNames. .PP And you can also say: .PP .Vb 1 \& swish-e -w foo -p creator .Ve .PP which will return all documents with the word \f(CW\*(C`foo\*(C'\fR, but the results will also include the contents of the \f(CW\*(C`creator\*(C'\fR meta tag along with results. This is using properties. .PP You can use properties and meta names at the same time, too: .PP .Vb 1 \& swish-e -w creator=(accounting or marketing) -p creator -s creator .Ve .PP That searches only in the \f(CW\*(C`creator\*(C'\fR \fImeta name\fR for either of the words \&\f(CW\*(C`accounting\*(C'\fR or \f(CW\*(C`marketing\*(C'\fR, prints out the contents of the contents of the \f(CW\*(C`creator\*(C'\fR \fIproperty\fR, and sorts the results by the \f(CW\*(C`creator\*(C'\fR \&\fIproperty name\fR. .PP (See also the \f(CW\*(C`\-x\*(C'\fR output format switch in SWISH-RUN.) .PP \fICan Swish-e index multi-byte characters?\fR .IX Subsection "Can Swish-e index multi-byte characters?" .PP No. This will require much work to change. But, Swish-e works with eight-bit characters, so many characters sets can be used. Note that it does call the ANSI-C \fItolower()\fR function which does depend on the current locale setting. See \f(CWlocale(7)\fR for more information. .Sh "Indexing" .IX Subsection "Indexing" \fIHow do I pass Swish-e a list of files to index?\fR .IX Subsection "How do I pass Swish-e a list of files to index?" .PP Currently, there is not a configuration directive to include a file that contains a list of files to index. But, there is a directive to include another configuration file. .PP .Vb 1 \& IncludeConfigFile /path/to/other/config .Ve .PP And in \f(CW\*(C`/path/to/other/config\*(C'\fR you can say: .PP .Vb 2 \& IndexDir file1 file2 file3 file4 file5 ... \& IndexDir file20 file21 file22 .Ve .PP You may also specify more than one configuration file on the command line: .PP .Vb 1 \& ./swish-e -c config_one config_two config_three .Ve .PP Another option is to create a directory with symbolic links of the files to index, and index just that directory. .PP \fIHow does Swish-e know which parser to use?\fR .IX Subsection "How does Swish-e know which parser to use?" .PP Swish can parse \s-1HTML\s0, \s-1XML\s0, and text documents. The parser is set by associating a file extension with a parser by the \f(CW\*(C`IndexContents\*(C'\fR directive. You may set the default parser with the \f(CW\*(C`DefaultContents\*(C'\fR directive. If a document is not assigned a parser it will default to the \s-1HTML\s0 parser (\s-1HTML2\s0 if built with libxml2). .PP You may use Filters or an external program to convert documents to \s-1HTML\s0, \&\s-1XML\s0, or text. .PP \fICan I reindex and search at the same time?\fR .IX Subsection "Can I reindex and search at the same time?" .PP Yes. Starting with version 2.2 Swish-e indexes to temporary files, and then renames the files when indexing is complete. On most systems renames are atomic. But, since Swish-e also generates more than one file during indexing there will be a very short period of time between renaming the various files when the index is out of sync. .PP Settings in \fIsrc/config.h\fR control some options related to temporary files, and their use during indexing. .PP \fICan I index phrases?\fR .IX Subsection "Can I index phrases?" .PP Phrases are indexed automatically. To search for a phrase simply place double quotes around the phrase. .PP For example: .PP .Vb 1 \& swish-e -w 'free and "fast search engine"' .Ve .PP \fIHow can I prevent phrases from matching across sentences?\fR .IX Subsection "How can I prevent phrases from matching across sentences?" .PP Use the BumpPositionCounterCharacters configuration directive. .PP \fISwish-e isn't indexing a certain word or phrase.\fR .IX Subsection "Swish-e isn't indexing a certain word or phrase." .PP There are a number of configuration parameters that control what Swish-e considers a \*(L"word\*(R" and it has a debugging feature to help pinpoint any indexing problems. .PP Configuration file directives (SWISH-CONFIG) \&\f(CW\*(C`WordCharacters\*(C'\fR, \f(CW\*(C`BeginCharacters\*(C'\fR, \f(CW\*(C`EndCharacters\*(C'\fR, \&\f(CW\*(C`IgnoreFirstChar\*(C'\fR, and \f(CW\*(C`IgnoreLastChar\*(C'\fR are the main settings that Swish-e uses to define a \*(L"word\*(R". See SWISH-CONFIG and SWISH-RUN for details. .PP Swish-e also uses compile-time defaults for many settings. These are located in \fIsrc/config.h\fR file. .PP Use of the command line arguments \f(CW\*(C`\-k\*(C'\fR, \f(CW\*(C`\-v\*(C'\fR and \f(CW\*(C`\-T\*(C'\fR are useful when debugging these problems. Using \f(CW\*(C`\-T INDEXED_WORDS\*(C'\fR while indexing will display each word as it is indexed. You should specify one file when using this feature since it can generate a lot of output. .PP .Vb 1 \& ./swish-e -c my.conf -i problem.file -T INDEXED_WORDS .Ve .PP You may also wish to index a single file that contains words that are or are not indexing as you expect and use \-T to output debugging information about the index. A useful command might be: .PP .Vb 1 \& ./swish-e -f index.swish-e -T INDEX_FULL .Ve .PP Once you see how Swish-e is parsing and indexing your words, you can adjust the configuration settings mentioned above to control what words are indexed. .PP Another useful command might be: .PP .Vb 1 \& ./swish-e -c my.conf -i problem.file -T PARSED_WORDS INDEXED_WORDS .Ve .PP This will show white-spaced words parsed from the document (\s-1PARSED_WORDS\s0), and how those words are split up into separate words for indexing (\s-1INDEXED_WORDS\s0). .PP \fIHow do I keep Swish-e from indexing numbers?\fR .IX Subsection "How do I keep Swish-e from indexing numbers?" .PP Swish-e indexes words as defined by the \f(CW\*(C`WordCharacters\*(C'\fR setting, as described above. So to avoid indexing numbers you simply remove digits from the \f(CW\*(C`WordCharacters\*(C'\fR setting. .PP There are also some settings in \fIsrc/config.h\fR that control what \*(L"words\*(R" are indexed. You can configure swish to never index words that are all digits, vowels, or consonants, or that contain more than some consecutive number of digits, vowels, or consonants. In general, you won't need to change these settings. .PP Also, there's an experimental feature called \f(CW\*(C`IgnoreNumberChars\*(C'\fR which allows you to define a set of characters that describe a number. If a word is made up of \fBonly\fR those characters it will not be indexed. .PP \fISwish-e crashes and burns on a certain file. What can I do?\fR .IX Subsection "Swish-e crashes and burns on a certain file. What can I do?" .PP This shouldn't happen. If it does please post to the Swish-e discussion list the details so it can be reproduced by the developers. .PP In the mean time, you can use a \f(CW\*(C`FileRules\*(C'\fR directive to exclude the particular file name, or pathname, or its title. If there are serious problems in indexing certain types of files, they may not have valid text in them (they may be binary files, for instance). You can use NoContents to exclude that type of file. .PP Swish-e will issue a warning if an embedded null character is found in a document. This warning will be an indication that you are trying to index binary data. If you need to index binary files try to find a program that will extract out the text (e.g. \fIstrings\fR\|(1), \fIcatdoc\fR\|(1), \fIpdftotext\fR\|(1)). .PP \fIHow to I prevent indexing of some documents?\fR .IX Subsection "How to I prevent indexing of some documents?" .PP When using the file system to index your files you can use the \&\f(CW\*(C`FileRules\*(C'\fR directive. Other than \f(CW\*(C`FileRules title\*(C'\fR, \f(CW\*(C`FileRules\*(C'\fR only works with the file system (\f(CW\*(C`\-S fs\*(C'\fR) indexing method, not with \&\f(CW\*(C`\-S prog\*(C'\fR or \f(CW\*(C`\-S http\*(C'\fR. .PP If you are spidering a site you have control over, use a \fIrobots.txt\fR file in your document root. This is a standard way to excluded files from search engines, and is fully supported by Swish\-e. See http://www.robotstxt.org/ .PP If spidering a website with the included \fIspider.pl\fR program then add any necessary tests to the spider's configuration file. Type in the \f(CW\*(C`prog\-bin\*(C'\fR directory for details or see the spider documentation on the Swish-e website. Look for the section on callback functions. .PP If using the libxml2 library for parsing \s-1HTML\s0 (which you probably are), you may also use the Meta Robots Exclusion in your documents: .PP .Vb 1 \& .Ve .PP See the obeyRobotsNoIndex directive. .PP \fIHow do I prevent indexing parts of a document?\fR .IX Subsection "How do I prevent indexing parts of a document?" .PP To prevent Swish-e from indexing a common header, footer, or navigation bar, \s-1AND\s0 you are using libxml2 for parsing \s-1HTML\s0, then you may use a fake \s-1HTML\s0 tag around the text you wish to ignore and use the \&\f(CW\*(C`IgnoreMetaTags\*(C'\fR directive. This will generate an error message if the \f(CW\*(C`ParserWarningLevel\*(C'\fR is set as it's invalid \s-1HTML\s0. .PP \&\f(CW\*(C`IgnoreMetaTags\*(C'\fR works with \s-1XML\s0 documents (and \s-1HTML\s0 documents when using libxml2 as the parser), but not with documents parsed by the text (\s-1TXT\s0) parser. .PP If you are using the libxml2 parser (\s-1HTML2\s0 and \s-1XML2\s0) then you can use the the following comments in your documents to prevent indexing: .PP .Vb 2 \& \& .Ve .PP and/or these may be used also: .PP .Vb 2 \& \& .Ve .PP \fIHow do I modify the path or \s-1URL\s0 of the indexed documents.\fR .IX Subsection "How do I modify the path or URL of the indexed documents." .PP Use the \f(CW\*(C`ReplaceRules\*(C'\fR configuration directive to rewrite path names and URLs. If you are using \f(CW\*(C`\-S prog\*(C'\fR input method you may set the path to any string. .PP \fIHow can I index data from a database?\fR .IX Subsection "How can I index data from a database?" .PP Use the \*(L"prog\*(R" document source method of indexing. Write a program to extract out the data from your database, and format it as \s-1XML\s0, \s-1HTML\s0, or text. See the examples in the \f(CW\*(C`prog\-bin\*(C'\fR directory, and the next question. .PP \fIHow do I index my \s-1PDF\s0, Word, and compressed documents?\fR .IX Subsection "How do I index my PDF, Word, and compressed documents?" .PP Swish-e can internally only parse \s-1HTML\s0, \s-1XML\s0 and \s-1TXT\s0 (text) files by default, but can make use of \fIfilters\fR that will convert other types of files such as \s-1MS\s0 Word documents, \s-1PDF\s0, or gzipped files into one of the file types that Swish-e understands. .PP Please see SWISH-CONFIG and the examples in the \fIfilters\fR and \fIfilter-bin\fR directory for more information. .PP See the next question to learn about the filtering options with Swish\-e. .PP \fIHow do I filter documents?\fR .IX Subsection "How do I filter documents?" .PP The term \*(L"filter\*(R" in Swish-e means the converstion of a document of one type (one that swish-e cannot index directly) into a type that Swish-e can index, namely \s-1HTML\s0, plain text, or \s-1XML\s0. To add to the confusion, there are a number of ways to accomplish this in Swish\-e. So here's a bit of background. .PP The FileFilter directive was added to swish first. This feature allows you to specify a program to run for documents that match a given file extension. For example, to filter \s-1PDF\s0 files (files that end in .pdf) you can specify the configuation setting of: .PP .Vb 1 \& FileFilter .pdf pdftotext "'%p' -" .Ve .PP which says to run the program \*(L"pdftotext\*(R" passing it the pathname of the file (%p) and a dash (which tells pdftotext to output to stdout). Then for each .pdf file Swish-e runs this program and reads in the filtered document from the output from the filter program. .PP This has the advantage that it is easy to setup \*(-- a single line in the config file is all that is needed to add the filter into Swish\-e. But it also has a number of problems. For example, if you use a Perl script to do your filtering it can be very slow since the filter script must be run (and thus compiled) for each processed document. This is exacerbated when using the \-S http method since the \-S http method also uses a Perl script that is run for every \s-1URL\s0 fetched. Also, when using \-S prog method of input (reading input from a program) using FileFilter means that Swish-e must first read the file in from the external program and then write the file out to a temporary file before running the filter. .PP With \-S prog it makes much more sense to filter the document in the program that is fetching the documents than to have swish-e read the file into memory, write it to a temporary file and then run an external program. .PP The Swish-e distribution contains a couple of example \-S prog programs. \fIspider.pl\fR is a reasonably full-featured web spider that offers many more options than the \-S http method. And it is much faster than running \-S http, too. .PP The spider has a perl configuration file, which means you can add programming logic right into the configuration file without editing the spider program. One bit of logic that is provided in the spider's configuration file is a \*(L"call\-back\*(R" function that allows you to filter the content. In other words, before the spider passes a fetched web document to swish for indexing the spider can call a simple subroutine in the spider's configuration file passing the document and its content type. The subroutine can then look at the content type and decide if the document needs to be filtered. .PP For example, when processing a document of type \*(L"application/msword\*(R" the call-back subroutine might call the doc2txt.pm perl module, and a document of type \&\*(L"appliation/pdf\*(R" could use the pdf2html.pm module. The \fIprog\-bin/SwishSpiderConfig.pl\fR file shows this usage. .PP This system works reasonably well, but also means that more work is required to setup the filters. First, you must explicitly check for specific content types and then call the appropriate Perl module, and second, you have to know how each module must be called and how each returns the possibly modified content. .PP In comes SWISH::Filter. .PP To make things easier the SWISH::Filter Perl module was created. The idea of this module is that there is one interface used to filter all types of documents. So instead of checking for specific types of content you just pass the content type and the document to the SWISH::Filter module and it returns a new content type and document if it was filtered. The filters that do the actual work are designed with a standard interface and work like filter \*(L"plug\-ins\*(R". Adding new filters means just downloading the filter to a directory and no changes are needed to the spider's configuation file. Download a filter for Postscript and next time you run indexing your Postscript files will be indexed. .PP Since the filters are standardized, hopefully when you have the need to filter documents of a specific type there will already be a filter ready for your use. .PP Now, note that the perl modules may or may not do the actual conversion of a document. For example, the \s-1PDF\s0 conversion module calls the pdfinfo and pdftotext programs. Those programs (part of the Xpfd package) must be installed separately from the filters. .PP The SwishSpiderConfig.pl examle spider configuration file shows how to use the SWISH::Filter module for filtering. This file is installed at \f(CW$prefix\fR/share/doc/swish\-e/examples/prog\-bin, where \f(CW$prefix\fR is normally /usr/local on unix-type machines. .PP The SWISH::Filter method of filtering can also be used with the \-S http method of indexing. By default the \fIswishspider\fR program (the Perl helper script that fetches documents from the web) will attempt to use the SWISH::Filter module if it can be found in Perls library path. This path is set automatically for spider.pl but not for swishspider (because it would slow down a method that's already slow and spider.pl is recommended over the \-S http method). .PP Therefore, all that's required to use this system with \-S http is setting the \f(CW@INC\fR array to point to the filter directory. .PP For example, if the swish-e distribution was unpacked into ~/swish\-e: .PP .Vb 1 \& PERL5LIB=~/swish-e/filters swish-e -c conf -S http .Ve .PP will allow the \-S http method to make use of the SWISH::Filter module. .PP Note that if you are not using the SWISH::Filter module you may wish to edit the \fIswishspider\fR program and disable the use of the SWISH::Filter module using this setting: .PP .Vb 1 \& use constant USE_FILTERS => 0; # disable SWISH::Filter .Ve .PP This prevents the program from attempting to use the SWISH::Filter module for every non-text \&\s-1URL\s0 that is fetched. Of course, if you are concerned with indexing speed you should be using the \-S prog method with spider.pl instead of \-S http. .PP If you are not spidering, but you still want to make use of the SWISH::Filter module for filtering you can use the DirTree.pl program (in \f(CW$prefix\fR/lib/swish\-e). This is a simple program that traverses the file system and uses SWISH::Filter for filtering. .PP Here's two examples of how to run a filter program, one using Swish\-e's \&\f(CW\*(C`FileFilter\*(C'\fR directive, another using a \f(CW\*(C`prog\*(C'\fR input method program. See the \fISwishSpiderConfig.pl\fR file for an example of using the SWISH::Filter module. .PP These filters simply use the program \f(CW\*(C`/bin/cat\*(C'\fR as a filter and only indexes .html files. .PP First, using the \f(CW\*(C`FileFilter\*(C'\fR method, here's the entire configuration file (swish.conf): .PP .Vb 3 \& IndexDir . \& IndexOnly .html \& FileFilter .html "/bin/cat" "'%p'" .Ve .PP and index with the command .PP .Vb 1 \& swish-e -c swish.conf -v 1 .Ve .PP Now, the same thing with using the \f(CW\*(C`\-S prog\*(C'\fR document source input method and a Perl program called catfilter.pl. You can see that's it's much more work than using the \f(CW\*(C`FileFilter\*(C'\fR method above, but provides a place to do additional processing. In this example, the \f(CW\*(C`prog\*(C'\fR method is only slightly faster. But if you needed a perl script to run as a FileFilter then \f(CW\*(C`prog\*(C'\fR will be significantly faster. .PP .Vb 3 \& #!/usr/local/bin/perl -w \& use strict; \& use File::Find; # for recursing a directory tree .Ve .PP .Vb 5 \& $/ = undef; \& find( \& { wanted => \e&wanted, no_chdir => 1, }, \& '.', \& ); .Ve .PP .Vb 3 \& sub wanted { \& return if -d; \& return unless /\e.html$/; .Ve .PP .Vb 1 \& my $mtime = (stat)[9]; .Ve .PP .Vb 3 \& my $child = open( FH, '-|' ); \& die "Failed to fork $!" unless defined $child; \& exec '/bin/cat', $_ unless $child; .Ve .PP .Vb 2 \& my $content = ; \& my $size = length $content; .Ve .PP .Vb 4 \& print <; \& } .Ve .PP And index with the command: .PP .Vb 1 \& swish-e -S prog -i ./catfilter.pl -v 1 .Ve .PP This example will probably not work under Windows due to the '\-|' open. A simple piped open may work just as well: .PP That is, replace: .PP .Vb 3 \& my $child = open( FH, '-|' ); \& die "Failed to fork $!" unless defined $child; \& exec '/bin/cat', $_ unless $child; .Ve .PP with this: .PP .Vb 1 \& open( FH, "/bin/cat $_ |" ) or die $!; .Ve .PP Perl will try to avoid running the command through the shell if meta characters are not passed to the open. See \f(CW\*(C`perldoc \-f open\*(C'\fR for more information. .PP \fIEh, but I just want to know how to index \s-1PDF\s0 documents!\fR .IX Subsection "Eh, but I just want to know how to index PDF documents!" .PP See the examples in the \fIconf\fR directory and the comments in the \fISwishSpiderConfig.pl\fR file. .PP See the previous question for the details on filtering. The method you decide to use will depend on how fast you want to index, and your comfort level with using Perl modules. .PP Regardless of the filtering method you use you will need to install the Xpdf packages available from http://www.foolabs.com/xpdf/. .PP \fII'm using Windows and can't get Filters or the prog input method to work!\fR .IX Subsection "I'm using Windows and can't get Filters or the prog input method to work!" .PP Both the \f(CW\*(C`\-S prog\*(C'\fR input method and filters use the \f(CW\*(C`popen()\*(C'\fR system call to run the external program. If your external program is, for example, a perl script, you have to tell Swish-e to run perl, instead of the script. Swish-e will convert forward slashes to backslashes when running under Windows. .PP For example, you would need to specify the path to perl as (assuming this is where perl is on your system): .PP .Vb 1 \& IndexDir e:/perl/bin/perl.exe .Ve .PP Or run a filter like: .PP .Vb 1 \& FileFilter .foo e:/perl/bin/perl.exe 'myscript.pl "%p"' .Ve .PP It's often easier to just install Linux. .PP \fIHow do I index non-English words?\fR .IX Subsection "How do I index non-English words?" .PP Swish-e indexes 8\-bit characters only. This is the \s-1ISO\s0 8859\-1 Latin\-1 character set, and includes many non-English letters (and symbols). As long as they are listed in \f(CW\*(C`WordCharacters\*(C'\fR they will be indexed. .PP Actually, you probably can index any 8\-bit character set, as long as you don't mix character sets in the same index and don't use libxml2 for parsing (see below). .PP The \f(CW\*(C`TranslateCharacters\*(C'\fR directive (SWISH-CONFIG) can translate characters while indexing and searching. You may specify the mapping of one character to another character with the \&\f(CW\*(C`TranslateCharacters\*(C'\fR directive. .PP \&\f(CW\*(C`TranslateCharacters :ascii7:\*(C'\fR is a predefined set of characters that will translate eight-bit characters to ascii7 characters. Using the \&\f(CW\*(C`:ascii7:\*(C'\fR rule will, for example, translate \*(L"Ääç\*(R" to \*(L"aac\*(R". This means: searching \*(L"Çelik\*(R", \*(L"çelik\*(R" or \*(L"celik\*(R" will all match the same word. .PP Note: When using libxml2 for parsing, parsed documents are converted internally (within libxml2) to \s-1UTF\-8\s0. This is converted to \s-1ISO\s0 8859\-1 Latin\-1 when indexing. In cases where a string can not be converted from \s-1UTF\-8\s0 to \s-1ISO\s0 8859\-1 (because it contains non 8859\-1 characters), the string will be sent to Swish-e in \s-1UTF\-8\s0 encoding. This will results in some words indexed incorrectly. Setting \f(CW\*(C`ParserWarningLevel\*(C'\fR to 1 or more will display warnings when \s-1UTF\-8\s0 to 8859\-1 conversion fails. .PP \fICan I add/remove files from an index?\fR .IX Subsection "Can I add/remove files from an index?" .PP Try building swish-e with the \f(CW\*(C`\-\-enable\-incremental\*(C'\fR option. .PP The rest of this \s-1FAQ\s0 applies to the default swish-e format. .PP Swish-e currently has no way to add or remove items from its index. But, Swish-e indexes so quickly that it's often possible to reindex the entire document set when a file needs to be added, modified or removed. If you are spidering a remote site then consider caching documents locally compressed. .PP Incremental additions can be handled in a couple of ways, depending on your situation. It's probably easiest to create one main index every night (or every week), and then create an index of just the new files between main indexing jobs and use the \f(CW\*(C`\-f\*(C'\fR option to pass both indexes to Swish-e while searching. .PP You can merge the indexes into one index (instead of using \-f), but it's not clear that this has any advantage over searching multiple indexes. .PP How does one create the incremental index? .PP One method is by using the \f(CW\*(C`\-N\*(C'\fR switch to pass a file path to Swish-e when indexing. It will only index files that have a last modification date \f(CW\*(C`newer\*(C'\fR than the file supplied with the \f(CW\*(C`\-N\*(C'\fR switch. .PP This option has the disadvantage that Swish-e must process every file in every directory as if they were going to be indexed (the test for \f(CW\*(C`\-N\*(C'\fR is done last right before indexing of the file contents begin and after all other tests on the file have been completed) \*(-- all that just to find a few new files. .PP Also, if you use the Swish-e index file as the file passed to \f(CW\*(C`\-N\*(C'\fR there may be files that were added after indexing was started, but before the index file was written. This could result in a file not being added to the index. .PP Another option is to maintain a parallel directory tree that contains symlinks pointing to the main files. When a new file is added (or changed) to the main directory tree you create a symlink to the real file in the parallel directory tree. Then just index the symlink directory to generate the incremental index. .PP This option has the disadvantage that you need to have a central program that creates the new files that can also create the symlinks. But, indexing is quite fast since Swish-e only has to look at the files that need to be indexed. When you run full indexing you simply unlink (delete) all the symlinks. .PP Both of these methods have issues where files could end up in both indexes, or files being left out of an index. Use of file locks while indexing, and hash lookups during searches can help prevent these problems. .PP \fII run out of memory trying to index my files.\fR .IX Subsection "I run out of memory trying to index my files." .PP It's true that indexing can take up a lot of memory! Swish-e is extremely fast at indexing, but that comes at the cost of memory. .PP The best answer is install more memory. .PP Another option is use the \f(CW\*(C`\-e\*(C'\fR switch. This will require less memory, but indexing will take longer as not all data will be stored in memory while indexing. How much less memory and how much more time depends on the documents you are indexing, and the hardware that you are using. .PP Here's an example of indexing all .html files in /usr/doc on Linux. This first example is \fIwithout\fR \f(CW\*(C`\-e\*(C'\fR and used about 84M of memory: .PP .Vb 3 \& 270279 unique words indexed. \& 23841 files indexed. 177640166 total bytes. \& Elapsed time: 00:04:45 CPU time: 00:03:19 .Ve .PP This is \fIwith\fR \f(CW\*(C`\-e\*(C'\fR, and used about 26M or memory: .PP .Vb 3 \& 270279 unique words indexed. \& 23841 files indexed. 177640166 total bytes. \& Elapsed time: 00:06:43 CPU time: 00:04:12 .Ve .PP You can also build a number of smaller indexes and then merge together with \f(CW\*(C`\-M\*(C'\fR. Using \f(CW\*(C`\-e\*(C'\fR while merging will save memory. .PP Finally, if you do build a number of smaller indexes, you can specify more than one index when searching by using the \f(CW\*(C`\-f\*(C'\fR switch. Sorting large results sets by a property will be slower when specifying multiple index files while searching. .PP \fI\*(L"too many open files\*(R" when indexing with \-e option\fR .IX Subsection "too many open files when indexing with -e option" .PP Some platforms report \*(L"too many open files\*(R" when using the \-e economy option. The \-e feature uses many temporary files (something like 377) plus the index files and this may exceed your system's limits. .PP Depending on your platform you may need to set \*(L"ulimit\*(R" or \*(L"unlimit\*(R". .PP For example, under Linux bash shell: .PP .Vb 1 \& $ ulimit -n 1024 .Ve .PP Or under an old Sparc .PP .Vb 1 \& % unlimit openfiles .Ve .PP \fIMy system admin says Swish-e uses too much of the \s-1CPU\s0!\fR .IX Subsection "My system admin says Swish-e uses too much of the CPU!" .PP That's a good thing! That expensive \s-1CPU\s0 is supposed to be busy. .PP Indexing takes a lot of work \*(-- to make indexing fast much of the work is done in memory which reduces the amount of time Swish-e is waiting on I/O. But, there's two things you can try: .PP The \f(CW\*(C`\-e\*(C'\fR option will run Swish-e in economy mode, which uses the disk to store data while indexing. This makes Swish-e run somewhat slower, but also uses less memory. Since it is writing to disk more often it will be spending more time waiting on I/O and less time in \s-1CPU\s0. Maybe. .PP The other thing is to simply lower the priority of the job using the \&\fInice\fR\|(1) command: .PP .Vb 1 \& /bin/nice -15 swish-e -c search.conf .Ve .PP If concerned about searching time, make sure you are using the \-b and \-m switches to only return a page at a time. If you know that your result sets will be large, and that you wish to return results one page at a time, and that often times many pages of the same query will be requested, you may be smart to request all the documents on the first request, and then cache the results to a temporary file. The perl module File::Cache makes this very simple to accomplish. .Sh "Spidering" .IX Subsection "Spidering" \fIHow can I index documents on a web server?\fR .IX Subsection "How can I index documents on a web server?" .PP If possible, use the file system method \f(CW\*(C`\-S fs\*(C'\fR of indexing to index documents in you web area of the file system. This avoids the overhead of spidering a web server and is much faster. (\f(CW\*(C`\-S fs\*(C'\fR is the default method if \f(CW\*(C`\-S\*(C'\fR is not specified). .PP If this is impossible (the web server is not local, or documents are dynamically generated), Swish-e provides two methods of spidering. First, it includes the http method of indexing \f(CW\*(C`\-S http\*(C'\fR. A number of special configuration directives are available that control spidering (see \*(L"Directives for the \s-1HTTP\s0 Access Method Only\*(R" in SWISH-CONFIG). A perl helper script (swishspider) is included in the \fIsrc\fR directory to assist with spidering web servers. There are example configurations for spidering in the \fIconf\fR directory. .PP As of Swish-e 2.2, there's a general purpose \*(L"prog\*(R" document source where a program can feed documents to it for indexing. A number of example programs can be found in the \f(CW\*(C`prog\-bin\*(C'\fR directory, including a program to spider web servers. The provided spider.pl program is full-featured and is easily customized. .PP The advantage of the \*(L"prog\*(R" document source feature over the \*(L"http\*(R" method is that the program is only executed one time, where the swishspider.pl program used in the \*(L"http\*(R" method is executed once for every document read from the web server. The forking of Swish-e and compiling of the perl script can be quite expensive, time\-wise. .PP The other advantage of the \f(CW\*(C`spider.pl\*(C'\fR program is that it's simple and efficient to add filtering (such as for \s-1PDF\s0 or \s-1MS\s0 Word docs) right into the spider.pl's configuration, and it includes features such as \s-1MD5\s0 checks to prevent duplicate indexing, options to avoid spidering some files, or index but avoid spidering. And since it's a perl program there's no limit on the features you can add. .PP \fIWhy does swish report \*(L"./swishspider: not found\*(R"?\fR .IX Subsection "Why does swish report ./swishspider: not found?" .PP Does the file \fIswishspider\fR exist where the error message displays? If not, either set the configuration option SpiderDirectory to point to the directory where the \fIswishspider\fR program is found, or place the \&\fIswishspider\fR program in the current directory when running swish\-e. .PP If you are running Windows, make sure \*(L"perl\*(R" is in your path. Try typing \fIperl\fR from a command prompt. .PP If you not running windows, make sure that the shebang line (the first line of the swishspider program that starts with #!) points to the correct location of perl. Typically this will be \fI/usr/bin/perl\fR or \fI/usr/local/bin/perl\fR. Also, make sure that you have execute and read permissions on \fIswishspider\fR. .PP The \fIswishspider\fR perl script is only used with the \-S http method of indexing. .PP \fII'm using the spider.pl program to spider my web site, but some large files are not indexed.\fR .IX Subsection "I'm using the spider.pl program to spider my web site, but some large files are not indexed." .PP The \f(CW\*(C`spider.pl\*(C'\fR program has a default limit of 5MB file size. This can be changed with the \f(CW\*(C`max_size\*(C'\fR parameter setting. See \f(CW\*(C`perldoc spider.pl\*(C'\fR for more information. .PP \fII still don't think all my web pages are being indexed.\fR .IX Subsection "I still don't think all my web pages are being indexed." .PP The \fIspider.pl\fR program has a number of debugging switches and can be quite verbose in telling you what's happening, and why. See \f(CW\*(C`perldoc spider.pl\*(C'\fR for instructions. .PP \fISwish is not spidering Javascript links!\fR .IX Subsection "Swish is not spidering Javascript links!" .PP Swish cannot follow links generated by Javascript, as they are generated by the browser and are not part of the document. .PP \fIHow do I spider other websites and combine it with my own (filesystem) index?\fR .IX Subsection "How do I spider other websites and combine it with my own (filesystem) index?" .PP You can either merge \f(CW\*(C`\-M\*(C'\fR two indexes into a single index, or use \f(CW\*(C`\-f\*(C'\fR to specify more than one index while searching. .PP You will have better results with the \f(CW\*(C`\-f\*(C'\fR method. .Sh "Searching" .IX Subsection "Searching" \fIHow do I limit searches to just parts of the index?\fR .IX Subsection "How do I limit searches to just parts of the index?" .PP If you can identify \*(L"parts\*(R" of your index by the path name you have two options. .PP The first options is by indexing the document path. Add this to your configuration: .PP .Vb 1 \& MetaNames swishdocpath .Ve .PP Now you can search for words or phrases in the path name: .PP .Vb 1 \& swish-e -w 'foo AND swishdocpath=(sales)' .Ve .PP So that will only find documents with the word \*(L"foo\*(R" and where the file's path contains \*(L"sales\*(R". That might not works as well as you like, though, as both of these paths will match: .PP .Vb 2 \& /web/sales/products/index.html \& /web/accounting/private/sales_we_messed_up.html .Ve .PP This can be solved by searching with a phrase (assuming \*(L"/\*(R" is not a WordCharacter): .PP .Vb 2 \& swish-e -w 'foo AND swishdocpath=("/web/sales/")' \& swish-e -w 'foo AND swishdocpath=("web sales")' (same thing) .Ve .PP The second option is a bit more powerful. With the \f(CW\*(C`ExtractPath\*(C'\fR directive you can use a regular expression to extract out a sub-set of the path and save it as a separate meta name: .PP .Vb 2 \& MetaNames department \& ExtractPath department regex !^/web/([^/]+).+$!$1/ .Ve .PP Which says match a path that starts with \*(L"/web/\*(R" and extract out everything after that up to, but not including the next \*(L"/\*(R" and save it in variable \f(CW$1\fR, and then match everything from the \*(L"/\*(R" onward. Then replace the entire matches string with \f(CW$1\fR. And that gets indexed as meta name \&\*(L"department\*(R". .PP Now you can search like: .PP .Vb 1 \& swish-e -w 'foo AND department=sales' .Ve .PP and be sure that you will only match the documents in the /www/sales/* path. Note that you can map completely different areas of your file system to the same metaname: .PP .Vb 3 \& # flag the marketing specific pages \& ExtractPath department regex !^/web/(marketing|sales)/.+$!marketing/ \& ExtractPath department regex !^/internal/marketing/.+$!marketing/ .Ve .PP .Vb 2 \& # flag the technical departments pages \& ExtractPath department regex !^/web/(tech|bugs)/.+$!tech/ .Ve .PP Finally, if you have something more complicated, use \f(CW\*(C`\-S prog\*(C'\fR and write a perl program or use a filter to set a meta tag when processing each file. .PP \fIHow is ranking calculated?\fR .IX Subsection "How is ranking calculated?" .PP The \f(CW\*(C`swishrank\*(C'\fR property value is calculated based on which Ranking Scheme (or algorithm) you have selected. In this discussion, any time the word \fBfancy\fR is used, you should consult the actual code for more details. It is open source, after all. .PP Things you can do to affect ranking: .IP "MetaNamesRank" 4 .IX Item "MetaNamesRank" You may configure your index to bias certain metaname values more or less than others. See the \f(CW\*(C`MetaNamesRank\*(C'\fR configuration option in SWISH-CONFIG. .IP "IgnoreTotalWordCountWhenRanking" 4 .IX Item "IgnoreTotalWordCountWhenRanking" Set to 1 (default) or 0 in your config file. See SWISH-CONFIG. \&\fB\s-1NOTE:\s0\fR You must set this to 0 to use the \s-1IDF\s0 Ranking Scheme. .IP "structure" 4 .IX Item "structure" Each term's position in each \s-1HTML\s0 document is given a structure value based on the context in which the word appears. The structure value is used to artificially inflate the frequency of each term in that particular document. These structural values are defined in \fIconfig.h\fR: .Sp .Vb 5 \& #define RANK_TITLE 7 \& #define RANK_HEADER 5 \& #define RANK_META 3 \& #define RANK_COMMENTS 1 \& #define RANK_EMPHASIZED 0 .Ve .Sp For example, if the word \f(CW\*(C`foo\*(C'\fR appears in the title of a document, the Scheme will treat that document as if \f(CW\*(C`foo\*(C'\fR appeared 7 additional times. .PP All Schemes share the following characteristics: .IP "\s-1AND\s0 searches" 4 .IX Item "AND searches" The rank value is averaged for all \s-1AND\s0'd terms. Terms within a set of parentheses () are averaged as a single term (this is an acknowledged weakness and is on the \s-1TODO\s0 list). .IP "\s-1OR\s0 searches" 4 .IX Item "OR searches" The rank value is summed and then doubled for each pair of \s-1OR\s0'd terms. This results in higher ranks for documents that have multiple \s-1OR\s0'd terms. .IP "scaled rank" 4 .IX Item "scaled rank" After a document's raw rank score is calculated, a final rank score is calculated using a fancy \f(CW\*(C`log()\*(C'\fR function. All the documents are then scaled against a base score of 1000. The top-ranked document will therefore always have a \f(CW\*(C`swishrank\*(C'\fR value of 1000. .PP Here is a brief overview of how the different Schemes work. The number in parentheses after the name is the value to invoke that scheme with \f(CW\*(C`swish\-e \-R\*(C'\fR or \f(CW\*(C`RankScheme()\*(C'\fR. .IP "Default (0)" 4 .IX Item "Default (0)" The default ranking scheme considers the number of times a term appears in a document (frequency), the MetaNamesRank and the structure value. The rank might be summarized as: .Sp .Vb 1 \& DocRank = Sum of ( structure + metabias ) .Ve .Sp Consider this output with the \s-1DEBUG_RANK\s0 variable set at compile time: .Sp .Vb 12 \& Ranking Scheme: 0 \& Word entry 0 at position 6 has struct 7 \& Word entry 1 at position 64 has struct 41 \& Word entry 2 at position 71 has struct 9 \& Word entry 3 at position 132 has struct 9 \& Word entry 4 at position 154 has struct 9 \& Word entry 5 at position 423 has struct 73 \& Word entry 6 at position 541 has struct 73 \& Word entry 7 at position 662 has struct 73 \& File num: 1104. Raw Rank: 21. Frequency: 8 scaled rank: 30445 \& Structure tally: \& struct 0x7 = count of 1 ( HEAD TITLE FILE ) x rank map of 8 = 8 .Ve .Sp .Vb 1 \& struct 0x9 = count of 3 ( BODY FILE ) x rank map of 1 = 3 .Ve .Sp .Vb 1 \& struct 0x29 = count of 1 ( HEADING BODY FILE ) x rank map of 6 = 6 .Ve .Sp .Vb 1 \& struct 0x49 = count of 3 ( EM BODY FILE ) x rank map of 1 = 3 .Ve .Sp Every word instance starts with a base score of 1. Then for each instance of your word, a running sum is taken of the structural value of that word position plus any bias you've configured. In the example above, the raw rank is \f(CW\*(C`1 + 8 + 3 + 6 + 3 = 21\*(C'\fR. .Sp Consider this line: .Sp .Vb 1 \& struct 0x7 = count of 1 ( HEAD TITLE FILE ) x rank map of 8 = 8 .Ve .Sp That means there was one instance of our word in the title of the file. It's context was in the tagset, inside the . The <title> is the most specific structure, so it gets the \&\s-1RANK_TITLE\s0 score: 7. The base rank of 1 plus the structure score of 7 equals 8. If there had been two instances of this word in the title, then the score would have been \f(CW\*(C`8 + 8 = 16\*(C'\fR. .IP "\s-1IDF\s0 (1)" 4 .IX Item "IDF (1)" \&\s-1IDF\s0 is short for Inverse Document Frequency. That's fancy ranking lingo for taking into account the total frequency of a term across the entire index, in addition to the term's frequency in a single document. \s-1IDF\s0 ranking also uses the relative density of a word in a document to judge its relevancy. Words that appear more often in a doc make that doc's rank higher, and longer docs are not weighted higher than shorter docs. .Sp The \s-1IDF\s0 Scheme might be summarized as: .Sp .Vb 1 \& DocRank = Sum of ( density * idf * ( structure + metabias ) ) .Ve .Sp Consider this output from \s-1DEBUG_RANK:\s0 .Sp .Vb 17 \& Ranking Scheme: 1 \& File num: 1104 Word Score: 1 Frequency: 8 Total files: 1451 \& Total word freq: 108 IDF: 2564 \& Total words: 1145877 Indexed words in this doc: 562 \& Average words: 789 Density: 1120 Word Weight: 28716 \& Word entry 0 at position 6 has struct 7 \& Word entry 1 at position 64 has struct 41 \& Word entry 2 at position 71 has struct 9 \& Word entry 3 at position 132 has struct 9 \& Word entry 4 at position 154 has struct 9 \& Word entry 5 at position 423 has struct 73 \& Word entry 6 at position 541 has struct 73 \& Word entry 7 at position 662 has struct 73 \& Rank after IDF weighting: 574321 \& scaled rank: 132609 \& Structure tally: \& struct 0x7 = count of 1 ( HEAD TITLE FILE ) x rank map of 8 = 8 .Ve .Sp .Vb 1 \& struct 0x9 = count of 3 ( BODY FILE ) x rank map of 1 = 3 .Ve .Sp .Vb 1 \& struct 0x29 = count of 1 ( HEADING BODY FILE ) x rank map of 6 = 6 .Ve .Sp .Vb 1 \& struct 0x49 = count of 3 ( EM BODY FILE ) x rank map of 1 = 3 .Ve .Sp It is similar to the default Scheme, but notice how the total number of files in the index and the total word frequency (as opposed to the document frequency) are both part of the equation. .PP Ranking is a complicated subject. SWISH-E allows for more Ranking Schemes to be developed and experimented with, using the \-R option (from the swish-e command) and the RankScheme (see the \s-1API\s0 documentation). Experiment and share your findings via the discussion list. .PP \fIHow can I limit searches to the title, body, or comment?\fR .IX Subsection "How can I limit searches to the title, body, or comment?" .PP Use the \f(CW\*(C`\-t\*(C'\fR switch. .PP \fII can't limit searches to title/body/comment.\fR .IX Subsection "I can't limit searches to title/body/comment." .PP Or, \fII can't search with meta names, all the names are indexed as \&\*(L"plain\*(R".\fR .PP Check in the config.h file if #define \s-1INDEXTAGS\s0 is set to 1. If it is, change it to 0, recompile, and index again. When \s-1INDEXTAGS\s0 is 1, \s-1ALL\s0 the tags are indexed as plain text, that is you index \*(L"title\*(R", \*(L"h1\*(R", and so on, \s-1AND\s0 they loose their indexing meaning. If \s-1INDEXTAGS\s0 is set to 0, you will still index meta tags and comments, unless you have indicated otherwise in the user config file with the IndexComments directive. .PP Also, check for the \f(CW\*(C`UndefinedMetaTags\*(C'\fR setting in your configuration file. .PP \fII've tried running the included \s-1CGI\s0 script and I get a \*(L"Internal Server Error\*(R"\fR .IX Subsection "I've tried running the included CGI script and I get a Internal Server Error" .PP Debugging \s-1CGI\s0 scripts are beyond the scope of this document. Internal Server Error basically means \*(L"check the web server's log for an error message\*(R", as it can mean a bad shebang (#!) line, a missing perl module, \s-1FTP\s0 transfer error, or simply an error in the program. The \s-1CGI\s0 script \fIswish.cgi\fR in the \fIexample\fR directory contains some debugging suggestions. Type \f(CW\*(C`perldoc swish.cgi\*(C'\fR for information. .PP There are also many, many \s-1CGI\s0 FAQs available on the Internet. A quick web search should offer help. As a last resort you might ask your webadmin for help... .PP \fIWhen I try to view the swish.cgi page I see the contents of the Perl program.\fR .IX Subsection "When I try to view the swish.cgi page I see the contents of the Perl program." .PP Your web server is not configured to run the program as a \s-1CGI\s0 script. This problem is described in \f(CW\*(C`perldoc swish.cgi\*(C'\fR. .PP \fIHow do I make Swish-e highlight words in search results?\fR .IX Subsection "How do I make Swish-e highlight words in search results?" .PP Short answer: .PP Use the supplied swish.cgi or search.cgi scripts located in the \fIexample\fR directory. .PP Long answer: .PP Swish-e can't because it doesn't have access to the source documents when returning results, of course. But a front-end program of your creation can highlight terms. Your program can open up the source documents and then use regular expressions to replace search terms with highlighted or bolded words. .PP But, that will fail with all but the most simple source documents. For \s-1HTML\s0 documents, for example, you must parse the document into words and tags (and comments). A word you wish to highlight may span multiple \&\s-1HTML\s0 tags, or be a word in a \s-1URL\s0 and you wish to highlight the entire link text. .PP Perl modules such as HTML::Parser and XML::Parser make word extraction possible. Next, you need to consider that Swish-e uses settings such as WordCharacters, BeginCharacters, EndCharacters, IgnoreFirstChar, and IgnoreLast, char to define a \*(L"word\*(R". That is, you can't consider that a string of characters with white space on each side is a word. .PP Then things like TranslateCharacters, and \s-1HTML\s0 Entities may transform a source word into something else, as far as Swish-e is concerned. Finally, searches can be limited by metanames, so you may need to limit your highlighting to only parts of the source document. Throw phrase searches and stopwords into the equation and you can see that it's not a trivial problem to solve. .PP All hope is not lost, thought, as Swish-e does provide some help. Using the \f(CW\*(C`\-H\*(C'\fR option it will return in the headers the current index (or indexes) settings for WordCharacters (and others) required to parse your source documents as it parses them during indexing, and will return a \&\*(L"Parsed Words:\*(R" header that will show how it parsed the query internally. If you use fuzzy indexing (word stemming, soundex, or metaphone) then you will also need to stem each word in your document before comparing with the \*(L"Parsed Words:\*(R" returned by Swish\-e. .PP The Swish-e stemming code is available either by using the Swish-e Perl module (\s-1SWISH::API\s0) or the C library (included with the swish-e distribution), or by using the SWISH::Stemmer module available on \s-1CPAN\s0. Also on \s-1CPAN\s0 is the module Text::DoubleMetaphone. Using \s-1SWISH::API\s0 probably provides the best stemming support. .PP \fIDo filters effect the performance during search?\fR .IX Subsection "Do filters effect the performance during search?" .PP No. Filters (FileFilter or via \*(L"prog\*(R" method) are only used for building the search index database. During search requests there will be no filter calls. .Sh "I have read the \s-1FAQ\s0 but I still have questions about using Swish\-e." .IX Subsection "I have read the FAQ but I still have questions about using Swish-e." The Swish-e discussion list is the place to go. http://swish\-e.org/. Please do not email developers directly. The list is the best place to ask questions. .PP Before you post please read \fI\s-1QUESTIONS\s0 \s-1AND\s0 \s-1TROUBLESHOOTING\s0\fR located in the \s-1INSTALL\s0 page. You should also search the Swish-e discussion list archive which can be found on the swish-e web site. .PP In short, be sure to include in the following when asking for help. .IP "* The swish-e version (./swish\-e \-V)" 4 .IX Item "The swish-e version (./swish-e -V)" .PD 0 .IP "* What you are indexing (and perhaps a sample), and the number of files" 4 .IX Item "What you are indexing (and perhaps a sample), and the number of files" .IP "* Your Swish-e configuration file" 4 .IX Item "Your Swish-e configuration file" .IP "* Any error messages that Swish-e is reporting" 4 .IX Item "Any error messages that Swish-e is reporting" .PD .SH "Document Info" .IX Header "Document Info" $Id: \s-1SWISH\-FAQ\s0.pod 2147 2008\-07\-21 02:48:55Z karpet $ .PP \&. ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/man/Makefile.am�����������������������������������������������������������������������0000664�0000771�0001750�00000002133�11166010112�012623� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#if BUILDDOCS man_MANS = \ $(srcdir)/swish-e.1 \ $(srcdir)/SWISH-CONFIG.1 \ $(srcdir)/SWISH-FAQ.1 \ $(srcdir)/SWISH-LIBRARY.1 \ $(srcdir)/SWISH-RUN.1 $(srcdir)/swish-e.1 : $(top_srcdir)/pod/swish-e.pod -rm -f $@ -pod2man --center="SWISH-E Documentation" --lax --release='$(VERSION)' $(top_srcdir)/pod/swish-e.pod > $@ $(srcdir)/SWISH-CONFIG.1 : $(top_srcdir)/pod/SWISH-CONFIG.pod -rm -f $@ -pod2man --center="SWISH-E Documentation" --lax --release='$(VERSION)' $(top_srcdir)/pod/SWISH-CONFIG.pod > $@ $(srcdir)/SWISH-FAQ.1 : $(top_srcdir)/pod/SWISH-FAQ.pod -rm -f $@ -pod2man --center="SWISH-E Documentation" --lax --release='$(VERSION)' $(top_srcdir)/pod/SWISH-FAQ.pod > $@ $(srcdir)/SWISH-LIBRARY.1 : $(top_srcdir)/pod/SWISH-LIBRARY.pod -rm -f $@ -pod2man --center="SWISH-E Documentation" --lax --release='$(VERSION)' $(top_srcdir)/pod/SWISH-LIBRARY.pod > $@ $(srcdir)/SWISH-RUN.1 : $(top_srcdir)/pod/SWISH-RUN.pod -rm -f $@ -pod2man --center="SWISH-E Documentation" --lax --release='$(VERSION)' $(top_srcdir)/pod/SWISH-RUN.pod > $@ #endif EXTRA_DIST = $(man_MANS) �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/man/swish-e.1�������������������������������������������������������������������������0000664�0000771�0001750�00000014300�11166010470�012236� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������.\" Automatically generated by Pod::Man v1.37, Pod::Parser v1.14 .\" .\" Standard preamble: .\" ======================================================================== .de Sh \" Subsection heading .br .if t .Sp .ne 5 .PP \fB\\$1\fR .PP .. .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. | will give a .\" real vertical bar. \*(C+ will give a nicer C++. Capital omega is used to .\" do unbreakable dashes and therefore won't be available. \*(C` and \*(C' .\" expand to `' in nroff, nothing in troff, for use with C<>. .tr \(*W-|\(bv\*(Tr .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.Sh), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .\" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .hy 0 .if n .na .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "SWISH-E 1" .TH SWISH-E 1 "2009-04-04" "2.4.7" "SWISH-E Documentation" .SH "NAME" Swish\-e \- A Search Engine .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 8 \& swish [-e] [-i dir file ... ] [-S system] [-c file] [-f file] [-l] [-v (num)] \& swish -w word1 word2 ... [-f file1 file2 ...] \e \& [-P phrase_delimiter] [-p prop1 ...] [-s sortprop1 [asc|desc] ...] \e \& [-m num] [-t str] [-d delim] [-H (num)] [-x output_format] \& swish -k (char|*) [-f file1 file2 ...] \& swish -M index1 index2 ... outputfile \& swish -N /path/to/compare/file \& swish -V .Ve .PP See the the \s-1\fISWISH\-RUN\s0\fR\|(1) man page for details on run-time options. .SH "DESCRIPTION" .IX Header "DESCRIPTION" Swish-e is Simple Web Indexing System for Humans \- Enhanced. Swish-e can quickly and easily index directories of files or remote web sites and search the generated indexes. .PP Swish-e is extremely fast in both indexing and searching, highly configurable, and can be seamlessly integrated with existing web sites to maintain a consistent design. Swish-e can index web pages, but can just as easily index text files, mailing list archives, or data stored in a relational database. .PP Swish is designed to index small to medium sized collection of documents, Although a few users are indexing over a million documents, typical usage is more often in the tens of thousands. Currently, Swish-e only indexes eight bit character encodings. .SH "DOCUMENTATION" .IX Header "DOCUMENTATION" Documentation is provided as \s-1HTML\s0 pages installed in \&\f(CW$prefix\fR/share/doc/swish\-e where \f(CW$prefix\fR is /usr/local if building from source, or /usr if installed as part of a package from your \s-1OS\s0 vendor. Under Windows \f(CW$prefix\fR is selected at installation time. .PP Documentation is also available on-line at http://swish\-e.org. .PP A subset of the documentation is installed as system man pages as well. The following man pages should be installed: .IP "\fIswish\-e\fR\|(1)" 4 .IX Item "swish-e" This man page. .IP "\s-1\fISWISH\-CONFIG\s0\fR\|(1)" 4 .IX Item "SWISH-CONFIG" Defines options that can be used in a configuration file. .IP "\s-1\fISWISH\-RUN\s0\fR\|(1)" 4 .IX Item "SWISH-RUN" Describes the run-time options and switches. .IP "\s-1\fISWISH\-FAQ\s0\fR\|(1)" 4 .IX Item "SWISH-FAQ" Answers to commonly asked questions. .IP "\s-1\fISWISH\-LIBRARY\s0\fR\|(1)" 4 .IX Item "SWISH-LIBRARY" \&\s-1API\s0 for the Swish-e search library. Applications can link against this library. .SH "SUPPORT" .IX Header "SUPPORT" Support for Swish-e is provide via the Swish-e discussion list. See http://swish\-e.org for information. ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/man/SWISH-CONFIG.1��������������������������������������������������������������������0000664�0000771�0001750�00000315777�11166010471�012546� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������.\" Automatically generated by Pod::Man v1.37, Pod::Parser v1.14 .\" .\" Standard preamble: .\" ======================================================================== .de Sh \" Subsection heading .br .if t .Sp .ne 5 .PP \fB\\$1\fR .PP .. .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. | will give a .\" real vertical bar. \*(C+ will give a nicer C++. Capital omega is used to .\" do unbreakable dashes and therefore won't be available. \*(C` and \*(C' .\" expand to `' in nroff, nothing in troff, for use with C<>. .tr \(*W-|\(bv\*(Tr .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.Sh), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .\" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .hy 0 .if n .na .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "SWISH-CONFIG 1" .TH SWISH-CONFIG 1 "2009-04-04" "2.4.7" "SWISH-E Documentation" .SH "NAME" SWISH\-CONFIG \- Configuration File Directives .SH "OVERVIEW" .IX Header "OVERVIEW" This document lists the available configuration directives available in Swish\-e. .SH "CONFIGURATION FILE" .IX Header "CONFIGURATION FILE" What files Swish-e indexes and how they are indexed, and where the index is written can be controlled by a configuration file. .PP The configuration file is a text file composed of comments, blank lines, and \fBconfiguration directives\fR. The order of the directives is not important. Some directives may be used more than once in the configuration file, while others can only be used once (e.g. additional directives will overwrite preceding directives). Case of the directive is not important \*(-- you may use upper, lower, or mixed case. .PP Comments are any line that begin with a \*(L"#\*(R". .PP .Vb 1 \& # This is a comment .Ve .PP As of 2.4.3 lines may be continued by placing a backslas as the last character on the line: .PP .Vb 4 \& IgnoreWords \e \& am \e \& the \e \& foo .Ve .PP Directives may take more than one parameter. Enclose single parameters that include whitespace in quotes (single or double). Inside of quotes the backslash escapes the next character. .PP .Vb 1 \& ReplaceRules append "foo bar" <- define "foo bar" as a single parameter .Ve .PP If you need to include a quote character in the value either use a backslash to escape it, or enclose it in quotes of the other type. .PP Backslashes also have special meaning in regular expressions. .PP .Vb 1 \& FileFilterMatch pdftotext "'%p' -" /\e.pdf$/ .Ve .PP This says that the dot is a real dot (instead of matching any character). If you place the regular expression in quotes then you must use double\-backslashes. .PP .Vb 1 \& FileFilterMatch pdftotext "'%p' -" "/\e\e.pdf$/" .Ve .PP Swish-e will convert the double backslash into a single backslash before passing the parameter to the regular expression compiler. .PP Commented example configuration files are included in the \fIconf\fR directory of the Swish-e distribution. .PP Some command line arguments can override directives specified in the configuration file. Please see also the SWISH-RUN for instructions on running Swish\-e, and the SWISH-SEARCH page for information and examples on how to search your index. .PP The configuration file is specified to Swish-e by the \f(CW\*(C`\-c\*(C'\fR switch. For example, .PP .Vb 1 \& swish-e -c myconfig.conf .Ve .PP You may also split your directives up into different configuration files. This allows you to have a master configuration file used for many different indexes, and smaller configuration files for each separate index. You can specify the different configuration files when running from the command line with the \f(CW\*(C`\-c\*(C'\fR switch (see SWISH-RUN), or you may include other Configuration file with the \fBIncludeConfigFile\fR directive below. .PP Typically, in a configuration file the directives are grouped together in some logical order \*(-- that is, directives that control the source of the documents would be grouped together first, and directives that control how each document is filtered or its words index in another group of directives. (The directives listed below are grouped in this order). .PP The configuration file directives are listed below in these groups: .IP "\(bu" 4 \&\*(L"Administrative Headers Directives\*(R" \*(-- You may add administrative information to the header of the index file. .IP "\(bu" 4 \&\*(L"Document Source Directives\*(R" \*(-- Directives for selecting the source documents and the location of the index file. .IP "\(bu" 4 \&\*(L"Document Contents Directives\*(R" \*(-- Directives that control how a document content is indexed. .IP "\(bu" 4 \&\*(L"Directives for the File Access method only\*(R" \*(-- These directives are only applicable to the File Access indexing method. .IP "\(bu" 4 \&\*(L"Directives for the \s-1HTTP\s0 Access Method Only\*(R" \*(-- Likewise, these only apply to the \s-1HTTP\s0 Access method. .IP "\(bu" 4 \&\*(L"Directives for the prog Access Method Only\*(R" \*(-- These only apply to the prog Access method. .IP "\(bu" 4 \&\*(L"Document Filter Directives\*(R" \*(-- This is a special section that describes using document filters with Swish\-e. .Sh "Alphabetical Listing of Directives" .IX Subsection "Alphabetical Listing of Directives" .IP "\(bu" 4 AbsoluteLinks [yes|NO] .IP "\(bu" 4 BeginCharacters *string of characters* .IP "\(bu" 4 BumpPositionCounterCharacters *string* .IP "\(bu" 4 Buzzwords [*list of buzzwords*|File: path] .IP "\(bu" 4 CompressPositions [yes|NO] .IP "\(bu" 4 ConvertHTMLEntities [YES|no] .IP "\(bu" 4 DefaultContents [TXT|HTML|XML|TXT2|HTML2|XML2|TXT*|HTML*|XML*] .IP "\(bu" 4 Delay *seconds* .IP "\(bu" 4 DontBumpPositionOnEndTags *list of names* .IP "\(bu" 4 DontBumpPositionOnStartTags *list of names* .IP "\(bu" 4 EnableAltSearchSyntax [yes|NO] .IP "\(bu" 4 EndCharacters *string of characters* .IP "\(bu" 4 EquivalentServer *server alias* .IP "\(bu" 4 ExtractPath *metaname* [replace|remove|prepend|append|regex] .IP "\(bu" 4 FileFilter *suffix* *program* [options] .IP "\(bu" 4 FileFilterMatch *program* *options* *regex* [*regex* ...] .IP "\(bu" 4 FileInfoCompression [yes|NO] .IP "\(bu" 4 FileMatch [contains|is|regex] *regular expression* .IP "\(bu" 4 FileRules [contains|is|regex] *regular expression* .IP "\(bu" 4 FuzzyIndexingMode [NONE|Stemming|Soundex|Metaphone|DoubleMetaphone] .IP "\(bu" 4 FollowSymLinks [yes|NO] .IP "\(bu" 4 HTMLLinksMetaName *metaname* .IP "\(bu" 4 IgnoreFirstChar *string of characters* .IP "\(bu" 4 IgnoreLastChar *string of characters* .IP "\(bu" 4 IgnoreLimit *integer integer* .IP "\(bu" 4 IgnoreMetaTags *list of names* .IP "\(bu" 4 IgnoreNumberChars *list of characters* .IP "\(bu" 4 IgnoreTotalWordCountWhenRanking [YES|no] .IP "\(bu" 4 IgnoreWords [*list of stop words*|File: path] .IP "\(bu" 4 ImageLinksMetaName *metaname* .IP "\(bu" 4 IncludeConfigFile .IP "\(bu" 4 IndexAdmin *text* .IP "\(bu" 4 IndexAltTagMetaName *tagname*|as\-text .IP "\(bu" 4 IndexComments [yes|NO] .IP "\(bu" 4 IndexContents [TXT|HTML|XML|TXT2|HTML2|XML2|TXT*|HTML*|XML*] *file extensions* .IP "\(bu" 4 IndexDescription *text* .IP "\(bu" 4 IndexDir [URL|directories or files] .IP "\(bu" 4 IndexFile *path* .IP "\(bu" 4 IndexName *text* .IP "\(bu" 4 IndexOnly *list of file suffixes* .IP "\(bu" 4 IndexPointer *text* .IP "\(bu" 4 IndexReport [0|1|2|3] .IP "\(bu" 4 MaxDepth *integer* .IP "\(bu" 4 MaxWordLimit *integer* .IP "\(bu" 4 MetaNameAlias *meta name* *list of aliases* .IP "\(bu" 4 MetaNames *list of names* .IP "\(bu" 4 MinWordLimit *integer* .IP "\(bu" 4 NoContents *list of file suffixes* .IP "\(bu" 4 obeyRobotsNoIndex [yes|NO] .IP "\(bu" 4 ParserWarnLevel [0|1|2|3] .IP "\(bu" 4 PreSortedIndex *list of property names* .IP "\(bu" 4 PropCompressionLevel [0\-9] .IP "\(bu" 4 PropertyNameAlias *property name* *list of aliases* .IP "\(bu" 4 PropertyNames *list of meta names* .IP "\(bu" 4 PropertyNamesCompareCase *list of meta names* .IP "\(bu" 4 PropertyNamesIgnoreCase *list of meta names* .IP "\(bu" 4 PropertyNamesNoStripChars *list of meta names* .IP "\(bu" 4 PropertyNamesDate *list of meta names* .IP "\(bu" 4 PropertyNamesNumeric *list of meta names* .IP "\(bu" 4 PropertyNamesMaxLength integer *list of meta names* .IP "\(bu" 4 PropertyNamesSortKeyLength integer *list of meta names* .IP "\(bu" 4 ReplaceRules [replace|remove|prepend|append|regex] .IP "\(bu" 4 ResultExtFormatName name \-x format string .IP "\(bu" 4 SpiderDirectory *path* .IP "\(bu" 4 StoreDescription [\s-1XML\s0 <tag>|HTML <meta>|TXT size] .IP "\(bu" 4 "SwishProgParameters *list of parameters* .IP "\(bu" 4 SwishSearchDefaultRule [<AND-WORD>|<or-word>] .IP "\(bu" 4 TmpDir *path* .IP "\(bu" 4 TranslateCharacters [*string1 string2*|:ascii7:] .IP "\(bu" 4 TruncateDocSize *number of characters* .IP "\(bu" 4 UndefinedMetaTags [error|ignore|INDEX|auto] .IP "\(bu" 4 UndefinedXMLAttributes [DISABLE|error|ignore|index|auto] .IP "\(bu" 4 UseStemming [yes|NO] .IP "\(bu" 4 UseSoundex [yes|NO] .IP "\(bu" 4 UseWords [*list of words*|File: path] .IP "\(bu" 4 WordCharacters *string of characters* .IP "\(bu" 4 XMLClassAttributes *list of \s-1XML\s0 attribute names* .Sh "Directives that Control Swish" .IX Subsection "Directives that Control Swish" These configuration directives control the general behavior of Swish\-e. .IP "IncludeConfigFile *path to config file*" 4 .IX Item "IncludeConfigFile *path to config file*" This directive can be used to include configuration directives located in another file. .Sp .Vb 1 \& IncludeConfigFile /usr/local/swish/conf/site_config.config .Ve .IP "IndexReport [0|1|2|3]" 4 .IX Item "IndexReport [0|1|2|3]" This is how detailed you want reporting while indexing. You can specify numbers 0 to 3. 0 is totally silent, 3 is the most verbose. The default is 1. .Sp This may be overridden from the command line via the \f(CW\*(C`\-v\*(C'\fR switch (see SWISH-RUN). .IP "ParserWarnLevel [0|1|2|3]" 4 .IX Item "ParserWarnLevel [0|1|2|3]" Sets the error level when using the libxml2 parser for \s-1XML\s0 and \s-1HTML\s0. libxml2 will point out structural errors in your documents. .Sp .Vb 4 \& 0 = no report \& 1 = fatal errors \& 2 = errors \& 3 = warnings .Ve .Sp Currently (as of 2.4.4 \- early 2005) libxml2 only reports errors at level 2. The default as of 2.4.4 is \*(L"2\*(R" which should report any errors that might indicate a problem parsing a document. .Sp The exception to this is \s-1UTF\-8\s0 to Latin\-1 conversion errors are reported at level 3 (changed from 1 in 2.4.4). Although these errors indicate a problem indexing text, they are only reported at level 3 because they can be very common. .Sp It is recommended that you index at ParserWarnLevel 3 when first starting out to see what errors and warnings are reported. Then reduce the level when you understand what documents are causing parsing problems and why. .IP "IndexFile *path*" 4 .IX Item "IndexFile *path*" Index file specifies the location of the generated index file. If not specified, Swish-e will create the file \fIindex.swish\-e\fR in the current directory. .Sp .Vb 1 \& IndexFile /usr/local/swish/site.index .Ve .IP "obeyRobotsNoIndex [yes|NO]" 4 .IX Item "obeyRobotsNoIndex [yes|NO]" When enabled, Swish-e will not index any \s-1HTML\s0 file that contains: .Sp .Vb 1 \& <meta name="robots" content="noindex"> .Ve .Sp The default is to ignore these meta tags and index the document. This tag is described at http://www.robotstxt.org/wc/exclusion.html. .Sp Note: This feature is only available with the libxml2 \s-1HTML\s0 parser. .Sp Also, if you are using the libxml2 parser (\s-1HTML2\s0 and \s-1XML2\s0) then you can use the following comments in your documents to prevent indexing: .Sp .Vb 2 \& <!-- SwishCommand noindex --> \& <!-- SwishCommand index --> .Ve .Sp and/or these may be used also: .Sp .Vb 2 \& <!-- noindex --> \& <!-- index --> .Ve .Sp For example, these are very helpful to prevent indexing of common headers, footers, and menus. .PP \&\fB\s-1NOTE\s0\fR: This following items are currently not available. These items require Swish-e to parse the configuration file while searching. .IP "EnableAltSearchSyntax [yes|NO]" 4 .IX Item "EnableAltSearchSyntax [yes|NO]" \&\fB\s-1NOTE\s0\fR: This following item is currently not available. .Sp Enable alternate search syntax. Allows the usage of a basic \&\*(L"Altavista(c)\*(R", \*(L"Lycos(c)\*(R", etc. like search syntax. This means a search query can contain \*(L"+\*(R" and \*(L"\-\*(R" as syntax parameter. .Sp Example: .Sp .Vb 4 \& swish-e -w "+word1 +word2 -word3 word4 word5" \& "+" = following word has to be in all found documents \& "-" = following word may not be in any document found \& " " = following word will be searched in documents .Ve .IP "SwishSearhOperators <and-word> <or-word> <not-word>" 4 .IX Item "SwishSearhOperators <and-word> <or-word> <not-word>" \&\fB\s-1NOTE\s0\fR: This following item is currently not available. .Sp Using this config directive you can change the boolean search operators of Swish\-e, e.g. to adapt these to your language. The default is: \s-1AND\s0 \s-1OR\s0 \s-1NOT\s0 .Sp Example (german): .Sp .Vb 1 \& SwishSearchOperators UND ODER NICHT .Ve .IP "SwishSearchDefaultRule [<AND-WORD>|<or-word>]" 4 .IX Item "SwishSearchDefaultRule [<AND-WORD>|<or-word>]" \&\fB\s-1NOTE\s0\fR: This following item is currently not available. .Sp \&\f(CW\*(C`SwishSearchDefaultRule\*(C'\fR defines the default Boolean operator to use if none is specified between words or phrases. The default is \f(CW\*(C`AND\*(C'\fR. .Sp The word you specify must match one of the available \f(CW\*(C`SwishSearchOperators\*(C'\fR. .Sp Example: .Sp .Vb 3 \& SwishSearchOperators UND ODER NICHT \& # Make it act like a web search engine \& SwishSearchDefaultRule ODER .Ve .IP "ResultExtFormatName name \-x format string" 4 .IX Item "ResultExtFormatName name -x format string" \&\fB\s-1NOTE\s0\fR: This following item is currently not available. .Sp The output of Swish-e can be defined by specifying a format string with the \&\f(CW\*(C`\-x\*(C'\fR command line argument. Using \f(CW\*(C`ResultExtFormatName\*(C'\fR you can assign a predefined format string to a name. .Sp Examples: .Sp .Vb 1 \& ResultExtFormatName moreinfo "%c|%r|%t|%p|<author>|<publishyear>\en" .Ve .Sp Then when searching you can specify the format string's name .Sp .Vb 1 \& swish-e ... -x moreinfo ... .Ve .Sp See the \f(CW\*(C`\-x\*(C'\fR switch in SWISH-RUN for more information about output formats. .Sh "Administrative Headers Directives" .IX Subsection "Administrative Headers Directives" Swish-e stores configuration information in the header of the index file. This information can be retrieved while searching or by functions in the Swish-e C library. There are a number of fields available for your own use. None of these fields are required: .IP "IndexName *text*" 4 .IX Item "IndexName *text*" .PD 0 .IP "IndexDescription *text*" 4 .IX Item "IndexDescription *text*" .IP "IndexPointer *text*" 4 .IX Item "IndexPointer *text*" .IP "IndexAdmin *text*" 4 .IX Item "IndexAdmin *text*" .PD These variables specify information that goes into index files to help users and administrators. IndexName should be the name of your index, like a book title. IndexDescription is a short description of the index or a \s-1URL\s0 pointing to a more full description. IndexPointer should be a pointer to the original information, most likely a \s-1URL\s0. IndexAdmin should be the name of the index maintainer and can include name and email information. These values should not be more than 70 or so characters and should be contained in quotes. Note that the automatically generated date in index files is in D/M/Y and 24\-hour format. .Sp Examples: .Sp .Vb 4 \& IndexName "Linux Documentation" \& IndexDescription "This is an index of /usr/doc on our Linux machine." \& IndexPointer http://localhost/swish/linux/index.html \& IndexAdmin webmaster .Ve .Sh "Document Source Directives" .IX Subsection "Document Source Directives" These directives control \fIwhat\fR documents are indexed and \fIhow\fR they are accessed. See also Directives for the File Access method only and Directives for the \s-1HTTP\s0 Access Method Only for directives that are specific to those access methods. .IP "IndexDir [directories or files|URL|external program]" 4 .IX Item "IndexDir [directories or files|URL|external program]" IndexDir defines the source of the documents for Swish\-e. Swish-e currently supports three file access methods: \fBFile system\fR, \fB\s-1HTTP\s0\fR (also called \fBspidering\fR), and \fBprog\fR for reading files from an external program. .Sp The \f(CW\*(C`\-S\*(C'\fR command line argument is used to select the file access method. .Sp .Vb 3 \& swish-e -c swish.config -S fs - file system \& swish-e -c swish.config -S http - internal http spider \& swish-e -c swish.config -S prog - external program of any type .Ve .Sp For the \fBfs\fR method of access \fBIndexDir\fR is a space-separated list of files and directories to index. Use a forward slash as the path separator in \s-1MS\s0 Windows. .Sp For the \fBhttp\fR method the \fBIndexDir\fR setting is a list of space-separated URLs. .Sp For the \fBprog\fR method the \fBIndexDir\fR setting is a list of space-separated programs to run (which generate documents for swish to index). .Sp You may specify more than one \fBIndexDir\fR directive. .Sp Any sub-directories of any listed directory will also be indexed. .Sp Note: While \fIprocessing\fR directories, Swish-e will ignore any files or directories that begin with a dot (\*(L".\*(R"). You may index files or directories that begin with a dot by specifying their name with \f(CW\*(C`IndexDir\*(C'\fR or \f(CW\*(C`\-i\*(C'\fR. .Sp Examples: .Sp .Vb 2 \& # Index this directory an any subdirectories \& IndexDir /usr/local/home/http .Ve .Sp .Vb 2 \& # Index the docs directory in current directory \& IndexDir ./docs .Ve .Sp .Vb 4 \& # Index these files in the current directory \& IndexDir ./index.html ./page1.html ./page2.html \& # and index this directory, too \& IndexDir ../public_html .Ve .Sp For the \fB\s-1HTTP\s0\fR method of access specify the \s-1URL\s0's from which you want the spidering to begin. .Sp Example: .Sp .Vb 2 \& IndexDir http://www.my-site.com/index.html \& IndexDir http://localhost/index.html .Ve .Sp Obviously, using the \fB\s-1HTTP\s0\fR method to index is \fBmuch\fR slower than indexing local files. Be well aware that some sites do not appreciate spidering and may block your \s-1IP\s0 address. You may wish to contact the remote site before spidering their web site. More information about spidering can be found in Directives for the \s-1HTTP\s0 Access Method Only below. .Sp For the prog method of access \fBIndexDir\fR specifies the path to the program(s) to execute. The external program must correctly format the documents being passed back to Swish\-e. Examples of external programs are provided in the \fIprog-bin\fR directory. .Sp .Vb 1 \& IndexDir ./myprogram.pl .Ve .Sp See prog for details. .Sp Note: Not all directives work with all methods. .IP "NoContents *list of file suffixes*" 4 .IX Item "NoContents *list of file suffixes*" Files with these suffixes will \fBnot\fR have their contents indexed, but will have their path name (file name) indexed instead. .Sp If the file's type is \s-1HTML\s0 or \s-1HTML2\s0 (as set by \f(CW\*(C`IndexContents\*(C'\fR or \&\f(CW\*(C`DefaultContents\*(C'\fR) then the file will be parsed for a \s-1HTML\s0 title and that title will be indexed. Note that you must set the file's type with \&\f(CW\*(C`IndexContents\*(C'\fR or \f(CW\*(C`DefaultContents\*(C'\fR: \f(CW\*(C`.html\*(C'\fR and \f(CW\*(C`.htm\*(C'\fR are \s-1NOT\s0 type \s-1HTML\s0 by default. For example: .Sp .Vb 1 \& IndexContents HTML* .htm .html .Ve .Sp If a title is found, it will still be checked for \f(CW\*(C`FileRules title\*(C'\fR, and the file will be skipped if a match is found. See \f(CW\*(C`FileRules\*(C'\fR. .Sp If the file's type is not \s-1HTML\s0, or it is \s-1HTML\s0 and no title is found, then the file's path will be indexed. .Sp For example, this will allow searching by image file name. .Sp .Vb 1 \& NoContents .gif .xbm .au .mov .mpg .pdf .ps .Ve .Sp Note: Using this directive will \fBnot\fR cause files with those suffixes to be indexed. That is, if you use \f(CW\*(C`IndexOnly\*(C'\fR to limit the types of files that are indexed, then you must specify in \f(CW\*(C`IndexOnly\*(C'\fR the same suffixes listed in \&\f(CW\*(C`NoContents\*(C'\fR. .Sp This does \fBnot\fR work: .Sp .Vb 3 \& # Wrong! \& IndexOnly .htm .html \& NoContents .gif .xbm .au .mov .mpg .pdf .ps .Ve .Sp A \f(CW\*(C`\-S prog\*(C'\fR program may set the \f(CW\*(C`No\-Contents:\*(C'\fR header to enable this feature for a specific document (although it would be smarter for the \f(CW\*(C`\-S prog\*(C'\fR program to simply only send the pathname or title to be indexed. .IP "ReplaceRules [replace|remove|prepend|append|regex]" 4 .IX Item "ReplaceRules [replace|remove|prepend|append|regex]" ReplaceRules allows you to make changes to file pathnames before they're indexed. These changed file names or URLs will be returned in search results. .Sp For example, you may index your files locally (with the File system indexing method), yet return a \s-1URL\s0 in search results. This directive can be used to map the file names to their respective URLs on your web server. .Sp There are five operations you can specify: \fBreplace\fR, \fBappend\fR, \&\fBremove\fR, \fBprepend\fR, and \fBregex\fR They will parse the pathname in the order you've typed these commands. .Sp This directive uses C library regex.h regular expressions. .Sp .Vb 5 \& replace "the string you want replaced" "what to change it to" \& remove "a string to remove" \& prepend "a string to add before the result" \& append "a string to add after the result" \& regex "/search string/replace string/options" .Ve .Sp Remember, quotes are needed if an expression contains white space, and backslashes have special meaning. .Sp Regex is an Extended Regular Expression. The first character found is the delimiter (but it's not smart enough to use matched chars such as [], (), and {}). .Sp The \fBreplace\fR string may use substitution variables: .Sp .Vb 4 \& $0 the entire matched (sub)string \& $1-$9 returns patterns captured in "(" ")" pairs \& $` the string before the matched pattern \& $' the string after the matched pattern .Ve .Sp The \fBoptions\fR change the behavior of expression: .Sp .Vb 2 \& i ignore the case when matching \& g repeat the substitution for the entire pattern .Ve .Sp Examples: .Sp .Vb 2 \& ReplaceRules replace testdir/ anotherdir/ \& ReplaceRules replace [a-z_0-9]*_m.*\e.html index.html .Ve .Sp .Vb 1 \& ReplaceRules remove testdir/ .Ve .Sp .Vb 2 \& ReplaceRules prepend http://localhost/ \& ReplaceRules append .html .Ve .Sp .Vb 5 \& ReplaceRules regex !^/web/(.+)/!http://$1.domain.com/! \& replaces a file path: \& /web/search/foo/index.html \& with \& http://search.domain.com/foo/index.html .Ve .Sp .Vb 2 \& ReplaceRules regex #^#http://localhost/www# \& ReplaceRules prepend http://localhost/www (same thing) .Ve .Sp .Vb 3 \& # Remove all extensions from C source files \& ReplaceRules remove .c # ERROR! That "." is *any char* \& ReplaceRules remove \e.c # much better... .Ve .Sp .Vb 2 \& ReplaceRules remove "\e\e.c" # if in quotes you need double-backslash! \& ReplaceRules remove "\e.c" # ERROR! "\e." -> "." and is *any char* .Ve .IP "IndexContents [TXT|HTML|XML|TXT2|HTML2|XML2|TXT*|HTML*|XML*] *file extensions*" 4 .IX Item "IndexContents [TXT|HTML|XML|TXT2|HTML2|XML2|TXT*|HTML*|XML*] *file extensions*" The \f(CW\*(C`IndexContents\*(C'\fR directive assigns one of Swish\-e's document parsers to a document, based on the its extension. Swish-e currently knows how to parse \&\s-1TXT\s0, \s-1HTML\s0, and \s-1XML\s0 documents. .Sp The \s-1XML2\s0, \s-1HTML2\s0, and \s-1TXT2\s0 parsers are currently only available when Swish-e is configured to use libxml2. .Sp You may use XML*, HTML*, and TXT* to select the parser automatically. If libxml2 is installed then it will be used to parse the content. Otherwise, Swish\-e's internal parsers will be used. .Sp Documents that are not assigned a parser with \f(CW\*(C`IndexContents\*(C'\fR will, by default, use the \s-1HTML2\s0 parser if libxml2 is installed, otherwise will use Swish\-e's internal \s-1HTML\s0 parser. The \f(CW\*(C`DefaultContents\*(C'\fR directive may be used to assign a parser to documents that do not match a file extension defined with the \f(CW\*(C`IndexContents\*(C'\fR directive. .Sp Example: .Sp .Vb 3 \& IndexContents HTML* .htm .html .shtml \& IndexContents TXT* .txt .log .text \& IndexContents XML* .xml .Ve .Sp HTML* is the default type for all files, unless otherwise specified (and this default can be changed by the \fBDefaultContents\fR directive. Swish-e parses titles from \s-1HTML\s0 files, if available, and keeps track of the context of the text for context searching (see \f(CW\*(C`\-t\*(C'\fR in SWISH-RUN). .Sp If using filters (with the \f(CW\*(C`FileFilter\*(C'\fR directive) to convert documents you should include those extensions, too. For example, if using a filter to convert .pdf to .html, you need to tell Swish-e that .pdf should be indexed by the internal \s-1HTML\s0 parser: .Sp .Vb 2 \& FileFilter .pdf pdf2html \& IndexContent HTML .pdf .Ve .Sp See also Document Filter Directives. .Sp \&\fBNote:\fR Some of this may be changed in the future to use content-types instead of file extensions. See \s-1SWISH\-3\s0.0 .IP "DefaultContents [TXT|HTML|XML|TXT2|HTML2|XML2|TXT*|HTML*|XML*]" 4 .IX Item "DefaultContents [TXT|HTML|XML|TXT2|HTML2|XML2|TXT*|HTML*|XML*]" This sets the default parser for documents that are not specified in \&\fBIndexContents\fR. If not specified the default is \s-1HTML\s0. .Sp The \s-1XML2\s0, \s-1HTML2\s0, and \s-1TXT2\s0 parsers are currently only available when Swish-e is configured to use libxml2. .Sp You may use XML*, HTML*, and TXT* to select the parser automatically. If libxml2 is installed then it will be used to parse the content. Otherwise, Swish\-e's internal parsers will be used. .Sp Example: .Sp .Vb 1 \& DefaultContents HTML .Ve .Sp The \f(CW\*(C`DefaultContents\*(C'\fR directive \fIshould\fR be used when spidering, as \s-1HTML\s0 files may be returned without a file extension (such as when requesting a directory and the default index.html is returned). .IP "FileInfoCompression [yes|NO]" 4 .IX Item "FileInfoCompression [yes|NO]" ** This directive is currently not supported ** .Sp Setting \fBFileInfoCompression\fR to \f(CW\*(C`yes\*(C'\fR will compress the index file to save disk space. This may result in longer indexing times. The default is \f(CW\*(C`no\*(C'\fR. .Sp Also see the \f(CW\*(C`\-e\*(C'\fR switch in SWISH-RUN for saving \s-1RAM\s0 during indexing. .Sh "Document Contents Directives" .IX Subsection "Document Contents Directives" These directives control what information is extracted from your source documents, and how that information is made available during searching. .IP "ConvertHTMLEntities [YES|no]" 4 .IX Item "ConvertHTMLEntities [YES|no]" \&\s-1ASCII\s0 \fIentities\fR can be converted automatically while indexing documents of type \s-1HTML\s0 (not for \s-1HTML2\s0). For performance reasons you may wish to set this to \&\f(CW\*(C`no\*(C'\fR if your documents do not contain \s-1HTML\s0 entities. The default is \f(CW\*(C`yes\*(C'\fR. .Sp If \f(CW\*(C`ConvertHTMLEntities\*(C'\fR is set \f(CW\*(C`no\*(C'\fR the entities will be indexed without conversion. .Sp \&\fB\s-1NOTE:\s0\fR Entities within \s-1XML\s0 files and files parsed with libxml2 (\s-1HTML2\s0) are converted regardless of this setting. .IP "MetaNames *list of names*" 4 .IX Item "MetaNames *list of names*" \&\s-1META\s0 names are a way to define \*(L"fields\*(R" in your \s-1XML\s0 and \s-1HTML\s0 documents. You can use the \s-1META\s0 names in your queries to limit the search to just the words contained in that \s-1META\s0 name of your document. For example, you might have a \&\s-1META\s0 tagged field in your documents called \f(CW\*(C`subjects\*(C'\fR and then you can search your documents for the word \*(L"foo\*(R" but only return documents where \*(L"foo\*(R" is within the \f(CW\*(C`subjects\*(C'\fR \s-1META\s0 tag. .Sp .Vb 1 \& swish-e -w subjects=foo .Ve .Sp (See also the \f(CW\*(C`\-t\*(C'\fR switch in SWISH-RUN for information about \&\fIcontext\fR searching in \s-1HTML\s0 documents.) .Sp The \fBMetaNames\fR directive is a space separated list. For example: .Sp .Vb 1 \& MetaNames meta1 meta2 keywords subjects .Ve .Sp You may also use \f(CW\*(C`UndefinedMetaTags\*(C'\fR to specify automatic extraction of meta names from your \s-1HTML\s0 and \s-1XML\s0 documents, and also to ignore indexing content of meta tags. .Sp \&\s-1META\s0 tags can have two formats in your \fB\s-1HTML\s0\fR source documents: .Sp .Vb 1 \& <META NAME="meta1" CONTENT="some content"> .Ve .Sp and (if using the HTML2/libxml2 parser) .Sp .Vb 3 \& <meta1> \& some content \& </meta1> .Ve .Sp But this second version is invalid \s-1HTML\s0, and will generate a warning if ParserWarningLevel is set (libxml2 only). .Sp And in \fB\s-1XML\s0\fR documents, use the format: .Sp .Vb 3 \& <meta1> \& Some Content \& </meta1> .Ve .Sp Then you can limit your search to just \s-1META\s0 \fBmeta1\fR like this: .Sp .Vb 1 \& swish-e -w 'meta1=(apples or oranges)' .Ve .Sp You may nest the \s-1XML\s0 and the start/end tag versions: .Sp .Vb 8 \& <keywords> \& <tag1> \& some content \& </tag1> \& <tag2> \& some other content \& </tag2> \& <keywords> .Ve .Sp Then you can search in both tag2 and tag2 with: .Sp .Vb 1 \& swish-e -w 'keywords=(query words)' .Ve .Sp Swish-e indexes all text as some metaname. The default is \f(CW\*(C`swishdefault\*(C'\fR, so these two queries are the same: .Sp .Vb 2 \& swish-e -w foo \& swish-e -w swishdefault=foo .Ve .Sp When indexing \s-1HTML\s0 Swish-e indexes the \s-1HTML\s0 title as default text, so when searching Swish-e will find matches in both the \s-1HTML\s0 body and the \&\s-1HTML\s0 title. Swish also, by default, indexes content of meta tags. So: .Sp .Vb 1 \& swish-e -w foo .Ve .Sp will find \*(L"foo\*(R" in the body, the title, or any meta tags. .Sp Currently, there's no way to prevent Swish-e from indexing the title contents along with the body contents, but see \f(CW\*(C`UndefinedMetaTags\*(C'\fR for how to control the indexing of meta tags. .Sp If you would like to search just the title text, you may use: .Sp .Vb 1 \& MetaNames swishtitle .Ve .Sp This will index the title text separately under the built-in swish internal meta name \*(L"swishtitle\*(R". You may then search like .Sp .Vb 2 \& swish-e -w foo -- search for "foo" in title, body (and undefined meta tags) \& swish-e -w swishtitle=foo -- search for "foo" in title only .Ve .Sp In addition to swishtitle, you can limit searches to documents' path with: .Sp .Vb 1 \& MetaNames swishdocpath .Ve .Sp Then to search for \*(L"foo\*(R" but also limit searches to documents that include \&\*(L"manual\*(R" or \*(L"tutorial\*(R" in their path: .Sp .Vb 1 \& swish-e -w foo swishdocpath=(manual or tutorial) .Ve .Sp See also \f(CW\*(C`ExtractPath\*(C'\fR. .IP "MetaNameAlias *meta name* *list of aliases*" 4 .IX Item "MetaNameAlias *meta name* *list of aliases*" MetaNameAlias assigns aliases for a meta name. For example, if your documents contain meta tags \*(L"description\*(R", \*(L"summary\*(R", and \*(L"overview\*(R" that all give a summary of your documents you could do this: .Sp .Vb 2 \& MetaNames summary \& MetaNameAlias summary description overview .Ve .Sp Then all three tags will get indexed as meta tag \*(L"summary\*(R". You can then search all the fields as: .Sp .Vb 1 \& -w summary=foo .Ve .Sp The Alias work at search time, too. So these will also limit the search to the \*(L"summary\*(R" meta name. .Sp .Vb 2 \& -w description=foo \& -w overview=foo .Ve .IP "MetaNamesRank integer *list of meta names*" 4 .IX Item "MetaNamesRank integer *list of meta names*" You can assign a bias to metanames that will affect how ranking is calculated. The range of values is from \-10 to +10, with zero being no bias. .Sp .Vb 4 \& MetaNamesRank 4 subject \& MetaNamesRank 3 swishdefault \& MetaNamesRank 2 author publisher \& MetaNamesRank -5 wrongwords .Ve .Sp This feature is still considered experimental. If you use it, please send feedback to the discussion list. .IP "HTMLLinksMetaName *metaname*" 4 .IX Item "HTMLLinksMetaName *metaname*" Allows indexing of \s-1HTML\s0 links. Normally, \s-1HTML\s0 links (href tags) are not indexed by Swish\-e. This directive defines a metaname, and links will be indexed under this meta name. .Sp Example: .Sp .Vb 1 \& HTMLLinksMetaName links .Ve .Sp Now, to limit searches to files with a link to \*(L"home.html\*(R" do this: .Sp .Vb 1 \& -w links='"home.html"' .Ve .Sp The double quotes force a phrase search. .Sp To make Swish-e index links as normal text, you may use: .Sp .Vb 1 \& HTMLLinksMetaName swishdefault .Ve .Sp This feature is only available with the libxml2 \s-1HTML\s0 parser. .IP "ImageLinksMetaName *metaname*" 4 .IX Item "ImageLinksMetaName *metaname*" Allows indexing of image links under a metaname. Normally, image URLs are not indexed. .Sp Example: .Sp .Vb 1 \& ImagesLinksMetaName images .Ve .Sp Now, if you would like to find pages that include a nice image of a beach: .Sp .Vb 1 \& -w images='beach' .Ve .Sp To make Swish-e index links as normal text, you may use: .Sp .Vb 1 \& ImageLinksMetaName swishdefault .Ve .Sp This feature is only available with the libxml2 \s-1HTML\s0 parser. .IP "IndexAltTagMetaName *tagname*|as\-text" 4 .IX Item "IndexAltTagMetaName *tagname*|as-text" Allows indexing of images <\s-1IMG\s0> \s-1ALT\s0 tag text. Specify either a tag name which will be used as a metaname, or the special text \*(L"as\-text\*(R" which says to index the \s-1ALT\s0 text as if it were plain text at the current location. .Sp For example, by specifying a tag name: .Sp .Vb 1 \& IndexAltTagMetaName bar .Ve .Sp would make this markup: .Sp .Vb 3 \& <foo> \& <img src="/someimage.png" alt="Alt text here"> \& </foo> .Ve .Sp appear like .Sp .Vb 3 \& <foo> \& <bar>Alt text here</bar> \& </foo> .Ve .Sp Then the normal rules (\f(CW\*(C`MetaNames\*(C'\fR and \f(CW\*(C`PropertyNames\*(C'\fR) apply to how that text is indexed. .Sp If you use the special tag \*(L"as\-text\*(R" then .Sp .Vb 3 \& <foo> \& <img src="/someimage.png" alt="Alt text here"> \& </foo> .Ve .Sp simply becomes .Sp .Vb 3 \& <foo> \& Alt text here \& </foo> .Ve .Sp This feature is only available when using the libxml2 parser (\s-1HTML2\s0 and \s-1XML2\s0). .IP "AbsoluteLinks [yes|NO]" 4 .IX Item "AbsoluteLinks [yes|NO]" If this is set true then Swish-e will attempt to convert relative URIs extracted from \s-1HTML\s0 documents for use with \f(CW\*(C`HTMLLinksMetaName\*(C'\fR and \&\f(CW\*(C`ImageLinksMetaName\*(C'\fR into absolute URIs. Swish-e will use any <\s-1BASE\s0> tag found in the document, otherwise it will use the file's pathname. The pathname used will be the pathname *after* \f(CW\*(C`ReplaceRules\*(C'\fR has been applied to the document's pathname. .Sp For example, say you wish to index image links under the metaname \&\*(L"images\*(R". .Sp .Vb 1 \& ImageLinksMetaName images .Ve .Sp If an image is located in http://localhost/vacations/france/index.html and \&\f(CW\*(C`AbsoluteLinks\*(C'\fR is set to no, then a image within that document: .Sp .Vb 1 \& <img src="beach.jpeg"> .Ve .Sp will only index \*(L"beach.jpeg\*(R". .Sp But, if you want more detail when searching, you can enable \f(CW\*(C`AbsoluteLinks\*(C'\fR and Swish-e will index \*(L"http://localhost/vacations/france/beach.jpeg\*(R". You can then look for images of beaches, but only in France: .Sp .Vb 1 \& -w images=(beach and france) .Ve .Sp This also means you can search for any images within France: .Sp .Vb 1 \& -w images=(france) .Ve .Sp This feature is only available with the libxml2 \s-1HTML\s0 parser. .IP "UndefinedMetaTags [error|ignore|INDEX|auto]" 4 .IX Item "UndefinedMetaTags [error|ignore|INDEX|auto]" This directive defines the behavior of Swish-e during indexing when a meta name is found but is \fBnot\fR listed in \fBMetaNames\fR. There are four choices: .RS 4 .IP "error" 2 .IX Item "error" If a meta name is found that is not listed in \fBMetaNames\fR then indexing will be halted and an error reported. .IP "ignore" 2 .IX Item "ignore" The contents of the meta tag are ignored and \fBnot\fR indexed unless a metaname has been defined with the \f(CW\*(C`MetaNames\*(C'\fR directive. .IP "index" 2 .IX Item "index" The contents of the meta tag are indexed, but placed in the main index unless there's an enclosing metatag already in force. This is the default. .IP "auto" 2 .IX Item "auto" This method create meta tags automatically for \s-1HTML\s0 meta names and \s-1XML\s0 elements. Using this is the same as specifying all the meta names explicitly in a \fBMetaNames\fR directive. .RE .RS 4 .RE .IP "UndefinedXMLAttributes [DISABLE|error|ignore|index|auto]" 4 .IX Item "UndefinedXMLAttributes [DISABLE|error|ignore|index|auto]" This is similar to \f(CW\*(C`UndefinedMetaTags\*(C'\fR, but only applies to \s-1XML\s0 documents (parsed with libxml2). This allows indexing of attribute content, and provides a way to index the content under a metaname. For example, \&\f(CW\*(C`UndefinedXMLAttributes\*(C'\fR can make .Sp .Vb 3 \& <person age="23"> \& John Doe \& </person> .Ve .Sp look like the following to swish: .Sp .Vb 6 \& <person> \& <person.age> \& 23 \& </person.age> \& John Doe \& </person> .Ve .Sp What happens to the text \*(L"23\*(R" will depend on the setting of \&\f(CW\*(C`UndefinedXMLAttributes\*(C'\fR: .RS 4 .IP "disable" 2 .IX Item "disable" \&\s-1XML\s0 attributes are not parsed and not indexed. This is the default. .IP "error" 2 .IX Item "error" If the concatenated meta name (e.g. person.age) is not listed in \&\fBMetaNames\fR then indexing will be halted and an error reported. .IP "ignore" 2 .IX Item "ignore" The contents of the meta tag are ignored and \fBnot\fR indexed unless a metaname has been defined with the \f(CW\*(C`MetaNames\*(C'\fR directive. .IP "index" 2 .IX Item "index" The contents of the meta tag are indexed, but placed in the main index unless there's an enclosing metatag already in force. .IP "auto" 2 .IX Item "auto" This method will create meta tags from the combined element and attributes (and \s-1XML\s0 Class name) This options should be used with caution as it can generate a lot of metaname entries. .Sp See also the example below \f(CW\*(C`XMLClassAttribues\*(C'\fR. .RE .RS 4 .RE .IP "XMLClassAttributes *list of \s-1XML\s0 attribute names*" 4 .IX Item "XMLClassAttributes *list of XML attribute names*" Combines an \s-1XML\s0 class name with the element name to make up a metaname. For example: .Sp .Vb 1 \& XMLClassAttributes class .Ve .Sp .Vb 6 \& <person class="first"> \& John \& </person> \& <person class="last"> \& Doe \& </person> .Ve .Sp Will appear to Swish-e as: .Sp .Vb 10 \& <person> \& <person.first> \& John \& </person.first> \& </person> \& <person> \& <person.last> \& Doe \& </person.last> \& </person> .Ve .Sp How the data is indexed depends on \f(CW\*(C`MetaNames\*(C'\fR and \f(CW\*(C`UndefinedMetaTags\*(C'\fR. .Sp Here's an example using the following configuration which combines the two directives \f(CW\*(C`XMLClassAttributes\*(C'\fR and \f(CW\*(C`UndefinedXMLAttributes\*(C'\fR. .Sp .Vb 4 \& XMLClassAttributes class \& UndefinedMetaTags auto \& UndefinedXMLAttributes auto \& IndexContents XML2 .xml .Ve .Sp The source \s-1XML\s0 file looks like: .Sp .Vb 2 \& <xml> <person class="student" phone="555-1212" age="102"> John </person> \& <person greeting="howdy">Bill</person> </xml> .Ve .Sp Swish-e parses as: .Sp .Vb 2 \& ./swish-e -c 2 -i 1.xml -T parsed_tags parsed_text -v 0 \& Indexing Data Source: "File-System" .Ve .Sp .Vb 1 \& <xml> (MetaName) .Ve .Sp .Vb 10 \& <person> (MetaName) \& <person.student> (MetaName) \& <person.student.phone> (MetaName) \& 555-1212 \& </person.student.phone> \& <person.student.age> (MetaName) \& 102 \& </person.student.age> \& John \& </person> .Ve .Sp .Vb 6 \& <person> (MetaName) \& <person.greeting> (MetaName) \& howdy \& </person.greeting> \& Bill \& </person> .Ve .Sp .Vb 2 \& </xml> \& Indexing done! .Ve .Sp One thing to note is that the first <person> block finds a class name \&\*(L"student\*(R" so all metanames that are created from attributes use the combined name \*(L"person.student\*(R". The second <person> block doesn't contain a \*(L"class\*(R" so, the attribute name is combined directly with the element name (e.g. \*(L"person.greeting\*(R"). .IP "ExtractPath *metaname* [replace|remove|prepend|append|regex]" 4 .IX Item "ExtractPath *metaname* [replace|remove|prepend|append|regex]" This directive can be used to index extracted parts of a document's path. A common use would be to limit searches to specific areas of your file tree. .Sp The extracted string will be indexed under the specified meta name. .Sp See \f(CW\*(C`ReplaceRules\*(C'\fR for a description of the various pattern replacement methods, but you will use the \fIregex\fR method. .Sp For example, say your file system (or web tree) was organized into departments: .Sp .Vb 3 \& /web/sales/foo... \& /web/parts/foo... \& /web/accounting/foo... .Ve .Sp And you wanted a way to limit searches to just documents under \*(L"sales\*(R". .Sp .Vb 1 \& ExtractPath department regex !^/web/([^/]+)/.*$!$1! .Ve .Sp Which says, extract out the department name (as substring \f(CW$1\fR) and index it as meta name \f(CW\*(C`department\*(C'\fR. Then to limit a search to the sales department: .Sp .Vb 1 \& swish-e -w foo AND department=sales .Ve .Sp Note that the \f(CW\*(C`regex\*(C'\fR method uses a substitution pattern, so to index only a sub-string match the \fIentire\fR document path in the regular expression, as shown above. Otherwise any part that is not matched will end up in the substitution pattern. .Sp See the \f(CW\*(C`ExtractPathDefault\*(C'\fR option for a way to set a value if not patterns match. .Sp Although unlikely, you may use more than one \f(CW\*(C`ExtractPath\*(C'\fR directive. More than one directive of the \fIsame\fR meta name will operate successively (in order listed in the configuration file) on the path. This allows you to use regular expressions on the results of the previous pattern substitution (as if piping the output from one expression to the patter of the next). .Sp .Vb 2 \& ExtractPath foo regex !^(...).+$!$1! \& ExtractPath foo regex !^.+(.)$!$1! .Ve .Sp So, the third letter is indexed as meta name \*(L"foo\*(R" if both patterns match. .Sp .Vb 2 \& ExtractPath foo regex !^X(...).+$!$1! \& ExtractPath foo regex !^.+(.)$!$1! .Ve .Sp Now (not the \*(L"X\*(R"), if the first pattern doesn't match, the last character of the path name is indexed. You must be clear on this behavior if you are using more than one \f(CW\*(C`ExtractPath\*(C'\fR directive with the same metaname. .Sp The document path operated on is the real path swish used to access the document. That is, the \f(CW\*(C`ReplaceRules\*(C'\fR directive has no effect on the path used with \f(CW\*(C`ExtractPath\*(C'\fR. .Sp The full path is used for each meta name if more than one \f(CW\*(C`ExtractPath\*(C'\fR directive is used. That is, changes to the path used in \f(CW\*(C`ExtractPath foo\*(C'\fR do not affect the path used by \f(CW\*(C`ExtractPath bar\*(C'\fR. .IP "ExtractPathDefault *metaname* default_value" 4 .IX Item "ExtractPathDefault *metaname* default_value" This can be used with \f(CW\*(C`ExtractPath\*(C'\fR to set a default string to index under the given metaname if none of the \f(CW\*(C`ExtractPath\*(C'\fR patterns match. .Sp For example, say your want to index each document with a metaname \&\*(L"department\*(R" based on the following path examples: .Sp .Vb 3 \& /web/sales/foo... \& /web/parts/foo... \& /web/accounting/foo... .Ve .Sp But you are also indexing documents that do not follow that pattern and you want to search those separately, too. .Sp .Vb 2 \& ExtractPath department regex !^/web/([^/]+)/.*$!$1! \& ExtractPathDefault department other .Ve .Sp Now, you may search like this: .Sp .Vb 4 \& -w foo department=(sales) - limit searches to the sales documents \& -w foo department=(parts) - limit searches to the parts documents \& -w foo department=(accounting) - limit searches to the accounting documents \& -w foo department=(other) - everything but sales, parts, and accounting. .Ve .Sp This basically is a shortcut for: .Sp .Vb 1 \& -w foo not department=(sales or parts or accounting) .Ve .Sp but you don't need to keep track of what was extracted. .IP "PropertyNames *list of meta names*" 4 .IX Item "PropertyNames *list of meta names*" .PD 0 .IP "PropertyNamesCompareCase *list of meta names*" 4 .IX Item "PropertyNamesCompareCase *list of meta names*" .IP "PropertyNamesIgnoreCase *list of meta names*" 4 .IX Item "PropertyNamesIgnoreCase *list of meta names*" .PD Swish-e allows you to specify certain \s-1META\s0 tags that can be used as \fBdocument properties\fR. The contents of any \s-1META\s0 tag that has been identified as a document property can be returned as part of the search results along with the rank, file name, title, and document size (see the \f(CW\*(C`\-p\*(C'\fR and \f(CW\*(C`\-x\*(C'\fR switches in SWISH-RUN). .Sp Properties are useful for returning additional data from documents in search results \*(-- this saves the effort of reading and parsing the source files while reading Swish-e search results, and is especially useful when the source documents are no longer available or slow to access (e.g. over http). .Sp Another feature of properties is that Swish-e can use the PropertyNames for sorting the search results (see the \f(CW\*(C`\-s\*(C'\fR switch). .Sp .Vb 1 \& PropertyNames author subjects .Ve .Sp Two variations are available. \f(CW\*(C`PropertyNamesCompareCase\*(C'\fR and \&\f(CW\*(C`PropertyNamesIgnoreCase\*(C'\fR. These tell Swish-e to either ignore or compare case when sorting results. The default for \f(CW\*(C`PropertyNames\*(C'\fR is to ignore the case. .Sp .Vb 2 \& PropertyNamesIgnoreCase subject \& PropertyNamesCompareCase keyword .Ve .Sp The defaults for \*(L"internal\*(R" properties are: .Sp .Vb 3 \& swishtitle -- ignore the case \& swishdocpath -- compare case \& swishdescription -- compare case .Ve .Sp These can be overridden with \f(CW\*(C`PropertyNamesCompareCase\*(C'\fR and \&\f(CW\*(C`PropertyNamesIgnoreCase\*(C'\fR. .Sp .Vb 1 \& PropertyNamesCompareCase swishtitle .Ve .Sp Use of PropertyNames will increase the size of your index files, sometimes significantly. Properties will be compressed if Swish-e is compiled with zlib as described in the \s-1INSTALL\s0 manual page. .Sp If Swish-e finds more than one property of the same name in a document the property's contents will be concatinated for strings, and a warning issues for numeric (or date) properties. .IP "PropertyNamesNoStripChars" 4 .IX Item "PropertyNamesNoStripChars" PropertyNamesNoStripChars specifies that the listed properties should not have strings of low \s-1ASCII\s0 characters replaced with a space character. Properties will be stored as found in the document. .Sp When printing properties with the swish-e binary newlines are replaced with a space character. Use the swish-e library (or \s-1SWISH::API\s0 perl module) to fetch properties without newlines replaced. .IP "PropertyNamesNumeric" 4 .IX Item "PropertyNamesNumeric" This directive is similar to \f(CW\*(C`PropertyNames\*(C'\fR, but it flags the property as being a string of digits (integer value) that will be stored as binary data instead of a string. This allows sorting with \f(CW\*(C`\-s\*(C'\fR and limiting with \f(CW\*(C`\-L\*(C'\fR to sort and limit the property correctly. .Sp Swish-e uses \f(CWstrtoul(3)\fR to convert the string into an unsigned long integer. Therefore, only positive integers can be stored. .Sp Future versions of Swish-e may be able to store different property types (such as negative integers and real numbers). This directive may change in future releases of Swish. .IP "PropertyNamesDate" 4 .IX Item "PropertyNamesDate" This directive is exactly like \f(CW\*(C`PropertyNamesNumeric\*(C'\fR, but it also flags the number as a machine timestamp (seconds since Epoch), and will print a formatted date when returning this property. See \f(CW\*(C`\-x\*(C'\fR in SWISH-RUN. .Sp Swish-e will not parse dates when indexing; you must use a timestamp. .IP "PropertyNameAlias *property name* *list of aliases*" 4 .IX Item "PropertyNameAlias *property name* *list of aliases*" This allows aliases for a property name. For example, if you are indexing \&\s-1HTML\s0 files, plus \s-1XML\s0 files that are written in English, German, and Spanish and thus use the tags \*(L"title\*(R", \*(L"titel\*(R", and \*(L"título\*(R" you can use: .Sp .Vb 1 \& PropertyNameAlias swishtitle title titel título titulo .Ve .Sp Note that \*(L"swishtitle\*(R" is the built-in property used to store the title of a document, and therefore you do not need to specify it as a PropertyName before use. .IP "PropertyNamesMaxLength integer *list of meta names*" 4 .IX Item "PropertyNamesMaxLength integer *list of meta names*" This option will set the max length of the text stored in a property. You must specify a number between 0 and the max integer size on your platform, and a list of properties. The properties specified must not be aliases. .Sp If any of the property names do not exist they will be created (e.g. you do not need to define the property with PropertyNames first). .Sp In general, this feature will only be useful when parsing \s-1HTML\s0 or \s-1XML\s0 with the libxml2 parser. .Sp For example: .Sp .Vb 2 \& PropertyNamesMaxLength 1000 swishdescription \& PropertyNameAlias swishdescription body .Ve .Sp Is somewhat like .Sp .Vb 4 \& StoreDescription HTML <body> 1000 \& StoreDescription XML <body> 1000 \& StoreDescription HTML2 <body> 1000 \& StoreDescription XML2 <body> 1000 .Ve .Sp but StoreDescription allows setting the tag for each parser type. .Sp .Vb 2 \& PropertyNamesMaxLength 1000 headings \& PropertyNameAlias headings h1 h2 h3 h4 .Ve .Sp collects all the heading text into a single property called \*(L"headings\*(R", not to exceed 1000 characters. .IP "PropertyNamesSortKeyLength integer *list of meta names*" 4 .IX Item "PropertyNamesSortKeyLength integer *list of meta names*" Sets the length of the string used when sorting. The default is 100 characters. The \-T metanames debugging option will list the current values for an index. .Sp This setting is used when sorting during indexing, and perhaps when sorting while searching. It also effects the order when limiting to a range of values with the \-L option. .IP "PreSortedIndex *list of property names*" 4 .IX Item "PreSortedIndex *list of property names*" By default Swish-e generates presorted tables while indexing for each property name. This allows faster sorting when generating results. On large document collections this presorting may add to the indexing time, and also adds to the total size of the index. This directive can be used to customize exactly which properties will be presorted. .Sp If \f(CW\*(C`PreSortedIndex\*(C'\fR it is \fInot\fR present in the config file (default action), all the properties will be presorted at indexing time. If it is present without any parameter, no properties will be presorted. Otherwise, only the property names specified will be presorted. .Sp For example, if you only wish to sort results by a property called \f(CW\*(C`title\*(C'\fR: .Sp .Vb 2 \& PropertyNames title age time \& PreSortedIndex title .Ve .IP "StoreDescription [\s-1XML\s0 <tag> size|HTML <meta> size|TXT size]" 4 .IX Item "StoreDescription [XML <tag> size|HTML <meta> size|TXT size]" \&\fBStoreDescription\fR allows you to store a document description in the index file. This description can be returned in your search results when the \f(CW\*(C`\-x\*(C'\fR switch is used to include the \fIswishdescription\fR for extended results, or by using \f(CW\*(C`\-p swishdescription\*(C'\fR. .Sp The document type (\s-1XML\s0, \s-1HTML\s0 and \s-1TXT\s0) must match the document type currently being indexed as set by \f(CW\*(C`IndexContents\*(C'\fR or \f(CW\*(C`DefaultContents\*(C'\fR. See those directives for possible values. A common problem is using \f(CW\*(C`StoreDescription\*(C'\fR yet not setting the document's type with \f(CW\*(C`IndexContents\*(C'\fR or \&\f(CW\*(C`DefaultContents\*(C'\fR. Another problem is different types: .Sp .Vb 2 \& IndexContents HTML2 .html \& StoreDescription HTML <body> .Ve .Sp Then .html documents are assigned a type of \s-1HTML2\s0 (and parsed by the libxml2 parser), but the description will not be stored since it is type \s-1HTML\s0 instead of \s-1HTML2\s0. .Sp For text documents you specify the type \s-1TXT\s0 (or \s-1TXT2\s0 or TXT*) and the number of \fIcharacters\fR to capture. .Sp .Vb 1 \& StoreDescription TXT 20 .Ve .Sp The above stores only the first twenty characters from the text file in the Swish-e index file. .Sp For \s-1HTML\s0, and \s-1XML\s0 file types, specify the tag to use for the description, and optionally the number of characters to capture. If not specified will capture the entire contents of the tag. .Sp .Vb 2 \& StoreDescription HTML <body> 20000 \& StoreDescription XML <desc> 40 .Ve .Sp Again, note that documents must be assigned a document type with \&\f(CW\*(C`IndexContents\*(C'\fR or \f(CW\*(C`DefaultContents\*(C'\fR to use this feature. .Sp Swish-e will compress the descriptions (or any other large property) if compiled to use zlib (see \s-1INSTALL\s0). This is recommended when using StoreDescription and a large number of documents. Compression of 30% to 50% is not uncommon with \s-1HTML\s0 files. .IP "PropCompressionLevel [0\-9]" 4 .IX Item "PropCompressionLevel [0-9]" This directive sets the compression level used when storing properties to disk. A setting of zero is no compression, and a setting of nine is the most compression. .Sp The default depends on the default setting compiled with zlib, but is typically six. .Sp This option is useful when using \f(CW\*(C`StoreDescription\*(C'\fR to store a large amount text in properties (or if using \f(CW\*(C`PropertyNames\*(C'\fR with large property sizes). .Sp Properties must be over a value defined in \fIconfig.h\fR (100 is the default) before compression will be attempted. Swish-e will never store the results of the compression if the compressed data is larger than the original data. .Sp This option is only available when Swish-e is compiled with zlib support. .IP "TruncateDocSize *number of characters*" 4 .IX Item "TruncateDocSize *number of characters*" TruncateDocSize limits the size of a document while indexing documents and/or using filters. This config directive truncates the numbers of read bytes of a document to the specified size. This means: if a document is larger, read only the specified numbers of bytes of the document. .Sp Example: .Sp .Vb 1 \& TruncateDocSize 10000000 .Ve .Sp The default is zero, which means read all data. .Sp Warning: If you use TruncateDocSize, use it with care! TruncateDocSize is a safety belt only, to limit e.g. filteroutput, when accessing databases, or to limit \*(L"runnaway\*(R" filters. Truncating doc input may destroy document structures for Swish-e (e.g. swish may miss closing tags for \s-1XML\s0 or \s-1HTML\s0 documents). .Sp TruncateDocSize does not currently work with the \f(CW\*(C`prog\*(C'\fR input source method. .IP "FuzzyIndexingMode NONE|Stemming|Soundex|Metaphone|DoubleMetaphone" 4 .IX Item "FuzzyIndexingMode NONE|Stemming|Soundex|Metaphone|DoubleMetaphone" Selects the type of index to create. Only one type of index may be created. .Sp It's a good idea to create both a normal index and a fuzzy index and allow your search interface select which index to use. Many people find the fuzzy searches to be too fuzzy. .Sp The available fuzzy indexing options can be displayed by running .Sp .Vb 1 \& swish-e -T LIST_FUZZY_MODES .Ve .Sp Available options include: .RS 4 .IP "None" 4 .IX Item "None" Words are stored in the index without any conversion. This is the default. .IP "Stemming_*" 4 .IX Item "Stemming_*" This options uses one of the installed Snowball stemmers (http://snowball.tartarus.org/). .Sp The installed stemmers can be viewed by running .Sp .Vb 1 \& swish-e -T LIST_FUZZY_MODES .Ve .Sp For example, to use the Spanish stemming module: .Sp .Vb 1 \& FuzzyIndexingMode Stemming_es .Ve .IP "Stem or Stemming_en" 4 .IX Item "Stem or Stemming_en" \&\fB**This option is no longer supported.**\fR .Sp Selects the legacy Swish-e English stemmer. .Sp This is deprecated in favor of the Snowball English stemmer Stemming_en1. .Sp Words are converted using the Porter stemming algorithm. .Sp From: http://www.tartarus.org/~martin/PorterStemmer/ .Sp .Vb 5 \& The Porter stemming algorithm (or Porter stemmer) is a \& process for removing the commoner morphological and inflexional \& endings from words in English. Its main use is as part of a \& term normalisation process that is usually done when setting up \& Information Retrieval systems. .Ve .Sp This will help a search for \*(L"running\*(R" to also find \*(L"run\*(R" and \*(L"runs\*(R", for example. .Sp The stemming function does not convert words to their root, rather programmatically removes endings on words in an attempt to make similar words with different endings stem to the same string of characters. It's not a perfect system, and searches on stemmed indexes often return curious results. For example, two entirely different words may stem to the same word. .Sp Stemming also can be confusing when used with a wildcard (truncation). For example, you might expect to find the word \*(L"running\*(R" by searching for \&\*(L"runn*\*(R". But this fails when using a stemmed index, as \*(L"running\*(R" stems to \&\*(L"run\*(R", yet searching for \*(L"runn*\*(R" looks for words that start with \*(L"runn\*(R". .IP "Soundex" 4 .IX Item "Soundex" Soundex was developed in the 1880s so records for people with similar sounding names could be found more readily. Soundex is a coded surname based on the way a surname sounds rather than spelling. Surnames that sound similar, like Smith and Smyth, are filed together under the same Soundex code. This is mostly useful for \s-1US\s0 English. .Sp Soundex should not be used to search for sound-alike words. Metaphone would be more appropriate for generic sound matching of words. Soundex should only be used where you need to search multiple documents for proper names which sound similar. This is primarily used for indexing genealogical records. This may be useful for indexing other collections of data consisting mostly of names. Many common name variations are matched by Soundex. The only notable exception is the first letter of the name. The first letter is not matched for sound. .IP "Metaphone and DoubleMetaphone" 4 .IX Item "Metaphone and DoubleMetaphone" Words are transformed into a short series of letters representing the sound of the word (in English). Metaphone algorithms are often used for looking up mis-spelled words in dictionary programs. .Sp From: http://aspell.sourceforge.net/metaphone/ .Sp .Vb 2 \& Lawrence Philips' Metaphone Algorithm is an algorithm which returns \& the rough approximation of how an English word sounds. .Ve .Sp The \f(CW\*(C`DoubleMetaphone\*(C'\fR mode will sometimes generate two different metaphones for the same word. This is supposed to be useful when a word may be pronounced more than one way. .Sp A metaphone index should give results somewhere in between Soundex and Stemming. .RE .RS 4 .RE .IP "UseStemming [yes|NO]" 4 .IX Item "UseStemming [yes|NO]" Put yes to apply word stemming algorithm during indexing, else no. .Sp .Vb 2 \& UseStemming no \& UseStemming yes .Ve .Sp When UseStemming is set to \f(CW\*(C`yes\*(C'\fR every word is stemmed before placing it in to the index. .Sp This option is deprecated. It has been superceded by \f(CW\*(C`FuzzyIndexingMode\*(C'\fR. .IP "UseSoundex [yes|NO]" 4 .IX Item "UseSoundex [yes|NO]" When UseSoundex is set to \f(CW\*(C`yes\*(C'\fR every word is converted to a Soundex code before placing it in to the index. .Sp This option is deprecated. It has been superceded by \f(CW\*(C`FuzzyIndexingMode\*(C'\fR. .IP "IgnoreTotalWordCountWhenRanking [YES|no]" 4 .IX Item "IgnoreTotalWordCountWhenRanking [YES|no]" Put yes to ignore the total number of words in the file when calculating ranking. Often better with merges and small files. Default is yes. .Sp .Vb 1 \& IgnoreTotalWordCountWhenRanking no .Ve .Sp The default was changed from no to yes in version 2.2. .Sp \&\fB\s-1NOTE:\s0\fR must be set to \fBno\fR if you intend to use the \-R 1 option when searching. .IP "MinWordLimit *integer*" 4 .IX Item "MinWordLimit *integer*" Set the minimum length of an word. Shorter words will not be indexed. The default is 1 (as defined in \fIsrc/config.h\fR). .Sp .Vb 1 \& MinWordLimit 5 .Ve .IP "MaxWordLimit *integer*" 4 .IX Item "MaxWordLimit *integer*" Set the maximum length of an indexable word. Every longer word will not be indexed. The Default is 40 (as defined in \fIsrc/config.h\fR). .IP "WordCharacters *string of characters*" 4 .IX Item "WordCharacters *string of characters*" .PD 0 .IP "IgnoreFirstChar *string of characters*" 4 .IX Item "IgnoreFirstChar *string of characters*" .IP "IgnoreLastChar *string of characters*" 4 .IX Item "IgnoreLastChar *string of characters*" .IP "BeginCharacters *string of characters*" 4 .IX Item "BeginCharacters *string of characters*" .IP "EndCharacters *string of characters*" 4 .IX Item "EndCharacters *string of characters*" .PD These settings define what a word consists of to the Swish-e indexing engine. Compiled in defaults are in \fIsrc/config.h\fR. .Sp When indexing Swish-e uses \fBWordCharacters\fR to split up the document into words. Words are defined by any string of non-blank characters that contain only the characters listed in WordCharacters. If a string of characters includes a character that is not in WordCharacters then the word will be spit into two or more separate words. .Sp For example: .Sp .Vb 1 \& WordCharacters abde .Ve .Sp Would turn \*(L"abcde\*(R" into two words \*(L"ab\*(R" and \*(L"de\*(R". .Sp Next, of these words, any characters defined in \fBIgnoreFirstChar\fR are stripped off the start of the word, and \fBIgnoreLastChar\fR characters are stripped off the end of the word. This allows, for example, periods within a word (www.slashdot.com), but not at the end of a word. Characters in IgnoreFirstChar and IgnoreLastChar must be in WordCharacters. .Sp Finally, the resulting words \s-1MUST\s0 begin with one of the characters listed in \fBBeginCharacters\fR and end with one of the characters listed in \&\fBEndCharacters\fR. BeginCharacters and EndCharacters must be a subset of the characters in WordCharacters. Often, WordCharacters, BeginCharacters and EndCharacters will all be the same. .Sp Note that the same process applies to the query while searching. .Sp Getting these settings correct will take careful consideration and practice. It's helpful to create an index of a single test file, and then look at the words that are placed in the index (see the \f(CW\*(C`\-v 4\*(C'\fR, \f(CW\*(C`\-D\*(C'\fR and \f(CW\*(C`\-k\*(C'\fR searching switches). .Sp Currently there is only support for eight-bit characters. .Sp Example: .Sp .Vb 5 \& WordCharacters .abcdefghijklmnopqrstuvwxyz \& BeginCharacters abcdefghijklmnopqrstuvwxyz \& EndCharacters abcdefghijklmnopqrstuvwxyz \& IgnoreFirstChar . \& IgnoreLastChar . .Ve .Sp So the string .Sp .Vb 1 \& Please visit http://www.example.com/path/to/file.html. .Ve .Sp will be indexed as the following words: .Sp .Vb 7 \& please \& visit \& http \& www.example.com \& path \& to \& file.html .Ve .Sp Which means that you can search for \f(CW\*(C`www.example.com\*(C'\fR as a single word, but searching for just \f(CW\*(C`example\*(C'\fR will not find the document. .Sp Note: when indexing \s-1HTML\s0 documents \s-1HTML\s0 entities are converted to their character equivalents before being processed with these directives. This is a change from previous versions of Swish-e where you were required to include the characters \f(CW\*(C`0123456789&#;\*(C'\fR to index entities. See also \f(CW\*(C`ConvertHTMLEntities\*(C'\fR .IP "Buzzwords [*list of buzzwords*|File: path]" 4 .IX Item "Buzzwords [*list of buzzwords*|File: path]" The Buzzwords option allows you to specify words that will be indexed regardless of WordCharacters, BeginCharacters, EndCharacters, stemming, soundex and many of the other checks done on words while indexing. .Sp Buzzwords are case insensitive. .Sp Buzzwords should be separated by spaces and may span multiple directives. If the special format \f(CW\*(C`File:filename\*(C'\fR is used then the Buzzwords will be read from an external file during indexing. .Sp Examples: .Sp .Vb 1 \& Buzzwords C++ TCP/IP .Ve .Sp .Vb 1 \& Buzzwords File: ./buzzwords.lst .Ve .Sp If a Buzzword contains search operator characters they must be backslashed when searching. For example: .Sp .Vb 1 \& Buzzwords C++ TCP/IP web=http .Ve .Sp .Vb 1 \& ./swish-e -w 'web\e=http' .Ve .Sp Buzzwords are found by splitting the text on whitespace, removing \&\f(CW\*(C`IgnoreFirstChar\*(C'\fR and \f(CW\*(C`IgnoreLastChar\*(C'\fR characters from the word, and then comparing with the list of \f(CW\*(C`Buzzwords\*(C'\fR. Therefore, if adding \f(CW\*(C`Buzzwords\*(C'\fR to an index you will probably want to define \f(CW\*(C`IgnoreFirstChar\*(C'\fR and \&\f(CW\*(C`IgnoreLastChar\*(C'\fR settings. .Sp Note: Buzzwords specific settings for \f(CW\*(C`IgnoreFirstChar\*(C'\fR and \f(CW\*(C`IgnoreLastChar\*(C'\fR may be used in the future. .IP "CompressPositions [yes|NO]" 4 .IX Item "CompressPositions [yes|NO]" This option enables zlib compression for individual word data in the index file. The default is \s-1NO\s0, that is the index word data is not compressed by default. .Sp Enabling this option can reduced the size of the index file, but at the expense of slower wildcard search times. .Sp The default changed from \s-1YES\s0 to \s-1NO\s0 starting with version 2.4.3. .IP "IgnoreWords [*list of stop words*|File: path]" 4 .IX Item "IgnoreWords [*list of stop words*|File: path]" The IgnoreWords option allows you to specify words to ignore, called \&\fIstopwords\fR. The default is to not use any stopwords. .Sp Words should be separated by spaces and may span multiple directives. If the special format \f(CW\*(C`File:filename\*(C'\fR is used then the stop words will be read from an external file during indexing. .Sp In previous versions of Swish-e you could use the directive .Sp .Vb 1 \& IgnoreWords swishdefault - obsolete! .Ve .Sp to include a default list of compiled in stopwords. This keyword is no longer supported. .Sp Examples: .Sp .Vb 1 \& IgnoreWords www http a an the of and or .Ve .Sp .Vb 1 \& IgnoreWords File: ./stopwords.de .Ve .IP "UseWords [*list of words*|File: path]" 4 .IX Item "UseWords [*list of words*|File: path]" UseWords defines the words that Swish-e will index. \fBOnly\fR the words listed will be indexed. .Sp You can specify a list of words following the directive (you may specify more than one \f(CW\*(C`UseWords\*(C'\fR directive in a config file), and/or use the \f(CW\*(C`File:\*(C'\fR form to specify a path to a file containing the words: .Sp .Vb 2 \& UseWords perl python pascal fortran basic cobal php \& UseWords File: /path/to/my/wordlist .Ve .Sp Please drop the Swish-e list a note if you actually use this feature. It may be removed from future versions. .IP "IgnoreLimit *integer integer*" 4 .IX Item "IgnoreLimit *integer integer*" This automatically omits words that appear too often in the files (these words are called stopwords). Specify a whole percentage and a number, such as \*(L"80 256\*(R". This omits words that occur in over 80% of the files and appear in over 256 files. Comment out to turn off auto\-stopwording. .Sp .Vb 1 \& IgnoreLimit 50 1000 .Ve .Sp Swish-e must do extra processing to adjust the entire index when this feature is used. It is recommended that instead of using this feature that you decided what words are stopwords and add them to \fBIngoreWords\fR in your configuration file. To do this, use IgnoreLimit one time and note the stop words that are found while indexing. Add this list to IgnoreWords, and then remove IgnoreLimit from the configuration file. .IP "IgnoreMetaTags *list of names*" 4 .IX Item "IgnoreMetaTags *list of names*" \&\f(CW\*(C`IgnoreMetaTags\*(C'\fR defines a list of metatags to ignore while indexing \s-1XML\s0 files (and \s-1HTML\s0 files if using libxml2 for parsing \s-1HTML\s0). All text within the tags will be ignored \*(-- both for indexing (\f(CW\*(C`MetaNames\*(C'\fR) and properties (\f(CW\*(C`PropertyNames\*(C'\fR). To still parse properties, yet do not index the text, see \&\f(CW\*(C`UndefinedMetaTags\*(C'\fR. .Sp This option is useful to avoid indexing specific data from a file. For example: .Sp .Vb 9 \& <person> \& <first_name> \& William \& </first_name> <last_name> \& Shakespeare \& </last_name> <updated_date> \& April 25, 1999 \& </updated_date> \& </person> .Ve .Sp In the above example you might \fBnot\fR want to index the updated date, and therefore prevent finding this record by searching .Sp .Vb 1 \& -w 'person=(April)' .Ve .Sp This is solved by: .Sp .Vb 1 \& IgnoreMetaTags updated_date .Ve .Sp See also \f(CW\*(C`UndefinedMetaTags\*(C'\fR. .IP "IgnoreNumberChars *list of characters*" 4 .IX Item "IgnoreNumberChars *list of characters*" Experimental Feature .Sp This experimental feature can be used to define a set of characters that describe a number. If a word is found to contain only those characters it will not be indexed. The characters listed must be part of \f(CW\*(C`WordCharacters\*(C'\fR settings. In other words, the \*(L"word\*(R" checked is a word that Swish-e would otherwise index. .Sp For example, .Sp .Vb 1 \& IgnoreNumberChars 0123456789$., .Ve .Sp Then Swish-e would not index the following: .Sp .Vb 3 \& 123 \& 123,456.78 \& $123.45 .Ve .Sp You might be tempted to avoid indexing hex numbers with: .Sp .Vb 1 \& IgnoreNumberChars 0123456789abcdef .Ve .Sp which will not index 0D31, but will also not index the word \*(L"bad\*(R". .Sp This is an experimental feature that may change in future versions. One possible change is to use regular expressions instead. .IP "IndexComments [NO|yes]" 4 .IX Item "IndexComments [NO|yes]" This option allows the user decide if to index the contents of \s-1HTML\s0 comments. Default is no. Set to yes if comment indexing is required. .Sp .Vb 1 \& IndexComments yes .Ve .Sp Note: This is a change in the default behavior prior to version 2.2. .IP "TranslateCharacters [*string1 string2*|:ascii7:]" 4 .IX Item "TranslateCharacters [*string1 string2*|:ascii7:]" The TranslateCharacters directive maps the characters in string1 to the characters listed in string2. .Sp For example: .Sp .Vb 2 \& # This will index a_b as a-b and ámo as amo \& TranslateCharacters _á -a .Ve .Sp \&\f(CW\*(C`TranslateCharacters :ascii7:\*(C'\fR is a predefined set of characters that will translate eight bit characters to ascii7 characters. Using the :ascii7: rule will translate \*(L"Ääç\*(R" to \*(L"aac\*(R". This means: searching \*(L"Çelik\*(R", \*(L"çelik\*(R" or \&\*(L"celik\*(R" will all match the same word. .Sp TranslateCharacters is done early in the indexing process, after converting \s-1HTML\s0 entities but before splitting the input text into words based on \fBWordCharacters\fR. So characters you are translating \fIfrom\fR do not need to be listed in word characters. .Sp The same character translations take place when searching. .IP "BumpPositionCounterCharacters *string*" 4 .IX Item "BumpPositionCounterCharacters *string*" When indexing Swish-e assigns a word position to each word. This enables phrase searching. There may be cases where you would like to prevent phrase matching. The BumpPositionCounterCharacters directive allows you to specify a set of characters that when found in the text will increment the word position \*(-- effectively preventing phrase matches across that character. .Sp For example, if you have a tag: .Sp .Vb 3 \& <subjects> \& computer programming | apple computers \& </subjects> .Ve .Sp You might want to prevent matching \*(L"programming apple\*(R" in that meta name. .Sp .Vb 1 \& BumpPositionCounterCharacters | .Ve .Sp There is no default, and you may list a string of characters. .IP "DontBumpPositionOnEndTags *list of names*" 4 .IX Item "DontBumpPositionOnEndTags *list of names*" .PD 0 .IP "DontBumpPositionOnStartTags *list of names*" 4 .IX Item "DontBumpPositionOnStartTags *list of names*" .PD Since metatags are typically separate data fields, the word position counter is automatically bumped between metatags (actually, bumped when a start tag is found and when an end tag is found). This prevents matching a phrase that spans more than one metaname. \f(CW\*(C`DontBumpPositionOnEndTags\*(C'\fR and \&\f(CW\*(C`DontBumpPositionOnStartTags\*(C'\fR disables this feature for the listed metanames. .Sp For example, .Sp .Vb 11 \& <person> \& <first_name> \& William \& </first_name> \& <last_name> \& Shakespeare \& </last_name> \& <updated_date> \& April 25, 1999 \& </updated_date> \& </person> .Ve .Sp In the configuration file: .Sp .Vb 2 \& DontBumpPositionOnEndTags first_name \& DontBumpPositionOnStartTags last_name .Ve .Sp This configuration allows this phrase search .Sp .Vb 1 \& -w 'person=("william shakespeare")' .Ve .Sp but this phrase search will fail .Sp .Vb 1 \& -w 'person=("shakespeare april")' .Ve .Sh "Directives for the File Access method only" .IX Subsection "Directives for the File Access method only" Some directives have different uses depending on the source of the documents. These directives are only valid when using the \fBFile system\fR method of indexing. .IP "IndexOnly *list of file suffixes*" 4 .IX Item "IndexOnly *list of file suffixes*" This directive specifies the allowable file suffixes (extensions) while indexing. The default is to index all files specified in \fBIndexDir\fR. .Sp .Vb 2 \& # Only index .html .htm and .q files \& IndexOnly .html .htm .q .Ve .Sp \&\f(CW\*(C`IndexOnly\*(C'\fR checks that the file end in the characters listed. It does not check \*(L"extensions\*(R". \f(CW\*(C`IndexOnly\*(C'\fR is tested right before \f(CW\*(C`FileRules\*(C'\fR is processed. .IP "FollowSymLinks [yes|NO]" 4 .IX Item "FollowSymLinks [yes|NO]" Put \*(L"yes\*(R" to follow symbolic links in indexing, else \*(L"no\*(R". Default is no. .Sp .Vb 2 \& FollowSymLinks no \& FollowSymLinks yes .Ve .Sp Note that when set to \f(CW\*(C`no\*(C'\fR extra \fIstat\fR\|(2) system calls must be made for each file. For large number of files you may see a small reduction in indexing time by setting this to \f(CW\*(C`yes\*(C'\fR. .Sp See also the \f(CW\*(C`\-l\*(C'\fR switch in SWISH-RUN. .IP "FileRules [type] [contains|is|regex] *regular expression*" 4 .IX Item "FileRules [type] [contains|is|regex] *regular expression*" .PD 0 .IP "FileMatch [type] [contains|is|regex] *regular expression*" 4 .IX Item "FileMatch [type] [contains|is|regex] *regular expression*" .PD FileRules and FileMatch are used to, respectively, exclude and include files and directories to index. Since, by default, Swish-e indexes all files and recurses all directories (but see also \f(CW\*(C`FollowSymLinks\*(C'\fR) you will typically only use \f(CW\*(C`FileRules\*(C'\fR to exclude files or directories. \f(CW\*(C`FileMatch\*(C'\fR is useful in a few cases, for example, to override the behavior of \f(CW\*(C`IndexOnly\*(C'\fR. Some examples are included below. .Sp Except for \f(CW\*(C`FileRules title ...\*(C'\fR, this feature is only available for file access method (\-S fs), which is the default indexing mode. Also, any pathname modification with \f(CW\*(C`ReplaceRules\*(C'\fR happens after the check for \f(CW\*(C`FileRules\*(C'\fR. (It's unlikely that you would exclude files with \f(CW\*(C`FileRules\*(C'\fR based on text you added with \f(CW\*(C`ReplaceRules\*(C'\fR!) .Sp The regular expression is a C regex.h extended regular expression. You may supply more than one regular expression per line, or use separate directives. Preceding the regular expression with the word \&\*(L"not\*(R" negates the match. .Sp The regular expression is compared against \fB[type]\fR as described below. .Sp For historical reasons, you can specify \f(CW\*(C`contains\*(C'\fR or \f(CW\*(C`is\*(C'\fR. \f(CW\*(C`is\*(C'\fR simply forces the regular expression to match at the start and end of the string (by internally prepending \*(L"^\*(R" and appending \*(L"$\*(R" to the regular expression). .Sp The \f(CW\*(C`regex\*(C'\fR option requires delimiter characters: .Sp .Vb 1 \& FileRules title regex /^private/i .Ve .Sp The only advantage of \f(CW\*(C`regex\*(C'\fR is if you want to do case insensitive matches, or simply like your regular expressions to look like perl regular expressions. You must use matching delimiters; (), {}, and [], are not currently supported for no good reason other than laziness. .Sp Use quotes (" or ') around a pattern if it contains any white space. Note that the backslash character becomes the escape character within quotes. .Sp For example, these sets generate the same regular expressions. .Sp .Vb 3 \& FileRules title is hello \& FileRules title contains ^hello$ \& FileRules title regex /^hello$/ .Ve .Sp These all need quotes due to the included space character .Sp .Vb 3 \& FileRules title is "hello there" \& FileRules title contains "^hello there$" \& FileRules title regex "!^hello there$!" .Ve .Sp These show how the backslash must be doubled inside of quotes. Swish-e converts a double-backslash into a single backslash, and then passes that single onto the regular expression compiler. .Sp .Vb 2 \& FileRules filename regex /\e.pdf/ \& FileRules filename regex "/\e\e.pdf/" .Ve .Sp .Vb 2 \& FileRules filename regex !hello\e\ethere! # need double for real backslash \& FileRules filename regex "!hello\e\e\e\ethere!" # need double-double inside of quotes .Ve .Sp \&\fBMatching Types\fR .Sp The following types of match strings my be supplied: .Sp .Vb 5 \& FileRules pathname \& FileRules dirname \& FileRules filename \& FileRules directory \& FileRules title .Ve .Sp .Vb 4 \& FileMatch pathname \& FileMatch filename \& FileMatch dirname \& FileMatch directory .Ve .Sp \&\fBpathname\fR matches the regular expression against the current pathname. The pathname may or may not be absolute depending on what you supplied to \&\f(CW\*(C`IndexDir\*(C'\fR. .Sp Example: .Sp .Vb 2 \& # Don't index paths that contain private or hidden \& FileRules pathname contains (private|hidden) .Ve .Sp .Vb 2 \& # Same thing \& FileRules pathname regex /(private|hidden)/ .Ve .Sp .Vb 2 \& # Don't index exe files \& FileRules pathname contains \e.exe$ .Ve .Sp \&\fBdirname\fR and \fBfilename\fR split the path name by the last delimiter character into a directory name, and a file name. Then these are compared against the patterns supplied. Directory names do \fBnot\fR have a trailing slash. All path names use the forward slash as a delimiter within Swish\-e. .Sp Example: .Sp .Vb 2 \& # Same as last example - don't index *.exe files. \& FileRules filename contains \e.exe$ .Ve .Sp .Vb 2 \& # Don't index any file called test.html files \& FileRules filename contains ^test\e.html$ .Ve .Sp .Vb 2 \& # Same thing \& FileRules filename is test\e.html .Ve .Sp .Vb 2 \& # Don't index any directories that contain "old" (/usr/local/myold/docs) \& FileRules dirname contains old .Ve .Sp .Vb 2 \& # Don't index any directories that contain the path segment "old" (/usr/local/old/foo) \& FileRules dirname contains /old/ .Ve .Sp .Vb 3 \& # Index only .htm, .html, plus any all-digit file names \& IndexOnly .htm .html \& FileMatch filename contains ^\ed+$ .Ve .Sp .Vb 3 \& # Same as previous, but maybe a little slower \& FileRules filename regex not !\e.(htm|html)$! \& FileMatch filename contains ^\ed+$ .Ve .Sp Swish-e checks these settings in the order of \f(CW\*(C`pathname\*(C'\fR, \f(CW\*(C`dirname\*(C'\fR, and \&\f(CW\*(C`filename\*(C'\fR, and \f(CW\*(C`FileMatch\*(C'\fR patterns are checked before \f(CW\*(C`FileRules\*(C'\fR, in general. This allows you to exclude most files with \f(CW\*(C`FileRules\*(C'\fR, yet allow in a few special cases with \f(CW\*(C`FileMatch\*(C'\fR. For example: .Sp .Vb 4 \& # Exclude all files of .exe, .bin, and .bat \& FileRules filename contains \e.(exe|bin|bat)$ \& # But, let these two in \& FileMatch filename is baseball\e.bat incoming_mail\e.bin .Ve .Sp .Vb 2 \& # Same, but as a single pattern \& FileMatch filename is (baseball\e.bat|incoming_mail\e.bin) .Ve .Sp The \f(CW\*(C`directory\*(C'\fR type is somewhat unique. When Swish-e recurses into a directory it will compare all the \fIfiles\fR in the directory with the pattern and then decide if that entire directory should or should not be indexed (or recursed). Note that you are matching against file names in a directory \*(-- and some of those names may be directory names. .Sp A \f(CW\*(C`FileRules directory\*(C'\fR match will cause Swish-e to ignore all files and sub-directories in the current directory. .Sp Warning: A match with \f(CW\*(C`FileMatch directory\*(C'\fR says to index \fBeverything\fR in the *current* directory and \fBignore\fR any FileRules for this directory. .Sp Example: .Sp .Vb 3 \& # Don't index any directories (and sub directories) that contain \& # a file (or sub-directory) called "index.skip" \& FileRules directory contains ^index\e.skip$ .Ve .Sp .Vb 2 \& # Don't index directories that contain a .htaccess file. \& FileRules directory contains ^\e.htaccess .Ve .Sp Note: While \fIprocessing\fR directories, Swish-e will ignore any files or directories that begin with a dot (\*(L".\*(R"). You may index files or directories that begin with a dot by specifying their name with \f(CW\*(C`IndexDir\*(C'\fR or \f(CW\*(C`\-i\*(C'\fR. .Sp \&\f(CW\*(C`title\*(C'\fR checks for a pattern match in an \s-1HTML\s0 title. .Sp Example: .Sp .Vb 1 \& FileRules title contains construction example pointers .Ve .Sp .Vb 2 \& # This example says to ignore case \& FileRules title regex "/^Internal document/i" .Ve .Sp Note: \f(CW\*(C`FileRules title\*(C'\fR works for any input method (fs, prog, or http) that is parsed as \s-1HTML\s0, and where a title was found in the document. .Sp In case all this seems a bit confusing, processing a directory happens in the following order. .Sp First the directory name is checked: .Sp .Vb 1 \& FileRules dirname - reject entire directory if matches .Ve .Sp Next the directory is scanned and each file name (which might be the name of a sub\-directory) is checked: .Sp .Vb 2 \& FileRules directory - reject entire dir if *any* files match \& FileMatch directory - accept entire dir if *any* files match .Ve .Sp Then, unless \f(CW\*(C`FileMatch directory\*(C'\fR matched, each file is tested with FileMatch. A match says to index the file without further testing (i.e. overrides FileRules and IndexOnly): .Sp .Vb 3 \& FileMatch pathname \e \& FileMatch dirname - file is accepted if any match \& FileMatch filename / .Ve .Sp otherwise .Sp .Vb 1 \& IndexOnly - file is checked for the correct file extension .Ve .Sp .Vb 3 \& FileRules pathname \e \& FileRules dirname - file is rejected if any match \& FileRules filename / .Ve .Sp finally, the file is indexed. .Sp Files (not directories) listed with \f(CW\*(C`IndexDir\*(C'\fR or \f(CW\*(C`\-i\*(C'\fR are processed in a similar way: .Sp .Vb 3 \& FileMatch pathname \e \& FileMatch dirname - file is accepted if any match \& FileMatch filename / .Ve .Sp otherwise, the file is rejected if it doesn't have the correct extension or a FileRules matches. .Sp .Vb 1 \& IndexOnly - file is checked for the correct file extension .Ve .Sp .Vb 3 \& FileRules pathname \e \& FileRules dirname - file is rejected if any match \& FileRules filename / .Ve .Sp Note: If things are not indexing as you expect, create a directory with some test files and use the \f(CW\*(C`\-T regex\*(C'\fR trace option to see how file names are checked. Start with very simple tests! .Sh "Directives for the \s-1HTTP\s0 Access Method Only" .IX Subsection "Directives for the HTTP Access Method Only" The \s-1HTTP\s0 Access method is enabled by the \*(L"\-S http\*(R" switch when indexing. It works by running a Perl program called SwishSpider which fetches documents from a web server. .PP Only text files (content\-type of \*(L"text/*\*(R") are indexed with the \s-1HTTP\s0 Access Method. Other document types (e.g. \s-1PDF\s0 or MSWord) may be indexed as well. The SwishSpider will attempt to make use of the SWISH::Filter module (included with the Swish-e distribution) to convert documents into a format that Swish-e can index. .PP Note: The \-S prog method of spidering (using spider.pl) can be a replacement for the \-S http method. It offers more configuration options and better spidering speed. .PP These directives below are available when using the \s-1HTTP\s0 Access Method of indexing. .IP "MaxDepth *integer*" 4 .IX Item "MaxDepth *integer*" MaxDepth defines how many links the spider should follow before stopping. A value of 0 configures the spider to traverse all links. The default is MaxDepth 0. .Sp .Vb 1 \& MaxDepth 5 .Ve .Sp Note: The default was changed from 5 to 0 in release 2.4.0 .IP "Delay *seconds*" 4 .IX Item "Delay *seconds*" The number of seconds to wait between issuing requests to a server. This setting allows for more friendly spidering of remote sites. The default is 5 seconds. .Sp .Vb 1 \& Delay 1 .Ve .Sp Note: The default was changed from 60 to 5 seconds in release 2.4.0 .IP "TmpDir *path*" 4 .IX Item "TmpDir *path*" The location of a writable temp directory on your system. The \s-1HTTP\s0 access method tells the Perl helper to place its files in this location, and the \f(CW\*(C`\-e\*(C'\fR switch causes Swish-e to use this directory while indexing. There is no default. .Sp .Vb 1 \& TmpDir /tmp/swish .Ve .Sp If this directory does not exist or is not writable Swish-e will fail with an error during indexing. .Sp Note, the environment variables of \f(CW\*(C`TMPDIR\*(C'\fR, \f(CW\*(C`TMP\*(C'\fR, and \f(CW\*(C`TEMP\*(C'\fR (in that order) will \fBoverride\fR this setting. .IP "SpiderDirectory *path*" 4 .IX Item "SpiderDirectory *path*" The location of the Perl helper script called \fIswishspider\fR. If you use a relative directory, it is relative to your directory when you run Swish\-e, not to the directory that Swish-e is in. The default is the location swishspider was installed. Normally this does not need to be set. .Sp .Vb 1 \& SpiderDirectory /usr/local/swish .Ve .IP "EquivalentServer *server alias*" 4 .IX Item "EquivalentServer *server alias*" Often times the same site may be referred to by different names. A common example is that often http://www.some\-server.com and http://some\-server.com are the same. Each line should have a list of all the method/names that should be considered equivalent. Multiple EquivalentServer directives may be used. Each directive defines its own set of equivalent servers. .Sp .Vb 2 \& EquivalentServer http://library.berkeley.edu http://www.lib.berkeley.edu \& EquivalentServer http://sunsite.berkeley.edu:2000 http://sunsite.berkeley.edu .Ve .Sh "Directives for the prog Access Method Only" .IX Subsection "Directives for the prog Access Method Only" This section details the directives that are only available for the \&\*(L"prog\*(R" document source feature of Swish\-e. The \*(L"prog\*(R" access method runs an external program that \*(L"feeds\*(R" documents to Swish\-e. This allows indexing and filtering of documents from any source. .PP See prog \- general purpose access method in the SWISH-RUN man page for more information. .PP A number of example programs for use with the \*(L"prog\*(R" access method are provided in the \fIprog-bin\fR directory. Please see those example if you have questions about implementing a \*(L"prog\*(R" input program. .IP "SwishProgParameters *list of parameters*" 4 .IX Item "SwishProgParameters *list of parameters*" This is a list of parameters that will be sent to the external program when running with the \*(L"prog\*(R" document source method. .Sp .Vb 2 \& SwishProgParameters /path/to/config hello there \& IndexDir /path/to/program.pl .Ve .Sp Then running: .Sp .Vb 1 \& swish-e -c config -S prog .Ve .Sp Swish-e will execute \f(CW\*(C`/path/to/program.pl\*(C'\fR and pass \f(CW\*(C`/path/to/config hello there\*(C'\fR as three command line arguments to the program. This directive makes it easy to pass settings from the Swish-e configuration file to the external program. .Sp For example, the \f(CW\*(C`spider.pl\*(C'\fR program (included in the \f(CW\*(C`prog\-bin\*(C'\fR directory) uses the \f(CW\*(C`SwishProgParameters\*(C'\fR to specify what file to read for configuration information. .Sp .Vb 2 \& SwishProgParameters spider.config \& IndexDir ./spider.pl .Ve .Sp The \f(CW\*(C`spider.pl\*(C'\fR program also has a default action so you can avoid using a configuration file: .Sp .Vb 2 \& SwishProgParameters default http://www.swishe.org/ http://some.other.site/ \& IndexDir ./spider.pl .Ve .Sp And the spider program will use default settings for spidering those sites. .Sp Swish-e can read documents from standard input, so another way to run an external program with parameters is: .Sp .Vb 1 \& ./spider.pl spider.conf | ./swish-e -S prog -i stdin .Ve .PP \&\fBNotes when using \s-1MS\s0 Windows\fR .PP You should use unix style path separators to specify your external program. Swish will convert forward slashes to backslashes before calling the external program. This is only true for the program name specified with \f(CW\*(C`IndexDir\*(C'\fR or the \f(CW\*(C`\-i\*(C'\fR command line option. .PP In addition, Swish-e will make sure the program specified actually exists, which means you need to use the full name of the program. .PP For example, to run the perl spider program \fIspider.pl\fR you would need a Swish-e configuration file such as: .PP .Vb 2 \& IndexDir e:/perl/bin/perl.exe \& SwishProgParameters prog-bin/spider.pl default http://swish-e.org .Ve .PP and run indexing with the command: .PP .Vb 1 \& swish-e -c swish.cfg -S prog -v 9 .Ve .PP The \f(CW\*(C`IndexDir\*(C'\fR command tells Swish-e the name of the program to run. Under unix you can just specify the name of the script, since unix will figure out the program from the first line of the script. .PP The \f(CW\*(C`SwishProgParameters\*(C'\fR are the parameters passed to the program specified by \f(CW\*(C`IndexDir\*(C'\fR (perl.exe in this case). The first parameter is the perl script to run (\fIprog\-bin/spider.pl\fR). Perl passes the rest of the parameters directly to the perl script. The second parameter \fIdefault\fR tells the \&\fIspider.pl\fR program to use default settings for spidering (or you could specify a spider config file \*(-- see \f(CW\*(C`perldoc spider.pl\*(C'\fR for details), and lastly, the \s-1URL\s0 is passed into the spider program. .Sh "Document Filter Directives" .IX Subsection "Document Filter Directives" Internally, Swish-e knows how to parse only text, \s-1HTML\s0, and \s-1XML\s0 documents. With \*(L"filters\*(R" you can index other types of documents. For example, if all your web pages are in gzip format a filter can uncompress these on the fly for indexing. .PP You may wish to read the Swish-e \s-1FAQ\s0 question on filtering before continuing here. How Do I filter documents? .PP There are two suggested methods for filtering. .PP \fIFiltering with SWISH::Filter\fR .IX Subsection "Filtering with SWISH::Filter" .PP The Swish-e distribution includes a Perl module called SWISH::Filter and individual filters located in the \fIfilters\fR directory. This system uses plug-in filters to extend the types of documents that Swish-e can index. The plug-in filters do not actually do the filtering, but rather provide a standard interface for accessing programs that can filter or convert documents. The programs that do the filtering are not part of the Swish-e distribution; they must be downloaded and installed separately. .PP The advantage of this method is that new filtering methods can be installed easily. .PP This system is designed to work with the \-S http and \-prog methods, but may also be used with the \f(CW\*(C`FileFilter\*(C'\fR feature and \-S fs indexing method. See \&\fI$prefix/share/doc/swish\-e/examples/filter\-bin/swish_filter.pl\fR for an example. .PP See the \fIfilters/README\fR file for more information. .PP \fIFiltering with the FileFilter feature\fR .IX Subsection "Filtering with the FileFilter feature" .PP A filter is an external program that Swish-e executes while processing a document of a given type. Swish-e will execute the filter program for each file that matches the file suffix (extension) set in the \&\fBFileFilter\fR or \fBFileFilterMatch\fR directives. \fBFileFilterMatch\fR matches using regular expressions and is described below. .PP Filters may be used with any type of input method (i.e. \-S fs, \-S http, or \-S prog). But because .PP Swish-e calls the external program passing as \fBdefault\fR arguments: .IP "$0" 4 .IX Item "$0" the name of the filter program .IP "$1" 4 .IX Item "$1" the physical path name of the file to read. This may be a temporary file location if indexing by the http method. .IP "$2" 4 .IX Item "$2" When indexing under the file system this will be the same as \f(CW$1\fR (the path to the source file), but when indexing under the http method this will be the \s-1URL\s0 of the source document. .PP Swish-e can also pass other parameters to the filter program. These parameters can be defined using the \fBFileFilter\fR or \fBFileFilterMatch\fR directives. See Filter Options below. .PP The filter program must open the file, process its contents, and return it to Swish-e by printing to \s-1STDOUT\s0. .PP Note that this can add a significant amount of time to the indexing process if your external program is a perl or shell script. If you have many files to filter you should consider writing your filter in C instead of a shell or perl script, or using the \*(L"prog\*(R" Access Method along with SWISH::Filter. .IP "FilterDir *path\-to\-directory*" 4 .IX Item "FilterDir *path-to-directory*" Deprecated. .Sp This is the path to a directory where the filter programs are stored. Swish-e looks in this directory to find the filter specified in the \&\fBFileFilter\fR directive. .Sp This directive is not needed if the filter program can be found in your system's path. Even if your filter is not in your system's path you can specify the full path to the filter in the FileFilter or FileFilterMatch directives. .Sp Example: .Sp .Vb 1 \& FilterDir /usr/local/swish/filters .Ve .ie n .IP "FileFilter *suffix* ""filter\-prog"" [""filter\-options""]" 4 .el .IP "FileFilter *suffix* ``filter\-prog'' [``filter\-options'']" 4 .IX Item "FileFilter *suffix* filter-prog [filter-options]" This maps file suffix (extension) to a filter program. If \fIfilter-prog\fR starts with a directory delimiter (absolute path), Swish-e doesn't use the FilterDir settings, but uses the given \fIfilter-prog\fR path directly. .Sp On systems that have a working \fIfork\fR\|(2) system call the filter program is run by forking swish then executing the filter. This mean the shell is not used for running the filter and no arguments are passed through the shell. .Sp On other systems (e.g. Windows) the arguments are double-quoted and \&\fIpopen\fR\|(3) is used to run the program. This does pass argument though the shell and may be a security concern depending on the abilities of the shell. .Sp Filter options: .Sp Filter options are a string passed as arguments to the \fIfilter-prog\fR. Filter options can contain variables, replaced by Swish\-e. If you omit \&\fIfilter-options\fR Swish-e will use default parameters for the options listed above. .Sp .Vb 2 \& Default: %p %P \& Which means: pass "workfile path" and "documentfile path" to filter. .Ve .Sp Variables in filter options: .Sp .Vb 7 \& %% = % \& %P = Full document pathname (e.g. URL, or path on filesystem) \& %p = Full pathname to work file (maybe a tmpfile or the real document path on filesystem) \& %F = Filename stripped from full document pathname \& %f = Filename stripped from "work" pathname \& %D = Directoryname stripped from full document pathname \& %d = Directoryname stripped from full "work" pathname .Ve .Sp Examples of strings passed: .Sp .Vb 6 \& %P = document pathname: http://myserver/path1/mydoc.txt \& %p = work pathname: /tmp/tmp.1234.mydoc.txt \& %F = mydoc.txt \& %f = tmp.1234.mydoc.txt \& %D = http://myserver/path1 \& %d = /tmp .Ve .Sp \&\fBNotes when using \s-1MS\s0 Windows\fR .Sp Windows uses double quotes to escape shell metacharacters, so if you need to use quotes then use single quotes around the entire option string. .Sp .Vb 1 \& FileFiler .mydoc mydocfilter.exe '--title "text with spaces"' .Ve .Sp You can specify the filter program using forward slashes (unix style). Swish will convert the slashes to backslashes before running your program. .Sp .Vb 1 \& FileFilter .mydoc c:/some/path/mydocfilter.exe '-d "%d" -example -url "%P" "%f"' .Ve .Sp Examples of filters: .Sp .Vb 4 \& FileFilter .doc /usr/local/bin/catdoc "-s8859-1 -d8859-1 %p" \& FileFilter .pdf pdftotext "%p -" \& FileFilter .html.gz gzip "-c %p" \& FileFilter .mydoc "/some/path/mydocfilter" "-d %d -example -url %P %f" .Ve .Sp The above examples are running a \fIbinary\fR filter program. For more complicated filtering needs you may use a scripting language such as Perl or a shell script. Here's some examples of calling a shell and perl script: .Sp .Vb 2 \& FileFilter .pdf pdf2html.sh \& FileFilter .ps ghostscript-filter.pl .Ve .Sp Using a scripting language (or any language that has a large startup cost) can \&\fBgreatly increase the indexing time\fR. For small indexing jobs, this may not be an issue, but for large collections of files that require processing by a scripting language, you may be better off using the \f(CW\*(C`\-S prog\*(C'\fR access method where the script will only be compiled once, instead of for each document. .Sp Filters are probably easier to write than a \f(CW\*(C`\-S prog\*(C'\fR program. Which you decide to use depends on your requirements. Examples of filter scripts can be found in the \fIfilter-bin\fR directory, and examples of \f(CW\*(C`\-S prog\*(C'\fR programs can be found in the \fIprog-bin\fR directory. .IP "FileFilterMatch *filter\-prog* *filter\-options* *regex* [*regex* ...]" 4 .IX Item "FileFilterMatch *filter-prog* *filter-options* *regex* [*regex* ...]" This is similar to \f(CW\*(C`FileMatch\*(C'\fR except uses regular expressions to match against the file name. *filter\-prog* is the path to the program. Unlike \&\f(CW\*(C`FileFilter\*(C'\fR this does \fBnot\fR use the \f(CW\*(C`FilterDir\*(C'\fR option. Also unlike \&\f(CW\*(C`FileFilter\*(C'\fR you \fBmust\fR specify the *filter\-options*. .Sp Examples: .Sp .Vb 1 \& FileFilterMatch ./pdftotext "%p -" /\e.pdf$/ .Ve .Sp Note that will also match a file called \*(L".pdf\*(R", so you may want to use something that requires a filename that has more than just an extension. For example: .Sp .Vb 1 \& FileFilterMatch ./pdftotext "%p -" /.\e.pdf$/ .Ve .Sp To specify more than one extension: .Sp .Vb 1 \& FileFilterMatch ./check_title.pl "%p" /\e.html$/ /\e.htm$/ .Ve .Sp Or a few ways to do the same thing: .Sp .Vb 2 \& FileFilterMatch ./check_title.pl %p /\e.(html|html)$/ \& FileFilterMatch ./check_title.pl %p /\e.html?$/ .Ve .Sp And to ignore case: .Sp .Vb 1 \& FileFilterMatch ./check_title.pl %p /\e.html?$/i .Ve .Sp You may also precede an expression with \*(L"not\*(R" to negate regular expression that follow. For example, to match files that do not have an extension: .Sp .Vb 1 \& FileFilterMatch ./convert "%p %P" not /\e..+$/ .Ve .SH "Document Info" .IX Header "Document Info" $Id: \s-1SWISH\-CONFIG\s0.pod 1846 2006\-10\-20 20:18:30Z whmoseley $ .PP \&. �swish-e-2.4.7/man/Makefile.in�����������������������������������������������������������������������0000664�0000771�0001750�00000030016�11166010112�012635� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������# Makefile.in generated by automake 1.9.6 from Makefile.am. # @configure_input@ # Copyright (C) 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, # 2003, 2004, 2005 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. @SET_MAKE@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = .. am__cd = CDPATH="$${ZSH_VERSION+.}$(PATH_SEPARATOR)" && cd INSTALL = @INSTALL@ install_sh_DATA = $(install_sh) -c -m 644 install_sh_PROGRAM = $(install_sh) -c install_sh_SCRIPT = $(install_sh) -c INSTALL_HEADER = $(INSTALL_DATA) transform = $(program_transform_name) NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : build_triplet = @build@ host_triplet = @host@ subdir = man DIST_COMMON = $(srcdir)/Makefile.am $(srcdir)/Makefile.in ACLOCAL_M4 = $(top_srcdir)/aclocal.m4 am__aclocal_m4_deps = $(top_srcdir)/config/acinclude.m4 \ $(top_srcdir)/configure.in am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \ $(ACLOCAL_M4) mkinstalldirs = $(SHELL) $(top_srcdir)/config/mkinstalldirs CONFIG_HEADER = $(top_builddir)/src/acconfig.h CONFIG_CLEAN_FILES = SOURCES = DIST_SOURCES = man1dir = $(mandir)/man1 am__installdirs = "$(DESTDIR)$(man1dir)" NROFF = nroff MANS = $(man_MANS) DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST) ACLOCAL = @ACLOCAL@ ALLOCA = @ALLOCA@ AMDEP_FALSE = @AMDEP_FALSE@ AMDEP_TRUE = @AMDEP_TRUE@ AMTAR = @AMTAR@ AR = @AR@ AS = @AS@ AUTOCONF = @AUTOCONF@ AUTOHEADER = @AUTOHEADER@ AUTOMAKE = @AUTOMAKE@ AWK = @AWK@ BTREE_OBJS = @BTREE_OBJS@ BUILDDOCS_FALSE = @BUILDDOCS_FALSE@ BUILDDOCS_TRUE = @BUILDDOCS_TRUE@ CC = @CC@ CCDEPMODE = @CCDEPMODE@ CFLAGS = @CFLAGS@ CPP = @CPP@ CPPFLAGS = @CPPFLAGS@ CXX = @CXX@ CXXCPP = @CXXCPP@ CXXDEPMODE = @CXXDEPMODE@ CXXFLAGS = @CXXFLAGS@ CYGPATH_W = @CYGPATH_W@ DEFS = @DEFS@ DEPDIR = @DEPDIR@ DLLTOOL = @DLLTOOL@ ECHO = @ECHO@ ECHO_C = @ECHO_C@ ECHO_N = @ECHO_N@ ECHO_T = @ECHO_T@ EGREP = @EGREP@ EXEEXT = @EXEEXT@ F77 = @F77@ FFLAGS = @FFLAGS@ INSTALLDOCS_FALSE = @INSTALLDOCS_FALSE@ INSTALLDOCS_TRUE = @INSTALLDOCS_TRUE@ INSTALL_DATA = @INSTALL_DATA@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@ LARGEFILES_MACROS = @LARGEFILES_MACROS@ LDFLAGS = @LDFLAGS@ LIBOBJS = @LIBOBJS@ LIBS = @LIBS@ LIBTOOL = @LIBTOOL@ LIBXML2_CFLAGS = @LIBXML2_CFLAGS@ LIBXML2_LIB = @LIBXML2_LIB@ LIBXML2_OBJS = @LIBXML2_OBJS@ LIBXML_REQUIRED_VERSION = @LIBXML_REQUIRED_VERSION@ LN_S = @LN_S@ LTLIBOBJS = @LTLIBOBJS@ MAINT = @MAINT@ MAINTAINER_MODE_FALSE = @MAINTAINER_MODE_FALSE@ MAINTAINER_MODE_TRUE = @MAINTAINER_MODE_TRUE@ MAKEINFO = @MAKEINFO@ OBJDUMP = @OBJDUMP@ OBJEXT = @OBJEXT@ PACKAGE = @PACKAGE@ PACKAGE_BUGREPORT = @PACKAGE_BUGREPORT@ PACKAGE_NAME = @PACKAGE_NAME@ PACKAGE_STRING = @PACKAGE_STRING@ PACKAGE_TARNAME = @PACKAGE_TARNAME@ PACKAGE_VERSION = @PACKAGE_VERSION@ PATH_SEPARATOR = @PATH_SEPARATOR@ PCRE_CFLAGS = @PCRE_CFLAGS@ PCRE_CONFIG = @PCRE_CONFIG@ PCRE_LIBS = @PCRE_LIBS@ PCRE_REQUIRED_VERSION = @PCRE_REQUIRED_VERSION@ PERL = @PERL@ POD2MAN = @POD2MAN@ RANLIB = @RANLIB@ SET_MAKE = @SET_MAKE@ SHELL = @SHELL@ STRIP = @STRIP@ SWISH_WEB = @SWISH_WEB@ VERSION = @VERSION@ XML2_CONFIG = @XML2_CONFIG@ Z_CFLAGS = @Z_CFLAGS@ Z_LIBS = @Z_LIBS@ ac_ct_AR = @ac_ct_AR@ ac_ct_AS = @ac_ct_AS@ ac_ct_CC = @ac_ct_CC@ ac_ct_CXX = @ac_ct_CXX@ ac_ct_DLLTOOL = @ac_ct_DLLTOOL@ ac_ct_F77 = @ac_ct_F77@ ac_ct_OBJDUMP = @ac_ct_OBJDUMP@ ac_ct_RANLIB = @ac_ct_RANLIB@ ac_ct_STRIP = @ac_ct_STRIP@ am__fastdepCC_FALSE = @am__fastdepCC_FALSE@ am__fastdepCC_TRUE = @am__fastdepCC_TRUE@ am__fastdepCXX_FALSE = @am__fastdepCXX_FALSE@ am__fastdepCXX_TRUE = @am__fastdepCXX_TRUE@ am__include = @am__include@ am__leading_dot = @am__leading_dot@ am__quote = @am__quote@ am__tar = @am__tar@ am__untar = @am__untar@ bindir = @bindir@ build = @build@ build_alias = @build_alias@ build_cpu = @build_cpu@ build_os = @build_os@ build_vendor = @build_vendor@ datadir = @datadir@ exec_prefix = @exec_prefix@ host = @host@ host_alias = @host_alias@ host_cpu = @host_cpu@ host_os = @host_os@ host_vendor = @host_vendor@ includedir = @includedir@ infodir = @infodir@ install_sh = @install_sh@ libdir = @libdir@ libexecdir = @libexecdir@ localstatedir = @localstatedir@ mandir = @mandir@ mkdir_p = @mkdir_p@ oldincludedir = @oldincludedir@ prefix = @prefix@ program_transform_name = @program_transform_name@ sbindir = @sbindir@ sharedstatedir = @sharedstatedir@ sysconfdir = @sysconfdir@ target_alias = @target_alias@ #if BUILDDOCS man_MANS = \ $(srcdir)/swish-e.1 \ $(srcdir)/SWISH-CONFIG.1 \ $(srcdir)/SWISH-FAQ.1 \ $(srcdir)/SWISH-LIBRARY.1 \ $(srcdir)/SWISH-RUN.1 #endif EXTRA_DIST = $(man_MANS) all: all-am .SUFFIXES: $(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am $(am__configure_deps) @for dep in $?; do \ case '$(am__configure_deps)' in \ *$$dep*) \ cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh \ && exit 0; \ exit 1;; \ esac; \ done; \ echo ' cd $(top_srcdir) && $(AUTOMAKE) --foreign man/Makefile'; \ cd $(top_srcdir) && \ $(AUTOMAKE) --foreign man/Makefile .PRECIOUS: Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status @case '$?' in \ *config.status*) \ cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh;; \ *) \ echo ' cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe)'; \ cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe);; \ esac; $(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh $(top_srcdir)/configure: @MAINTAINER_MODE_TRUE@ $(am__configure_deps) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh $(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh mostlyclean-libtool: -rm -f *.lo clean-libtool: -rm -rf .libs _libs distclean-libtool: -rm -f libtool uninstall-info-am: install-man1: $(man1_MANS) $(man_MANS) @$(NORMAL_INSTALL) test -z "$(man1dir)" || $(mkdir_p) "$(DESTDIR)$(man1dir)" @list='$(man1_MANS) $(dist_man1_MANS) $(nodist_man1_MANS)'; \ l2='$(man_MANS) $(dist_man_MANS) $(nodist_man_MANS)'; \ for i in $$l2; do \ case "$$i" in \ *.1*) list="$$list $$i" ;; \ esac; \ done; \ for i in $$list; do \ if test -f $(srcdir)/$$i; then file=$(srcdir)/$$i; \ else file=$$i; fi; \ ext=`echo $$i | sed -e 's/^.*\\.//'`; \ case "$$ext" in \ 1*) ;; \ *) ext='1' ;; \ esac; \ inst=`echo $$i | sed -e 's/\\.[0-9a-z]*$$//'`; \ inst=`echo $$inst | sed -e 's/^.*\///'`; \ inst=`echo $$inst | sed '$(transform)'`.$$ext; \ echo " $(INSTALL_DATA) '$$file' '$(DESTDIR)$(man1dir)/$$inst'"; \ $(INSTALL_DATA) "$$file" "$(DESTDIR)$(man1dir)/$$inst"; \ done uninstall-man1: @$(NORMAL_UNINSTALL) @list='$(man1_MANS) $(dist_man1_MANS) $(nodist_man1_MANS)'; \ l2='$(man_MANS) $(dist_man_MANS) $(nodist_man_MANS)'; \ for i in $$l2; do \ case "$$i" in \ *.1*) list="$$list $$i" ;; \ esac; \ done; \ for i in $$list; do \ ext=`echo $$i | sed -e 's/^.*\\.//'`; \ case "$$ext" in \ 1*) ;; \ *) ext='1' ;; \ esac; \ inst=`echo $$i | sed -e 's/\\.[0-9a-z]*$$//'`; \ inst=`echo $$inst | sed -e 's/^.*\///'`; \ inst=`echo $$inst | sed '$(transform)'`.$$ext; \ echo " rm -f '$(DESTDIR)$(man1dir)/$$inst'"; \ rm -f "$(DESTDIR)$(man1dir)/$$inst"; \ done tags: TAGS TAGS: ctags: CTAGS CTAGS: distdir: $(DISTFILES) $(mkdir_p) $(distdir)/$(srcdir) @srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; \ topsrcdirstrip=`echo "$(top_srcdir)" | sed 's|.|.|g'`; \ list='$(DISTFILES)'; for file in $$list; do \ case $$file in \ $(srcdir)/*) file=`echo "$$file" | sed "s|^$$srcdirstrip/||"`;; \ $(top_srcdir)/*) file=`echo "$$file" | sed "s|^$$topsrcdirstrip/|$(top_builddir)/|"`;; \ esac; \ if test -f $$file || test -d $$file; then d=.; else d=$(srcdir); fi; \ dir=`echo "$$file" | sed -e 's,/[^/]*$$,,'`; \ if test "$$dir" != "$$file" && test "$$dir" != "."; then \ dir="/$$dir"; \ $(mkdir_p) "$(distdir)$$dir"; \ else \ dir=''; \ fi; \ if test -d $$d/$$file; then \ if test -d $(srcdir)/$$file && test $$d != $(srcdir); then \ cp -pR $(srcdir)/$$file $(distdir)$$dir || exit 1; \ fi; \ cp -pR $$d/$$file $(distdir)$$dir || exit 1; \ else \ test -f $(distdir)/$$file \ || cp -p $$d/$$file $(distdir)/$$file \ || exit 1; \ fi; \ done check-am: all-am check: check-am all-am: Makefile $(MANS) installdirs: for dir in "$(DESTDIR)$(man1dir)"; do \ test -z "$$dir" || $(mkdir_p) "$$dir"; \ done install: install-am install-exec: install-exec-am install-data: install-data-am uninstall: uninstall-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am installcheck: installcheck-am install-strip: $(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \ install_sh_PROGRAM="$(INSTALL_STRIP_PROGRAM)" INSTALL_STRIP_FLAG=-s \ `test -z '$(STRIP)' || \ echo "INSTALL_PROGRAM_ENV=STRIPPROG='$(STRIP)'"` install mostlyclean-generic: clean-generic: distclean-generic: -test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES) maintainer-clean-generic: @echo "This command is intended for maintainers to use" @echo "it deletes files that may require special tools to rebuild." clean: clean-am clean-am: clean-generic clean-libtool mostlyclean-am distclean: distclean-am -rm -f Makefile distclean-am: clean-am distclean-generic distclean-libtool dvi: dvi-am dvi-am: html: html-am info: info-am info-am: install-data-am: install-man install-exec-am: install-info: install-info-am install-man: install-man1 installcheck-am: maintainer-clean: maintainer-clean-am -rm -f Makefile maintainer-clean-am: distclean-am maintainer-clean-generic mostlyclean: mostlyclean-am mostlyclean-am: mostlyclean-generic mostlyclean-libtool pdf: pdf-am pdf-am: ps: ps-am ps-am: uninstall-am: uninstall-info-am uninstall-man uninstall-man: uninstall-man1 .PHONY: all all-am check check-am clean clean-generic clean-libtool \ distclean distclean-generic distclean-libtool distdir dvi \ dvi-am html html-am info info-am install install-am \ install-data install-data-am install-exec install-exec-am \ install-info install-info-am install-man install-man1 \ install-strip installcheck installcheck-am installdirs \ maintainer-clean maintainer-clean-generic mostlyclean \ mostlyclean-generic mostlyclean-libtool pdf pdf-am ps ps-am \ uninstall uninstall-am uninstall-info-am uninstall-man \ uninstall-man1 $(srcdir)/swish-e.1 : $(top_srcdir)/pod/swish-e.pod -rm -f $@ -pod2man --center="SWISH-E Documentation" --lax --release='$(VERSION)' $(top_srcdir)/pod/swish-e.pod > $@ $(srcdir)/SWISH-CONFIG.1 : $(top_srcdir)/pod/SWISH-CONFIG.pod -rm -f $@ -pod2man --center="SWISH-E Documentation" --lax --release='$(VERSION)' $(top_srcdir)/pod/SWISH-CONFIG.pod > $@ $(srcdir)/SWISH-FAQ.1 : $(top_srcdir)/pod/SWISH-FAQ.pod -rm -f $@ -pod2man --center="SWISH-E Documentation" --lax --release='$(VERSION)' $(top_srcdir)/pod/SWISH-FAQ.pod > $@ $(srcdir)/SWISH-LIBRARY.1 : $(top_srcdir)/pod/SWISH-LIBRARY.pod -rm -f $@ -pod2man --center="SWISH-E Documentation" --lax --release='$(VERSION)' $(top_srcdir)/pod/SWISH-LIBRARY.pod > $@ $(srcdir)/SWISH-RUN.1 : $(top_srcdir)/pod/SWISH-RUN.pod -rm -f $@ -pod2man --center="SWISH-E Documentation" --lax --release='$(VERSION)' $(top_srcdir)/pod/SWISH-RUN.pod > $@ # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������swish-e-2.4.7/man/SWISH-RUN.1�����������������������������������������������������������������������0000664�0000771�0001750�00000137024�11166010473�012232� �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������.\" Automatically generated by Pod::Man v1.37, Pod::Parser v1.14 .\" .\" Standard preamble: .\" ======================================================================== .de Sh \" Subsection heading .br .if t .Sp .ne 5 .PP \fB\\$1\fR .PP .. .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. | will give a .\" real vertical bar. \*(C+ will give a nicer C++. Capital omega is used to .\" do unbreakable dashes and therefore won't be available. \*(C` and \*(C' .\" expand to `' in nroff, nothing in troff, for use with C<>. .tr \(*W-|\(bv\*(Tr .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.Sh), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .\" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .hy 0 .if n .na .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "SWISH-RUN 1" .TH SWISH-RUN 1 "2009-04-04" "2.4.7" "SWISH-E Documentation" .SH "NAME" SWISH\-RUN \- Running Swish\-e and Command Line Switches .SH "OVERVIEW" .IX Header "OVERVIEW" The Swish-e program is controlled by command line arguments (called \&\fIswitches\fR). Often, it is run manually from a shell (command prompt), or from a program such as a \s-1CGI\s0 script that passes the command line arguments to swish. .PP Note: A number of the command line switches may be specified in the Swish-e configuration file specified with the \f(CW\*(C`\-c\*(C'\fR command line argument. Please see SWISH-CONFIG for a complete description of available configuration file directives. .PP There are two basic operating modes of Swish\-e: indexing and searching. There are command line arguments that are unique to each mode, and others that apply to both (yet may have different meaning depending on the operating mode). These command line arguments are listed below, grouped by: .PP \&\s-1INDEXING\s0 \*(-- describes the command line arguments used while indexing. .PP \&\s-1SEARCHING\s0 \*(-- lists the command line arguments used while searching. .PP \&\s-1OTHER\s0 \s-1SWITCHES\s0 \*(-- lists switches that don't apply to searching or indexing. .PP Beginning with Swish-e version 2.1, you may embed its search engine into your applications. Please see SWISH-LIBRARY. .SH "INDEXING" .IX Header "INDEXING" Swish-e indexing is initiated by passing \fIcommand line arguments\fR to swish. The command line arguments used for \fIsearching\fR are described in \s-1SEARCHING\s0. Also, see SWISH-SEARCH for examples of searching with Swish\-e. .PP Swish-e usage: .PP .Vb 2 \& swish-e [-i dir file ... ] [-c file] [-f file] [-l] \e \& [-v (num)] [-S method(fs|http|prog)] [-N path] .Ve .PP The \f(CW\*(C`\-h\*(C'\fR switch (help) will list the available Swish-e command line arguments: .PP .Vb 1 \& swish-e -h .Ve .PP Typically, most if not all indexing settings are placed in a configuration file (specified with the \f(CW\*(C`\-c\*(C'\fR switch). Once the configuration file is setup indexing is initiated as: .PP .Vb 1 \& swish-e -c /path/to/config/file .Ve .PP See SWISH-CONFIG for information on the configuration file. .PP Security Note: If the swish binary is named \fIswish-search\fR then swish will not allow any operation that would cause swish to write to the index file. .PP When indexing it may be advisable to index to a temporary file, and then after indexing has successfully completed rename the file to the final location. This is especially important when replacing an index that is currently in use. .PP .Vb 3 \& swish-e -c swish.config -f index.tmp \& [check return code from swish or look for err: output] \& mv index.tmp index.swish-e .Ve .Sh "Indexing Command Line Arguments" .IX Subsection "Indexing Command Line Arguments" .IP "\-i *directories and/or files* (input file)" 4 .IX Item "-i *directories and/or files* (input file)" This specifies the directories and/or files to index. Directories will be indexed recursively. This is typically specified in the configuration file with the \fBIndexDir\fR directive instead of on the command line. Use of this switch overrides the configuration file settings. .IP "\-S [fs|http|prog] (document source/access mode)" 4 .IX Item "-S [fs|http|prog] (document source/access mode)" This specifies the method to use for accessing documents to index. Can be either \f(CW\*(C`fs\*(C'\fR for local indexing via the file system (the default), \&\f(CW\*(C`http\*(C'\fR for spidering, or \f(CW\*(C`prog\*(C'\fR for reading documents from an external program. .Sp Located in the \f(CW\*(C`conf\*(C'\fR directory are example configuration files that demonstrate indexing with the different document source methods. .Sp See the SWISH-FAQ for a discussion on the different indexing methods, and the difference between spidering with the http method vs. using the file system method. .RS 4 .IP "fs \- file system" 4 .IX Item "fs - file system" The \f(CW\*(C`fs\*(C'\fR method simply reads files from a local (or networked) drive. This is the default method if the \f(CW\*(C`\-S\*(C'\fR switch is not specified. See SWISH-CONFIG for configuration directives specific to the \f(CW\*(C`fs\*(C'\fR method. .IP "http \- spider a web server" 4 .IX Item "http - spider a web server" The \f(CW\*(C`http\*(C'\fR method is used to spider web servers. It uses an included helper program called \fIswishspider\fR. See SWISH-CONFIG for configuration directives specific to the \f(CW\*(C`http\*(C'\fR method. .Sp Security Note: Under Windows swish passes the URLs fetched from remote documents through the shell (swish uses the \fIsystem()\fR command for running \fIswishspider\fR under Windows), and this may be considered an additional security risk. .Sp The \f(CW\*(C`http\*(C'\fR method is deprecated (or at least not very well appreciated). Consider using the \f(CW\*(C`prog\*(C'\fR method described below for spidering. There's a spider program available in the \&\fIprog-bin\fR directory for use with the \f(CW\*(C`prog\*(C'\fR method. Here's a number of limitation with this method that are solved with the \f(CW\*(C`prog\*(C'\fR method: .RS 4 .IP "*" 4 swishspider only spiders standard <a href=\*(L"...\*(R"> links. Frames and other links are not followed. .IP "*" 4 By default, this method of spidering only indexes files that have a content type of \*(L"text/*\*(R" (e.g. text/plain, text/html, text/xml). You should use \f(CW\*(C`DefaultContents\*(C'\fR and \f(CW\*(C`IndexContents\*(C'\fR to map file extensions to parsers used by swish (e.g. \f(CW\*(C`IndexContents HTML* .html .htm\*(C'\fR), but this will fail where a document does not have a file extension. .IP "*" 4 Swish\-e's \f(CW\*(C`FileFilter\*(C'\fR directive can be used with the \f(CW\*(C`http\*(C'\fR access method, although it requires a separate process (in addition to the swsihspider process) for each document filtered. .IP "*" 4 The SWISH::Filter modules can be used with the swishspider program. SWISH::Filter provides a general purpose filtering system (see SWISH::Filter documentation). To use SWISH::Filter set \s-1PERL5LIB\s0 to point to the location of the \s-1SWISH\s0 module name space (typically /usr/local/lib/swish\-e under Unix). For example: .Sp .Vb 2 \& export PERL5LIB=/usr/local/lib/swish-e # bash, bourne shells \& setenv PERL5LIB /usr/local/lib/swish-e # csh, tcsh .Ve .Sp or under Windows .Sp .Vb 1 \& set PERL5LIB=c:\eprogram files\eswish-e2.4\elib\eswish-e .Ve .Sp SWISH::Filter is not enabled by default due to the overhead of loading the modules for every document fetched. .Sp The Swish-e distribution includes perl modules in the SWISH::Filters::* namespace to make converting non-text documents into a format that Swish-e can parse easy. As mentioned above, the helper script \&\fIswishspider\fR will use these modules if can be found via \s-1PERL5LIB\s0. These modules only provide an interface to programs that do the conversion. For example, you will need to download and install the \&\*(L"catdoc\*(R" program to convert MSWord documents into text for indexing. Please see \&\fIfilters/README\fR to see how to use this filter system. .RE .RS 4 .RE .IP "prog \- general purpose access method" 4 .IX Item "prog - general purpose access method" The \f(CW\*(C`prog\*(C'\fR method is new to Swish-e version 2.2. It's designed as a general purpose method to feed documents to swish from an external program. .Sp For example, the external program can read a database (e.g. MySQL), spider a web server, or convert documents from one format to another (e.g. pdf to html). Or, you can simply use it to read the files of the file system (like \f(CW\*(C`\-S fs\*(C'\fR), yet provide you with full control of what files are indexed. .Sp The external program name to run is passed to swish either by the IndexDir directive, or via the \f(CW\*(C`\-i\*(C'\fR option. .Sp The program specified should be an absolute path as swish-e will attempt to \fIstat()\fR the program to make sure it exists. Swish does this to help in error reporting. .Sp If the program specified with \-i or IndexDir is not an absolute path (i.e. does not include \*(L"/\*(R" ) then swish-e will append the \*(L"libexecdir\*(R" directory defined during configuration. Typically, libexecdir is set to \&\*(L"$prefix/lib/swish\-e\*(R" (/usr/local/lib/swish\-e), but is platform and installation dependent. Running swish-e \-h will report the directory. .Sp For example, the \-S prog program \*(L"spider.pl\*(R" is a Perl helper program for use with \-S prog and is installed in libexecdir. .Sp .Vb 2 \& IndexDir spider.pl \& SwishProgParameters default http://localhost/index.html .Ve .Sp and swish-e will find spider.pl in libexecdir. .Sp Additional parameters may be passed to the external program via the SwishProgParameters directive. In the example above swish-e will pass two parameters to spider.pl, \*(L"default\*(R" and \&\*(L"http://localhost/index.html\*(R". .Sp A special name \*(L"stdin\*(R" may be used with \f(CW\*(C`\-i\*(C'\fR or IndexDir which tells swish to read from standard input instead of from an external program. See example below. .Sp The external program prints to standard output (which swish captures) a set of headers followed by the content of the file to index. The output looks similar to an email message or a \s-1HTTP\s0 document returned by a web server in that it includes name/value pairs of headers, a blank line, and the content. .Sp The content length is determined by a content-length header supplied to swish by the program; there is no \*(L"end of record\*(R" character or flag sent between documents. Therefore, it is critical that the content-length header is correct. This is a common source of errors. .Sp One advantage of this method (over using filters, for example) is that the external program is run only once for the entire indexing job, instead of once for every document. This avoids forking and creating a new process for every document, and makes a huge difference when your external program is something like perl that has a large startup cost. .Sp Here's a simple example written in Perl: .Sp .Vb 2 \& #!/usr/local/bin/perl -w \& use strict; .Ve .Sp .Vb 11 \& # Build a document \& my $doc = <<EOF; \& <html> \& <head> \& <title>Document Title \& \& \& This is the text. \& \& \& EOF .Ve .Sp .Vb 4 \& # Prepare the headers for swish \& my $path = 'Example.file'; \& my $size = length $doc; \& my $mtime = time; .Ve .Sp .Vb 6 \& # Output the document (to swish) \& print <content_type =~ m!text/html!' .Ve .Sp This header is not required. .IP "Update\-Mode:" 4 .IX Item "Update-Mode:" When updating an incremental index this header can be used to select the mode for updating the index. There are three possible values: .Sp .Vb 3 \& Update \& Remove \& Index .Ve .Sp \&\*(L"Update\*(R" will update the index with the given file if the date of the given file is newer than the date of the file already in the index. Setting to \*(L"Update\*(R" is the same as using \-u on the command line. .Sp \&\*(L"Remove\*(R" mode will remove the file specified by the Path-Name header. Setting \*(L"Remove\*(R" is the same as using \-r on the command line. .Sp \&\*(L"Index\*(R" will add the file to the index. \s-1NOTE:\s0 swish-e will not check to see if the file already exists. .Sp If this header is not specified, the default is the mode specified on the command line (\-u, \-r, or none). .Sp This option is still experimental and is subject to change in the future. Ask on the Swish-e list before using. .RE .RS 4 .Sp The above example program only returns one document and exits, which is not very useful. Normally, your program would read data from some source, such as files or a database, format as \&\s-1XML\s0, \s-1HTML\s0, or text, and pass them to swish, one after another. The \f(CW\*(C`Content\-Length:\*(C'\fR header tells swish where each document ends \*(-- there is not any special \*(L"end of record\*(R" character or marker. .Sp To index with the above example you need to make sure that the program is executable (and that the path to perl is correct), and then call swish telling to run in \f(CW\*(C`prog\*(C'\fR mode, and the name of the program to use for input. .Sp .Vb 2 \& % chmod 755 example.pl \& % ./swish-e -S prog -i ./example.pl .Ve .Sp Programs can and should be tested prior to running swish. For example: .Sp .Vb 1 \& % ./example.pl > test.out .Ve .Sp A few more useful example programs are provided in the swish-e distribution located in the \fIprog-bin\fR directory. Some include documentation: .Sp .Vb 2 \& % cd prog-bin \& % perldoc spider.pl .Ve .Sp Others are small examples that include comments: .Sp .Vb 2 \& % cd prog-bin \& % less DirTree.pl .Ve .Sp The \fIspider.pl\fR program can be used as a replacement for the \fI\-S http\fR method. It is far more feature-rich and offers much more control over indexing. .Sp If you use the special program name \*(L"stdin\*(R" with \f(CW\*(C`\-i\*(C'\fR or IndexDir then swish-e will read from standard input instead of from a program. For example: .Sp .Vb 1 \& % ./example.pl --count=1000 /path/to/data | ./swish-e -S prog -i stdin .Ve .Sp This is basically the same as using a swish-e configuration file of: .Sp .Vb 2 \& SwishProgParameters --count=1000 /path/to/data \& IndexDir ./example.pl .Ve .Sp in a config file and running .Sp .Vb 1 \& % ./swish-e -S prog -c swish.conf .Ve .Sp This gives an easy way to run swish without a configuration file with a \f(CW\*(C`\-S prog\*(C'\fR program that requires parameters. It also means you can capture data to a file and then index more once with the same data: .Sp .Vb 3 \& % ./example.pl /path/to/data --count=1000 > docs.txt \& % cat docs.txt | ./swish-e -S prog -i stdin -c normal_index \& % cat docs.txt | ./swish-e -S prog -i stdin -c fuzzy_index .Ve .Sp Using \*(L"stdin\*(R" might also be useful for programs that call swish (instead of swish calling the program). .Sp (The reason \*(L"stdin\*(R" is used instead of the more common \*(L"\-\*(R" dash is due to the rotten way swish parses the command line. This should be fixed in the future.) .Sp The \f(CW\*(C`prog\*(C'\fR method bypasses some of the configuration parameters available to the file system method \*(-- settings such as \&\f(CW\*(C`IndexOnly\*(C'\fR, \f(CW\*(C`FileRules\*(C'\fR, \f(CW\*(C`FileMatch\*(C'\fR and \f(CW\*(C`FollowSymLinks\*(C'\fR are ignored when using the \f(CW\*(C`prog\*(C'\fR method. It's expected that these operations are better accomplished in the external program before passing the document onto swish. In other words, when using the \f(CW\*(C`prog\*(C'\fR method, only send the documents to swish that you want indexed. .Sp You may use swish's filter feature with the \f(CW\*(C`prog\*(C'\fR method, but performance will be better if you run filtering programs from within your external program. See also \fIfilters/README\fR for an example how to easily add document converstion and filtering into your Perl-based programs. .Sp \&\fBNotes when using \-S prog on \s-1MS\s0 Windows\fR .Sp Windows does not use the shebang (#!) line of a program to determine the program to run. So, when running, for example, a perl program you may need to specify the perl.exe binary as the program, and use the \&\f(CW\*(C`SwishProgParameters\*(C'\fR to name the file. .Sp .Vb 2 \& IndexDir e:/perl/bin/perl.exe \& SwishProgParameters read_database.pl .Ve .Sp Swish will replace the forward slashes with backslashes before running the command specified with \&\f(CW\*(C`IndexDir\*(C'\fR. Swish uses the \fIpopen\fR\|(3) command which passes the command through the shell. .RE .RE .RS 4 .RE .IP "\-f *indexfile* (index file)" 4 .IX Item "-f *indexfile* (index file)" If you are indexing, this specifies the file to save the generated index in, and you can only specify one file. See also \fBIndexFile\fR in the configuration file. .Sp If you are searching, this specifies the index files (one or more) to search from. The default index file is index.swish\-e in the current directory. .IP "\-c *file ...* (configuration files)" 4 .IX Item "-c *file ...* (configuration files)" Specify the configuration file(s) to use for indexing. This file contains many directives that control how Swish-e proceeds. See SWISH-CONFIG for a complete listing of configuration file directives. .Sp Example: .Sp .Vb 1 \& swish-e -c docs.conf .Ve .Sp If you specify a directory to index, an index file, or the verbose option on the command\-line, these values will override any specified in the configuration file. .Sp You can specify multiple configuration files. For example, you may have one configuration file that has common site-wide settings, and another for a specific index. .Sp Examples: .Sp .Vb 3 \& 1) swish-e -c swish-e.conf \& 2) swish-e -i /usr/local/www -f index.swish-e -v -c swish-e.conf \& 3) swish-e -c swish-e.conf stopwords.conf .Ve .RS 4 .IP "1" 3 .IX Item "1" The settings in the configuration file will be used to index a site. .IP "2" 3 .IX Item "2" These command-line options will override anything in the configuration file. .IP "3" 3 .IX Item "3" The variables in swish\-e.conf will be read, then the variable in stopwords.conf will be read. Note that if the same variables occur in both files, older values may be written over. .RE .RS 4 .RE .IP "\-e (economy mode)" 4 .IX Item "-e (economy mode)" For large sites indexing may require more \s-1RAM\s0 than is available. The \f(CW\*(C`\-e\*(C'\fR switch tells swish to use disk space to store data structures while indexing, saving memory. This option is recommended if swish uses so much \s-1RAM\s0 that the computer begins to swap excessively, and you cannot increase available memory. The trade-off is slightly longer indexing times, and a busy disk drive. .IP "\-l (symbolic links)" 4 .IX Item "-l (symbolic links)" Specifying this option tells swish to follow symbolic links when indexing. The configuration file value \fBFollowSymLinks\fR will override the command-line value. .Sp The default is not to follow symlinks. A small improvement in indexing time my result from enabling FollowSymLinks since swish does not need to stat every directory and file processed to determine if it is a symbolic link. .IP "\-N path (index only newer files)" 4 .IX Item "-N path (index only newer files)" The \f(CW\*(C`\-N\*(C'\fR option takes a path to a file, and only files \fInewer\fR than the specified file will be indexed. This is helpful for creating incremental indexes \*(-- that is, indexes that contain just files added since the last full index was created of all files. .Sp Example (bad example) .Sp .Vb 1 \& swish-e -c config.file -N index.swish-e -f index.new .Ve .Sp This will index as normal, but only files with a modified date newer than \fIindex.swish\-e\fR will be indexed. .Sp This is a bad example because it uses \fIindex.swish\-e\fR which one might assume was the date of last indexing. The problem is that files might have been added between the time indexing read the directory and when the \fIindex.swish\-e\fR file was created \*(-- which can be quite a bit of time for very large indexing jobs. .Sp The only solution is to prevent any new file additions while full indexing is running. If this is impossible then it will be slightly better to do this: .Sp Full indexing: .Sp .Vb 3 \& touch indexing_time.file \& swish-e -c config.file -f index.tmp \& mv index.tmp index.full .Ve .Sp Incremental indexing: .Sp .Vb 2 \& swish-e -c config.file -N indexing_time.file -f index.tmp \& mv index.tmp index.incremental .Ve .Sp Then search with .Sp .Vb 1 \& swish-e -w foo -f index.full index.incremental .Ve .Sp or merge the indexes .Sp .Vb 3 \& swish-e -M index.full index.incremental index.tmp \& mv index.tmp index.swish-e \& swish-e -w foo .Ve .IP "\-r" 4 .IX Item "-r" \&\fB**incremental index format only**\fR The \f(CW\*(C`\-r\*(C'\fR option puts swish-e into \*(L"removal\*(R" mode. Any input files (given with \f(CW\*(C`\-i\*(C'\fR or the \f(CW\*(C`IndexDir\*(C'\fR parameter) are removed from an existing index. .Sp Example: .Sp .Vb 1 \& swish-e -r -i file.html .Ve .Sp would remove \fIfile.html\fR from the existing index. .IP "\-u" 4 .IX Item "-u" \&\fB**incremental index format only**\fR The \f(CW\*(C`\-u\*(C'\fR option puts swish-e into \*(L"update\*(R" mode. The timestamp of each input file is compared against the corresponding file in the existing index. If swish-e encounters an input file that either does not exist yet in the index or exists with a timestamp older than the input file, the input file is updated in the index. Any words in the input file that have been added or removed are reflected as such in the index. .Sp Example: .Sp .Vb 1 \& swish-e -i file.html -u .Ve .Sp would update the index.swish\-e index with the contents of file.html. If file.html was new, it would be added. If file.html already existed in the index, its contents would be updated in the index. .IP "\-v [0|1|2|3] (verbosity level)" 4 .IX Item "-v [0|1|2|3] (verbosity level)" The \f(CW\*(C`\-v\*(C'\fR option can take a numerical value from 0 to 3. Specify 0 for completely silent operation and 3 for detailed reports. .Sp If no value is given then 1 is assumed. See also \fBIndexReport\fR in the configuration file. .Sp Warnings and errors are reported regardless of the verbosity level. In addition, all error and warnings are written to standard out. This is for historical reasons (many scripts exist that parse standard out for error messages). .IP "\-W (0|1|2|3) (parser warning level)" 4 .IX Item "-W (0|1|2|3) (parser warning level)" If using the libxml2 parser, the default parser warning level is set at \f(CW2\fR. Use the \f(CW\*(C`\-W\*(C'\fR option to override that default. Most often, you might want to turn it off altogether: .Sp .Vb 1 \& swish-e -W0 -i path/to/files .Ve .Sp would fail silently if the parser encountered any errors. .SH "SEARCHING" .IX Header "SEARCHING" The following command line arguments are available when searching with Swish\-e. These switches are used to select the index to search, what fields to search, and how and what to print as results. .PP This section just lists the available command line arguments and their usage. Please see SWISH-SEARCH for detailed searching instructions. .PP \&\fBWarning\fR: If using Swish-e via a \s-1CGI\s0 interface, please see \s-1CGI\s0 Danger! .PP Security Note: If the swish binary is named \fIswish-search\fR then swish will not allow any operation that would cause swish to write to the index file. .Sh "Searching Command Line Arguments" .IX Subsection "Searching Command Line Arguments" .IP "\-w *word1 word2 ...* (query words)" 4 .IX Item "-w *word1 word2 ...* (query words)" This performs a case-insensitive search using a number of keywords. If no index file to search is specified (via the \f(CW\*(C`\-f\*(C'\fR switch), swish-e will try to search a file called index.swish\-e in the current directory. .Sp .Vb 1 \& swish-e -w word .Ve .Sp Phrase searching is accomplished by placing the quote delimiter (a double-quote by default) around the search phrase. .Sp .Vb 1 \& swish-e -w 'word or "this phrase"' .Ve .Sp Search would should be protected from the shell by quotes. Typically, this is single quotes when running under Unix. .Sp Under Windows \fIcommand.com\fR you may not need to use quotes, but you will need to backslash the quotes used to delimit phrases: .Sp .Vb 1 \& swish-e -w \e"a phrase\e" .Ve .Sp The phrase delimiter can be set with the \f(CW\*(C`\-P\*(C'\fR switch. .Sp The search may be limited to a \fIMetaName\fR. For example: .Sp .Vb 1 \& swish-e -w meta1=(foo or baz) .Ve .Sp will only search within the \fBmeta1\fR tag. .Sp Please see SWISH-SEARCH for a description of MetaNames .IP "\-f *file1 file2 ...* (index files)" 4 .IX Item "-f *file1 file2 ...* (index files)" Specifies the index file(s) used while searching. More than one file may be listed, and each file will be searched. If no \f(CW\*(C`\-f\*(C'\fR switch is specified then the file \fIindex.swish\-e\fR in the current directory will be used as the index file. .IP "\-m *number* (max results)" 4 .IX Item "-m *number* (max results)" While searching, this specifies the maximum number of results to return. The default is to return all results. .Sp This switch is often used in conjunction with the \f(CW\*(C`\-b\*(C'\fR switch to return results one page at a time (strongly recommended for large indexes). .IP "\-b *number* (beginning result)" 4 .IX Item "-b *number* (beginning result)" Sets the \fIbegining\fR search result to return (records are numbered from 1). This switch can be used with the \f(CW\*(C`\-m\*(C'\fR switch to return results in groups or pages. .Sp Example: .Sp .Vb 2 \& swish-e -w 'word' -b 1 -m 20 # first 'page' \& swish-e -w 'word' -b 21 -m 20 # second 'page' .Ve .IP "\-t HBthec (context searching)" 4 .IX Item "-t HBthec (context searching)" The \f(CW\*(C`\-t\*(C'\fR option allows you to search for words that exist only in specific \s-1HTML\s0 tags. Each character in the string you specify in the argument to this option represents a different tag in which to search for the word. H means all \s-1HEAD\s0 tags, B stands for \s-1BODY\s0 tags, t is all \s-1TITLE\s0 tags, h is H1 to H6 (header) tags, e is emphasized tags (this may be B, I, \&\s-1EM\s0, or \s-1STRONG\s0), and c is \s-1HTML\s0 comment tags .Sp search only in header () tags .Sp .Vb 1 \& swish-e -w word -t h .Ve .IP "\-d *string* (delimiter)" 4 .IX Item "-d *string* (delimiter)" Set the delimiter used when printing results. By default, Swish-e separates the output fields by a space, and places double-quotes around the document title. This output may be hard to parse, so it is recommended to use \f(CW\*(C`\-d\*(C'\fR to specify a character or string used as a separator between fields. .Sp The string \f(CW\*(C`dq\*(C'\fR means \*(L"double\-quotes\*(R". .Sp .Vb 5 \& swish-e -w word -d , # single char \& swish-e -w word -d :: # string \& swish-e -w word -d '"' # double quotes under Unix \& swish-e -w word -d \e" # double quotes under Windows \& swish-e -w word -d dq # double quotes .Ve .Sp The following control characters may also be specified: \f(CW\*(C`\et \er \en \ef\*(C'\fR. .Sp Warning: This string is passed directly to \fIsprintf()\fR and therefore exposes a securty hole. Do not allow user data to set \-d format strings directly. .IP "\-P *character*" 4 .IX Item "-P *character*" Sets the delimiter used for phrase searches. The default is double quotes \f(CW\*(C`"\*(C'\fR. .Sp Some examples under bash: (be careful about you shell metacharacters) .Sp .Vb 2 \& swish-e -P ^ -w 'title=^words in a phrase^' \& swish-e -P \e' -w "title='words in a pharse"' .Ve .IP "\-p *property1 property2 ...* (display properties)" 4 .IX Item "-p *property1 property2 ...* (display properties)" This causes swish to print the listed property in the search results. The properties are returned in the order they are listed in the \f(CW\*(C`\-p\*(C'\fR argument. .Sp Properties are defined by the \fBProperNames\fR directive in the configuration file (see SWISH-CONFIG) and properties must also be defined in \fBMetaNames\fR. Swish stores the text of the meta name as a \fIproperty\fR, and then will return this text while searching if this option is used. .Sp Properties are very useful for returning data included in a source documnet without having to re-read the source document while searching. For example, this could be used to return a short document description. See also see \fBDocument Summeries\fR and PropertyNames in SWISH-CONFIG. .Sp To return the subject and category properties while indexing. .Sp .Vb 1 \& swish-e -w word -p subject category .Ve .Sp Properties are returned in double quotes. If a property contains a double quote it is \s-1HTML\s0 escaped ("). See the \f(CW\*(C`\-x\*(C'\fR switch for a more advanced method of returning a list of properties. .Sp \&\s-1NOTE:\s0 it is necessary to have indexed with the proper PropertyNames directive in the user config file in order to use this option. .IP "\-s *property [asc|desc] ...* (sort)" 4 .IX Item "-s *property [asc|desc] ...* (sort)" Normally, search results are printed out in order of relevancy, with the most relevant listed first. The \f(CW\*(C`\-s\*(C'\fR sort switch allows you to sort results in order of a specified \fIproperty\fR, where a \fIproperty\fR was defined using the \fBMetaNames\fR and \fBPropertyNames\fR directives during indexing (see SWISH-CONFIG). .Sp The string passed can include the strings \f(CW\*(C`asc\*(C'\fR and \f(CW\*(C`desc\*(C'\fR to specify the sort order, and more than one property may be specified to sort on more than one key. .Sp Examples: .Sp sort by title property ascending order .Sp .Vb 1 \& -s title .Ve .Sp sort descending by title, ascending by name .Sp .Vb 1 \& -s title desc name asc .Ve .Sp Note: Swish limits sort keys to 100 characters. This limit can be changed by changing \s-1MAX_SORT_STRING_LEN\s0 in src/config.h and rebuilding swish\-e. .IP "\-L limit to a range of property values (Limit)" 4 .IX Item "-L limit to a range of property values (Limit)" \&\fBThis is an experimental feature!\fR .Sp The \f(CW\*(C`\-L\*(C'\fR switch can be used to limit search results to a range of property values .Sp Example: .Sp .Vb 1 \& swish-e -w foo -L swishtitle a m .Ve .Sp finds all documents that contain the word \f(CW\*(C`foo\*(C'\fR, and where the document's title is in the range of \f(CW\*(C`a\*(C'\fR to \f(CW\*(C`m\*(C'\fR, inclusive. By default, the case of the property is ignored, but this can be changed by using PropertyNamesCompareCase configuation directive. .Sp Limiting may be done with user-defined properties, as well. .Sp For example, if you indexed documents that contain a created timestamp in a meta tag: .Sp .Vb 1 \& .Ve .Sp Then you tell Swish that you have a property called \f(CW\*(C`created_on\*(C'\fR, and that it's a timestamp. .Sp .Vb 1 \& PropertyNamesDate created_on .Ve .Sp After indexing you will be able to limit documents to a range of timestamps: .Sp .Vb 1 \& -w foo -L created_on 946684800 949363199 .Ve .Sp will find documents containing the word foo and that have a created_on date from the start of Jan 1, 2000 to the end of Jan 31, 2000. .Sp Note: swish currently does not parse dates; Unix timestamps must be used. .Sp Two special formats can be used: .Sp .Vb 2 \& -L swishtitle <= m \& -L swishtitle >= m .Ve .Sp Finds titles less than or equal, or grater than or equal to the letter \f(CW\*(C`m\*(C'\fR. .Sp This feature will not work with \f(CW\*(C`swishrank\*(C'\fR or \f(CW\*(C`swishdbfile\*(C'\fR properties. .Sp This feature takes advantages of the pre-sorted tables built by swish during indexing to make this feature fast while searching. You should see in the indexing output a line such as: .Sp .Vb 1 \& 6 properties sorted. .Ve .Sp That indicates that six pre-sorted tables were built during indexing. By default, all properties are presorted while indexing. What properties are pre-sorted can be controlled by the configuration parameter \f(CW\*(C`PreSortedIndex\*(C'\fR. .Sp Using the \f(CW\*(C`\-L\*(C'\fR switch on a property that was not pre-sorted will still work, but may be \fImuch\fR slower during searching. .Sp Note that the PropertyNamesSortKeyLength setting is used for sorting properties. Using too small a PropertyNamesSortKeyLength could result in \-L selecting the wrong properties due to incomplete sorting. .Sp This is an experimental feature, and its use and interface are subject to change. .IP "\-x formatstring (extended output format)" 4 .IX Item "-x formatstring (extended output format)" The \f(CW\*(C`\-x\*(C'\fR switch defines the output format string. The format string can contain plain text and property names (including swish-defined internal property names) and is used to generate the output for every result. In addition, the output format of the property name can be controlled with C\-like printf format strings. This feature overrides the cmdline switches \f(CW\*(C`\-d\*(C'\fR and \&\f(CW\*(C`\-p\*(C'\fR, and a warning will be generated if \f(CW\*(C`\-d\*(C'\fR or \f(CW\*(C`\-p\*(C'\fR are used with \f(CW\*(C`\-x\*(C'\fR. .Sp Warning: The format string (fmt) is passed directly to \fIsprintf()\fR and therefore exposes a securty hole. Do not allow user data to set \-x format strings directly. .Sp For example, to return just the title, one per line, in the search results: .Sp .Vb 1 \& swish-e -w ... -x '\en' ... .Ve .Sp Note: the \f(CW\*(C`\en\*(C'\fR may need to be protected from your shell. .Sp See also ResultExtFormatName for a way to define \fInamed\fR format strings in the swish configuration file. .Sp \&\fBFormat of \*(L"formatstring\*(R":\fR .Sp .Vb 1 \& "texttexttext..." .Ve .Sp Where \fBpropertyname\fR is: .RS 4 .IP "*" 4 the name of a user property as specified with the config file directive \*(L"PropertyNames\*(R" .IP "*" 4 the name of a swish Auto property (see below). These properties are defined automatically by swish \*(-- you do not need to specify them with PropertyNames directive. (This may change in the future.) .RE .RS 4 .Sp propertynames must be placed within "<\*(L" and \*(R">". .Sp \&\fBUser properties:\fR .Sp Swish-e allows you to specify certain \s-1META\s0 tags within your documents that can be used as \fBdocument properties\fR. The contents of any \s-1META\s0 tag that has been identified as a document property can be returned as part of the search results. Doucment properties must be defined while indexing using the \fBPropertyNames\fR configuration directive (see SWISH-CONFIG). .Sp Examples of user-defined PropertyNames: .Sp .Vb 5 \& \& \& \& \& .Ve .Sp \&\fBAuto properties:\fR .Sp Swish defines a number of \*(L"Auto\*(R" properties for each document indexed. These are available for output when using the \f(CW\*(C`\-x\*(C'\fR format. .Sp .Vb 10 \& Name Type Contents \& -------------- ------- ---------------------------------------------- \& swishreccount Integer Result record counter \& swishtitle String Document title \& swishrank Integer Result rank for this hit \& swishdocpath String URL or filepath to document \& swishdocsize Integer Document size in bytes \& swishlastmodified Date Last modified date of document \& swishdescription String Description of document (see:StoreDescription) \& swishdbfile String Path of swish database indexfile .Ve .Sp The Auto properties can also be specified using shortcuts: .Sp .Vb 10 \& Shortcut Property Name \& -------- -------------- \& %c swishreccount \& %d swishdescription \& %D swishlastmodified \& %I swishdbfile \& %p swishdocpath \& %r swishrank \& %l swishdocsize \& %t swishtitle .Ve .Sp For example, these are equivalent: .Sp .Vb 2 \& -x '::\en' \& -x '%r:%p:%t\en' .Ve .Sp Use a double percent sign \*(L"%%\*(R" to enter a literal percent sign in the output. .Sp \&\fBFormatstrings of properties:\fR .Sp Properties listed in an \f(CW\*(C`\-x\*(C'\fR format string can include format control strings. These \*(L"propertyformats\*(R" are used to control how the contents of the associated property are printed. Property formats are used like C\-language printf formats. The property format is specified by including the attribute \*(L"fmt\*(R" within the property tag. .Sp Format strings cannot be used with the \*(L"%\*(R" shortcuts described above. .Sp General syntax: .Sp .Vb 1 \& -x '' .Ve .Sp where \f(CW\*(C`subfmt\*(C'\fR controls the output format of \f(CW\*(C`propertyname\*(C'\fR. .Sp Examples of property format strings: .Sp .Vb 3 \& date type: \& string type: \& integer type: .Ve .Sp Please see the manual pages for \fIstrftime\fR\|(3) and \fIsprintf\fR\|(3) for an explanation of format strings. Note: some versions of strftime do not offer the \f(CW%s\fR format string (number of seconds since the Epoch), so swish provides a special format string \*(L"%ld\*(R" to display the number of seconds since the Epoch. .Sp The first character of a property format string defines the delimiter for the format string. For example, .Sp .Vb 3 \& -x " ...\en" \& -x " ...\en" \& -x " ...\en" .Ve .Sp \&\fBStandard predefined formats:\fR .Sp If you ommit the sub\-format, the following formats are used: .Sp .Vb 4 \& String type: "%s" (like printf char *) \& Integer type: "%d" (like printf int) \& Float type: "%f" (like printf double) \& Date type: "%Y-%m-%d %H:%M:%S" (like strftime) .Ve .Sp \&\fBText in \*(L"formatstring\*(R" or \*(L"propfmtstr\*(R":\fR .Sp Text will be output as-is in format strings (and property format strings). Special characters can be escaped with a backslash. To get a new line for each result hit, you have to include the Newline-Character \*(L"\en\*(R" at the end of \*(L"fmtstr\*(R". .Sp .Vb 5 \& -x "||\en" \& -x "Count=, Rank=\en" \& -x "Title=\e\e" \& -x 'Date: \en' \& -x 'Date in seconds: \en' .Ve .Sp \&\fBControl/Escape charcters:\fR .Sp you can use C\-like control escapes in the format string: .Sp .Vb 3 \& known controls: \ea, \eb, \ef, \en, \er, \et, \ev, \& digit escapes: \exhexdigits \e0octaldigits \& character escapes: \eanychar .Ve .Sp Example, .Sp .Vb 1 \& swish -x "%c\et%r\et%p\et\e"\e"\en" .Ve .Sp \&\fBExamples of \-x format strings:\fR .Sp .Vb 5 \& -x "%c|%r|%p|%t|%D|%d\en" \& -x "%c|%r|%p|%t||%d\en" \& -x "\et\et\et\en \& -x "xml_out: \e\e>\e\en" \& -x "xml_out: \en" .Ve .RE .IP "\-H [0|1|2|3|] (header output verbosity)" 4 .IX Item "-H [0|1|2|3|] (header output verbosity)" The \f(CW\*(C`\-H n\*(C'\fR switch generates extened \fIheader\fR output. This is most useful when searching more than one index file at a time by specifying more than one index file with the \f(CW\*(C`\-f\*(C'\fR switch. \&\f(CW\*(C`\-H 2\*(C'\fR will generate a set of headers specific to each index file. This gives access to the settings used to generate each index file. .Sp Even when searching a single index file, \f(CW\*(C`\-H n\*(C'\fR will provided additional information about the index file, how it was indexed, and how swish is interperting the query. .Sp .Vb 5 \& -H 0 : print no header information, output only search result entries. \& -H 1 : print standard result header (default). \& -H 2 : print additional header information for each searched index file. \& -H 3 : enhanced header output (e.g. print stopwords). \& -H 9 : print diagnostic information in the header of the results (changed from: C<-v 4>) .Ve .IP "\-R [0|1] (Ranking Scheme)" 4 .IX Item "-R [0|1] (Ranking Scheme)" \&\fBThis is an experimental feature!\fR .Sp The default ranking scheme in SWISH-E evaluates each word in a query in terms of its frequency and position in each document. The default scheme is 0. .Sp New in version 2.4.3 you may optionally select an experimental ranking scheme that, in addition to document frequency and position, uses Inverse Document Frequency (\s-1IDF\s0), or the relative frequency of each word across all the indexes being searched, and Relative Density, or the normalization of the frequency of a word in relationship to the number of words in the document. .Sp \&\fB\s-1NOTE:\s0\fR IgnoreTotalWordCountWhenRanking must be set to \fBno\fR or \fB0\fR in your index(es) for \-R 1 to work. .Sp Specify \-R 1 to turn on \s-1IDF\s0 ranking. See the \s-1API\s0 documentation for how to set the ranking scheme in your Perl or C program. .SH "OTHER SWITCHES" .IX Header "OTHER SWITCHES" .IP "\-V (version)" 4 .IX Item "-V (version)" Print the current version. .IP "\-k *letter* (print out keywords)" 4 .IX Item "-k *letter* (print out keywords)" The \f(CW\*(C`\-k\*(C'\fR switch is used for testing and will cause swish to print out all keywords in the index beginning with that letter. You may enter \f(CW\*(C`\-k '*'\*(C'\fR to generate a list of all words indexed by swish. .IP "\-D *index file* (debug index)" 4 .IX Item "-D *index file* (debug index)" The \-D option is no longer supported in version 2.2. .IP "\-T *options* (trace/debug swish)" 4 .IX Item "-T *options* (trace/debug swish)" The \-T option is used to print out information that may be helpful when debugging swish\-e's operation. This option replaced the \f(CW\*(C`\-D\*(C'\fR option of previous versions. .Sp Running \f(CW\*(C`\-T help\*(C'\fR will print out a list of available *options* .SH "Merging Index Files" .IX Header "Merging Index Files" In previous versions of Swish-e indexing would require a very large amount of memory and the indexing process could be very slow. Merging provided a way to index in chunks and then combine the indexes together into a single index. .PP Indexing is much faster now and uses much less memory, and with the \f(CW\*(C`\-e\*(C'\fR switch very little memory is needed to index a large site. .PP Still, at times it can be useful to merge different index files into one file for searching. This could be because you want to keep separate site indexes and a common one for a global search, or you have separate collections of documents that you wish to search all at one time, but manage separately. .IP "\-M *index1 index2 ... indexN out_index" 4 .IX Item "-M *index1 index2 ... indexN out_index" Merges the indexes specified on the command line \*(-- the last file name entered is the output file. The output index must not exist (otherwise merge will not proceed). .Sp Only indexes that were indexed with common settings may be merged. (e.g. don't mix stemming and non-stemming indexes, or indexes with different WordCharacter settings, etc.). .Sp Use the \f(CW\*(C`\-e\*(C'\fR switch while merging to reduce memory usage. .Sp Merge generates progress messages regardless of the setting of \f(CW\*(C`\-v\*(C'\fR. .IP "\-c *configuration file*" 4 .IX Item "-c *configuration file*" Specify a configuration file while indexing to add administrative information to the output index file. .SH "Document Info" .IX Header "Document Info" $Id: \s-1SWISH\-RUN\s0.pod 1741 2005\-05\-17 02:22:40Z karman $ .PP \&. swish-e-2.4.7/man/SWISH-LIBRARY.10000664000077100017500000007642511166010472012700 00000000000000.\" Automatically generated by Pod::Man v1.37, Pod::Parser v1.14 .\" .\" Standard preamble: .\" ======================================================================== .de Sh \" Subsection heading .br .if t .Sp .ne 5 .PP \fB\\$1\fR .PP .. .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. | will give a .\" real vertical bar. \*(C+ will give a nicer C++. Capital omega is used to .\" do unbreakable dashes and therefore won't be available. \*(C` and \*(C' .\" expand to `' in nroff, nothing in troff, for use with C<>. .tr \(*W-|\(bv\*(Tr .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.Sh), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .\" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .hy 0 .if n .na .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "SWISH-LIBRARY 1" .TH SWISH-LIBRARY 1 "2009-04-04" "2.4.7" "SWISH-E Documentation" .SH "NAME" SWISH\-LIBRARY \- Interface to the Swish\-e C library .SH "OVERVIEW" .IX Header "OVERVIEW" The C library in an interface to the Swish-e search code. It provides a way to embed Swish-e into your applications. This \s-1API\s0 is based on Swish-e version 2.3. .PP \&\fBNote:\fR This is a \s-1NEW\s0 \s-1API\s0 as of Swish-e version 2.3. The C language interface has changed as has the perl interface to Swish\-e. The new Perl interface is the \s-1SWISH::API\s0 module and is included with the Swish-e distribution. The old \s-1SWISHE\s0 perl module has been rewritten to work with the new \s-1API\s0. The \s-1SWISHE\s0 perl module is no longer included with the Swish-e distribution, but can be downloaded from the Swish-e web site. .PP The advantage of the library is that the index files or files can be opened one time and many queries made on the open index. This saves the startup time required to fork and run the swish-e binary, and the expensive time of opening up the index file. Some benchmarks have shown a three fold increase in speed. .PP The downside is that your program now has more code and data in it (the index tables can use quite a bit of memory), and if a fatal error happens in swish it will bring down your program. These are things to think about, especially if embedding swish into a web server such as Apache where there are many processes serving requests. .PP The best way to learn about the library is to look at two files included with the Swish-e distribution that make use of the library. .IP "src/libtest.c" 4 .IX Item "src/libtest.c" This file gives a basic overview of linking a C program with the Swish-e library. Not all available functions are used in that example, but it should give you a good overview of building a C program with swish\-e. .Sp To build and run libtest chdir to the src directory and run the commands: .Sp .Vb 2 \& $ make libtest \& $ ./libtest [optional name of index file] .Ve .Sp You will be prompted for the search words. The default index used is \fIindex.swish\-e\fR. This can be overridden by placing a list of index files in a quote-protected string. .Sp .Vb 1 \& $ ./libtest 'index1 index2 index3' .Ve .IP "perl/API.xs" 4 .IX Item "perl/API.xs" The \fI\s-1API\s0.xs\fR file is a Perl \*(L"xsub\*(R" interface to the C library and is part of the \&\s-1SWISH::API\s0 Perl module. This is an object-oriented interface to the Swish-e library and demonstrates how the various search \*(L"objects\*(R" are created by C calls and how they are destroyed when no longer needed. .SH "Installing the Swish-e library" .IX Header "Installing the Swish-e library" The Swish-e library is installed when you run \*(L"make install\*(R" when building Swish\-e. No extra installation steps are required. .PP The library consists of a header file \*(L"swish\-e.h\*(R" and a library \&\*(L"libswish\-e.*\*(R" that can either be a static or shared library depending on your platform. .SH "Library Overview" .IX Header "Library Overview" When you first attach to an index file (or index files) you are returned a \*(L"swish handle\*(R". From the handle you create one or more \*(L"search objects\*(R" which holds the parameters to query the index, such as the query string, sort order, search phrase delimiter, limit parameters and \s-1HTML\s0 structure bits. The \*(L"object\*(R" is really just a pointer to a C structure, but it's helpful to think of it as an object that data and functionality associated with it. .PP The search object is used to query the index. A query returns a \*(L"results object\*(R". The results object holds the number of hits, the parsed query per index, and the result set. The results object keeps track of the current position in the result set. You may \*(L"seek\*(R" to a specific record within the result set (useful for displaying a page of results). .PP Finally, a result object represents a single result from the result list. A result object provides access to the result's properties (such as file name, rank, etc.). .PP In addition to results, there are functions available to access the header values stored in the index file, functions to check and report errors, and a few utility functions. .SH "Available Functions" .IX Header "Available Functions" Below is the list of available function included in the Swish-e C language \s-1API\s0. .PP These functions (and typedefs) are defined in the \fIswish\-e.h\fR header file. The common objects (e.g. structures) used are: .PP .Vb 5 \& SW_HANDLE - swish handle that associates with an index file \& SW_SEARCH - search "object" that holds search parameters \& SW_RESULTS - results "object" that holds a result set \& SW_RESULT - a single result used for accessing the result's properties \& SW_FUZZYWORD - used for fuzzy (stemming) word conversion .Ve .Sh "Searching" .IX Subsection "Searching" .IP "\s-1SW_HANDLE\s0 SwishInit(char *IndexFiles);" 4 .IX Item "SW_HANDLE SwishInit(char *IndexFiles);" This functions opens and reads the header info of the index files included in IndexFiles string. The string should contain a space-separated list of index files. .Sp .Vb 2 \& SW_HANDLE myhandle; \& myhandle = SwishInit("file1.idx"); .Ve .Sp Typically you will open a handle at the beginning of your program and use it to make multiple queries on an index. .Sp This function will always return a swish handle. You must check for errors, and on error free the memory used by the handle, or abort. .Sp Here's an example of aborting: .Sp .Vb 4 \& SW_HANDLE swish_handle; \& swish_handle = SwishInit("file1.idx file2.idx"); \& if ( SwishError( swish_handle ) ) \& SwishAbortLastError( swish_handle ); .Ve .Sp And here's an example of catching the error: .Sp .Vb 8 \& SW_HANDLE swish_handle; \& swish_handle = SwishInit("file1.idx file2.idx"); \& if ( SwishError( swish_handle ) ) \& { \& printf("Failed to connect to swish. %s\en", SwishErrorString( swish_handle ) ); \& SwishClose( swish_handle ); /* free the memory used */ \& return 0; \& } .Ve .Sp You may have more than one handle active at a time. .Sp Swish-e will not tell you if the index file changes on disk (such as after reindexing). In a persistent environment (e.g. mod_perl) the calling program should check to see if the index file has changed on disk. A common way to do this is to store the inode number before opening the index file(s), and then stat the file name every so often and reopen the index files if the inode number changes. .IP "void SwishClose(\s-1SW_HANDLE\s0 handle);" 4 .IX Item "void SwishClose(SW_HANDLE handle);" This function closes and frees the memory of a Swish handle. Every swish handle should be freed when done searching the index. Failing to close the handle will result in a memory leak. .IP "\s-1SW_SEARCH\s0 New_Search_Object(\s-1SW_HANDLE\s0 handle, const char *query);" 4 .IX Item "SW_SEARCH New_Search_Object(SW_HANDLE handle, const char *query);" Returns a new search \*(L"object\*(R". The search object holds the parameters used for searching an index. A single search object can be used to query the index multiple times. The available settings listed below are \*(L"sticky\*(R" in that they remain set on the search object until change. .IP "int SwishGetStructure( \s-1SW_SEARCH\s0 srch );" 4 .IX Item "int SwishGetStructure( SW_SEARCH srch );" Returns the \*(L"structure\*(R" flag of the search object passed or 0 if the search object is \s-1NULL\s0. .IP "void SwishPhraseDelimiter( \s-1SW_SEARCH\s0 srch, char delimiter );" 4 .IX Item "void SwishPhraseDelimiter( SW_SEARCH srch, char delimiter );" Sets the phrase delimiter character. The default is double\-quotes. .IP "char SwishGetPhraseDelimiter( \s-1SW_SEARCH\s0 srch );" 4 .IX Item "char SwishGetPhraseDelimiter( SW_SEARCH srch );" Returns the phrase delimiter character used in the search object or 0 if the search object is \s-1NULL\s0. .IP "void SwishSetStructure( \s-1SW_SEARCH\s0 srch, int structure );" 4 .IX Item "void SwishSetStructure( SW_SEARCH srch, int structure );" Sets the \*(L"structure\*(R" flag in the search object. The structure flag is used to limit searches to parts of \s-1HTML\s0 files (such as to the title or headers). The default is to not limit. This provides the functionality of the \-H command line switch. .IP "void SwishPhraseDelimiter( \s-1SW_SEARCH\s0 srch, char delimiter );" 4 .IX Item "void SwishPhraseDelimiter( SW_SEARCH srch, char delimiter );" Sets the phrase delimiter character. The default is double\-quotes. .IP "void SwishSetSort( \s-1SW_SEARCH\s0 srch, char *sort );" 4 .IX Item "void SwishSetSort( SW_SEARCH srch, char *sort );" Sets the sort order of the results. This is the same as the \-s switch used with the swish-e binary. .IP "void SwishSetQuery( \s-1SW_SEARCH\s0 srch, char *query );" 4 .IX Item "void SwishSetQuery( SW_SEARCH srch, char *query );" Sets the query string in the search object. This typically is not needed since it can be set when creating the search object or when executing a query. .IP "void SwishSetSearchLimit( \s-1SW_SEARCH\s0 srch, char *propertyname, char *low, char *hi);" 4 .IX Item "void SwishSetSearchLimit( SW_SEARCH srch, char *propertyname, char *low, char *hi);" Sets the limit parameters for a search. Provides the same functionality as the \-L command line switch. You may specify a range of property values that search results must be within. You may call \fISwishSetSearchLimit()\fR only one time for each property (but can set limits on more than one property at a time). .Sp Unlike the other settings on the search object, once you run a query on the search object you must call \fISwishResetSearchLimit()\fR to change or clear the limit parameters. .IP "void SwishResetSearchLimit( \s-1SW_SEARCH\s0 srch );" 4 .IX Item "void SwishResetSearchLimit( SW_SEARCH srch );" Resets the limits set on a search object set by \fISwishSetSearchLimit()\fR. .IP "void Free_Search_Object( \s-1SW_SEARCH\s0 srch );" 4 .IX Item "void Free_Search_Object( SW_SEARCH srch );" Frees the search object. This must be called when done with the search object. Generally, you can reuse a search object for multiple queries so typically you would call this right before calling \fISwishClose()\fR. .Sp You may free the search object before freeing and generated results objects. .IP "\s-1SW_RESULTS\s0 SwishExecute( \s-1SW_SEARCH\s0 search, const char *query);" 4 .IX Item "SW_RESULTS SwishExecute( SW_SEARCH search, const char *query);" Searches the index or indexes based on the parameters in the search object. Returns a results object. See below for functions to access the data stored in the results object. .Sp You should always check for errors after calling \fISwishExecute()\fR. .IP "\s-1SW_RESULTS\s0 SwishQuery(\s-1SW_HANDLE\s0, const char *words );" 4 .IX Item "SW_RESULTS SwishQuery(SW_HANDLE, const char *words );" This is a short-cut function that bypasses the creation of a search object (actually, bypasses the need to create and free a search object). This only allows passing in a query string; other search parameters cannot be set. The results are sorted by rank. .Sp You should always check for errors after calling \fISwishQuery()\fR. .Sh "Reading Results" .IX Subsection "Reading Results" .IP "int SwishHits( \s-1SW_RESULTS\s0 results );" 4 .IX Item "int SwishHits( SW_RESULTS results );" Returns the number of results in the results object. .IP "\s-1SWISH_HEADER_VALUE\s0 SwishParsedWords( \s-1SW_RESULTS\s0, const char *index_name );" 4 .IX Item "SWISH_HEADER_VALUE SwishParsedWords( SW_RESULTS, const char *index_name );" Returns the tokenized query. Words are split by WordCharacters and stopwords are removed. The parsed words are useful for highlighting search terms in your program. .Sp The \*(L"index_name\*(R" is the name of the index supplied in the \fISwishInit()\fR function call. .Sp Returns a \s-1SWISH_HEADER_VALUE\s0 union of type \s-1SWISH_LIST\s0 which is a char **. See src/libtest.c for an example of accessing the strings in this list, but in general you may cast this to a (char **). .IP "\s-1SWISH_HEADER_VALUE\s0 SwishRemovedStopwords( \s-1SW_RESULTS\s0, const char *index_name );" 4 .IX Item "SWISH_HEADER_VALUE SwishRemovedStopwords( SW_RESULTS, const char *index_name );" Returns a list of stopwords removed from the input query. .Sp Returns a \s-1SWISH_HEADER_VALUE\s0 union of type \s-1SWISH_LIST\s0 which is a char **. See src/libtest.c for an example of accessing the strings in this list, but in general you may cast this to a (char **). .IP "int SwishSeekResult( \s-1SW_RESULTS\s0, int position );" 4 .IX Item "int SwishSeekResult( SW_RESULTS, int position );" Sets the current seek position in the list of results, with position zero being the first record (unlike \-b where one is the first result). .Sp Returns the position or a negative number on error. .IP "\s-1SW_RESULT\s0 SwishNextResult( \s-1SW_RESULTS\s0 );" 4 .IX Item "SW_RESULT SwishNextResult( SW_RESULTS );" Returns the next result, or \s-1NULL\s0 if not more results are available. .Sp The result object returned does not need to be freed after use (unlike the swish handle, search object, and results object). .IP "const char *SwishResultPropertyStr(\s-1SW_RESULT\s0, char *propertyname);" 4 .IX Item "const char *SwishResultPropertyStr(SW_RESULT, char *propertyname);" This function is mostly useful for testing as it returns odd results on errors. .Sp Aborts if called with a \s-1NULL\s0 \s-1SW_RESULT\s0 object .Sp Returns a string value of the specified property. .Sp Returns the empty string "" if the current result does not have the specified property assigned. .Sp Returns the string \*(L"(null)\*(R" on invalid property name (i.e. property name is not defined in the index) and sets an error (see below) indicating the invalid property name. .Sp The string returned does not need to be freed, but is only valid for the current result. If you wish to save the string you must copy it locally. .Sp Dates are formatted using the hard-coded format string: \*(L"%Y\-%m\-%d \f(CW%H:\fR%M:%S\*(R" in localtime. .IP "unsigned long SwishResultPropertyULong(\s-1SW_RESULT\s0 r, char *propertyname);" 4 .IX Item "unsigned long SwishResultPropertyULong(SW_RESULT r, char *propertyname);" Returns a numeric property as an unsigned long. Numeric properties are used for both PropertyNamesNumeric and PropertyNamesDate type of properties. Dates are returned as a unix timestamp as reported by the system when the index was created. .Sp Swish-e will abort if called with a \s-1NULL\s0 \s-1SW_RESULT\s0 object. Without the \s-1SW_RESULT\s0 object swish-e cannot set any error codes. .Sp On error returns \s-1UMAX_LONG\s0. This is commonly defined in limits.h. Check \fISwishError()\fR (see below) for the type of error. .Sp If \fISwishError()\fR returns false (zero) then it simply means that this result does not have any data for the specified property. .Sp If \fISwishError()\fR returns true (non\-zero) then either the propertyname specified is invalid, or the property requested is not a numeric (or date) property (e.g. it's a string property). .Sp See below on how to fetch the specific error message when \fISwishError()\fR is true. .IP "PropValue *getResultPropValue (\s-1SW_RESULT\s0 r, char *propertyname, int \s-1ID\s0 );" 4 .IX Item "PropValue *getResultPropValue (SW_RESULT r, char *propertyname, int ID );" This is a low-level function to fetch a property regardless of type. This is likely the best function for accessing properties. .Sp Swish-e will abort if called with a \s-1NULL\s0 \s-1SW_RESULT\s0 object. Propertyname is the name of the property. \s-1ID\s0 is the id number of the property, if known. \s-1ID\s0 is not normally used in the \s-1API\s0, but it's purpose is to avoid looking up the property \s-1ID\s0 for every result displayed. .Sp The return PropValue is a structure that contains a flag to indicate the type, and a union that holds the property value. They flags and structure are defined in swish\-e.h. .Sp The property must be copied locally and the returned \*(L"PropValue\*(R" value must be freed by calling \fIfreeResultPropValue()\fR to avoid a memory leak. .Sp On error returns \s-1NULL\s0. Check \fISwishError()\fR (see below) for the type of error. .Sp If returns \s-1NULL\s0 but \fISwishError()\fR returns false (zero) then it simply means that this result does not have any data for the specified property. .Sp If \fISwishError()\fR returns true (non\-zero) then the property name specified is invalid (i.e. not defined for the index). .Sp See below on how to fetch the specific error message when \fISwishError()\fR is true. .Sp See perl/API.xs for an example on using this function. .IP "void freeResultPropValue(void)" 4 .IX Item "void freeResultPropValue(void)" Frees the \*(L"PropValue\*(R" returned after calling \fIgetResultPropValue()\fR. .IP "void Free_Results_Object( \s-1SW_RESULTS\s0 results );" 4 .IX Item "void Free_Results_Object( SW_RESULTS results );" Frees the results object (frees the result set). This must be called when done reading the results and before calling \fISwishClose()\fR. .Sh "Accessing the Index Header Values" .IX Subsection "Accessing the Index Header Values" Each index file has associated header values that describe the index. These functions provide access to this data. The header data is returned as a union \s-1SWISH_HEADER_VALUE\s0, and a pointer to a \s-1SWISH_HEADER_TYPE\s0 is passed in and the returned value indicates the type of data that is returned. See src/libtest.c and perl/API.xs for examples. .IP "const char **SwishHeaderNames( \s-1SW_HANDLE\s0 );" 4 .IX Item "const char **SwishHeaderNames( SW_HANDLE );" Returns the list of possible header names. This list is the same for all index files of a given version of Swish\-e. It provides a way to gain access to all headers without having to list them in your program. .IP "const char **SwishIndexNames( \s-1SW_HANDLE\s0 );" 4 .IX Item "const char **SwishIndexNames( SW_HANDLE );" Returns a list of index files opened. This is just the list of index files specified in the \fISwishInit()\fR call. You need the name of the index file to access a specific index's header values. .IP "\s-1SWISH_HEADER_VALUE\s0 SwishHeaderValue( \s-1SW_HANDLE\s0, const char *index_name, const char *cur_header, \s-1SWISH_HEADER_TYPE\s0 *type );" 4 .IX Item "SWISH_HEADER_VALUE SwishHeaderValue( SW_HANDLE, const char *index_name, const char *cur_header, SWISH_HEADER_TYPE *type );" Fetches the header value for the given index file, and the header name. The call sets the \*(L"type\*(R" passed in to the type of value returned. .Sp See src/libtest.c and perl/API.xs for examples. .IP "\s-1SWISH_HEADER_VALUE\s0 SwishResultIndexValue( \s-1SW_RESULT\s0, const char *name, \s-1SWISH_HEADER_TYPE\s0 *type );" 4 .IX Item "SWISH_HEADER_VALUE SwishResultIndexValue( SW_RESULT, const char *name, SWISH_HEADER_TYPE *type );" This is like \fISwishHeaderValue()\fR above, but instead of supplying an index file name and a swish handle, supply a result object and the header value is fetched from the result's related index file. .Sh "Accessing Property Meta Data" .IX Subsection "Accessing Property Meta Data" In addition to the pre-defined standard properties, you have the option of adding additional \*(L"meta\*(R" properties to be indexed and/or added to the list of properties returned with each result. Consult the sections on the MetaNames and PropteryNames directives in the \s-1CONFIGURATION\s0 \s-1FILE\s0 for an explanation of how to do this. .PP These functions provide access to the meta data stored in an index. You can use them to determine what meta/property information is available for an index including all the pre-defined standard properties. See libtest.c for an example. .IP "\s-1SWISH_META_LIST\s0 SwishMetaList( \s-1SW_HANDLE\s0, const char *index_name );" 4 .IX Item "SWISH_META_LIST SwishMetaList( SW_HANDLE, const char *index_name );" Returns the list of meta entries for the given index file as a null-terminated array of \s-1SW_META\s0 objects. Use the functions below to extract specific fields from the \s-1SW_META\s0 structure. Meta's are distinct from properties. .IP "\s-1SWISH_META_LIST\s0 SwishPropertyList( \s-1SW_HANDLE\s0, const char *index_name );" 4 .IX Item "SWISH_META_LIST SwishPropertyList( SW_HANDLE, const char *index_name );" This function is the same as \fISwishMetaList()\fR but it returns an array of properties as opposed to meta objects. Property attributes can be extracted in the same was as meta objects using the functions below. .IP "\s-1SWISH_META_LIST\s0 SwishResultMetaList( \s-1SW_RESULT\s0 );" 4 .IX Item "SWISH_META_LIST SwishResultMetaList( SW_RESULT );" This is like \fISwishMetaList()\fR above but determines the index to use from a result object. .IP "\s-1SWISH_META_LIST\s0 SwishResultPropertyList( \s-1SW_RESULT\s0 );" 4 .IX Item "SWISH_META_LIST SwishResultPropertyList( SW_RESULT );" This is like \fISwishPropertyList()\fR above but like \fISwishResultMetaList()\fR uses a result object instead of an index name. .IP "const char *SwishMetaName( \s-1SW_META\s0 );" 4 .IX Item "const char *SwishMetaName( SW_META );" Given a \s-1SW_META\s0 object returned by one of the above, this function will return the meta/property's name. You can use this name to access a property's value for a given as described above. .IP "int SwishMetaType( \s-1SW_META\s0 );" 4 .IX Item "int SwishMetaType( SW_META );" Get the data type for the given meta/property. Known types are listed in swish\-e.h .IP "SwishMetaID( \s-1SW_META\s0 );" 4 .IX Item "SwishMetaID( SW_META );" Get the internal \s-1ID\s0 number for the given meta/property. These id's are unique per index file but are not unique per results. .Sh "Checking for Errors" .IX Subsection "Checking for Errors" You should check for errors after all calls. The last error is stored in the swish handle object, and is only valid until the next operation (which resets the error flags). .PP Currently, some errors are flagged as \*(L"critical\*(R" errors. In these cases you should destroy (by calling the \fISwishClose()\fR function ) the current swish handle. If you have other objects in scope (e.g. a search object or results object) destroy those first. .PP The types of errors that are critical can be seen in src/error.c. Currently the list includes: .PP .Vb 6 \& Could not open index file \& Unknown index file format \& Index file(s) is empty \& Index file error \& Invalid swish handle \& Invalid results object .Ve .IP "int SwishError( \s-1SW_HANDLE\s0 );" 4 .IX Item "int SwishError( SW_HANDLE );" This returns true if an error condition exists. It returns the error number, which is a integer less than zero on error. This should be checked before calling any of the other error functions below. .IP "const char *SwishErrorString( \s-1SW_HANDLE\s0 );" 4 .IX Item "const char *SwishErrorString( SW_HANDLE );" This returns a general text description of the current error. .IP "const char *SwishLastErrorMsg( \s-1SW_HANDLE\s0 );" 4 .IX Item "const char *SwishLastErrorMsg( SW_HANDLE );" In some cases this will return a string with specifics about the current error. For example, \fISwishErrorString()\fR may return \*(L"Unknown metaname\*(R", but \fISwishLastErrorMsg()\fR will return a string with the name of the unknown metaname. .IP "int SwishCriticalError( \s-1SW_HANDLE\s0 );" 4 .IX Item "int SwishCriticalError( SW_HANDLE );" Returns true if the current error condition is a critical error. On critical errors you should free up any current objects and call \fISwishClose()\fR as swish may be in an unstable state. .IP "void SwishAbortLastError( \s-1SW_HANDLE\s0 );" 4 .IX Item "void SwishAbortLastError( SW_HANDLE );" This is a convenience function that will format and print the last error message, and then abort the program. .IP "void set_error_handle( \s-1FILE\s0 *where );" 4 .IX Item "void set_error_handle( FILE *where );" Sets where errors and warnings are printed (when printed by swish). For historical reasons, when swish-e first starts up errors and warnings are sent to stdout. .IP "void SwishErrorsToStderr( void );" 4 .IX Item "void SwishErrorsToStderr( void );" A convenience method to send errors to stderr instead of stdout. .Sh "Utility Functions" .IX Subsection "Utility Functions" .IP "const char *SwishWordsByLetter(\s-1SWISH\s0 * sw, char *indexname, char c);" 4 .IX Item "const char *SwishWordsByLetter(SWISH * sw, char *indexname, char c);" Returns all the words in the index \*(L"indexname\*(R" that begin with the letter passed in. Returns \s-1NULL\s0 if the name of the index file is invalid. .Sp This fuction may change in the future since only 8\-bit chars can currently be used. .IP "char * SwsishStemWord( \s-1SW_HANDLE\s0 sw, char *in_word );" 4 .IX Item "char * SwsishStemWord( SW_HANDLE sw, char *in_word );" Deprecated .Sp This can be used to convert a word to its stem. It uses only the original Porter Stemmer. .IP "\s-1SW_FUZZYWORD\s0 SwishFuzzyWord( \s-1SW_RESULT\s0 r, char *word );" 4 .IX Item "SW_FUZZYWORD SwishFuzzyWord( SW_RESULT r, char *word );" Stems \*(L"word\*(R" based on the fuzzy mode selected during indexing. .Sp The fuzzy mode used during indexing is stored in the index file. Since each result is linked to a given index file this method allows stemming a word based on it's index file. .Sp One possible use for this is to highlight search terms in a document summary, which would be based on a given result. .Sp The methods below can be used to access the data returned. The \&\s-1SW_FUZZYWORD\s0 object must be freed when done to avoid a memory leak. .IP "const char **SwishFuzzyWordList( \s-1SW_FUZZYWORD\s0 fw );" 4 .IX Item "const char **SwishFuzzyWordList( SW_FUZZYWORD fw );" Returns a null terminated list of strings returned by the stemmer. In most cases this will be a single string. .Sp Here's an example: .Sp .Vb 8 \& SW_FYZZYWORD fuzzy_word = SwishFuzzyWord( result ); \& const char **word_list = SwishFuzzyWordList( fuzzy_word ); \& while ( *word_list ) \& { \& printf("%s\en", *word_list ); \& word_list++; \& } \& SwishFuzzyWordFree( fuzzy_word ); .Ve .Sp If the stemmer does not convert the string (for example attempting to stem numeric data) the word_list will contain the original word. To tell if the stemmer actually stemmed the word check the return value with \&\fISwishFuzzyWordError()\fR. .IP "int SwishFuzzyWordError( \s-1SW_FUZZYWORD\s0 fw );" 4 .IX Item "int SwishFuzzyWordError( SW_FUZZYWORD fw );" This returns zero if the stemming operation was sucessfull, otherwise it returns a value indicating the reason the word was not stemmed. The return values are defined in the swish-e src/stemmer.h file. .Sp Not all stemmers set this value correctly. But since \fISwishFuzzyWordList()\fR will return a valid string regardless of the return value, you can often just ignore this setting. That's what I do. .IP "int SwishFuzzyWordCount( \s-1SW_FUZZYWORD\s0 fw );" 4 .IX Item "int SwishFuzzyWordCount( SW_FUZZYWORD fw );" Returns the count of string in the word list available by calling \&\fISwishFuzzyWordList()\fR. .Sp This is normally just one, but in the case of DoubleMetaphone it can be one or two (i.e. DoubleMetaphone can return one or two strings). .IP "const char *SwishFuzzyMode( \s-1SW_RESULT\s0 r );" 4 .IX Item "const char *SwishFuzzyMode( SW_RESULT r );" Returns the name of the stemmer used for the given result (which is related to an index). .IP "void SwishFuzzyWordFree( \s-1SW_FUZZYWORD\s0 fw );" 4 .IX Item "void SwishFuzzyWordFree( SW_FUZZYWORD fw );" Frees the memory used by the \s-1SW_FUZZYWORD\s0. .SH "Bug-Reports" .IX Header "Bug-Reports" Please report bug reports to the Swish-e discussion group. Feel also free to improve or enhance this feature. .SH "Author" .IX Header "Author" Original interface: Aug 2000 Jose Ruiz jmruiz@boe.es .PP Updated: Aug 22, 2002 \- Bill Moseley .PP Interface redesigned for Swish-e version 2.3 Oct 17, 2002 \- Bill Moseley .SH "Document Info" .IX Header "Document Info" $Id: \s-1SWISH\-LIBRARY\s0.pod 1906 2007\-02\-07 19:25:16Z moseley $ .PP \&. swish-e-2.4.7/configure0000775000077100017500000325620611166010113011743 00000000000000#! /bin/sh # Guess values for system-dependent variables and create Makefiles. # Generated by GNU Autoconf 2.59. # # Copyright (C) 2003 Free Software Foundation, Inc. # This configure script is free software; the Free Software Foundation # gives unlimited permission to copy, distribute and modify it. ## --------------------- ## ## M4sh Initialization. ## ## --------------------- ## # Be Bourne compatible if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then emulate sh NULLCMD=: # Zsh 3.x and 4.x performs word splitting on ${1+"$@"}, which # is contrary to our usage. Disable this feature. alias -g '${1+"$@"}'='"$@"' elif test -n "${BASH_VERSION+set}" && (set -o posix) >/dev/null 2>&1; then set -o posix fi DUALCASE=1; export DUALCASE # for MKS sh # Support unset when possible. if ( (MAIL=60; unset MAIL) || exit) >/dev/null 2>&1; then as_unset=unset else as_unset=false fi # Work around bugs in pre-3.0 UWIN ksh. $as_unset ENV MAIL MAILPATH PS1='$ ' PS2='> ' PS4='+ ' # NLS nuisances. for as_var in \ LANG LANGUAGE LC_ADDRESS LC_ALL LC_COLLATE LC_CTYPE LC_IDENTIFICATION \ LC_MEASUREMENT LC_MESSAGES LC_MONETARY LC_NAME LC_NUMERIC LC_PAPER \ LC_TELEPHONE LC_TIME do if (set +x; test -z "`(eval $as_var=C; export $as_var) 2>&1`"); then eval $as_var=C; export $as_var else $as_unset $as_var fi done # Required to use basename. if expr a : '\(a\)' >/dev/null 2>&1; then as_expr=expr else as_expr=false fi if (basename /) >/dev/null 2>&1 && test "X`basename / 2>&1`" = "X/"; then as_basename=basename else as_basename=false fi # Name of the executable. as_me=`$as_basename "$0" || $as_expr X/"$0" : '.*/\([^/][^/]*\)/*$' \| \ X"$0" : 'X\(//\)$' \| \ X"$0" : 'X\(/\)$' \| \ . : '\(.\)' 2>/dev/null || echo X/"$0" | sed '/^.*\/\([^/][^/]*\)\/*$/{ s//\1/; q; } /^X\/\(\/\/\)$/{ s//\1/; q; } /^X\/\(\/\).*/{ s//\1/; q; } s/.*/./; q'` # PATH needs CR, and LINENO needs CR and PATH. # Avoid depending upon Character Ranges. as_cr_letters='abcdefghijklmnopqrstuvwxyz' as_cr_LETTERS='ABCDEFGHIJKLMNOPQRSTUVWXYZ' as_cr_Letters=$as_cr_letters$as_cr_LETTERS as_cr_digits='0123456789' as_cr_alnum=$as_cr_Letters$as_cr_digits # The user is always right. if test "${PATH_SEPARATOR+set}" != set; then echo "#! /bin/sh" >conf$$.sh echo "exit 0" >>conf$$.sh chmod +x conf$$.sh if (PATH="/nonexistent;."; conf$$.sh) >/dev/null 2>&1; then PATH_SEPARATOR=';' else PATH_SEPARATOR=: fi rm -f conf$$.sh fi as_lineno_1=$LINENO as_lineno_2=$LINENO as_lineno_3=`(expr $as_lineno_1 + 1) 2>/dev/null` test "x$as_lineno_1" != "x$as_lineno_2" && test "x$as_lineno_3" = "x$as_lineno_2" || { # Find who we are. Look in the path if we contain no path at all # relative or not. case $0 in *[\\/]* ) as_myself=$0 ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. test -r "$as_dir/$0" && as_myself=$as_dir/$0 && break done ;; esac # We did not find ourselves, most probably we were run as `sh COMMAND' # in which case we are not to be found in the path. if test "x$as_myself" = x; then as_myself=$0 fi if test ! -f "$as_myself"; then { echo "$as_me: error: cannot find myself; rerun with an absolute path" >&2 { (exit 1); exit 1; }; } fi case $CONFIG_SHELL in '') as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in /bin$PATH_SEPARATOR/usr/bin$PATH_SEPARATOR$PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for as_base in sh bash ksh sh5; do case $as_dir in /*) if ("$as_dir/$as_base" -c ' as_lineno_1=$LINENO as_lineno_2=$LINENO as_lineno_3=`(expr $as_lineno_1 + 1) 2>/dev/null` test "x$as_lineno_1" != "x$as_lineno_2" && test "x$as_lineno_3" = "x$as_lineno_2" ') 2>/dev/null; then $as_unset BASH_ENV || test "${BASH_ENV+set}" != set || { BASH_ENV=; export BASH_ENV; } $as_unset ENV || test "${ENV+set}" != set || { ENV=; export ENV; } CONFIG_SHELL=$as_dir/$as_base export CONFIG_SHELL exec "$CONFIG_SHELL" "$0" ${1+"$@"} fi;; esac done done ;; esac # Create $as_me.lineno as a copy of $as_myself, but with $LINENO # uniformly replaced by the line number. The first 'sed' inserts a # line-number line before each line; the second 'sed' does the real # work. The second script uses 'N' to pair each line-number line # with the numbered line, and appends trailing '-' during # substitution so that $LINENO is not a special case at line end. # (Raja R Harinath suggested sed '=', and Paul Eggert wrote the # second 'sed' script. Blame Lee E. McMahon for sed's syntax. :-) sed '=' <$as_myself | sed ' N s,$,-, : loop s,^\(['$as_cr_digits']*\)\(.*\)[$]LINENO\([^'$as_cr_alnum'_]\),\1\2\1\3, t loop s,-$,, s,^['$as_cr_digits']*\n,, ' >$as_me.lineno && chmod +x $as_me.lineno || { echo "$as_me: error: cannot create $as_me.lineno; rerun with a POSIX shell" >&2 { (exit 1); exit 1; }; } # Don't try to exec as it changes $[0], causing all sort of problems # (the dirname of $[0] is not the place where we might find the # original and so on. Autoconf is especially sensible to this). . ./$as_me.lineno # Exit status is that of the last command. exit } case `echo "testing\c"; echo 1,2,3`,`echo -n testing; echo 1,2,3` in *c*,-n*) ECHO_N= ECHO_C=' ' ECHO_T=' ' ;; *c*,* ) ECHO_N=-n ECHO_C= ECHO_T= ;; *) ECHO_N= ECHO_C='\c' ECHO_T= ;; esac if expr a : '\(a\)' >/dev/null 2>&1; then as_expr=expr else as_expr=false fi rm -f conf$$ conf$$.exe conf$$.file echo >conf$$.file if ln -s conf$$.file conf$$ 2>/dev/null; then # We could just check for DJGPP; but this test a) works b) is more generic # and c) will remain valid once DJGPP supports symlinks (DJGPP 2.04). if test -f conf$$.exe; then # Don't use ln at all; we don't have any links as_ln_s='cp -p' else as_ln_s='ln -s' fi elif ln conf$$.file conf$$ 2>/dev/null; then as_ln_s=ln else as_ln_s='cp -p' fi rm -f conf$$ conf$$.exe conf$$.file if mkdir -p . 2>/dev/null; then as_mkdir_p=: else test -d ./-p && rmdir ./-p as_mkdir_p=false fi as_executable_p="test -f" # Sed expression to map a string onto a valid CPP name. as_tr_cpp="eval sed 'y%*$as_cr_letters%P$as_cr_LETTERS%;s%[^_$as_cr_alnum]%_%g'" # Sed expression to map a string onto a valid variable name. as_tr_sh="eval sed 'y%*+%pp%;s%[^_$as_cr_alnum]%_%g'" # IFS # We need space, tab and new line, in precisely that order. as_nl=' ' IFS=" $as_nl" # CDPATH. $as_unset CDPATH # Check that we are running under the correct shell. SHELL=${CONFIG_SHELL-/bin/sh} case X$ECHO in X*--fallback-echo) # Remove one level of quotation (which was required for Make). ECHO=`echo "$ECHO" | sed 's,\\\\\$\\$0,'$0','` ;; esac echo=${ECHO-echo} if test "X$1" = X--no-reexec; then # Discard the --no-reexec flag, and continue. shift elif test "X$1" = X--fallback-echo; then # Avoid inline document here, it may be left over : elif test "X`($echo '\t') 2>/dev/null`" = 'X\t' ; then # Yippee, $echo works! : else # Restart under the correct shell. exec $SHELL "$0" --no-reexec ${1+"$@"} fi if test "X$1" = X--fallback-echo; then # used as fallback echo shift cat </dev/null 2>&1 && unset CDPATH if test -z "$ECHO"; then if test "X${echo_test_string+set}" != Xset; then # find a string as large as possible, as long as the shell can cope with it for cmd in 'sed 50q "$0"' 'sed 20q "$0"' 'sed 10q "$0"' 'sed 2q "$0"' 'echo test'; do # expected sizes: less than 2Kb, 1Kb, 512 bytes, 16 bytes, ... if (echo_test_string=`eval $cmd`) 2>/dev/null && echo_test_string=`eval $cmd` && (test "X$echo_test_string" = "X$echo_test_string") 2>/dev/null then break fi done fi if test "X`($echo '\t') 2>/dev/null`" = 'X\t' && echo_testing_string=`($echo "$echo_test_string") 2>/dev/null` && test "X$echo_testing_string" = "X$echo_test_string"; then : else # The Solaris, AIX, and Digital Unix default echo programs unquote # backslashes. This makes it impossible to quote backslashes using # echo "$something" | sed 's/\\/\\\\/g' # # So, first we look for a working echo in the user's PATH. lt_save_ifs="$IFS"; IFS=$PATH_SEPARATOR for dir in $PATH /usr/ucb; do IFS="$lt_save_ifs" if (test -f $dir/echo || test -f $dir/echo$ac_exeext) && test "X`($dir/echo '\t') 2>/dev/null`" = 'X\t' && echo_testing_string=`($dir/echo "$echo_test_string") 2>/dev/null` && test "X$echo_testing_string" = "X$echo_test_string"; then echo="$dir/echo" break fi done IFS="$lt_save_ifs" if test "X$echo" = Xecho; then # We didn't find a better echo, so look for alternatives. if test "X`(print -r '\t') 2>/dev/null`" = 'X\t' && echo_testing_string=`(print -r "$echo_test_string") 2>/dev/null` && test "X$echo_testing_string" = "X$echo_test_string"; then # This shell has a builtin print -r that does the trick. echo='print -r' elif (test -f /bin/ksh || test -f /bin/ksh$ac_exeext) && test "X$CONFIG_SHELL" != X/bin/ksh; then # If we have ksh, try running configure again with it. ORIGINAL_CONFIG_SHELL=${CONFIG_SHELL-/bin/sh} export ORIGINAL_CONFIG_SHELL CONFIG_SHELL=/bin/ksh export CONFIG_SHELL exec $CONFIG_SHELL "$0" --no-reexec ${1+"$@"} else # Try using printf. echo='printf %s\n' if test "X`($echo '\t') 2>/dev/null`" = 'X\t' && echo_testing_string=`($echo "$echo_test_string") 2>/dev/null` && test "X$echo_testing_string" = "X$echo_test_string"; then # Cool, printf works : elif echo_testing_string=`($ORIGINAL_CONFIG_SHELL "$0" --fallback-echo '\t') 2>/dev/null` && test "X$echo_testing_string" = 'X\t' && echo_testing_string=`($ORIGINAL_CONFIG_SHELL "$0" --fallback-echo "$echo_test_string") 2>/dev/null` && test "X$echo_testing_string" = "X$echo_test_string"; then CONFIG_SHELL=$ORIGINAL_CONFIG_SHELL export CONFIG_SHELL SHELL="$CONFIG_SHELL" export SHELL echo="$CONFIG_SHELL $0 --fallback-echo" elif echo_testing_string=`($CONFIG_SHELL "$0" --fallback-echo '\t') 2>/dev/null` && test "X$echo_testing_string" = 'X\t' && echo_testing_string=`($CONFIG_SHELL "$0" --fallback-echo "$echo_test_string") 2>/dev/null` && test "X$echo_testing_string" = "X$echo_test_string"; then echo="$CONFIG_SHELL $0 --fallback-echo" else # maybe with a smaller string... prev=: for cmd in 'echo test' 'sed 2q "$0"' 'sed 10q "$0"' 'sed 20q "$0"' 'sed 50q "$0"'; do if (test "X$echo_test_string" = "X`eval $cmd`") 2>/dev/null then break fi prev="$cmd" done if test "$prev" != 'sed 50q "$0"'; then echo_test_string=`eval $prev` export echo_test_string exec ${ORIGINAL_CONFIG_SHELL-${CONFIG_SHELL-/bin/sh}} "$0" ${1+"$@"} else # Oops. We lost completely, so just stick with echo. echo=echo fi fi fi fi fi fi # Copy echo and quote the copy suitably for passing to libtool from # the Makefile, instead of quoting the original, which is used later. ECHO=$echo if test "X$ECHO" = "X$CONFIG_SHELL $0 --fallback-echo"; then ECHO="$CONFIG_SHELL \\\$\$0 --fallback-echo" fi tagnames=${tagnames+${tagnames},}CXX tagnames=${tagnames+${tagnames},}F77 # Name of the host. # hostname on some systems (SVR3.2, Linux) returns a bogus exit status, # so uname gets run too. ac_hostname=`(hostname || uname -n) 2>/dev/null | sed 1q` exec 6>&1 # # Initializations. # ac_default_prefix=/usr/local ac_config_libobj_dir=. cross_compiling=no subdirs= MFLAGS= MAKEFLAGS= SHELL=${CONFIG_SHELL-/bin/sh} # Maximum number of lines to put in a shell here document. # This variable seems obsolete. It should probably be removed, and # only ac_max_sed_lines should be used. : ${ac_max_here_lines=38} # Identity of this package. PACKAGE_NAME= PACKAGE_TARNAME= PACKAGE_VERSION= PACKAGE_STRING= PACKAGE_BUGREPORT= ac_unique_file="src/swish.c" # Factoring default headers for most tests. ac_includes_default="\ #include #if HAVE_SYS_TYPES_H # include #endif #if HAVE_SYS_STAT_H # include #endif #if STDC_HEADERS # include # include #else # if HAVE_STDLIB_H # include # endif #endif #if HAVE_STRING_H # if !STDC_HEADERS && HAVE_MEMORY_H # include # endif # include #endif #if HAVE_STRINGS_H # include #endif #if HAVE_INTTYPES_H # include #else # if HAVE_STDINT_H # include # endif #endif #if HAVE_UNISTD_H # include #endif" ac_subst_vars='SHELL PATH_SEPARATOR PACKAGE_NAME PACKAGE_TARNAME PACKAGE_VERSION PACKAGE_STRING PACKAGE_BUGREPORT exec_prefix prefix program_transform_name bindir sbindir libexecdir datadir sysconfdir sharedstatedir localstatedir libdir includedir oldincludedir infodir mandir build_alias host_alias target_alias DEFS ECHO_C ECHO_N ECHO_T LIBS BUILDDOCS_TRUE BUILDDOCS_FALSE INSTALLDOCS_TRUE INSTALLDOCS_FALSE SWISH_WEB INSTALL_PROGRAM INSTALL_SCRIPT INSTALL_DATA CYGPATH_W PACKAGE VERSION ACLOCAL AUTOCONF AUTOMAKE AUTOHEADER MAKEINFO install_sh STRIP ac_ct_STRIP INSTALL_STRIP_PROGRAM mkdir_p AWK SET_MAKE am__leading_dot AMTAR am__tar am__untar CC CFLAGS LDFLAGS CPPFLAGS ac_ct_CC EXEEXT OBJEXT DEPDIR am__include am__quote AMDEP_TRUE AMDEP_FALSE AMDEPBACKSLASH CCDEPMODE am__fastdepCC_TRUE am__fastdepCC_FALSE build build_cpu build_vendor build_os host host_cpu host_vendor host_os EGREP LN_S ECHO AR ac_ct_AR RANLIB ac_ct_RANLIB DLLTOOL ac_ct_DLLTOOL AS ac_ct_AS OBJDUMP ac_ct_OBJDUMP CPP CXX CXXFLAGS ac_ct_CXX CXXDEPMODE am__fastdepCXX_TRUE am__fastdepCXX_FALSE CXXCPP F77 FFLAGS ac_ct_F77 LIBTOOL MAINTAINER_MODE_TRUE MAINTAINER_MODE_FALSE MAINT PERL POD2MAN ALLOCA LIBOBJS XML2_CONFIG LIBXML_REQUIRED_VERSION LIBXML2_OBJS LIBXML2_LIB LIBXML2_CFLAGS BTREE_OBJS Z_CFLAGS Z_LIBS PCRE_CONFIG PCRE_REQUIRED_VERSION PCRE_CFLAGS PCRE_LIBS LARGEFILES_MACROS LTLIBOBJS' ac_subst_files='' # Initialize some variables set by options. ac_init_help= ac_init_version=false # The variables have the same names as the options, with # dashes changed to underlines. cache_file=/dev/null exec_prefix=NONE no_create= no_recursion= prefix=NONE program_prefix=NONE program_suffix=NONE program_transform_name=s,x,x, silent= site= srcdir= verbose= x_includes=NONE x_libraries=NONE # Installation directory options. # These are left unexpanded so users can "make install exec_prefix=/foo" # and all the variables that are supposed to be based on exec_prefix # by default will actually change. # Use braces instead of parens because sh, perl, etc. also accept them. bindir='${exec_prefix}/bin' sbindir='${exec_prefix}/sbin' libexecdir='${exec_prefix}/libexec' datadir='${prefix}/share' sysconfdir='${prefix}/etc' sharedstatedir='${prefix}/com' localstatedir='${prefix}/var' libdir='${exec_prefix}/lib' includedir='${prefix}/include' oldincludedir='/usr/include' infodir='${prefix}/info' mandir='${prefix}/man' ac_prev= for ac_option do # If the previous option needs an argument, assign it. if test -n "$ac_prev"; then eval "$ac_prev=\$ac_option" ac_prev= continue fi ac_optarg=`expr "x$ac_option" : 'x[^=]*=\(.*\)'` # Accept the important Cygnus configure options, so we can diagnose typos. case $ac_option in -bindir | --bindir | --bindi | --bind | --bin | --bi) ac_prev=bindir ;; -bindir=* | --bindir=* | --bindi=* | --bind=* | --bin=* | --bi=*) bindir=$ac_optarg ;; -build | --build | --buil | --bui | --bu) ac_prev=build_alias ;; -build=* | --build=* | --buil=* | --bui=* | --bu=*) build_alias=$ac_optarg ;; -cache-file | --cache-file | --cache-fil | --cache-fi \ | --cache-f | --cache- | --cache | --cach | --cac | --ca | --c) ac_prev=cache_file ;; -cache-file=* | --cache-file=* | --cache-fil=* | --cache-fi=* \ | --cache-f=* | --cache-=* | --cache=* | --cach=* | --cac=* | --ca=* | --c=*) cache_file=$ac_optarg ;; --config-cache | -C) cache_file=config.cache ;; -datadir | --datadir | --datadi | --datad | --data | --dat | --da) ac_prev=datadir ;; -datadir=* | --datadir=* | --datadi=* | --datad=* | --data=* | --dat=* \ | --da=*) datadir=$ac_optarg ;; -disable-* | --disable-*) ac_feature=`expr "x$ac_option" : 'x-*disable-\(.*\)'` # Reject names that are not valid shell variable names. expr "x$ac_feature" : ".*[^-_$as_cr_alnum]" >/dev/null && { echo "$as_me: error: invalid feature name: $ac_feature" >&2 { (exit 1); exit 1; }; } ac_feature=`echo $ac_feature | sed 's/-/_/g'` eval "enable_$ac_feature=no" ;; -enable-* | --enable-*) ac_feature=`expr "x$ac_option" : 'x-*enable-\([^=]*\)'` # Reject names that are not valid shell variable names. expr "x$ac_feature" : ".*[^-_$as_cr_alnum]" >/dev/null && { echo "$as_me: error: invalid feature name: $ac_feature" >&2 { (exit 1); exit 1; }; } ac_feature=`echo $ac_feature | sed 's/-/_/g'` case $ac_option in *=*) ac_optarg=`echo "$ac_optarg" | sed "s/'/'\\\\\\\\''/g"`;; *) ac_optarg=yes ;; esac eval "enable_$ac_feature='$ac_optarg'" ;; -exec-prefix | --exec_prefix | --exec-prefix | --exec-prefi \ | --exec-pref | --exec-pre | --exec-pr | --exec-p | --exec- \ | --exec | --exe | --ex) ac_prev=exec_prefix ;; -exec-prefix=* | --exec_prefix=* | --exec-prefix=* | --exec-prefi=* \ | --exec-pref=* | --exec-pre=* | --exec-pr=* | --exec-p=* | --exec-=* \ | --exec=* | --exe=* | --ex=*) exec_prefix=$ac_optarg ;; -gas | --gas | --ga | --g) # Obsolete; use --with-gas. with_gas=yes ;; -help | --help | --hel | --he | -h) ac_init_help=long ;; -help=r* | --help=r* | --hel=r* | --he=r* | -hr*) ac_init_help=recursive ;; -help=s* | --help=s* | --hel=s* | --he=s* | -hs*) ac_init_help=short ;; -host | --host | --hos | --ho) ac_prev=host_alias ;; -host=* | --host=* | --hos=* | --ho=*) host_alias=$ac_optarg ;; -includedir | --includedir | --includedi | --included | --include \ | --includ | --inclu | --incl | --inc) ac_prev=includedir ;; -includedir=* | --includedir=* | --includedi=* | --included=* | --include=* \ | --includ=* | --inclu=* | --incl=* | --inc=*) includedir=$ac_optarg ;; -infodir | --infodir | --infodi | --infod | --info | --inf) ac_prev=infodir ;; -infodir=* | --infodir=* | --infodi=* | --infod=* | --info=* | --inf=*) infodir=$ac_optarg ;; -libdir | --libdir | --libdi | --libd) ac_prev=libdir ;; -libdir=* | --libdir=* | --libdi=* | --libd=*) libdir=$ac_optarg ;; -libexecdir | --libexecdir | --libexecdi | --libexecd | --libexec \ | --libexe | --libex | --libe) ac_prev=libexecdir ;; -libexecdir=* | --libexecdir=* | --libexecdi=* | --libexecd=* | --libexec=* \ | --libexe=* | --libex=* | --libe=*) libexecdir=$ac_optarg ;; -localstatedir | --localstatedir | --localstatedi | --localstated \ | --localstate | --localstat | --localsta | --localst \ | --locals | --local | --loca | --loc | --lo) ac_prev=localstatedir ;; -localstatedir=* | --localstatedir=* | --localstatedi=* | --localstated=* \ | --localstate=* | --localstat=* | --localsta=* | --localst=* \ | --locals=* | --local=* | --loca=* | --loc=* | --lo=*) localstatedir=$ac_optarg ;; -mandir | --mandir | --mandi | --mand | --man | --ma | --m) ac_prev=mandir ;; -mandir=* | --mandir=* | --mandi=* | --mand=* | --man=* | --ma=* | --m=*) mandir=$ac_optarg ;; -nfp | --nfp | --nf) # Obsolete; use --without-fp. with_fp=no ;; -no-create | --no-create | --no-creat | --no-crea | --no-cre \ | --no-cr | --no-c | -n) no_create=yes ;; -no-recursion | --no-recursion | --no-recursio | --no-recursi \ | --no-recurs | --no-recur | --no-recu | --no-rec | --no-re | --no-r) no_recursion=yes ;; -oldincludedir | --oldincludedir | --oldincludedi | --oldincluded \ | --oldinclude | --oldinclud | --oldinclu | --oldincl | --oldinc \ | --oldin | --oldi | --old | --ol | --o) ac_prev=oldincludedir ;; -oldincludedir=* | --oldincludedir=* | --oldincludedi=* | --oldincluded=* \ | --oldinclude=* | --oldinclud=* | --oldinclu=* | --oldincl=* | --oldinc=* \ | --oldin=* | --oldi=* | --old=* | --ol=* | --o=*) oldincludedir=$ac_optarg ;; -prefix | --prefix | --prefi | --pref | --pre | --pr | --p) ac_prev=prefix ;; -prefix=* | --prefix=* | --prefi=* | --pref=* | --pre=* | --pr=* | --p=*) prefix=$ac_optarg ;; -program-prefix | --program-prefix | --program-prefi | --program-pref \ | --program-pre | --program-pr | --program-p) ac_prev=program_prefix ;; -program-prefix=* | --program-prefix=* | --program-prefi=* \ | --program-pref=* | --program-pre=* | --program-pr=* | --program-p=*) program_prefix=$ac_optarg ;; -program-suffix | --program-suffix | --program-suffi | --program-suff \ | --program-suf | --program-su | --program-s) ac_prev=program_suffix ;; -program-suffix=* | --program-suffix=* | --program-suffi=* \ | --program-suff=* | --program-suf=* | --program-su=* | --program-s=*) program_suffix=$ac_optarg ;; -program-transform-name | --program-transform-name \ | --program-transform-nam | --program-transform-na \ | --program-transform-n | --program-transform- \ | --program-transform | --program-transfor \ | --program-transfo | --program-transf \ | --program-trans | --program-tran \ | --progr-tra | --program-tr | --program-t) ac_prev=program_transform_name ;; -program-transform-name=* | --program-transform-name=* \ | --program-transform-nam=* | --program-transform-na=* \ | --program-transform-n=* | --program-transform-=* \ | --program-transform=* | --program-transfor=* \ | --program-transfo=* | --program-transf=* \ | --program-trans=* | --program-tran=* \ | --progr-tra=* | --program-tr=* | --program-t=*) program_transform_name=$ac_optarg ;; -q | -quiet | --quiet | --quie | --qui | --qu | --q \ | -silent | --silent | --silen | --sile | --sil) silent=yes ;; -sbindir | --sbindir | --sbindi | --sbind | --sbin | --sbi | --sb) ac_prev=sbindir ;; -sbindir=* | --sbindir=* | --sbindi=* | --sbind=* | --sbin=* \ | --sbi=* | --sb=*) sbindir=$ac_optarg ;; -sharedstatedir | --sharedstatedir | --sharedstatedi \ | --sharedstated | --sharedstate | --sharedstat | --sharedsta \ | --sharedst | --shareds | --shared | --share | --shar \ | --sha | --sh) ac_prev=sharedstatedir ;; -sharedstatedir=* | --sharedstatedir=* | --sharedstatedi=* \ | --sharedstated=* | --sharedstate=* | --sharedstat=* | --sharedsta=* \ | --sharedst=* | --shareds=* | --shared=* | --share=* | --shar=* \ | --sha=* | --sh=*) sharedstatedir=$ac_optarg ;; -site | --site | --sit) ac_prev=site ;; -site=* | --site=* | --sit=*) site=$ac_optarg ;; -srcdir | --srcdir | --srcdi | --srcd | --src | --sr) ac_prev=srcdir ;; -srcdir=* | --srcdir=* | --srcdi=* | --srcd=* | --src=* | --sr=*) srcdir=$ac_optarg ;; -sysconfdir | --sysconfdir | --sysconfdi | --sysconfd | --sysconf \ | --syscon | --sysco | --sysc | --sys | --sy) ac_prev=sysconfdir ;; -sysconfdir=* | --sysconfdir=* | --sysconfdi=* | --sysconfd=* | --sysconf=* \ | --syscon=* | --sysco=* | --sysc=* | --sys=* | --sy=*) sysconfdir=$ac_optarg ;; -target | --target | --targe | --targ | --tar | --ta | --t) ac_prev=target_alias ;; -target=* | --target=* | --targe=* | --targ=* | --tar=* | --ta=* | --t=*) target_alias=$ac_optarg ;; -v | -verbose | --verbose | --verbos | --verbo | --verb) verbose=yes ;; -version | --version | --versio | --versi | --vers | -V) ac_init_version=: ;; -with-* | --with-*) ac_package=`expr "x$ac_option" : 'x-*with-\([^=]*\)'` # Reject names that are not valid shell variable names. expr "x$ac_package" : ".*[^-_$as_cr_alnum]" >/dev/null && { echo "$as_me: error: invalid package name: $ac_package" >&2 { (exit 1); exit 1; }; } ac_package=`echo $ac_package| sed 's/-/_/g'` case $ac_option in *=*) ac_optarg=`echo "$ac_optarg" | sed "s/'/'\\\\\\\\''/g"`;; *) ac_optarg=yes ;; esac eval "with_$ac_package='$ac_optarg'" ;; -without-* | --without-*) ac_package=`expr "x$ac_option" : 'x-*without-\(.*\)'` # Reject names that are not valid shell variable names. expr "x$ac_package" : ".*[^-_$as_cr_alnum]" >/dev/null && { echo "$as_me: error: invalid package name: $ac_package" >&2 { (exit 1); exit 1; }; } ac_package=`echo $ac_package | sed 's/-/_/g'` eval "with_$ac_package=no" ;; --x) # Obsolete; use --with-x. with_x=yes ;; -x-includes | --x-includes | --x-include | --x-includ | --x-inclu \ | --x-incl | --x-inc | --x-in | --x-i) ac_prev=x_includes ;; -x-includes=* | --x-includes=* | --x-include=* | --x-includ=* | --x-inclu=* \ | --x-incl=* | --x-inc=* | --x-in=* | --x-i=*) x_includes=$ac_optarg ;; -x-libraries | --x-libraries | --x-librarie | --x-librari \ | --x-librar | --x-libra | --x-libr | --x-lib | --x-li | --x-l) ac_prev=x_libraries ;; -x-libraries=* | --x-libraries=* | --x-librarie=* | --x-librari=* \ | --x-librar=* | --x-libra=* | --x-libr=* | --x-lib=* | --x-li=* | --x-l=*) x_libraries=$ac_optarg ;; -*) { echo "$as_me: error: unrecognized option: $ac_option Try \`$0 --help' for more information." >&2 { (exit 1); exit 1; }; } ;; *=*) ac_envvar=`expr "x$ac_option" : 'x\([^=]*\)='` # Reject names that are not valid shell variable names. expr "x$ac_envvar" : ".*[^_$as_cr_alnum]" >/dev/null && { echo "$as_me: error: invalid variable name: $ac_envvar" >&2 { (exit 1); exit 1; }; } ac_optarg=`echo "$ac_optarg" | sed "s/'/'\\\\\\\\''/g"` eval "$ac_envvar='$ac_optarg'" export $ac_envvar ;; *) # FIXME: should be removed in autoconf 3.0. echo "$as_me: WARNING: you should use --build, --host, --target" >&2 expr "x$ac_option" : ".*[^-._$as_cr_alnum]" >/dev/null && echo "$as_me: WARNING: invalid host type: $ac_option" >&2 : ${build_alias=$ac_option} ${host_alias=$ac_option} ${target_alias=$ac_option} ;; esac done if test -n "$ac_prev"; then ac_option=--`echo $ac_prev | sed 's/_/-/g'` { echo "$as_me: error: missing argument to $ac_option" >&2 { (exit 1); exit 1; }; } fi # Be sure to have absolute paths. for ac_var in exec_prefix prefix do eval ac_val=$`echo $ac_var` case $ac_val in [\\/$]* | ?:[\\/]* | NONE | '' ) ;; *) { echo "$as_me: error: expected an absolute directory name for --$ac_var: $ac_val" >&2 { (exit 1); exit 1; }; };; esac done # Be sure to have absolute paths. for ac_var in bindir sbindir libexecdir datadir sysconfdir sharedstatedir \ localstatedir libdir includedir oldincludedir infodir mandir do eval ac_val=$`echo $ac_var` case $ac_val in [\\/$]* | ?:[\\/]* ) ;; *) { echo "$as_me: error: expected an absolute directory name for --$ac_var: $ac_val" >&2 { (exit 1); exit 1; }; };; esac done # There might be people who depend on the old broken behavior: `$host' # used to hold the argument of --host etc. # FIXME: To remove some day. build=$build_alias host=$host_alias target=$target_alias # FIXME: To remove some day. if test "x$host_alias" != x; then if test "x$build_alias" = x; then cross_compiling=maybe echo "$as_me: WARNING: If you wanted to set the --build type, don't use --host. If a cross compiler is detected then cross compile mode will be used." >&2 elif test "x$build_alias" != "x$host_alias"; then cross_compiling=yes fi fi ac_tool_prefix= test -n "$host_alias" && ac_tool_prefix=$host_alias- test "$silent" = yes && exec 6>/dev/null # Find the source files, if location was not specified. if test -z "$srcdir"; then ac_srcdir_defaulted=yes # Try the directory containing this script, then its parent. ac_confdir=`(dirname "$0") 2>/dev/null || $as_expr X"$0" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$0" : 'X\(//\)[^/]' \| \ X"$0" : 'X\(//\)$' \| \ X"$0" : 'X\(/\)' \| \ . : '\(.\)' 2>/dev/null || echo X"$0" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/; q; } /^X\(\/\/\)[^/].*/{ s//\1/; q; } /^X\(\/\/\)$/{ s//\1/; q; } /^X\(\/\).*/{ s//\1/; q; } s/.*/./; q'` srcdir=$ac_confdir if test ! -r $srcdir/$ac_unique_file; then srcdir=.. fi else ac_srcdir_defaulted=no fi if test ! -r $srcdir/$ac_unique_file; then if test "$ac_srcdir_defaulted" = yes; then { echo "$as_me: error: cannot find sources ($ac_unique_file) in $ac_confdir or .." >&2 { (exit 1); exit 1; }; } else { echo "$as_me: error: cannot find sources ($ac_unique_file) in $srcdir" >&2 { (exit 1); exit 1; }; } fi fi (cd $srcdir && test -r ./$ac_unique_file) 2>/dev/null || { echo "$as_me: error: sources are in $srcdir, but \`cd $srcdir' does not work" >&2 { (exit 1); exit 1; }; } srcdir=`echo "$srcdir" | sed 's%\([^\\/]\)[\\/]*$%\1%'` ac_env_build_alias_set=${build_alias+set} ac_env_build_alias_value=$build_alias ac_cv_env_build_alias_set=${build_alias+set} ac_cv_env_build_alias_value=$build_alias ac_env_host_alias_set=${host_alias+set} ac_env_host_alias_value=$host_alias ac_cv_env_host_alias_set=${host_alias+set} ac_cv_env_host_alias_value=$host_alias ac_env_target_alias_set=${target_alias+set} ac_env_target_alias_value=$target_alias ac_cv_env_target_alias_set=${target_alias+set} ac_cv_env_target_alias_value=$target_alias ac_env_CC_set=${CC+set} ac_env_CC_value=$CC ac_cv_env_CC_set=${CC+set} ac_cv_env_CC_value=$CC ac_env_CFLAGS_set=${CFLAGS+set} ac_env_CFLAGS_value=$CFLAGS ac_cv_env_CFLAGS_set=${CFLAGS+set} ac_cv_env_CFLAGS_value=$CFLAGS ac_env_LDFLAGS_set=${LDFLAGS+set} ac_env_LDFLAGS_value=$LDFLAGS ac_cv_env_LDFLAGS_set=${LDFLAGS+set} ac_cv_env_LDFLAGS_value=$LDFLAGS ac_env_CPPFLAGS_set=${CPPFLAGS+set} ac_env_CPPFLAGS_value=$CPPFLAGS ac_cv_env_CPPFLAGS_set=${CPPFLAGS+set} ac_cv_env_CPPFLAGS_value=$CPPFLAGS ac_env_CPP_set=${CPP+set} ac_env_CPP_value=$CPP ac_cv_env_CPP_set=${CPP+set} ac_cv_env_CPP_value=$CPP ac_env_CXX_set=${CXX+set} ac_env_CXX_value=$CXX ac_cv_env_CXX_set=${CXX+set} ac_cv_env_CXX_value=$CXX ac_env_CXXFLAGS_set=${CXXFLAGS+set} ac_env_CXXFLAGS_value=$CXXFLAGS ac_cv_env_CXXFLAGS_set=${CXXFLAGS+set} ac_cv_env_CXXFLAGS_value=$CXXFLAGS ac_env_CXXCPP_set=${CXXCPP+set} ac_env_CXXCPP_value=$CXXCPP ac_cv_env_CXXCPP_set=${CXXCPP+set} ac_cv_env_CXXCPP_value=$CXXCPP ac_env_F77_set=${F77+set} ac_env_F77_value=$F77 ac_cv_env_F77_set=${F77+set} ac_cv_env_F77_value=$F77 ac_env_FFLAGS_set=${FFLAGS+set} ac_env_FFLAGS_value=$FFLAGS ac_cv_env_FFLAGS_set=${FFLAGS+set} ac_cv_env_FFLAGS_value=$FFLAGS # # Report the --help message. # if test "$ac_init_help" = "long"; then # Omit some internal or obsolete options to make the list less imposing. # This message is too long to be a string in the A/UX 3.1 sh. cat <<_ACEOF \`configure' configures this package to adapt to many kinds of systems. Usage: $0 [OPTION]... [VAR=VALUE]... To assign environment variables (e.g., CC, CFLAGS...), specify them as VAR=VALUE. See below for descriptions of some of the useful variables. Defaults for the options are specified in brackets. Configuration: -h, --help display this help and exit --help=short display options specific to this package --help=recursive display the short help of all the included packages -V, --version display version information and exit -q, --quiet, --silent do not print \`checking...' messages --cache-file=FILE cache test results in FILE [disabled] -C, --config-cache alias for \`--cache-file=config.cache' -n, --no-create do not create output files --srcdir=DIR find the sources in DIR [configure dir or \`..'] _ACEOF cat <<_ACEOF Installation directories: --prefix=PREFIX install architecture-independent files in PREFIX [$ac_default_prefix] --exec-prefix=EPREFIX install architecture-dependent files in EPREFIX [PREFIX] By default, \`make install' will install all the files in \`$ac_default_prefix/bin', \`$ac_default_prefix/lib' etc. You can specify an installation prefix other than \`$ac_default_prefix' using \`--prefix', for instance \`--prefix=\$HOME'. For better control, use the options below. Fine tuning of the installation directories: --bindir=DIR user executables [EPREFIX/bin] --sbindir=DIR system admin executables [EPREFIX/sbin] --libexecdir=DIR program executables [EPREFIX/libexec] --datadir=DIR read-only architecture-independent data [PREFIX/share] --sysconfdir=DIR read-only single-machine data [PREFIX/etc] --sharedstatedir=DIR modifiable architecture-independent data [PREFIX/com] --localstatedir=DIR modifiable single-machine data [PREFIX/var] --libdir=DIR object code libraries [EPREFIX/lib] --includedir=DIR C header files [PREFIX/include] --oldincludedir=DIR C header files for non-gcc [/usr/include] --infodir=DIR info documentation [PREFIX/info] --mandir=DIR man documentation [PREFIX/man] _ACEOF cat <<\_ACEOF Program names: --program-prefix=PREFIX prepend PREFIX to installed program names --program-suffix=SUFFIX append SUFFIX to installed program names --program-transform-name=PROGRAM run sed PROGRAM on installed program names System types: --build=BUILD configure for building on BUILD [guessed] --host=HOST cross-compile to build programs to run on HOST [BUILD] _ACEOF fi if test -n "$ac_init_help"; then cat <<\_ACEOF Optional Features: --disable-FEATURE do not include FEATURE (same as --enable-FEATURE=no) --enable-FEATURE[=ARG] include FEATURE [ARG=yes] --enable-daystamp Adds today's date to version --disable-dependency-tracking speeds up one-time build --enable-dependency-tracking do not reject slow dependency extractors --enable-shared[=PKGS] build shared libraries [default=yes] --enable-static[=PKGS] build static libraries [default=yes] --enable-fast-install[=PKGS] optimize for fast installation [default=yes] --disable-libtool-lock avoid locking (might break parallel builds) --enable-maintainer-mode enable make rules and dependencies not useful (and sometimes confusing) to the casual installer --enable-incremental ** developer use only ** --enable-psortarray ** and use ARRAY persort arrays (if incremental) --disable-largefile omit support for large files --enable-memdebug (developers only) checks for memory consistency on alloc/free using guards --enable-memtrace (developers only) checks for unfreed memory, and where it is allocated --enable-memstats (developers only) gives memory statistics (bytes allocated, calls, etc) Optional Packages: --with-PACKAGE[=ARG] use PACKAGE [ARG=yes] --without-PACKAGE do not use PACKAGE (same as --with-PACKAGE=no) --with-website=DIR use swish-e.org website src in DIR (YES if found) --with-gnu-ld assume the C compiler uses GNU ld [default=no] --with-pic try to use only PIC/non-PIC objects [default=use both] --with-tags[=TAGS] include additional configurations [automatic] --with-libxml2=DIR use libxml2 in DIR (YES if found) --with-zlib[=DIR] use libz in DIR --with-pcre=DIR use pcre in DIR (YES if found) Some influential environment variables: CC C compiler command CFLAGS C compiler flags LDFLAGS linker flags, e.g. -L if you have libraries in a nonstandard directory CPPFLAGS C/C++ preprocessor flags, e.g. -I if you have headers in a nonstandard directory CPP C preprocessor CXX C++ compiler command CXXFLAGS C++ compiler flags CXXCPP C++ preprocessor F77 Fortran 77 compiler command FFLAGS Fortran 77 compiler flags Use these variables to override the choices made by `configure' or to help it to find libraries and programs with nonstandard names/locations. _ACEOF fi if test "$ac_init_help" = "recursive"; then # If there are subdirs, report their specific --help. ac_popdir=`pwd` for ac_dir in : $ac_subdirs_all; do test "x$ac_dir" = x: && continue test -d $ac_dir || continue ac_builddir=. if test "$ac_dir" != .; then ac_dir_suffix=/`echo "$ac_dir" | sed 's,^\.[\\/],,'` # A "../" for each directory in $ac_dir_suffix. ac_top_builddir=`echo "$ac_dir_suffix" | sed 's,/[^\\/]*,../,g'` else ac_dir_suffix= ac_top_builddir= fi case $srcdir in .) # No --srcdir option. We are building in place. ac_srcdir=. if test -z "$ac_top_builddir"; then ac_top_srcdir=. else ac_top_srcdir=`echo $ac_top_builddir | sed 's,/$,,'` fi ;; [\\/]* | ?:[\\/]* ) # Absolute path. ac_srcdir=$srcdir$ac_dir_suffix; ac_top_srcdir=$srcdir ;; *) # Relative path. ac_srcdir=$ac_top_builddir$srcdir$ac_dir_suffix ac_top_srcdir=$ac_top_builddir$srcdir ;; esac # Do not use `cd foo && pwd` to compute absolute paths, because # the directories may not exist. case `pwd` in .) ac_abs_builddir="$ac_dir";; *) case "$ac_dir" in .) ac_abs_builddir=`pwd`;; [\\/]* | ?:[\\/]* ) ac_abs_builddir="$ac_dir";; *) ac_abs_builddir=`pwd`/"$ac_dir";; esac;; esac case $ac_abs_builddir in .) ac_abs_top_builddir=${ac_top_builddir}.;; *) case ${ac_top_builddir}. in .) ac_abs_top_builddir=$ac_abs_builddir;; [\\/]* | ?:[\\/]* ) ac_abs_top_builddir=${ac_top_builddir}.;; *) ac_abs_top_builddir=$ac_abs_builddir/${ac_top_builddir}.;; esac;; esac case $ac_abs_builddir in .) ac_abs_srcdir=$ac_srcdir;; *) case $ac_srcdir in .) ac_abs_srcdir=$ac_abs_builddir;; [\\/]* | ?:[\\/]* ) ac_abs_srcdir=$ac_srcdir;; *) ac_abs_srcdir=$ac_abs_builddir/$ac_srcdir;; esac;; esac case $ac_abs_builddir in .) ac_abs_top_srcdir=$ac_top_srcdir;; *) case $ac_top_srcdir in .) ac_abs_top_srcdir=$ac_abs_builddir;; [\\/]* | ?:[\\/]* ) ac_abs_top_srcdir=$ac_top_srcdir;; *) ac_abs_top_srcdir=$ac_abs_builddir/$ac_top_srcdir;; esac;; esac cd $ac_dir # Check for guested configure; otherwise get Cygnus style configure. if test -f $ac_srcdir/configure.gnu; then echo $SHELL $ac_srcdir/configure.gnu --help=recursive elif test -f $ac_srcdir/configure; then echo $SHELL $ac_srcdir/configure --help=recursive elif test -f $ac_srcdir/configure.ac || test -f $ac_srcdir/configure.in; then echo $ac_configure --help else echo "$as_me: WARNING: no configuration information is in $ac_dir" >&2 fi cd $ac_popdir done fi test -n "$ac_init_help" && exit 0 if $ac_init_version; then cat <<\_ACEOF Copyright (C) 2003 Free Software Foundation, Inc. This configure script is free software; the Free Software Foundation gives unlimited permission to copy, distribute and modify it. _ACEOF exit 0 fi exec 5>config.log cat >&5 <<_ACEOF This file contains any messages produced by compilers while running configure, to aid debugging if configure makes a mistake. It was created by $as_me, which was generated by GNU Autoconf 2.59. Invocation command line was $ $0 $@ _ACEOF { cat <<_ASUNAME ## --------- ## ## Platform. ## ## --------- ## hostname = `(hostname || uname -n) 2>/dev/null | sed 1q` uname -m = `(uname -m) 2>/dev/null || echo unknown` uname -r = `(uname -r) 2>/dev/null || echo unknown` uname -s = `(uname -s) 2>/dev/null || echo unknown` uname -v = `(uname -v) 2>/dev/null || echo unknown` /usr/bin/uname -p = `(/usr/bin/uname -p) 2>/dev/null || echo unknown` /bin/uname -X = `(/bin/uname -X) 2>/dev/null || echo unknown` /bin/arch = `(/bin/arch) 2>/dev/null || echo unknown` /usr/bin/arch -k = `(/usr/bin/arch -k) 2>/dev/null || echo unknown` /usr/convex/getsysinfo = `(/usr/convex/getsysinfo) 2>/dev/null || echo unknown` hostinfo = `(hostinfo) 2>/dev/null || echo unknown` /bin/machine = `(/bin/machine) 2>/dev/null || echo unknown` /usr/bin/oslevel = `(/usr/bin/oslevel) 2>/dev/null || echo unknown` /bin/universe = `(/bin/universe) 2>/dev/null || echo unknown` _ASUNAME as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. echo "PATH: $as_dir" done } >&5 cat >&5 <<_ACEOF ## ----------- ## ## Core tests. ## ## ----------- ## _ACEOF # Keep a trace of the command line. # Strip out --no-create and --no-recursion so they do not pile up. # Strip out --silent because we don't want to record it for future runs. # Also quote any args containing shell meta-characters. # Make two passes to allow for proper duplicate-argument suppression. ac_configure_args= ac_configure_args0= ac_configure_args1= ac_sep= ac_must_keep_next=false for ac_pass in 1 2 do for ac_arg do case $ac_arg in -no-create | --no-c* | -n | -no-recursion | --no-r*) continue ;; -q | -quiet | --quiet | --quie | --qui | --qu | --q \ | -silent | --silent | --silen | --sile | --sil) continue ;; *" "*|*" "*|*[\[\]\~\#\$\^\&\*\(\)\{\}\\\|\;\<\>\?\"\']*) ac_arg=`echo "$ac_arg" | sed "s/'/'\\\\\\\\''/g"` ;; esac case $ac_pass in 1) ac_configure_args0="$ac_configure_args0 '$ac_arg'" ;; 2) ac_configure_args1="$ac_configure_args1 '$ac_arg'" if test $ac_must_keep_next = true; then ac_must_keep_next=false # Got value, back to normal. else case $ac_arg in *=* | --config-cache | -C | -disable-* | --disable-* \ | -enable-* | --enable-* | -gas | --g* | -nfp | --nf* \ | -q | -quiet | --q* | -silent | --sil* | -v | -verb* \ | -with-* | --with-* | -without-* | --without-* | --x) case "$ac_configure_args0 " in "$ac_configure_args1"*" '$ac_arg' "* ) continue ;; esac ;; -* ) ac_must_keep_next=true ;; esac fi ac_configure_args="$ac_configure_args$ac_sep'$ac_arg'" # Get rid of the leading space. ac_sep=" " ;; esac done done $as_unset ac_configure_args0 || test "${ac_configure_args0+set}" != set || { ac_configure_args0=; export ac_configure_args0; } $as_unset ac_configure_args1 || test "${ac_configure_args1+set}" != set || { ac_configure_args1=; export ac_configure_args1; } # When interrupted or exit'd, cleanup temporary files, and complete # config.log. We remove comments because anyway the quotes in there # would cause problems or look ugly. # WARNING: Be sure not to use single quotes in there, as some shells, # such as our DU 5.0 friend, will then `close' the trap. trap 'exit_status=$? # Save into config.log some information that might help in debugging. { echo cat <<\_ASBOX ## ---------------- ## ## Cache variables. ## ## ---------------- ## _ASBOX echo # The following way of writing the cache mishandles newlines in values, { (set) 2>&1 | case `(ac_space='"'"' '"'"'; set | grep ac_space) 2>&1` in *ac_space=\ *) sed -n \ "s/'"'"'/'"'"'\\\\'"'"''"'"'/g; s/^\\([_$as_cr_alnum]*_cv_[_$as_cr_alnum]*\\)=\\(.*\\)/\\1='"'"'\\2'"'"'/p" ;; *) sed -n \ "s/^\\([_$as_cr_alnum]*_cv_[_$as_cr_alnum]*\\)=\\(.*\\)/\\1=\\2/p" ;; esac; } echo cat <<\_ASBOX ## ----------------- ## ## Output variables. ## ## ----------------- ## _ASBOX echo for ac_var in $ac_subst_vars do eval ac_val=$`echo $ac_var` echo "$ac_var='"'"'$ac_val'"'"'" done | sort echo if test -n "$ac_subst_files"; then cat <<\_ASBOX ## ------------- ## ## Output files. ## ## ------------- ## _ASBOX echo for ac_var in $ac_subst_files do eval ac_val=$`echo $ac_var` echo "$ac_var='"'"'$ac_val'"'"'" done | sort echo fi if test -s confdefs.h; then cat <<\_ASBOX ## ----------- ## ## confdefs.h. ## ## ----------- ## _ASBOX echo sed "/^$/d" confdefs.h | sort echo fi test "$ac_signal" != 0 && echo "$as_me: caught signal $ac_signal" echo "$as_me: exit $exit_status" } >&5 rm -f core *.core && rm -rf conftest* confdefs* conf$$* $ac_clean_files && exit $exit_status ' 0 for ac_signal in 1 2 13 15; do trap 'ac_signal='$ac_signal'; { (exit 1); exit 1; }' $ac_signal done ac_signal=0 # confdefs.h avoids OS command line length limits that DEFS can exceed. rm -rf conftest* confdefs.h # AIX cpp loses on an empty file, so make sure it contains at least a newline. echo >confdefs.h # Predefined preprocessor variables. cat >>confdefs.h <<_ACEOF #define PACKAGE_NAME "$PACKAGE_NAME" _ACEOF cat >>confdefs.h <<_ACEOF #define PACKAGE_TARNAME "$PACKAGE_TARNAME" _ACEOF cat >>confdefs.h <<_ACEOF #define PACKAGE_VERSION "$PACKAGE_VERSION" _ACEOF cat >>confdefs.h <<_ACEOF #define PACKAGE_STRING "$PACKAGE_STRING" _ACEOF cat >>confdefs.h <<_ACEOF #define PACKAGE_BUGREPORT "$PACKAGE_BUGREPORT" _ACEOF # Let the site file select an alternate cache file if it wants to. # Prefer explicitly selected file to automatically selected ones. if test -z "$CONFIG_SITE"; then if test "x$prefix" != xNONE; then CONFIG_SITE="$prefix/share/config.site $prefix/etc/config.site" else CONFIG_SITE="$ac_default_prefix/share/config.site $ac_default_prefix/etc/config.site" fi fi for ac_site_file in $CONFIG_SITE; do if test -r "$ac_site_file"; then { echo "$as_me:$LINENO: loading site script $ac_site_file" >&5 echo "$as_me: loading site script $ac_site_file" >&6;} sed 's/^/| /' "$ac_site_file" >&5 . "$ac_site_file" fi done if test -r "$cache_file"; then # Some versions of bash will fail to source /dev/null (special # files actually), so we avoid doing that. if test -f "$cache_file"; then { echo "$as_me:$LINENO: loading cache $cache_file" >&5 echo "$as_me: loading cache $cache_file" >&6;} case $cache_file in [\\/]* | ?:[\\/]* ) . $cache_file;; *) . ./$cache_file;; esac fi else { echo "$as_me:$LINENO: creating cache $cache_file" >&5 echo "$as_me: creating cache $cache_file" >&6;} >$cache_file fi # Check that the precious variables saved in the cache have kept the same # value. ac_cache_corrupted=false for ac_var in `(set) 2>&1 | sed -n 's/^ac_env_\([a-zA-Z_0-9]*\)_set=.*/\1/p'`; do eval ac_old_set=\$ac_cv_env_${ac_var}_set eval ac_new_set=\$ac_env_${ac_var}_set eval ac_old_val="\$ac_cv_env_${ac_var}_value" eval ac_new_val="\$ac_env_${ac_var}_value" case $ac_old_set,$ac_new_set in set,) { echo "$as_me:$LINENO: error: \`$ac_var' was set to \`$ac_old_val' in the previous run" >&5 echo "$as_me: error: \`$ac_var' was set to \`$ac_old_val' in the previous run" >&2;} ac_cache_corrupted=: ;; ,set) { echo "$as_me:$LINENO: error: \`$ac_var' was not set in the previous run" >&5 echo "$as_me: error: \`$ac_var' was not set in the previous run" >&2;} ac_cache_corrupted=: ;; ,);; *) if test "x$ac_old_val" != "x$ac_new_val"; then { echo "$as_me:$LINENO: error: \`$ac_var' has changed since the previous run:" >&5 echo "$as_me: error: \`$ac_var' has changed since the previous run:" >&2;} { echo "$as_me:$LINENO: former value: $ac_old_val" >&5 echo "$as_me: former value: $ac_old_val" >&2;} { echo "$as_me:$LINENO: current value: $ac_new_val" >&5 echo "$as_me: current value: $ac_new_val" >&2;} ac_cache_corrupted=: fi;; esac # Pass precious variables to config.status. if test "$ac_new_set" = set; then case $ac_new_val in *" "*|*" "*|*[\[\]\~\#\$\^\&\*\(\)\{\}\\\|\;\<\>\?\"\']*) ac_arg=$ac_var=`echo "$ac_new_val" | sed "s/'/'\\\\\\\\''/g"` ;; *) ac_arg=$ac_var=$ac_new_val ;; esac case " $ac_configure_args " in *" '$ac_arg' "*) ;; # Avoid dups. Use of quotes ensures accuracy. *) ac_configure_args="$ac_configure_args '$ac_arg'" ;; esac fi done if $ac_cache_corrupted; then { echo "$as_me:$LINENO: error: changes in the environment can compromise the build" >&5 echo "$as_me: error: changes in the environment can compromise the build" >&2;} { { echo "$as_me:$LINENO: error: run \`make distclean' and/or \`rm $cache_file' and start over" >&5 echo "$as_me: error: run \`make distclean' and/or \`rm $cache_file' and start over" >&2;} { (exit 1); exit 1; }; } fi ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu ac_aux_dir= for ac_dir in config $srcdir/config; do if test -f $ac_dir/install-sh; then ac_aux_dir=$ac_dir ac_install_sh="$ac_aux_dir/install-sh -c" break elif test -f $ac_dir/install.sh; then ac_aux_dir=$ac_dir ac_install_sh="$ac_aux_dir/install.sh -c" break elif test -f $ac_dir/shtool; then ac_aux_dir=$ac_dir ac_install_sh="$ac_aux_dir/shtool install -c" break fi done if test -z "$ac_aux_dir"; then { { echo "$as_me:$LINENO: error: cannot find install-sh or install.sh in config $srcdir/config" >&5 echo "$as_me: error: cannot find install-sh or install.sh in config $srcdir/config" >&2;} { (exit 1); exit 1; }; } fi ac_config_guess="$SHELL $ac_aux_dir/config.guess" ac_config_sub="$SHELL $ac_aux_dir/config.sub" ac_configure="$SHELL $ac_aux_dir/configure" # This should be Cygnus configure. PACKAGE=swish-e MAJOR_VERSION=2 MINOR_VERSION=4 MICRO_VERSION=7 INTERFACE_AGE=0 BINARY_AGE=0 VERSION=$MAJOR_VERSION.$MINOR_VERSION.$MICRO_VERSION SWISH_WEB="" if false ; then BUILDDOCS_TRUE= BUILDDOCS_FALSE='#' else BUILDDOCS_TRUE='#' BUILDDOCS_FALSE= fi if false ; then INSTALLDOCS_TRUE= INSTALLDOCS_FALSE='#' else INSTALLDOCS_TRUE='#' INSTALLDOCS_FALSE= fi # Check whether --with-website or --without-website was given. if test "${with_website+set}" = set; then withval="$with_website" else withval=no fi; if test "x$withval" != "xno"; then SWISH_WEB="$withval/bin/build" if test ! -f "$SWISH_WEB"; then { { echo "$as_me:$LINENO: error: Failed to find program to build swish-e html docs \"$SWISH_WEB\"" >&5 echo "$as_me: error: Failed to find program to build swish-e html docs \"$SWISH_WEB\"" >&2;} { (exit 1); exit 1; }; } fi else # Extract the first word of "build-swish-docs", so it can be a program name with args. set dummy build-swish-docs; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_path_SWISH_WEB+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else case $SWISH_WEB in [\\/]* | ?:[\\/]*) ac_cv_path_SWISH_WEB="$SWISH_WEB" # Let the user override the test with a path. ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_path_SWISH_WEB="$as_dir/$ac_word$ac_exec_ext" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done ;; esac fi SWISH_WEB=$ac_cv_path_SWISH_WEB if test -n "$SWISH_WEB"; then echo "$as_me:$LINENO: result: $SWISH_WEB" >&5 echo "${ECHO_T}$SWISH_WEB" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi fi if test -n "$SWISH_WEB"; then SWISH_WEB_CHK=`$SWISH_WEB -check` if test "x$SWISH_WEB_CHK" = xa-ok; then echo "$as_me:$LINENO: result: Building html docs with $SWISH_WEB" >&5 echo "${ECHO_T}Building html docs with $SWISH_WEB" >&6 if true ; then BUILDDOCS_TRUE= BUILDDOCS_FALSE='#' else BUILDDOCS_TRUE='#' BUILDDOCS_FALSE= fi if true ; then INSTALLDOCS_TRUE= INSTALLDOCS_FALSE='#' else INSTALLDOCS_TRUE='#' INSTALLDOCS_FALSE= fi else { { echo "$as_me:$LINENO: error: problem running '$SWISH_WEB -check'. Returned '$SWISH_WEB_CHECK'" >&5 echo "$as_me: error: problem running '$SWISH_WEB -check'. Returned '$SWISH_WEB_CHECK'" >&2;} { (exit 1); exit 1; }; } fi else if test -f "$srcdir/html/readme.html"; then if true; then INSTALLDOCS_TRUE= INSTALLDOCS_FALSE='#' else INSTALLDOCS_TRUE='#' INSTALLDOCS_FALSE= fi else { echo "$as_me:$LINENO: WARNING: ** Not installing HTML docs. \"$srcdir/html/README.html\" not found **" >&5 echo "$as_me: WARNING: ** Not installing HTML docs. \"$srcdir/html/README.html\" not found **" >&2;} fi fi # Check whether --enable-daystamp or --disable-daystamp was given. if test "${enable_daystamp+set}" = set; then enableval="$enable_daystamp" daystamp=yes fi; if test x$daystamp = xyes; then TODAY=`/bin/date +%Y-%m-%d` VERSION="$VERSION-$TODAY" fi ac_config_headers="$ac_config_headers src/acconfig.h" am__api_version="1.9" # Find a good install program. We prefer a C program (faster), # so one script is as good as another. But avoid the broken or # incompatible versions: # SysV /etc/install, /usr/sbin/install # SunOS /usr/etc/install # IRIX /sbin/install # AIX /bin/install # AmigaOS /C/install, which installs bootblocks on floppy discs # AIX 4 /usr/bin/installbsd, which doesn't work without a -g flag # AFS /usr/afsws/bin/install, which mishandles nonexistent args # SVR4 /usr/ucb/install, which tries to use the nonexistent group "staff" # OS/2's system install, which has a completely different semantic # ./install, which can be erroneously created by make from ./install.sh. echo "$as_me:$LINENO: checking for a BSD-compatible install" >&5 echo $ECHO_N "checking for a BSD-compatible install... $ECHO_C" >&6 if test -z "$INSTALL"; then if test "${ac_cv_path_install+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. # Account for people who put trailing slashes in PATH elements. case $as_dir/ in ./ | .// | /cC/* | \ /etc/* | /usr/sbin/* | /usr/etc/* | /sbin/* | /usr/afsws/bin/* | \ ?:\\/os2\\/install\\/* | ?:\\/OS2\\/INSTALL\\/* | \ /usr/ucb/* ) ;; *) # OSF1 and SCO ODT 3.0 have their own names for install. # Don't use installbsd from OSF since it installs stuff as root # by default. for ac_prog in ginstall scoinst install; do for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_prog$ac_exec_ext"; then if test $ac_prog = install && grep dspmsg "$as_dir/$ac_prog$ac_exec_ext" >/dev/null 2>&1; then # AIX install. It has an incompatible calling convention. : elif test $ac_prog = install && grep pwplus "$as_dir/$ac_prog$ac_exec_ext" >/dev/null 2>&1; then # program-specific install script used by HP pwplus--don't use. : else ac_cv_path_install="$as_dir/$ac_prog$ac_exec_ext -c" break 3 fi fi done done ;; esac done fi if test "${ac_cv_path_install+set}" = set; then INSTALL=$ac_cv_path_install else # As a last resort, use the slow shell script. We don't cache a # path for INSTALL within a source directory, because that will # break other packages using the cache if that directory is # removed, or if the path is relative. INSTALL=$ac_install_sh fi fi echo "$as_me:$LINENO: result: $INSTALL" >&5 echo "${ECHO_T}$INSTALL" >&6 # Use test -z because SunOS4 sh mishandles braces in ${var-val}. # It thinks the first close brace ends the variable substitution. test -z "$INSTALL_PROGRAM" && INSTALL_PROGRAM='${INSTALL}' test -z "$INSTALL_SCRIPT" && INSTALL_SCRIPT='${INSTALL}' test -z "$INSTALL_DATA" && INSTALL_DATA='${INSTALL} -m 644' echo "$as_me:$LINENO: checking whether build environment is sane" >&5 echo $ECHO_N "checking whether build environment is sane... $ECHO_C" >&6 # Just in case sleep 1 echo timestamp > conftest.file # Do `set' in a subshell so we don't clobber the current shell's # arguments. Must try -L first in case configure is actually a # symlink; some systems play weird games with the mod time of symlinks # (eg FreeBSD returns the mod time of the symlink's containing # directory). if ( set X `ls -Lt $srcdir/configure conftest.file 2> /dev/null` if test "$*" = "X"; then # -L didn't work. set X `ls -t $srcdir/configure conftest.file` fi rm -f conftest.file if test "$*" != "X $srcdir/configure conftest.file" \ && test "$*" != "X conftest.file $srcdir/configure"; then # If neither matched, then we have a broken ls. This can happen # if, for instance, CONFIG_SHELL is bash and it inherits a # broken ls alias from the environment. This has actually # happened. Such a system could not be considered "sane". { { echo "$as_me:$LINENO: error: ls -t appears to fail. Make sure there is not a broken alias in your environment" >&5 echo "$as_me: error: ls -t appears to fail. Make sure there is not a broken alias in your environment" >&2;} { (exit 1); exit 1; }; } fi test "$2" = conftest.file ) then # Ok. : else { { echo "$as_me:$LINENO: error: newly created file is older than distributed files! Check your system clock" >&5 echo "$as_me: error: newly created file is older than distributed files! Check your system clock" >&2;} { (exit 1); exit 1; }; } fi echo "$as_me:$LINENO: result: yes" >&5 echo "${ECHO_T}yes" >&6 test "$program_prefix" != NONE && program_transform_name="s,^,$program_prefix,;$program_transform_name" # Use a double $ so make ignores it. test "$program_suffix" != NONE && program_transform_name="s,\$,$program_suffix,;$program_transform_name" # Double any \ or $. echo might interpret backslashes. # By default was `s,x,x', remove it if useless. cat <<\_ACEOF >conftest.sed s/[\\$]/&&/g;s/;s,x,x,$// _ACEOF program_transform_name=`echo $program_transform_name | sed -f conftest.sed` rm conftest.sed # expand $ac_aux_dir to an absolute path am_aux_dir=`cd $ac_aux_dir && pwd` test x"${MISSING+set}" = xset || MISSING="\${SHELL} $am_aux_dir/missing" # Use eval to expand $SHELL if eval "$MISSING --run true"; then am_missing_run="$MISSING --run " else am_missing_run= { echo "$as_me:$LINENO: WARNING: \`missing' script is too old or missing" >&5 echo "$as_me: WARNING: \`missing' script is too old or missing" >&2;} fi if mkdir -p --version . >/dev/null 2>&1 && test ! -d ./--version; then # We used to keeping the `.' as first argument, in order to # allow $(mkdir_p) to be used without argument. As in # $(mkdir_p) $(somedir) # where $(somedir) is conditionally defined. However this is wrong # for two reasons: # 1. if the package is installed by a user who cannot write `.' # make install will fail, # 2. the above comment should most certainly read # $(mkdir_p) $(DESTDIR)$(somedir) # so it does not work when $(somedir) is undefined and # $(DESTDIR) is not. # To support the latter case, we have to write # test -z "$(somedir)" || $(mkdir_p) $(DESTDIR)$(somedir), # so the `.' trick is pointless. mkdir_p='mkdir -p --' else # On NextStep and OpenStep, the `mkdir' command does not # recognize any option. It will interpret all options as # directories to create, and then abort because `.' already # exists. for d in ./-p ./--version; do test -d $d && rmdir $d done # $(mkinstalldirs) is defined by Automake if mkinstalldirs exists. if test -f "$ac_aux_dir/mkinstalldirs"; then mkdir_p='$(mkinstalldirs)' else mkdir_p='$(install_sh) -d' fi fi for ac_prog in gawk mawk nawk awk do # Extract the first word of "$ac_prog", so it can be a program name with args. set dummy $ac_prog; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_AWK+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$AWK"; then ac_cv_prog_AWK="$AWK" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_AWK="$ac_prog" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi AWK=$ac_cv_prog_AWK if test -n "$AWK"; then echo "$as_me:$LINENO: result: $AWK" >&5 echo "${ECHO_T}$AWK" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi test -n "$AWK" && break done echo "$as_me:$LINENO: checking whether ${MAKE-make} sets \$(MAKE)" >&5 echo $ECHO_N "checking whether ${MAKE-make} sets \$(MAKE)... $ECHO_C" >&6 set dummy ${MAKE-make}; ac_make=`echo "$2" | sed 'y,:./+-,___p_,'` if eval "test \"\${ac_cv_prog_make_${ac_make}_set+set}\" = set"; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.make <<\_ACEOF all: @echo 'ac_maketemp="$(MAKE)"' _ACEOF # GNU make sometimes prints "make[1]: Entering...", which would confuse us. eval `${MAKE-make} -f conftest.make 2>/dev/null | grep temp=` if test -n "$ac_maketemp"; then eval ac_cv_prog_make_${ac_make}_set=yes else eval ac_cv_prog_make_${ac_make}_set=no fi rm -f conftest.make fi if eval "test \"`echo '$ac_cv_prog_make_'${ac_make}_set`\" = yes"; then echo "$as_me:$LINENO: result: yes" >&5 echo "${ECHO_T}yes" >&6 SET_MAKE= else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 SET_MAKE="MAKE=${MAKE-make}" fi rm -rf .tst 2>/dev/null mkdir .tst 2>/dev/null if test -d .tst; then am__leading_dot=. else am__leading_dot=_ fi rmdir .tst 2>/dev/null # test to see if srcdir already configured if test "`cd $srcdir && pwd`" != "`pwd`" && test -f $srcdir/config.status; then { { echo "$as_me:$LINENO: error: source directory already configured; run \"make distclean\" there first" >&5 echo "$as_me: error: source directory already configured; run \"make distclean\" there first" >&2;} { (exit 1); exit 1; }; } fi # test whether we have cygpath if test -z "$CYGPATH_W"; then if (cygpath --version) >/dev/null 2>/dev/null; then CYGPATH_W='cygpath -w' else CYGPATH_W=echo fi fi # Define the identity of the package. PACKAGE=$PACKAGE VERSION=$VERSION cat >>confdefs.h <<_ACEOF #define PACKAGE "$PACKAGE" _ACEOF cat >>confdefs.h <<_ACEOF #define VERSION "$VERSION" _ACEOF # Some tools Automake needs. ACLOCAL=${ACLOCAL-"${am_missing_run}aclocal-${am__api_version}"} AUTOCONF=${AUTOCONF-"${am_missing_run}autoconf"} AUTOMAKE=${AUTOMAKE-"${am_missing_run}automake-${am__api_version}"} AUTOHEADER=${AUTOHEADER-"${am_missing_run}autoheader"} MAKEINFO=${MAKEINFO-"${am_missing_run}makeinfo"} install_sh=${install_sh-"$am_aux_dir/install-sh"} # Installed binaries are usually stripped using `strip' when the user # run `make install-strip'. However `strip' might not be the right # tool to use in cross-compilation environments, therefore Automake # will honor the `STRIP' environment variable to overrule this program. if test "$cross_compiling" != no; then if test -n "$ac_tool_prefix"; then # Extract the first word of "${ac_tool_prefix}strip", so it can be a program name with args. set dummy ${ac_tool_prefix}strip; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_STRIP+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$STRIP"; then ac_cv_prog_STRIP="$STRIP" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_STRIP="${ac_tool_prefix}strip" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi STRIP=$ac_cv_prog_STRIP if test -n "$STRIP"; then echo "$as_me:$LINENO: result: $STRIP" >&5 echo "${ECHO_T}$STRIP" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi fi if test -z "$ac_cv_prog_STRIP"; then ac_ct_STRIP=$STRIP # Extract the first word of "strip", so it can be a program name with args. set dummy strip; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_ac_ct_STRIP+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$ac_ct_STRIP"; then ac_cv_prog_ac_ct_STRIP="$ac_ct_STRIP" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_STRIP="strip" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done test -z "$ac_cv_prog_ac_ct_STRIP" && ac_cv_prog_ac_ct_STRIP=":" fi fi ac_ct_STRIP=$ac_cv_prog_ac_ct_STRIP if test -n "$ac_ct_STRIP"; then echo "$as_me:$LINENO: result: $ac_ct_STRIP" >&5 echo "${ECHO_T}$ac_ct_STRIP" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi STRIP=$ac_ct_STRIP else STRIP="$ac_cv_prog_STRIP" fi fi INSTALL_STRIP_PROGRAM="\${SHELL} \$(install_sh) -c -s" # We need awk for the "check" target. The system "awk" is bad on # some platforms. # Always define AMTAR for backward compatibility. AMTAR=${AMTAR-"${am_missing_run}tar"} am__tar='${AMTAR} chof - "$$tardir"'; am__untar='${AMTAR} xf -' ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu if test -n "$ac_tool_prefix"; then # Extract the first word of "${ac_tool_prefix}gcc", so it can be a program name with args. set dummy ${ac_tool_prefix}gcc; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_CC="${ac_tool_prefix}gcc" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then echo "$as_me:$LINENO: result: $CC" >&5 echo "${ECHO_T}$CC" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi fi if test -z "$ac_cv_prog_CC"; then ac_ct_CC=$CC # Extract the first word of "gcc", so it can be a program name with args. set dummy gcc; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_ac_ct_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$ac_ct_CC"; then ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_CC="gcc" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi ac_ct_CC=$ac_cv_prog_ac_ct_CC if test -n "$ac_ct_CC"; then echo "$as_me:$LINENO: result: $ac_ct_CC" >&5 echo "${ECHO_T}$ac_ct_CC" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi CC=$ac_ct_CC else CC="$ac_cv_prog_CC" fi if test -z "$CC"; then if test -n "$ac_tool_prefix"; then # Extract the first word of "${ac_tool_prefix}cc", so it can be a program name with args. set dummy ${ac_tool_prefix}cc; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_CC="${ac_tool_prefix}cc" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then echo "$as_me:$LINENO: result: $CC" >&5 echo "${ECHO_T}$CC" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi fi if test -z "$ac_cv_prog_CC"; then ac_ct_CC=$CC # Extract the first word of "cc", so it can be a program name with args. set dummy cc; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_ac_ct_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$ac_ct_CC"; then ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_CC="cc" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi ac_ct_CC=$ac_cv_prog_ac_ct_CC if test -n "$ac_ct_CC"; then echo "$as_me:$LINENO: result: $ac_ct_CC" >&5 echo "${ECHO_T}$ac_ct_CC" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi CC=$ac_ct_CC else CC="$ac_cv_prog_CC" fi fi if test -z "$CC"; then # Extract the first word of "cc", so it can be a program name with args. set dummy cc; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else ac_prog_rejected=no as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then if test "$as_dir/$ac_word$ac_exec_ext" = "/usr/ucb/cc"; then ac_prog_rejected=yes continue fi ac_cv_prog_CC="cc" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done if test $ac_prog_rejected = yes; then # We found a bogon in the path, so make sure we never use it. set dummy $ac_cv_prog_CC shift if test $# != 0; then # We chose a different compiler from the bogus one. # However, it has the same basename, so the bogon will be chosen # first if we set CC to just the basename; use the full file name. shift ac_cv_prog_CC="$as_dir/$ac_word${1+' '}$@" fi fi fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then echo "$as_me:$LINENO: result: $CC" >&5 echo "${ECHO_T}$CC" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi fi if test -z "$CC"; then if test -n "$ac_tool_prefix"; then for ac_prog in cl do # Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args. set dummy $ac_tool_prefix$ac_prog; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_CC="$ac_tool_prefix$ac_prog" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then echo "$as_me:$LINENO: result: $CC" >&5 echo "${ECHO_T}$CC" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi test -n "$CC" && break done fi if test -z "$CC"; then ac_ct_CC=$CC for ac_prog in cl do # Extract the first word of "$ac_prog", so it can be a program name with args. set dummy $ac_prog; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_ac_ct_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$ac_ct_CC"; then ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_CC="$ac_prog" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi ac_ct_CC=$ac_cv_prog_ac_ct_CC if test -n "$ac_ct_CC"; then echo "$as_me:$LINENO: result: $ac_ct_CC" >&5 echo "${ECHO_T}$ac_ct_CC" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi test -n "$ac_ct_CC" && break done CC=$ac_ct_CC fi fi test -z "$CC" && { { echo "$as_me:$LINENO: error: no acceptable C compiler found in \$PATH See \`config.log' for more details." >&5 echo "$as_me: error: no acceptable C compiler found in \$PATH See \`config.log' for more details." >&2;} { (exit 1); exit 1; }; } # Provide some information about the compiler. echo "$as_me:$LINENO:" \ "checking for C compiler version" >&5 ac_compiler=`set X $ac_compile; echo $2` { (eval echo "$as_me:$LINENO: \"$ac_compiler --version &5\"") >&5 (eval $ac_compiler --version &5) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } { (eval echo "$as_me:$LINENO: \"$ac_compiler -v &5\"") >&5 (eval $ac_compiler -v &5) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } { (eval echo "$as_me:$LINENO: \"$ac_compiler -V &5\"") >&5 (eval $ac_compiler -V &5) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { ; return 0; } _ACEOF ac_clean_files_save=$ac_clean_files ac_clean_files="$ac_clean_files a.out a.exe b.out" # Try to create an executable without -o first, disregard a.out. # It will help us diagnose broken compilers, and finding out an intuition # of exeext. echo "$as_me:$LINENO: checking for C compiler default output file name" >&5 echo $ECHO_N "checking for C compiler default output file name... $ECHO_C" >&6 ac_link_default=`echo "$ac_link" | sed 's/ -o *conftest[^ ]*//'` if { (eval echo "$as_me:$LINENO: \"$ac_link_default\"") >&5 (eval $ac_link_default) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; then # Find the output, starting from the most likely. This scheme is # not robust to junk in `.', hence go to wildcards (a.*) only as a last # resort. # Be careful to initialize this variable, since it used to be cached. # Otherwise an old cache value of `no' led to `EXEEXT = no' in a Makefile. ac_cv_exeext= # b.out is created by i960 compilers. for ac_file in a_out.exe a.exe conftest.exe a.out conftest a.* conftest.* b.out do test -f "$ac_file" || continue case $ac_file in *.$ac_ext | *.xcoff | *.tds | *.d | *.pdb | *.xSYM | *.bb | *.bbg | *.o | *.obj ) ;; conftest.$ac_ext ) # This is the source file. ;; [ab].out ) # We found the default executable, but exeext='' is most # certainly right. break;; *.* ) ac_cv_exeext=`expr "$ac_file" : '[^.]*\(\..*\)'` # FIXME: I believe we export ac_cv_exeext for Libtool, # but it would be cool to find out if it's true. Does anybody # maintain Libtool? --akim. export ac_cv_exeext break;; * ) break;; esac done else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 { { echo "$as_me:$LINENO: error: C compiler cannot create executables See \`config.log' for more details." >&5 echo "$as_me: error: C compiler cannot create executables See \`config.log' for more details." >&2;} { (exit 77); exit 77; }; } fi ac_exeext=$ac_cv_exeext echo "$as_me:$LINENO: result: $ac_file" >&5 echo "${ECHO_T}$ac_file" >&6 # Check the compiler produces executables we can run. If not, either # the compiler is broken, or we cross compile. echo "$as_me:$LINENO: checking whether the C compiler works" >&5 echo $ECHO_N "checking whether the C compiler works... $ECHO_C" >&6 # FIXME: These cross compiler hacks should be removed for Autoconf 3.0 # If not cross compiling, check that we can run a simple program. if test "$cross_compiling" != yes; then if { ac_try='./$ac_file' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then cross_compiling=no else if test "$cross_compiling" = maybe; then cross_compiling=yes else { { echo "$as_me:$LINENO: error: cannot run C compiled programs. If you meant to cross compile, use \`--host'. See \`config.log' for more details." >&5 echo "$as_me: error: cannot run C compiled programs. If you meant to cross compile, use \`--host'. See \`config.log' for more details." >&2;} { (exit 1); exit 1; }; } fi fi fi echo "$as_me:$LINENO: result: yes" >&5 echo "${ECHO_T}yes" >&6 rm -f a.out a.exe conftest$ac_cv_exeext b.out ac_clean_files=$ac_clean_files_save # Check the compiler produces executables we can run. If not, either # the compiler is broken, or we cross compile. echo "$as_me:$LINENO: checking whether we are cross compiling" >&5 echo $ECHO_N "checking whether we are cross compiling... $ECHO_C" >&6 echo "$as_me:$LINENO: result: $cross_compiling" >&5 echo "${ECHO_T}$cross_compiling" >&6 echo "$as_me:$LINENO: checking for suffix of executables" >&5 echo $ECHO_N "checking for suffix of executables... $ECHO_C" >&6 if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; then # If both `conftest.exe' and `conftest' are `present' (well, observable) # catch `conftest.exe'. For instance with Cygwin, `ls conftest' will # work properly (i.e., refer to `conftest.exe'), while it won't with # `rm'. for ac_file in conftest.exe conftest conftest.*; do test -f "$ac_file" || continue case $ac_file in *.$ac_ext | *.xcoff | *.tds | *.d | *.pdb | *.xSYM | *.bb | *.bbg | *.o | *.obj ) ;; *.* ) ac_cv_exeext=`expr "$ac_file" : '[^.]*\(\..*\)'` export ac_cv_exeext break;; * ) break;; esac done else { { echo "$as_me:$LINENO: error: cannot compute suffix of executables: cannot compile and link See \`config.log' for more details." >&5 echo "$as_me: error: cannot compute suffix of executables: cannot compile and link See \`config.log' for more details." >&2;} { (exit 1); exit 1; }; } fi rm -f conftest$ac_cv_exeext echo "$as_me:$LINENO: result: $ac_cv_exeext" >&5 echo "${ECHO_T}$ac_cv_exeext" >&6 rm -f conftest.$ac_ext EXEEXT=$ac_cv_exeext ac_exeext=$EXEEXT echo "$as_me:$LINENO: checking for suffix of object files" >&5 echo $ECHO_N "checking for suffix of object files... $ECHO_C" >&6 if test "${ac_cv_objext+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { ; return 0; } _ACEOF rm -f conftest.o conftest.obj if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; then for ac_file in `(ls conftest.o conftest.obj; ls conftest.*) 2>/dev/null`; do case $ac_file in *.$ac_ext | *.xcoff | *.tds | *.d | *.pdb | *.xSYM | *.bb | *.bbg ) ;; *) ac_cv_objext=`expr "$ac_file" : '.*\.\(.*\)'` break;; esac done else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 { { echo "$as_me:$LINENO: error: cannot compute suffix of object files: cannot compile See \`config.log' for more details." >&5 echo "$as_me: error: cannot compute suffix of object files: cannot compile See \`config.log' for more details." >&2;} { (exit 1); exit 1; }; } fi rm -f conftest.$ac_cv_objext conftest.$ac_ext fi echo "$as_me:$LINENO: result: $ac_cv_objext" >&5 echo "${ECHO_T}$ac_cv_objext" >&6 OBJEXT=$ac_cv_objext ac_objext=$OBJEXT echo "$as_me:$LINENO: checking whether we are using the GNU C compiler" >&5 echo $ECHO_N "checking whether we are using the GNU C compiler... $ECHO_C" >&6 if test "${ac_cv_c_compiler_gnu+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { #ifndef __GNUC__ choke me #endif ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_compiler_gnu=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_compiler_gnu=no fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext ac_cv_c_compiler_gnu=$ac_compiler_gnu fi echo "$as_me:$LINENO: result: $ac_cv_c_compiler_gnu" >&5 echo "${ECHO_T}$ac_cv_c_compiler_gnu" >&6 GCC=`test $ac_compiler_gnu = yes && echo yes` ac_test_CFLAGS=${CFLAGS+set} ac_save_CFLAGS=$CFLAGS CFLAGS="-g" echo "$as_me:$LINENO: checking whether $CC accepts -g" >&5 echo $ECHO_N "checking whether $CC accepts -g... $ECHO_C" >&6 if test "${ac_cv_prog_cc_g+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_prog_cc_g=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_prog_cc_g=no fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext fi echo "$as_me:$LINENO: result: $ac_cv_prog_cc_g" >&5 echo "${ECHO_T}$ac_cv_prog_cc_g" >&6 if test "$ac_test_CFLAGS" = set; then CFLAGS=$ac_save_CFLAGS elif test $ac_cv_prog_cc_g = yes; then if test "$GCC" = yes; then CFLAGS="-g -O2" else CFLAGS="-g" fi else if test "$GCC" = yes; then CFLAGS="-O2" else CFLAGS= fi fi echo "$as_me:$LINENO: checking for $CC option to accept ANSI C" >&5 echo $ECHO_N "checking for $CC option to accept ANSI C... $ECHO_C" >&6 if test "${ac_cv_prog_cc_stdc+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else ac_cv_prog_cc_stdc=no ac_save_CC=$CC cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include #include #include #include /* Most of the following tests are stolen from RCS 5.7's src/conf.sh. */ struct buf { int x; }; FILE * (*rcsopen) (struct buf *, struct stat *, int); static char *e (p, i) char **p; int i; { return p[i]; } static char *f (char * (*g) (char **, int), char **p, ...) { char *s; va_list v; va_start (v,p); s = g (p, va_arg (v,int)); va_end (v); return s; } /* OSF 4.0 Compaq cc is some sort of almost-ANSI by default. It has function prototypes and stuff, but not '\xHH' hex character constants. These don't provoke an error unfortunately, instead are silently treated as 'x'. The following induces an error, until -std1 is added to get proper ANSI mode. Curiously '\x00'!='x' always comes out true, for an array size at least. It's necessary to write '\x00'==0 to get something that's true only with -std1. */ int osf4_cc_array ['\x00' == 0 ? 1 : -1]; int test (int i, double x); struct s1 {int (*f) (int a);}; struct s2 {int (*f) (double a);}; int pairnames (int, char **, FILE *(*)(struct buf *, struct stat *, int), int, int); int argc; char **argv; int main () { return f (e, argv, 0) != argv[0] || f (e, argv, 1) != argv[1]; ; return 0; } _ACEOF # Don't try gcc -ansi; that turns off useful extensions and # breaks some systems' header files. # AIX -qlanglvl=ansi # Ultrix and OSF/1 -std1 # HP-UX 10.20 and later -Ae # HP-UX older versions -Aa -D_HPUX_SOURCE # SVR4 -Xc -D__EXTENSIONS__ for ac_arg in "" -qlanglvl=ansi -std1 -Ae "-Aa -D_HPUX_SOURCE" "-Xc -D__EXTENSIONS__" do CC="$ac_save_CC $ac_arg" rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_prog_cc_stdc=$ac_arg break else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext done rm -f conftest.$ac_ext conftest.$ac_objext CC=$ac_save_CC fi case "x$ac_cv_prog_cc_stdc" in x|xno) echo "$as_me:$LINENO: result: none needed" >&5 echo "${ECHO_T}none needed" >&6 ;; *) echo "$as_me:$LINENO: result: $ac_cv_prog_cc_stdc" >&5 echo "${ECHO_T}$ac_cv_prog_cc_stdc" >&6 CC="$CC $ac_cv_prog_cc_stdc" ;; esac # Some people use a C++ compiler to compile C. Since we use `exit', # in C++ we need to declare it. In case someone uses the same compiler # for both compiling C and C++ we need to have the C++ compiler decide # the declaration of exit, since it's the most demanding environment. cat >conftest.$ac_ext <<_ACEOF #ifndef __cplusplus choke me #endif _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then for ac_declaration in \ '' \ 'extern "C" void std::exit (int) throw (); using std::exit;' \ 'extern "C" void std::exit (int); using std::exit;' \ 'extern "C" void exit (int) throw ();' \ 'extern "C" void exit (int);' \ 'void exit (int);' do cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ $ac_declaration #include int main () { exit (42); ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then : else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 continue fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ $ac_declaration int main () { exit (42); ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then break else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext done rm -f conftest* if test -n "$ac_declaration"; then echo '#ifdef __cplusplus' >>confdefs.h echo $ac_declaration >>confdefs.h echo '#endif' >>confdefs.h fi else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu DEPDIR="${am__leading_dot}deps" ac_config_commands="$ac_config_commands depfiles" am_make=${MAKE-make} cat > confinc << 'END' am__doit: @echo done .PHONY: am__doit END # If we don't find an include directive, just comment out the code. echo "$as_me:$LINENO: checking for style of include used by $am_make" >&5 echo $ECHO_N "checking for style of include used by $am_make... $ECHO_C" >&6 am__include="#" am__quote= _am_result=none # First try GNU make style include. echo "include confinc" > confmf # We grep out `Entering directory' and `Leaving directory' # messages which can occur if `w' ends up in MAKEFLAGS. # In particular we don't look at `^make:' because GNU make might # be invoked under some other name (usually "gmake"), in which # case it prints its new name instead of `make'. if test "`$am_make -s -f confmf 2> /dev/null | grep -v 'ing directory'`" = "done"; then am__include=include am__quote= _am_result=GNU fi # Now try BSD make style include. if test "$am__include" = "#"; then echo '.include "confinc"' > confmf if test "`$am_make -s -f confmf 2> /dev/null`" = "done"; then am__include=.include am__quote="\"" _am_result=BSD fi fi echo "$as_me:$LINENO: result: $_am_result" >&5 echo "${ECHO_T}$_am_result" >&6 rm -f confinc confmf # Check whether --enable-dependency-tracking or --disable-dependency-tracking was given. if test "${enable_dependency_tracking+set}" = set; then enableval="$enable_dependency_tracking" fi; if test "x$enable_dependency_tracking" != xno; then am_depcomp="$ac_aux_dir/depcomp" AMDEPBACKSLASH='\' fi if test "x$enable_dependency_tracking" != xno; then AMDEP_TRUE= AMDEP_FALSE='#' else AMDEP_TRUE='#' AMDEP_FALSE= fi depcc="$CC" am_compiler_list= echo "$as_me:$LINENO: checking dependency style of $depcc" >&5 echo $ECHO_N "checking dependency style of $depcc... $ECHO_C" >&6 if test "${am_cv_CC_dependencies_compiler_type+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -z "$AMDEP_TRUE" && test -f "$am_depcomp"; then # We make a subdir and do the tests there. Otherwise we can end up # making bogus files that we don't know about and never remove. For # instance it was reported that on HP-UX the gcc test will end up # making a dummy file named `D' -- because `-MD' means `put the output # in D'. mkdir conftest.dir # Copy depcomp to subdir because otherwise we won't find it if we're # using a relative directory. cp "$am_depcomp" conftest.dir cd conftest.dir # We will build objects and dependencies in a subdirectory because # it helps to detect inapplicable dependency modes. For instance # both Tru64's cc and ICC support -MD to output dependencies as a # side effect of compilation, but ICC will put the dependencies in # the current directory while Tru64 will put them in the object # directory. mkdir sub am_cv_CC_dependencies_compiler_type=none if test "$am_compiler_list" = ""; then am_compiler_list=`sed -n 's/^#*\([a-zA-Z0-9]*\))$/\1/p' < ./depcomp` fi for depmode in $am_compiler_list; do # Setup a source with many dependencies, because some compilers # like to wrap large dependency lists on column 80 (with \), and # we should not choose a depcomp mode which is confused by this. # # We need to recreate these files for each test, as the compiler may # overwrite some of them when testing with obscure command lines. # This happens at least with the AIX C compiler. : > sub/conftest.c for i in 1 2 3 4 5 6; do echo '#include "conftst'$i'.h"' >> sub/conftest.c # Using `: > sub/conftst$i.h' creates only sub/conftst1.h with # Solaris 8's {/usr,}/bin/sh. touch sub/conftst$i.h done echo "${am__include} ${am__quote}sub/conftest.Po${am__quote}" > confmf case $depmode in nosideeffect) # after this tag, mechanisms are not by side-effect, so they'll # only be used when explicitly requested if test "x$enable_dependency_tracking" = xyes; then continue else break fi ;; none) break ;; esac # We check with `-c' and `-o' for the sake of the "dashmstdout" # mode. It turns out that the SunPro C++ compiler does not properly # handle `-M -o', and we need to detect this. if depmode=$depmode \ source=sub/conftest.c object=sub/conftest.${OBJEXT-o} \ depfile=sub/conftest.Po tmpdepfile=sub/conftest.TPo \ $SHELL ./depcomp $depcc -c -o sub/conftest.${OBJEXT-o} sub/conftest.c \ >/dev/null 2>conftest.err && grep sub/conftst6.h sub/conftest.Po > /dev/null 2>&1 && grep sub/conftest.${OBJEXT-o} sub/conftest.Po > /dev/null 2>&1 && ${MAKE-make} -s -f confmf > /dev/null 2>&1; then # icc doesn't choke on unknown options, it will just issue warnings # or remarks (even with -Werror). So we grep stderr for any message # that says an option was ignored or not supported. # When given -MP, icc 7.0 and 7.1 complain thusly: # icc: Command line warning: ignoring option '-M'; no argument required # The diagnosis changed in icc 8.0: # icc: Command line remark: option '-MP' not supported if (grep 'ignoring option' conftest.err || grep 'not supported' conftest.err) >/dev/null 2>&1; then :; else am_cv_CC_dependencies_compiler_type=$depmode break fi fi done cd .. rm -rf conftest.dir else am_cv_CC_dependencies_compiler_type=none fi fi echo "$as_me:$LINENO: result: $am_cv_CC_dependencies_compiler_type" >&5 echo "${ECHO_T}$am_cv_CC_dependencies_compiler_type" >&6 CCDEPMODE=depmode=$am_cv_CC_dependencies_compiler_type if test "x$enable_dependency_tracking" != xno \ && test "$am_cv_CC_dependencies_compiler_type" = gcc3; then am__fastdepCC_TRUE= am__fastdepCC_FALSE='#' else am__fastdepCC_TRUE='#' am__fastdepCC_FALSE= fi ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu if test -n "$ac_tool_prefix"; then # Extract the first word of "${ac_tool_prefix}gcc", so it can be a program name with args. set dummy ${ac_tool_prefix}gcc; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_CC="${ac_tool_prefix}gcc" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then echo "$as_me:$LINENO: result: $CC" >&5 echo "${ECHO_T}$CC" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi fi if test -z "$ac_cv_prog_CC"; then ac_ct_CC=$CC # Extract the first word of "gcc", so it can be a program name with args. set dummy gcc; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_ac_ct_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$ac_ct_CC"; then ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_CC="gcc" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi ac_ct_CC=$ac_cv_prog_ac_ct_CC if test -n "$ac_ct_CC"; then echo "$as_me:$LINENO: result: $ac_ct_CC" >&5 echo "${ECHO_T}$ac_ct_CC" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi CC=$ac_ct_CC else CC="$ac_cv_prog_CC" fi if test -z "$CC"; then if test -n "$ac_tool_prefix"; then # Extract the first word of "${ac_tool_prefix}cc", so it can be a program name with args. set dummy ${ac_tool_prefix}cc; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_CC="${ac_tool_prefix}cc" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then echo "$as_me:$LINENO: result: $CC" >&5 echo "${ECHO_T}$CC" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi fi if test -z "$ac_cv_prog_CC"; then ac_ct_CC=$CC # Extract the first word of "cc", so it can be a program name with args. set dummy cc; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_ac_ct_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$ac_ct_CC"; then ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_CC="cc" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi ac_ct_CC=$ac_cv_prog_ac_ct_CC if test -n "$ac_ct_CC"; then echo "$as_me:$LINENO: result: $ac_ct_CC" >&5 echo "${ECHO_T}$ac_ct_CC" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi CC=$ac_ct_CC else CC="$ac_cv_prog_CC" fi fi if test -z "$CC"; then # Extract the first word of "cc", so it can be a program name with args. set dummy cc; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else ac_prog_rejected=no as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then if test "$as_dir/$ac_word$ac_exec_ext" = "/usr/ucb/cc"; then ac_prog_rejected=yes continue fi ac_cv_prog_CC="cc" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done if test $ac_prog_rejected = yes; then # We found a bogon in the path, so make sure we never use it. set dummy $ac_cv_prog_CC shift if test $# != 0; then # We chose a different compiler from the bogus one. # However, it has the same basename, so the bogon will be chosen # first if we set CC to just the basename; use the full file name. shift ac_cv_prog_CC="$as_dir/$ac_word${1+' '}$@" fi fi fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then echo "$as_me:$LINENO: result: $CC" >&5 echo "${ECHO_T}$CC" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi fi if test -z "$CC"; then if test -n "$ac_tool_prefix"; then for ac_prog in cl do # Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args. set dummy $ac_tool_prefix$ac_prog; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_CC="$ac_tool_prefix$ac_prog" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then echo "$as_me:$LINENO: result: $CC" >&5 echo "${ECHO_T}$CC" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi test -n "$CC" && break done fi if test -z "$CC"; then ac_ct_CC=$CC for ac_prog in cl do # Extract the first word of "$ac_prog", so it can be a program name with args. set dummy $ac_prog; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_ac_ct_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$ac_ct_CC"; then ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_CC="$ac_prog" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi ac_ct_CC=$ac_cv_prog_ac_ct_CC if test -n "$ac_ct_CC"; then echo "$as_me:$LINENO: result: $ac_ct_CC" >&5 echo "${ECHO_T}$ac_ct_CC" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi test -n "$ac_ct_CC" && break done CC=$ac_ct_CC fi fi test -z "$CC" && { { echo "$as_me:$LINENO: error: no acceptable C compiler found in \$PATH See \`config.log' for more details." >&5 echo "$as_me: error: no acceptable C compiler found in \$PATH See \`config.log' for more details." >&2;} { (exit 1); exit 1; }; } # Provide some information about the compiler. echo "$as_me:$LINENO:" \ "checking for C compiler version" >&5 ac_compiler=`set X $ac_compile; echo $2` { (eval echo "$as_me:$LINENO: \"$ac_compiler --version &5\"") >&5 (eval $ac_compiler --version &5) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } { (eval echo "$as_me:$LINENO: \"$ac_compiler -v &5\"") >&5 (eval $ac_compiler -v &5) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } { (eval echo "$as_me:$LINENO: \"$ac_compiler -V &5\"") >&5 (eval $ac_compiler -V &5) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } echo "$as_me:$LINENO: checking whether we are using the GNU C compiler" >&5 echo $ECHO_N "checking whether we are using the GNU C compiler... $ECHO_C" >&6 if test "${ac_cv_c_compiler_gnu+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { #ifndef __GNUC__ choke me #endif ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_compiler_gnu=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_compiler_gnu=no fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext ac_cv_c_compiler_gnu=$ac_compiler_gnu fi echo "$as_me:$LINENO: result: $ac_cv_c_compiler_gnu" >&5 echo "${ECHO_T}$ac_cv_c_compiler_gnu" >&6 GCC=`test $ac_compiler_gnu = yes && echo yes` ac_test_CFLAGS=${CFLAGS+set} ac_save_CFLAGS=$CFLAGS CFLAGS="-g" echo "$as_me:$LINENO: checking whether $CC accepts -g" >&5 echo $ECHO_N "checking whether $CC accepts -g... $ECHO_C" >&6 if test "${ac_cv_prog_cc_g+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_prog_cc_g=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_prog_cc_g=no fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext fi echo "$as_me:$LINENO: result: $ac_cv_prog_cc_g" >&5 echo "${ECHO_T}$ac_cv_prog_cc_g" >&6 if test "$ac_test_CFLAGS" = set; then CFLAGS=$ac_save_CFLAGS elif test $ac_cv_prog_cc_g = yes; then if test "$GCC" = yes; then CFLAGS="-g -O2" else CFLAGS="-g" fi else if test "$GCC" = yes; then CFLAGS="-O2" else CFLAGS= fi fi echo "$as_me:$LINENO: checking for $CC option to accept ANSI C" >&5 echo $ECHO_N "checking for $CC option to accept ANSI C... $ECHO_C" >&6 if test "${ac_cv_prog_cc_stdc+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else ac_cv_prog_cc_stdc=no ac_save_CC=$CC cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include #include #include #include /* Most of the following tests are stolen from RCS 5.7's src/conf.sh. */ struct buf { int x; }; FILE * (*rcsopen) (struct buf *, struct stat *, int); static char *e (p, i) char **p; int i; { return p[i]; } static char *f (char * (*g) (char **, int), char **p, ...) { char *s; va_list v; va_start (v,p); s = g (p, va_arg (v,int)); va_end (v); return s; } /* OSF 4.0 Compaq cc is some sort of almost-ANSI by default. It has function prototypes and stuff, but not '\xHH' hex character constants. These don't provoke an error unfortunately, instead are silently treated as 'x'. The following induces an error, until -std1 is added to get proper ANSI mode. Curiously '\x00'!='x' always comes out true, for an array size at least. It's necessary to write '\x00'==0 to get something that's true only with -std1. */ int osf4_cc_array ['\x00' == 0 ? 1 : -1]; int test (int i, double x); struct s1 {int (*f) (int a);}; struct s2 {int (*f) (double a);}; int pairnames (int, char **, FILE *(*)(struct buf *, struct stat *, int), int, int); int argc; char **argv; int main () { return f (e, argv, 0) != argv[0] || f (e, argv, 1) != argv[1]; ; return 0; } _ACEOF # Don't try gcc -ansi; that turns off useful extensions and # breaks some systems' header files. # AIX -qlanglvl=ansi # Ultrix and OSF/1 -std1 # HP-UX 10.20 and later -Ae # HP-UX older versions -Aa -D_HPUX_SOURCE # SVR4 -Xc -D__EXTENSIONS__ for ac_arg in "" -qlanglvl=ansi -std1 -Ae "-Aa -D_HPUX_SOURCE" "-Xc -D__EXTENSIONS__" do CC="$ac_save_CC $ac_arg" rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_prog_cc_stdc=$ac_arg break else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext done rm -f conftest.$ac_ext conftest.$ac_objext CC=$ac_save_CC fi case "x$ac_cv_prog_cc_stdc" in x|xno) echo "$as_me:$LINENO: result: none needed" >&5 echo "${ECHO_T}none needed" >&6 ;; *) echo "$as_me:$LINENO: result: $ac_cv_prog_cc_stdc" >&5 echo "${ECHO_T}$ac_cv_prog_cc_stdc" >&6 CC="$CC $ac_cv_prog_cc_stdc" ;; esac # Some people use a C++ compiler to compile C. Since we use `exit', # in C++ we need to declare it. In case someone uses the same compiler # for both compiling C and C++ we need to have the C++ compiler decide # the declaration of exit, since it's the most demanding environment. cat >conftest.$ac_ext <<_ACEOF #ifndef __cplusplus choke me #endif _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then for ac_declaration in \ '' \ 'extern "C" void std::exit (int) throw (); using std::exit;' \ 'extern "C" void std::exit (int); using std::exit;' \ 'extern "C" void exit (int) throw ();' \ 'extern "C" void exit (int);' \ 'void exit (int);' do cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ $ac_declaration #include int main () { exit (42); ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then : else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 continue fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ $ac_declaration int main () { exit (42); ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then break else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext done rm -f conftest* if test -n "$ac_declaration"; then echo '#ifdef __cplusplus' >>confdefs.h echo $ac_declaration >>confdefs.h echo '#endif' >>confdefs.h fi else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu depcc="$CC" am_compiler_list= echo "$as_me:$LINENO: checking dependency style of $depcc" >&5 echo $ECHO_N "checking dependency style of $depcc... $ECHO_C" >&6 if test "${am_cv_CC_dependencies_compiler_type+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -z "$AMDEP_TRUE" && test -f "$am_depcomp"; then # We make a subdir and do the tests there. Otherwise we can end up # making bogus files that we don't know about and never remove. For # instance it was reported that on HP-UX the gcc test will end up # making a dummy file named `D' -- because `-MD' means `put the output # in D'. mkdir conftest.dir # Copy depcomp to subdir because otherwise we won't find it if we're # using a relative directory. cp "$am_depcomp" conftest.dir cd conftest.dir # We will build objects and dependencies in a subdirectory because # it helps to detect inapplicable dependency modes. For instance # both Tru64's cc and ICC support -MD to output dependencies as a # side effect of compilation, but ICC will put the dependencies in # the current directory while Tru64 will put them in the object # directory. mkdir sub am_cv_CC_dependencies_compiler_type=none if test "$am_compiler_list" = ""; then am_compiler_list=`sed -n 's/^#*\([a-zA-Z0-9]*\))$/\1/p' < ./depcomp` fi for depmode in $am_compiler_list; do # Setup a source with many dependencies, because some compilers # like to wrap large dependency lists on column 80 (with \), and # we should not choose a depcomp mode which is confused by this. # # We need to recreate these files for each test, as the compiler may # overwrite some of them when testing with obscure command lines. # This happens at least with the AIX C compiler. : > sub/conftest.c for i in 1 2 3 4 5 6; do echo '#include "conftst'$i'.h"' >> sub/conftest.c # Using `: > sub/conftst$i.h' creates only sub/conftst1.h with # Solaris 8's {/usr,}/bin/sh. touch sub/conftst$i.h done echo "${am__include} ${am__quote}sub/conftest.Po${am__quote}" > confmf case $depmode in nosideeffect) # after this tag, mechanisms are not by side-effect, so they'll # only be used when explicitly requested if test "x$enable_dependency_tracking" = xyes; then continue else break fi ;; none) break ;; esac # We check with `-c' and `-o' for the sake of the "dashmstdout" # mode. It turns out that the SunPro C++ compiler does not properly # handle `-M -o', and we need to detect this. if depmode=$depmode \ source=sub/conftest.c object=sub/conftest.${OBJEXT-o} \ depfile=sub/conftest.Po tmpdepfile=sub/conftest.TPo \ $SHELL ./depcomp $depcc -c -o sub/conftest.${OBJEXT-o} sub/conftest.c \ >/dev/null 2>conftest.err && grep sub/conftst6.h sub/conftest.Po > /dev/null 2>&1 && grep sub/conftest.${OBJEXT-o} sub/conftest.Po > /dev/null 2>&1 && ${MAKE-make} -s -f confmf > /dev/null 2>&1; then # icc doesn't choke on unknown options, it will just issue warnings # or remarks (even with -Werror). So we grep stderr for any message # that says an option was ignored or not supported. # When given -MP, icc 7.0 and 7.1 complain thusly: # icc: Command line warning: ignoring option '-M'; no argument required # The diagnosis changed in icc 8.0: # icc: Command line remark: option '-MP' not supported if (grep 'ignoring option' conftest.err || grep 'not supported' conftest.err) >/dev/null 2>&1; then :; else am_cv_CC_dependencies_compiler_type=$depmode break fi fi done cd .. rm -rf conftest.dir else am_cv_CC_dependencies_compiler_type=none fi fi echo "$as_me:$LINENO: result: $am_cv_CC_dependencies_compiler_type" >&5 echo "${ECHO_T}$am_cv_CC_dependencies_compiler_type" >&6 CCDEPMODE=depmode=$am_cv_CC_dependencies_compiler_type if test "x$enable_dependency_tracking" != xno \ && test "$am_cv_CC_dependencies_compiler_type" = gcc3; then am__fastdepCC_TRUE= am__fastdepCC_FALSE='#' else am__fastdepCC_TRUE='#' am__fastdepCC_FALSE= fi am_cv_prog_cc_stdc=$ac_cv_prog_cc_stdc echo "$as_me:$LINENO: checking for an ANSI C-conforming const" >&5 echo $ECHO_N "checking for an ANSI C-conforming const... $ECHO_C" >&6 if test "${ac_cv_c_const+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { /* FIXME: Include the comments suggested by Paul. */ #ifndef __cplusplus /* Ultrix mips cc rejects this. */ typedef int charset[2]; const charset x; /* SunOS 4.1.1 cc rejects this. */ char const *const *ccp; char **p; /* NEC SVR4.0.2 mips cc rejects this. */ struct point {int x, y;}; static struct point const zero = {0,0}; /* AIX XL C 1.02.0.0 rejects this. It does not let you subtract one const X* pointer from another in an arm of an if-expression whose if-part is not a constant expression */ const char *g = "string"; ccp = &g + (g ? g-g : 0); /* HPUX 7.0 cc rejects these. */ ++ccp; p = (char**) ccp; ccp = (char const *const *) p; { /* SCO 3.2v4 cc rejects this. */ char *t; char const *s = 0 ? (char *) 0 : (char const *) 0; *t++ = 0; } { /* Someone thinks the Sun supposedly-ANSI compiler will reject this. */ int x[] = {25, 17}; const int *foo = &x[0]; ++foo; } { /* Sun SC1.0 ANSI compiler rejects this -- but not the above. */ typedef const int *iptr; iptr p = 0; ++p; } { /* AIX XL C 1.02.0.0 rejects this saying "k.c", line 2.27: 1506-025 (S) Operand must be a modifiable lvalue. */ struct s { int j; const int *ap[3]; }; struct s *b; b->j = 5; } { /* ULTRIX-32 V3.1 (Rev 9) vcc rejects this */ const int foo = 10; } #endif ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_c_const=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_c_const=no fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext fi echo "$as_me:$LINENO: result: $ac_cv_c_const" >&5 echo "${ECHO_T}$ac_cv_c_const" >&6 if test $ac_cv_c_const = no; then cat >>confdefs.h <<\_ACEOF #define const _ACEOF fi # Check whether --enable-shared or --disable-shared was given. if test "${enable_shared+set}" = set; then enableval="$enable_shared" p=${PACKAGE-default} case $enableval in yes) enable_shared=yes ;; no) enable_shared=no ;; *) enable_shared=no # Look at the argument we got. We use all the common list separators. lt_save_ifs="$IFS"; IFS="${IFS}$PATH_SEPARATOR," for pkg in $enableval; do IFS="$lt_save_ifs" if test "X$pkg" = "X$p"; then enable_shared=yes fi done IFS="$lt_save_ifs" ;; esac else enable_shared=yes fi; # Check whether --enable-static or --disable-static was given. if test "${enable_static+set}" = set; then enableval="$enable_static" p=${PACKAGE-default} case $enableval in yes) enable_static=yes ;; no) enable_static=no ;; *) enable_static=no # Look at the argument we got. We use all the common list separators. lt_save_ifs="$IFS"; IFS="${IFS}$PATH_SEPARATOR," for pkg in $enableval; do IFS="$lt_save_ifs" if test "X$pkg" = "X$p"; then enable_static=yes fi done IFS="$lt_save_ifs" ;; esac else enable_static=yes fi; # Check whether --enable-fast-install or --disable-fast-install was given. if test "${enable_fast_install+set}" = set; then enableval="$enable_fast_install" p=${PACKAGE-default} case $enableval in yes) enable_fast_install=yes ;; no) enable_fast_install=no ;; *) enable_fast_install=no # Look at the argument we got. We use all the common list separators. lt_save_ifs="$IFS"; IFS="${IFS}$PATH_SEPARATOR," for pkg in $enableval; do IFS="$lt_save_ifs" if test "X$pkg" = "X$p"; then enable_fast_install=yes fi done IFS="$lt_save_ifs" ;; esac else enable_fast_install=yes fi; # Make sure we can run config.sub. $ac_config_sub sun4 >/dev/null 2>&1 || { { echo "$as_me:$LINENO: error: cannot run $ac_config_sub" >&5 echo "$as_me: error: cannot run $ac_config_sub" >&2;} { (exit 1); exit 1; }; } echo "$as_me:$LINENO: checking build system type" >&5 echo $ECHO_N "checking build system type... $ECHO_C" >&6 if test "${ac_cv_build+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else ac_cv_build_alias=$build_alias test -z "$ac_cv_build_alias" && ac_cv_build_alias=`$ac_config_guess` test -z "$ac_cv_build_alias" && { { echo "$as_me:$LINENO: error: cannot guess build type; you must specify one" >&5 echo "$as_me: error: cannot guess build type; you must specify one" >&2;} { (exit 1); exit 1; }; } ac_cv_build=`$ac_config_sub $ac_cv_build_alias` || { { echo "$as_me:$LINENO: error: $ac_config_sub $ac_cv_build_alias failed" >&5 echo "$as_me: error: $ac_config_sub $ac_cv_build_alias failed" >&2;} { (exit 1); exit 1; }; } fi echo "$as_me:$LINENO: result: $ac_cv_build" >&5 echo "${ECHO_T}$ac_cv_build" >&6 build=$ac_cv_build build_cpu=`echo $ac_cv_build | sed 's/^\([^-]*\)-\([^-]*\)-\(.*\)$/\1/'` build_vendor=`echo $ac_cv_build | sed 's/^\([^-]*\)-\([^-]*\)-\(.*\)$/\2/'` build_os=`echo $ac_cv_build | sed 's/^\([^-]*\)-\([^-]*\)-\(.*\)$/\3/'` echo "$as_me:$LINENO: checking host system type" >&5 echo $ECHO_N "checking host system type... $ECHO_C" >&6 if test "${ac_cv_host+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else ac_cv_host_alias=$host_alias test -z "$ac_cv_host_alias" && ac_cv_host_alias=$ac_cv_build_alias ac_cv_host=`$ac_config_sub $ac_cv_host_alias` || { { echo "$as_me:$LINENO: error: $ac_config_sub $ac_cv_host_alias failed" >&5 echo "$as_me: error: $ac_config_sub $ac_cv_host_alias failed" >&2;} { (exit 1); exit 1; }; } fi echo "$as_me:$LINENO: result: $ac_cv_host" >&5 echo "${ECHO_T}$ac_cv_host" >&6 host=$ac_cv_host host_cpu=`echo $ac_cv_host | sed 's/^\([^-]*\)-\([^-]*\)-\(.*\)$/\1/'` host_vendor=`echo $ac_cv_host | sed 's/^\([^-]*\)-\([^-]*\)-\(.*\)$/\2/'` host_os=`echo $ac_cv_host | sed 's/^\([^-]*\)-\([^-]*\)-\(.*\)$/\3/'` echo "$as_me:$LINENO: checking for a sed that does not truncate output" >&5 echo $ECHO_N "checking for a sed that does not truncate output... $ECHO_C" >&6 if test "${lt_cv_path_SED+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else # Loop through the user's path and test for sed and gsed. # Then use that list of sed's as ones to test for truncation. as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for lt_ac_prog in sed gsed; do for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$lt_ac_prog$ac_exec_ext"; then lt_ac_sed_list="$lt_ac_sed_list $as_dir/$lt_ac_prog$ac_exec_ext" fi done done done lt_ac_max=0 lt_ac_count=0 # Add /usr/xpg4/bin/sed as it is typically found on Solaris # along with /bin/sed that truncates output. for lt_ac_sed in $lt_ac_sed_list /usr/xpg4/bin/sed; do test ! -f $lt_ac_sed && continue cat /dev/null > conftest.in lt_ac_count=0 echo $ECHO_N "0123456789$ECHO_C" >conftest.in # Check for GNU sed and select it if it is found. if "$lt_ac_sed" --version 2>&1 < /dev/null | grep 'GNU' > /dev/null; then lt_cv_path_SED=$lt_ac_sed break fi while true; do cat conftest.in conftest.in >conftest.tmp mv conftest.tmp conftest.in cp conftest.in conftest.nl echo >>conftest.nl $lt_ac_sed -e 's/a$//' < conftest.nl >conftest.out || break cmp -s conftest.out conftest.nl || break # 10000 chars as input seems more than enough test $lt_ac_count -gt 10 && break lt_ac_count=`expr $lt_ac_count + 1` if test $lt_ac_count -gt $lt_ac_max; then lt_ac_max=$lt_ac_count lt_cv_path_SED=$lt_ac_sed fi done done fi SED=$lt_cv_path_SED echo "$as_me:$LINENO: result: $SED" >&5 echo "${ECHO_T}$SED" >&6 echo "$as_me:$LINENO: checking for egrep" >&5 echo $ECHO_N "checking for egrep... $ECHO_C" >&6 if test "${ac_cv_prog_egrep+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if echo a | (grep -E '(a|b)') >/dev/null 2>&1 then ac_cv_prog_egrep='grep -E' else ac_cv_prog_egrep='egrep' fi fi echo "$as_me:$LINENO: result: $ac_cv_prog_egrep" >&5 echo "${ECHO_T}$ac_cv_prog_egrep" >&6 EGREP=$ac_cv_prog_egrep # Check whether --with-gnu-ld or --without-gnu-ld was given. if test "${with_gnu_ld+set}" = set; then withval="$with_gnu_ld" test "$withval" = no || with_gnu_ld=yes else with_gnu_ld=no fi; ac_prog=ld if test "$GCC" = yes; then # Check if gcc -print-prog-name=ld gives a path. echo "$as_me:$LINENO: checking for ld used by $CC" >&5 echo $ECHO_N "checking for ld used by $CC... $ECHO_C" >&6 case $host in *-*-mingw*) # gcc leaves a trailing carriage return which upsets mingw ac_prog=`($CC -print-prog-name=ld) 2>&5 | tr -d '\015'` ;; *) ac_prog=`($CC -print-prog-name=ld) 2>&5` ;; esac case $ac_prog in # Accept absolute paths. [\\/]* | ?:[\\/]*) re_direlt='/[^/][^/]*/\.\./' # Canonicalize the pathname of ld ac_prog=`echo $ac_prog| $SED 's%\\\\%/%g'` while echo $ac_prog | grep "$re_direlt" > /dev/null 2>&1; do ac_prog=`echo $ac_prog| $SED "s%$re_direlt%/%"` done test -z "$LD" && LD="$ac_prog" ;; "") # If it fails, then pretend we aren't using GCC. ac_prog=ld ;; *) # If it is relative, then search for the first ld in PATH. with_gnu_ld=unknown ;; esac elif test "$with_gnu_ld" = yes; then echo "$as_me:$LINENO: checking for GNU ld" >&5 echo $ECHO_N "checking for GNU ld... $ECHO_C" >&6 else echo "$as_me:$LINENO: checking for non-GNU ld" >&5 echo $ECHO_N "checking for non-GNU ld... $ECHO_C" >&6 fi if test "${lt_cv_path_LD+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -z "$LD"; then lt_save_ifs="$IFS"; IFS=$PATH_SEPARATOR for ac_dir in $PATH; do IFS="$lt_save_ifs" test -z "$ac_dir" && ac_dir=. if test -f "$ac_dir/$ac_prog" || test -f "$ac_dir/$ac_prog$ac_exeext"; then lt_cv_path_LD="$ac_dir/$ac_prog" # Check to see if the program is GNU ld. I'd rather use --version, # but apparently some variants of GNU ld only accept -v. # Break only if it was the GNU/non-GNU ld that we prefer. case `"$lt_cv_path_LD" -v 2>&1 &5 echo "${ECHO_T}$LD" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi test -z "$LD" && { { echo "$as_me:$LINENO: error: no acceptable ld found in \$PATH" >&5 echo "$as_me: error: no acceptable ld found in \$PATH" >&2;} { (exit 1); exit 1; }; } echo "$as_me:$LINENO: checking if the linker ($LD) is GNU ld" >&5 echo $ECHO_N "checking if the linker ($LD) is GNU ld... $ECHO_C" >&6 if test "${lt_cv_prog_gnu_ld+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else # I'd rather use --version here, but apparently some GNU lds only accept -v. case `$LD -v 2>&1 &5 echo "${ECHO_T}$lt_cv_prog_gnu_ld" >&6 with_gnu_ld=$lt_cv_prog_gnu_ld echo "$as_me:$LINENO: checking for $LD option to reload object files" >&5 echo $ECHO_N "checking for $LD option to reload object files... $ECHO_C" >&6 if test "${lt_cv_ld_reload_flag+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else lt_cv_ld_reload_flag='-r' fi echo "$as_me:$LINENO: result: $lt_cv_ld_reload_flag" >&5 echo "${ECHO_T}$lt_cv_ld_reload_flag" >&6 reload_flag=$lt_cv_ld_reload_flag case $reload_flag in "" | " "*) ;; *) reload_flag=" $reload_flag" ;; esac reload_cmds='$LD$reload_flag -o $output$reload_objs' case $host_os in darwin*) if test "$GCC" = yes; then reload_cmds='$LTCC $LTCFLAGS -nostdlib ${wl}-r -o $output$reload_objs' else reload_cmds='$LD$reload_flag -o $output$reload_objs' fi ;; esac echo "$as_me:$LINENO: checking for BSD-compatible nm" >&5 echo $ECHO_N "checking for BSD-compatible nm... $ECHO_C" >&6 if test "${lt_cv_path_NM+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$NM"; then # Let the user override the test. lt_cv_path_NM="$NM" else lt_nm_to_check="${ac_tool_prefix}nm" if test -n "$ac_tool_prefix" && test "$build" = "$host"; then lt_nm_to_check="$lt_nm_to_check nm" fi for lt_tmp_nm in $lt_nm_to_check; do lt_save_ifs="$IFS"; IFS=$PATH_SEPARATOR for ac_dir in $PATH /usr/ccs/bin/elf /usr/ccs/bin /usr/ucb /bin; do IFS="$lt_save_ifs" test -z "$ac_dir" && ac_dir=. tmp_nm="$ac_dir/$lt_tmp_nm" if test -f "$tmp_nm" || test -f "$tmp_nm$ac_exeext" ; then # Check to see if the nm accepts a BSD-compat flag. # Adding the `sed 1q' prevents false positives on HP-UX, which says: # nm: unknown option "B" ignored # Tru64's nm complains that /dev/null is an invalid object file case `"$tmp_nm" -B /dev/null 2>&1 | sed '1q'` in */dev/null* | *'Invalid file or object type'*) lt_cv_path_NM="$tmp_nm -B" break ;; *) case `"$tmp_nm" -p /dev/null 2>&1 | sed '1q'` in */dev/null*) lt_cv_path_NM="$tmp_nm -p" break ;; *) lt_cv_path_NM=${lt_cv_path_NM="$tmp_nm"} # keep the first match, but continue # so that we can try to find one that supports BSD flags ;; esac ;; esac fi done IFS="$lt_save_ifs" done test -z "$lt_cv_path_NM" && lt_cv_path_NM=nm fi fi echo "$as_me:$LINENO: result: $lt_cv_path_NM" >&5 echo "${ECHO_T}$lt_cv_path_NM" >&6 NM="$lt_cv_path_NM" echo "$as_me:$LINENO: checking whether ln -s works" >&5 echo $ECHO_N "checking whether ln -s works... $ECHO_C" >&6 LN_S=$as_ln_s if test "$LN_S" = "ln -s"; then echo "$as_me:$LINENO: result: yes" >&5 echo "${ECHO_T}yes" >&6 else echo "$as_me:$LINENO: result: no, using $LN_S" >&5 echo "${ECHO_T}no, using $LN_S" >&6 fi echo "$as_me:$LINENO: checking how to recognise dependent libraries" >&5 echo $ECHO_N "checking how to recognise dependent libraries... $ECHO_C" >&6 if test "${lt_cv_deplibs_check_method+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else lt_cv_file_magic_cmd='$MAGIC_CMD' lt_cv_file_magic_test_file= lt_cv_deplibs_check_method='unknown' # Need to set the preceding variable on all platforms that support # interlibrary dependencies. # 'none' -- dependencies not supported. # `unknown' -- same as none, but documents that we really don't know. # 'pass_all' -- all dependencies passed with no checks. # 'test_compile' -- check by making test program. # 'file_magic [[regex]]' -- check by looking for files in library path # which responds to the $file_magic_cmd with a given extended regex. # If you have `file' or equivalent on your system and you're not sure # whether `pass_all' will *always* work, you probably want this one. case $host_os in aix4* | aix5*) lt_cv_deplibs_check_method=pass_all ;; beos*) lt_cv_deplibs_check_method=pass_all ;; bsdi[45]*) lt_cv_deplibs_check_method='file_magic ELF [0-9][0-9]*-bit [ML]SB (shared object|dynamic lib)' lt_cv_file_magic_cmd='/usr/bin/file -L' lt_cv_file_magic_test_file=/shlib/libc.so ;; cygwin*) # func_win32_libid is a shell function defined in ltmain.sh lt_cv_deplibs_check_method='file_magic ^x86 archive import|^x86 DLL' lt_cv_file_magic_cmd='func_win32_libid' ;; mingw* | pw32*) # Base MSYS/MinGW do not provide the 'file' command needed by # func_win32_libid shell function, so use a weaker test based on 'objdump'. lt_cv_deplibs_check_method='file_magic file format pei*-i386(.*architecture: i386)?' lt_cv_file_magic_cmd='$OBJDUMP -f' ;; darwin* | rhapsody*) lt_cv_deplibs_check_method=pass_all ;; freebsd* | kfreebsd*-gnu | dragonfly*) if echo __ELF__ | $CC -E - | grep __ELF__ > /dev/null; then case $host_cpu in i*86 ) # Not sure whether the presence of OpenBSD here was a mistake. # Let's accept both of them until this is cleared up. lt_cv_deplibs_check_method='file_magic (FreeBSD|OpenBSD|DragonFly)/i[3-9]86 (compact )?demand paged shared library' lt_cv_file_magic_cmd=/usr/bin/file lt_cv_file_magic_test_file=`echo /usr/lib/libc.so.*` ;; esac else lt_cv_deplibs_check_method=pass_all fi ;; gnu*) lt_cv_deplibs_check_method=pass_all ;; hpux10.20* | hpux11*) lt_cv_file_magic_cmd=/usr/bin/file case $host_cpu in ia64*) lt_cv_deplibs_check_method='file_magic (s[0-9][0-9][0-9]|ELF-[0-9][0-9]) shared object file - IA64' lt_cv_file_magic_test_file=/usr/lib/hpux32/libc.so ;; hppa*64*) lt_cv_deplibs_check_method='file_magic (s[0-9][0-9][0-9]|ELF-[0-9][0-9]) shared object file - PA-RISC [0-9].[0-9]' lt_cv_file_magic_test_file=/usr/lib/pa20_64/libc.sl ;; *) lt_cv_deplibs_check_method='file_magic (s[0-9][0-9][0-9]|PA-RISC[0-9].[0-9]) shared library' lt_cv_file_magic_test_file=/usr/lib/libc.sl ;; esac ;; interix3*) # PIC code is broken on Interix 3.x, that's why |\.a not |_pic\.a here lt_cv_deplibs_check_method='match_pattern /lib[^/]+(\.so|\.a)$' ;; irix5* | irix6* | nonstopux*) case $LD in *-32|*"-32 ") libmagic=32-bit;; *-n32|*"-n32 ") libmagic=N32;; *-64|*"-64 ") libmagic=64-bit;; *) libmagic=never-match;; esac lt_cv_deplibs_check_method=pass_all ;; # This must be Linux ELF. linux*) lt_cv_deplibs_check_method=pass_all ;; netbsd*) if echo __ELF__ | $CC -E - | grep __ELF__ > /dev/null; then lt_cv_deplibs_check_method='match_pattern /lib[^/]+(\.so\.[0-9]+\.[0-9]+|_pic\.a)$' else lt_cv_deplibs_check_method='match_pattern /lib[^/]+(\.so|_pic\.a)$' fi ;; newos6*) lt_cv_deplibs_check_method='file_magic ELF [0-9][0-9]*-bit [ML]SB (executable|dynamic lib)' lt_cv_file_magic_cmd=/usr/bin/file lt_cv_file_magic_test_file=/usr/lib/libnls.so ;; nto-qnx*) lt_cv_deplibs_check_method=unknown ;; openbsd*) if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`" || test "$host_os-$host_cpu" = "openbsd2.8-powerpc"; then lt_cv_deplibs_check_method='match_pattern /lib[^/]+(\.so\.[0-9]+\.[0-9]+|\.so|_pic\.a)$' else lt_cv_deplibs_check_method='match_pattern /lib[^/]+(\.so\.[0-9]+\.[0-9]+|_pic\.a)$' fi ;; osf3* | osf4* | osf5*) lt_cv_deplibs_check_method=pass_all ;; solaris*) lt_cv_deplibs_check_method=pass_all ;; sysv4 | sysv4.3*) case $host_vendor in motorola) lt_cv_deplibs_check_method='file_magic ELF [0-9][0-9]*-bit [ML]SB (shared object|dynamic lib) M[0-9][0-9]* Version [0-9]' lt_cv_file_magic_test_file=`echo /usr/lib/libc.so*` ;; ncr) lt_cv_deplibs_check_method=pass_all ;; sequent) lt_cv_file_magic_cmd='/bin/file' lt_cv_deplibs_check_method='file_magic ELF [0-9][0-9]*-bit [LM]SB (shared object|dynamic lib )' ;; sni) lt_cv_file_magic_cmd='/bin/file' lt_cv_deplibs_check_method="file_magic ELF [0-9][0-9]*-bit [LM]SB dynamic lib" lt_cv_file_magic_test_file=/lib/libc.so ;; siemens) lt_cv_deplibs_check_method=pass_all ;; pc) lt_cv_deplibs_check_method=pass_all ;; esac ;; sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX* | sysv4*uw2*) lt_cv_deplibs_check_method=pass_all ;; esac fi echo "$as_me:$LINENO: result: $lt_cv_deplibs_check_method" >&5 echo "${ECHO_T}$lt_cv_deplibs_check_method" >&6 file_magic_cmd=$lt_cv_file_magic_cmd deplibs_check_method=$lt_cv_deplibs_check_method test -z "$deplibs_check_method" && deplibs_check_method=unknown # If no C compiler was specified, use CC. LTCC=${LTCC-"$CC"} # If no C compiler flags were specified, use CFLAGS. LTCFLAGS=${LTCFLAGS-"$CFLAGS"} # Allow CC to be a program name with arguments. compiler=$CC # Check whether --enable-libtool-lock or --disable-libtool-lock was given. if test "${enable_libtool_lock+set}" = set; then enableval="$enable_libtool_lock" fi; test "x$enable_libtool_lock" != xno && enable_libtool_lock=yes # Some flags need to be propagated to the compiler or linker for good # libtool support. case $host in ia64-*-hpux*) # Find out which ABI we are using. echo 'int i;' > conftest.$ac_ext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; then case `/usr/bin/file conftest.$ac_objext` in *ELF-32*) HPUX_IA64_MODE="32" ;; *ELF-64*) HPUX_IA64_MODE="64" ;; esac fi rm -rf conftest* ;; *-*-irix6*) # Find out which ABI we are using. echo '#line 4794 "configure"' > conftest.$ac_ext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; then if test "$lt_cv_prog_gnu_ld" = yes; then case `/usr/bin/file conftest.$ac_objext` in *32-bit*) LD="${LD-ld} -melf32bsmip" ;; *N32*) LD="${LD-ld} -melf32bmipn32" ;; *64-bit*) LD="${LD-ld} -melf64bmip" ;; esac else case `/usr/bin/file conftest.$ac_objext` in *32-bit*) LD="${LD-ld} -32" ;; *N32*) LD="${LD-ld} -n32" ;; *64-bit*) LD="${LD-ld} -64" ;; esac fi fi rm -rf conftest* ;; x86_64-*linux*|ppc*-*linux*|powerpc*-*linux*|s390*-*linux*|sparc*-*linux*) # Find out which ABI we are using. echo 'int i;' > conftest.$ac_ext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; then case `/usr/bin/file conftest.o` in *32-bit*) case $host in x86_64-*linux*) LD="${LD-ld} -m elf_i386" ;; ppc64-*linux*|powerpc64-*linux*) LD="${LD-ld} -m elf32ppclinux" ;; s390x-*linux*) LD="${LD-ld} -m elf_s390" ;; sparc64-*linux*) LD="${LD-ld} -m elf32_sparc" ;; esac ;; *64-bit*) case $host in x86_64-*linux*) LD="${LD-ld} -m elf_x86_64" ;; ppc*-*linux*|powerpc*-*linux*) LD="${LD-ld} -m elf64ppc" ;; s390*-*linux*) LD="${LD-ld} -m elf64_s390" ;; sparc*-*linux*) LD="${LD-ld} -m elf64_sparc" ;; esac ;; esac fi rm -rf conftest* ;; *-*-sco3.2v5*) # On SCO OpenServer 5, we need -belf to get full-featured binaries. SAVE_CFLAGS="$CFLAGS" CFLAGS="$CFLAGS -belf" echo "$as_me:$LINENO: checking whether the C compiler needs -belf" >&5 echo $ECHO_N "checking whether the C compiler needs -belf... $ECHO_C" >&6 if test "${lt_cv_cc_needs_belf+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then lt_cv_cc_needs_belf=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 lt_cv_cc_needs_belf=no fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu fi echo "$as_me:$LINENO: result: $lt_cv_cc_needs_belf" >&5 echo "${ECHO_T}$lt_cv_cc_needs_belf" >&6 if test x"$lt_cv_cc_needs_belf" != x"yes"; then # this is probably gcc 2.8.0, egcs 1.0 or newer; no need for -belf CFLAGS="$SAVE_CFLAGS" fi ;; sparc*-*solaris*) # Find out which ABI we are using. echo 'int i;' > conftest.$ac_ext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; then case `/usr/bin/file conftest.o` in *64-bit*) case $lt_cv_prog_gnu_ld in yes*) LD="${LD-ld} -m elf64_sparc" ;; *) LD="${LD-ld} -64" ;; esac ;; esac fi rm -rf conftest* ;; *-*-cygwin* | *-*-mingw* | *-*-pw32*) if test -n "$ac_tool_prefix"; then # Extract the first word of "${ac_tool_prefix}dlltool", so it can be a program name with args. set dummy ${ac_tool_prefix}dlltool; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_DLLTOOL+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$DLLTOOL"; then ac_cv_prog_DLLTOOL="$DLLTOOL" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_DLLTOOL="${ac_tool_prefix}dlltool" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi DLLTOOL=$ac_cv_prog_DLLTOOL if test -n "$DLLTOOL"; then echo "$as_me:$LINENO: result: $DLLTOOL" >&5 echo "${ECHO_T}$DLLTOOL" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi fi if test -z "$ac_cv_prog_DLLTOOL"; then ac_ct_DLLTOOL=$DLLTOOL # Extract the first word of "dlltool", so it can be a program name with args. set dummy dlltool; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_ac_ct_DLLTOOL+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$ac_ct_DLLTOOL"; then ac_cv_prog_ac_ct_DLLTOOL="$ac_ct_DLLTOOL" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_DLLTOOL="dlltool" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done test -z "$ac_cv_prog_ac_ct_DLLTOOL" && ac_cv_prog_ac_ct_DLLTOOL="false" fi fi ac_ct_DLLTOOL=$ac_cv_prog_ac_ct_DLLTOOL if test -n "$ac_ct_DLLTOOL"; then echo "$as_me:$LINENO: result: $ac_ct_DLLTOOL" >&5 echo "${ECHO_T}$ac_ct_DLLTOOL" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi DLLTOOL=$ac_ct_DLLTOOL else DLLTOOL="$ac_cv_prog_DLLTOOL" fi if test -n "$ac_tool_prefix"; then # Extract the first word of "${ac_tool_prefix}as", so it can be a program name with args. set dummy ${ac_tool_prefix}as; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_AS+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$AS"; then ac_cv_prog_AS="$AS" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_AS="${ac_tool_prefix}as" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi AS=$ac_cv_prog_AS if test -n "$AS"; then echo "$as_me:$LINENO: result: $AS" >&5 echo "${ECHO_T}$AS" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi fi if test -z "$ac_cv_prog_AS"; then ac_ct_AS=$AS # Extract the first word of "as", so it can be a program name with args. set dummy as; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_ac_ct_AS+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$ac_ct_AS"; then ac_cv_prog_ac_ct_AS="$ac_ct_AS" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_AS="as" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done test -z "$ac_cv_prog_ac_ct_AS" && ac_cv_prog_ac_ct_AS="false" fi fi ac_ct_AS=$ac_cv_prog_ac_ct_AS if test -n "$ac_ct_AS"; then echo "$as_me:$LINENO: result: $ac_ct_AS" >&5 echo "${ECHO_T}$ac_ct_AS" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi AS=$ac_ct_AS else AS="$ac_cv_prog_AS" fi if test -n "$ac_tool_prefix"; then # Extract the first word of "${ac_tool_prefix}objdump", so it can be a program name with args. set dummy ${ac_tool_prefix}objdump; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_OBJDUMP+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$OBJDUMP"; then ac_cv_prog_OBJDUMP="$OBJDUMP" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_OBJDUMP="${ac_tool_prefix}objdump" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi OBJDUMP=$ac_cv_prog_OBJDUMP if test -n "$OBJDUMP"; then echo "$as_me:$LINENO: result: $OBJDUMP" >&5 echo "${ECHO_T}$OBJDUMP" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi fi if test -z "$ac_cv_prog_OBJDUMP"; then ac_ct_OBJDUMP=$OBJDUMP # Extract the first word of "objdump", so it can be a program name with args. set dummy objdump; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_ac_ct_OBJDUMP+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$ac_ct_OBJDUMP"; then ac_cv_prog_ac_ct_OBJDUMP="$ac_ct_OBJDUMP" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_OBJDUMP="objdump" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done test -z "$ac_cv_prog_ac_ct_OBJDUMP" && ac_cv_prog_ac_ct_OBJDUMP="false" fi fi ac_ct_OBJDUMP=$ac_cv_prog_ac_ct_OBJDUMP if test -n "$ac_ct_OBJDUMP"; then echo "$as_me:$LINENO: result: $ac_ct_OBJDUMP" >&5 echo "${ECHO_T}$ac_ct_OBJDUMP" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi OBJDUMP=$ac_ct_OBJDUMP else OBJDUMP="$ac_cv_prog_OBJDUMP" fi ;; esac need_locks="$enable_libtool_lock" ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu echo "$as_me:$LINENO: checking how to run the C preprocessor" >&5 echo $ECHO_N "checking how to run the C preprocessor... $ECHO_C" >&6 # On Suns, sometimes $CPP names a directory. if test -n "$CPP" && test -d "$CPP"; then CPP= fi if test -z "$CPP"; then if test "${ac_cv_prog_CPP+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else # Double quotes because CPP needs to be expanded for CPP in "$CC -E" "$CC -E -traditional-cpp" "/lib/cpp" do ac_preproc_ok=false for ac_c_preproc_warn_flag in '' yes do # Use a header file that comes with gcc, so configuring glibc # with a fresh cross-compiler works. # Prefer to if __STDC__ is defined, since # exists even on freestanding compilers. # On the NeXT, cc -E runs the code through the compiler's parser, # not just through cpp. "Syntax error" is here to catch this case. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #ifdef __STDC__ # include #else # include #endif Syntax error _ACEOF if { (eval echo "$as_me:$LINENO: \"$ac_cpp conftest.$ac_ext\"") >&5 (eval $ac_cpp conftest.$ac_ext) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } >/dev/null; then if test -s conftest.err; then ac_cpp_err=$ac_c_preproc_warn_flag ac_cpp_err=$ac_cpp_err$ac_c_werror_flag else ac_cpp_err= fi else ac_cpp_err=yes fi if test -z "$ac_cpp_err"; then : else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 # Broken: fails on valid input. continue fi rm -f conftest.err conftest.$ac_ext # OK, works on sane cases. Now check whether non-existent headers # can be detected and how. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include _ACEOF if { (eval echo "$as_me:$LINENO: \"$ac_cpp conftest.$ac_ext\"") >&5 (eval $ac_cpp conftest.$ac_ext) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } >/dev/null; then if test -s conftest.err; then ac_cpp_err=$ac_c_preproc_warn_flag ac_cpp_err=$ac_cpp_err$ac_c_werror_flag else ac_cpp_err= fi else ac_cpp_err=yes fi if test -z "$ac_cpp_err"; then # Broken: success on invalid input. continue else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 # Passes both tests. ac_preproc_ok=: break fi rm -f conftest.err conftest.$ac_ext done # Because of `break', _AC_PREPROC_IFELSE's cleaning code was skipped. rm -f conftest.err conftest.$ac_ext if $ac_preproc_ok; then break fi done ac_cv_prog_CPP=$CPP fi CPP=$ac_cv_prog_CPP else ac_cv_prog_CPP=$CPP fi echo "$as_me:$LINENO: result: $CPP" >&5 echo "${ECHO_T}$CPP" >&6 ac_preproc_ok=false for ac_c_preproc_warn_flag in '' yes do # Use a header file that comes with gcc, so configuring glibc # with a fresh cross-compiler works. # Prefer to if __STDC__ is defined, since # exists even on freestanding compilers. # On the NeXT, cc -E runs the code through the compiler's parser, # not just through cpp. "Syntax error" is here to catch this case. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #ifdef __STDC__ # include #else # include #endif Syntax error _ACEOF if { (eval echo "$as_me:$LINENO: \"$ac_cpp conftest.$ac_ext\"") >&5 (eval $ac_cpp conftest.$ac_ext) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } >/dev/null; then if test -s conftest.err; then ac_cpp_err=$ac_c_preproc_warn_flag ac_cpp_err=$ac_cpp_err$ac_c_werror_flag else ac_cpp_err= fi else ac_cpp_err=yes fi if test -z "$ac_cpp_err"; then : else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 # Broken: fails on valid input. continue fi rm -f conftest.err conftest.$ac_ext # OK, works on sane cases. Now check whether non-existent headers # can be detected and how. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include _ACEOF if { (eval echo "$as_me:$LINENO: \"$ac_cpp conftest.$ac_ext\"") >&5 (eval $ac_cpp conftest.$ac_ext) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } >/dev/null; then if test -s conftest.err; then ac_cpp_err=$ac_c_preproc_warn_flag ac_cpp_err=$ac_cpp_err$ac_c_werror_flag else ac_cpp_err= fi else ac_cpp_err=yes fi if test -z "$ac_cpp_err"; then # Broken: success on invalid input. continue else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 # Passes both tests. ac_preproc_ok=: break fi rm -f conftest.err conftest.$ac_ext done # Because of `break', _AC_PREPROC_IFELSE's cleaning code was skipped. rm -f conftest.err conftest.$ac_ext if $ac_preproc_ok; then : else { { echo "$as_me:$LINENO: error: C preprocessor \"$CPP\" fails sanity check See \`config.log' for more details." >&5 echo "$as_me: error: C preprocessor \"$CPP\" fails sanity check See \`config.log' for more details." >&2;} { (exit 1); exit 1; }; } fi ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu echo "$as_me:$LINENO: checking for ANSI C header files" >&5 echo $ECHO_N "checking for ANSI C header files... $ECHO_C" >&6 if test "${ac_cv_header_stdc+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include #include #include #include int main () { ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_header_stdc=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_header_stdc=no fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext if test $ac_cv_header_stdc = yes; then # SunOS 4.x string.h does not declare mem*, contrary to ANSI. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include _ACEOF if (eval "$ac_cpp conftest.$ac_ext") 2>&5 | $EGREP "memchr" >/dev/null 2>&1; then : else ac_cv_header_stdc=no fi rm -f conftest* fi if test $ac_cv_header_stdc = yes; then # ISC 2.0.2 stdlib.h does not declare free, contrary to ANSI. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include _ACEOF if (eval "$ac_cpp conftest.$ac_ext") 2>&5 | $EGREP "free" >/dev/null 2>&1; then : else ac_cv_header_stdc=no fi rm -f conftest* fi if test $ac_cv_header_stdc = yes; then # /bin/cc in Irix-4.0.5 gets non-ANSI ctype macros unless using -ansi. if test "$cross_compiling" = yes; then : else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include #if ((' ' & 0x0FF) == 0x020) # define ISLOWER(c) ('a' <= (c) && (c) <= 'z') # define TOUPPER(c) (ISLOWER(c) ? 'A' + ((c) - 'a') : (c)) #else # define ISLOWER(c) \ (('a' <= (c) && (c) <= 'i') \ || ('j' <= (c) && (c) <= 'r') \ || ('s' <= (c) && (c) <= 'z')) # define TOUPPER(c) (ISLOWER(c) ? ((c) | 0x40) : (c)) #endif #define XOR(e, f) (((e) && !(f)) || (!(e) && (f))) int main () { int i; for (i = 0; i < 256; i++) if (XOR (islower (i), ISLOWER (i)) || toupper (i) != TOUPPER (i)) exit(2); exit (0); } _ACEOF rm -f conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='./conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then : else echo "$as_me: program exited with status $ac_status" >&5 echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ( exit $ac_status ) ac_cv_header_stdc=no fi rm -f core *.core gmon.out bb.out conftest$ac_exeext conftest.$ac_objext conftest.$ac_ext fi fi fi echo "$as_me:$LINENO: result: $ac_cv_header_stdc" >&5 echo "${ECHO_T}$ac_cv_header_stdc" >&6 if test $ac_cv_header_stdc = yes; then cat >>confdefs.h <<\_ACEOF #define STDC_HEADERS 1 _ACEOF fi # On IRIX 5.3, sys/types and inttypes.h are conflicting. for ac_header in sys/types.h sys/stat.h stdlib.h string.h memory.h strings.h \ inttypes.h stdint.h unistd.h do as_ac_Header=`echo "ac_cv_header_$ac_header" | $as_tr_sh` echo "$as_me:$LINENO: checking for $ac_header" >&5 echo $ECHO_N "checking for $ac_header... $ECHO_C" >&6 if eval "test \"\${$as_ac_Header+set}\" = set"; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ $ac_includes_default #include <$ac_header> _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then eval "$as_ac_Header=yes" else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 eval "$as_ac_Header=no" fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext fi echo "$as_me:$LINENO: result: `eval echo '${'$as_ac_Header'}'`" >&5 echo "${ECHO_T}`eval echo '${'$as_ac_Header'}'`" >&6 if test `eval echo '${'$as_ac_Header'}'` = yes; then cat >>confdefs.h <<_ACEOF #define `echo "HAVE_$ac_header" | $as_tr_cpp` 1 _ACEOF fi done for ac_header in dlfcn.h do as_ac_Header=`echo "ac_cv_header_$ac_header" | $as_tr_sh` if eval "test \"\${$as_ac_Header+set}\" = set"; then echo "$as_me:$LINENO: checking for $ac_header" >&5 echo $ECHO_N "checking for $ac_header... $ECHO_C" >&6 if eval "test \"\${$as_ac_Header+set}\" = set"; then echo $ECHO_N "(cached) $ECHO_C" >&6 fi echo "$as_me:$LINENO: result: `eval echo '${'$as_ac_Header'}'`" >&5 echo "${ECHO_T}`eval echo '${'$as_ac_Header'}'`" >&6 else # Is the header compilable? echo "$as_me:$LINENO: checking $ac_header usability" >&5 echo $ECHO_N "checking $ac_header usability... $ECHO_C" >&6 cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ $ac_includes_default #include <$ac_header> _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_header_compiler=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_header_compiler=no fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext echo "$as_me:$LINENO: result: $ac_header_compiler" >&5 echo "${ECHO_T}$ac_header_compiler" >&6 # Is the header present? echo "$as_me:$LINENO: checking $ac_header presence" >&5 echo $ECHO_N "checking $ac_header presence... $ECHO_C" >&6 cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include <$ac_header> _ACEOF if { (eval echo "$as_me:$LINENO: \"$ac_cpp conftest.$ac_ext\"") >&5 (eval $ac_cpp conftest.$ac_ext) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } >/dev/null; then if test -s conftest.err; then ac_cpp_err=$ac_c_preproc_warn_flag ac_cpp_err=$ac_cpp_err$ac_c_werror_flag else ac_cpp_err= fi else ac_cpp_err=yes fi if test -z "$ac_cpp_err"; then ac_header_preproc=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_header_preproc=no fi rm -f conftest.err conftest.$ac_ext echo "$as_me:$LINENO: result: $ac_header_preproc" >&5 echo "${ECHO_T}$ac_header_preproc" >&6 # So? What about this header? case $ac_header_compiler:$ac_header_preproc:$ac_c_preproc_warn_flag in yes:no: ) { echo "$as_me:$LINENO: WARNING: $ac_header: accepted by the compiler, rejected by the preprocessor!" >&5 echo "$as_me: WARNING: $ac_header: accepted by the compiler, rejected by the preprocessor!" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: proceeding with the compiler's result" >&5 echo "$as_me: WARNING: $ac_header: proceeding with the compiler's result" >&2;} ac_header_preproc=yes ;; no:yes:* ) { echo "$as_me:$LINENO: WARNING: $ac_header: present but cannot be compiled" >&5 echo "$as_me: WARNING: $ac_header: present but cannot be compiled" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: check for missing prerequisite headers?" >&5 echo "$as_me: WARNING: $ac_header: check for missing prerequisite headers?" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: see the Autoconf documentation" >&5 echo "$as_me: WARNING: $ac_header: see the Autoconf documentation" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: section \"Present But Cannot Be Compiled\"" >&5 echo "$as_me: WARNING: $ac_header: section \"Present But Cannot Be Compiled\"" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: proceeding with the preprocessor's result" >&5 echo "$as_me: WARNING: $ac_header: proceeding with the preprocessor's result" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: in the future, the compiler will take precedence" >&5 echo "$as_me: WARNING: $ac_header: in the future, the compiler will take precedence" >&2;} ( cat <<\_ASBOX ## ------------------------------------------ ## ## Report this to the AC_PACKAGE_NAME lists. ## ## ------------------------------------------ ## _ASBOX ) | sed "s/^/$as_me: WARNING: /" >&2 ;; esac echo "$as_me:$LINENO: checking for $ac_header" >&5 echo $ECHO_N "checking for $ac_header... $ECHO_C" >&6 if eval "test \"\${$as_ac_Header+set}\" = set"; then echo $ECHO_N "(cached) $ECHO_C" >&6 else eval "$as_ac_Header=\$ac_header_preproc" fi echo "$as_me:$LINENO: result: `eval echo '${'$as_ac_Header'}'`" >&5 echo "${ECHO_T}`eval echo '${'$as_ac_Header'}'`" >&6 fi if test `eval echo '${'$as_ac_Header'}'` = yes; then cat >>confdefs.h <<_ACEOF #define `echo "HAVE_$ac_header" | $as_tr_cpp` 1 _ACEOF fi done ac_ext=cc ac_cpp='$CXXCPP $CPPFLAGS' ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_cxx_compiler_gnu if test -n "$ac_tool_prefix"; then for ac_prog in $CCC g++ c++ gpp aCC CC cxx cc++ cl FCC KCC RCC xlC_r xlC do # Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args. set dummy $ac_tool_prefix$ac_prog; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_CXX+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$CXX"; then ac_cv_prog_CXX="$CXX" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_CXX="$ac_tool_prefix$ac_prog" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi CXX=$ac_cv_prog_CXX if test -n "$CXX"; then echo "$as_me:$LINENO: result: $CXX" >&5 echo "${ECHO_T}$CXX" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi test -n "$CXX" && break done fi if test -z "$CXX"; then ac_ct_CXX=$CXX for ac_prog in $CCC g++ c++ gpp aCC CC cxx cc++ cl FCC KCC RCC xlC_r xlC do # Extract the first word of "$ac_prog", so it can be a program name with args. set dummy $ac_prog; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_ac_ct_CXX+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$ac_ct_CXX"; then ac_cv_prog_ac_ct_CXX="$ac_ct_CXX" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_CXX="$ac_prog" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi ac_ct_CXX=$ac_cv_prog_ac_ct_CXX if test -n "$ac_ct_CXX"; then echo "$as_me:$LINENO: result: $ac_ct_CXX" >&5 echo "${ECHO_T}$ac_ct_CXX" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi test -n "$ac_ct_CXX" && break done test -n "$ac_ct_CXX" || ac_ct_CXX="g++" CXX=$ac_ct_CXX fi # Provide some information about the compiler. echo "$as_me:$LINENO:" \ "checking for C++ compiler version" >&5 ac_compiler=`set X $ac_compile; echo $2` { (eval echo "$as_me:$LINENO: \"$ac_compiler --version &5\"") >&5 (eval $ac_compiler --version &5) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } { (eval echo "$as_me:$LINENO: \"$ac_compiler -v &5\"") >&5 (eval $ac_compiler -v &5) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } { (eval echo "$as_me:$LINENO: \"$ac_compiler -V &5\"") >&5 (eval $ac_compiler -V &5) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } echo "$as_me:$LINENO: checking whether we are using the GNU C++ compiler" >&5 echo $ECHO_N "checking whether we are using the GNU C++ compiler... $ECHO_C" >&6 if test "${ac_cv_cxx_compiler_gnu+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { #ifndef __GNUC__ choke me #endif ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_cxx_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_compiler_gnu=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_compiler_gnu=no fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext ac_cv_cxx_compiler_gnu=$ac_compiler_gnu fi echo "$as_me:$LINENO: result: $ac_cv_cxx_compiler_gnu" >&5 echo "${ECHO_T}$ac_cv_cxx_compiler_gnu" >&6 GXX=`test $ac_compiler_gnu = yes && echo yes` ac_test_CXXFLAGS=${CXXFLAGS+set} ac_save_CXXFLAGS=$CXXFLAGS CXXFLAGS="-g" echo "$as_me:$LINENO: checking whether $CXX accepts -g" >&5 echo $ECHO_N "checking whether $CXX accepts -g... $ECHO_C" >&6 if test "${ac_cv_prog_cxx_g+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_cxx_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_prog_cxx_g=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_prog_cxx_g=no fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext fi echo "$as_me:$LINENO: result: $ac_cv_prog_cxx_g" >&5 echo "${ECHO_T}$ac_cv_prog_cxx_g" >&6 if test "$ac_test_CXXFLAGS" = set; then CXXFLAGS=$ac_save_CXXFLAGS elif test $ac_cv_prog_cxx_g = yes; then if test "$GXX" = yes; then CXXFLAGS="-g -O2" else CXXFLAGS="-g" fi else if test "$GXX" = yes; then CXXFLAGS="-O2" else CXXFLAGS= fi fi for ac_declaration in \ '' \ 'extern "C" void std::exit (int) throw (); using std::exit;' \ 'extern "C" void std::exit (int); using std::exit;' \ 'extern "C" void exit (int) throw ();' \ 'extern "C" void exit (int);' \ 'void exit (int);' do cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ $ac_declaration #include int main () { exit (42); ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_cxx_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then : else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 continue fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ $ac_declaration int main () { exit (42); ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_cxx_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then break else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext done rm -f conftest* if test -n "$ac_declaration"; then echo '#ifdef __cplusplus' >>confdefs.h echo $ac_declaration >>confdefs.h echo '#endif' >>confdefs.h fi ac_ext=cc ac_cpp='$CXXCPP $CPPFLAGS' ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_cxx_compiler_gnu depcc="$CXX" am_compiler_list= echo "$as_me:$LINENO: checking dependency style of $depcc" >&5 echo $ECHO_N "checking dependency style of $depcc... $ECHO_C" >&6 if test "${am_cv_CXX_dependencies_compiler_type+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -z "$AMDEP_TRUE" && test -f "$am_depcomp"; then # We make a subdir and do the tests there. Otherwise we can end up # making bogus files that we don't know about and never remove. For # instance it was reported that on HP-UX the gcc test will end up # making a dummy file named `D' -- because `-MD' means `put the output # in D'. mkdir conftest.dir # Copy depcomp to subdir because otherwise we won't find it if we're # using a relative directory. cp "$am_depcomp" conftest.dir cd conftest.dir # We will build objects and dependencies in a subdirectory because # it helps to detect inapplicable dependency modes. For instance # both Tru64's cc and ICC support -MD to output dependencies as a # side effect of compilation, but ICC will put the dependencies in # the current directory while Tru64 will put them in the object # directory. mkdir sub am_cv_CXX_dependencies_compiler_type=none if test "$am_compiler_list" = ""; then am_compiler_list=`sed -n 's/^#*\([a-zA-Z0-9]*\))$/\1/p' < ./depcomp` fi for depmode in $am_compiler_list; do # Setup a source with many dependencies, because some compilers # like to wrap large dependency lists on column 80 (with \), and # we should not choose a depcomp mode which is confused by this. # # We need to recreate these files for each test, as the compiler may # overwrite some of them when testing with obscure command lines. # This happens at least with the AIX C compiler. : > sub/conftest.c for i in 1 2 3 4 5 6; do echo '#include "conftst'$i'.h"' >> sub/conftest.c # Using `: > sub/conftst$i.h' creates only sub/conftst1.h with # Solaris 8's {/usr,}/bin/sh. touch sub/conftst$i.h done echo "${am__include} ${am__quote}sub/conftest.Po${am__quote}" > confmf case $depmode in nosideeffect) # after this tag, mechanisms are not by side-effect, so they'll # only be used when explicitly requested if test "x$enable_dependency_tracking" = xyes; then continue else break fi ;; none) break ;; esac # We check with `-c' and `-o' for the sake of the "dashmstdout" # mode. It turns out that the SunPro C++ compiler does not properly # handle `-M -o', and we need to detect this. if depmode=$depmode \ source=sub/conftest.c object=sub/conftest.${OBJEXT-o} \ depfile=sub/conftest.Po tmpdepfile=sub/conftest.TPo \ $SHELL ./depcomp $depcc -c -o sub/conftest.${OBJEXT-o} sub/conftest.c \ >/dev/null 2>conftest.err && grep sub/conftst6.h sub/conftest.Po > /dev/null 2>&1 && grep sub/conftest.${OBJEXT-o} sub/conftest.Po > /dev/null 2>&1 && ${MAKE-make} -s -f confmf > /dev/null 2>&1; then # icc doesn't choke on unknown options, it will just issue warnings # or remarks (even with -Werror). So we grep stderr for any message # that says an option was ignored or not supported. # When given -MP, icc 7.0 and 7.1 complain thusly: # icc: Command line warning: ignoring option '-M'; no argument required # The diagnosis changed in icc 8.0: # icc: Command line remark: option '-MP' not supported if (grep 'ignoring option' conftest.err || grep 'not supported' conftest.err) >/dev/null 2>&1; then :; else am_cv_CXX_dependencies_compiler_type=$depmode break fi fi done cd .. rm -rf conftest.dir else am_cv_CXX_dependencies_compiler_type=none fi fi echo "$as_me:$LINENO: result: $am_cv_CXX_dependencies_compiler_type" >&5 echo "${ECHO_T}$am_cv_CXX_dependencies_compiler_type" >&6 CXXDEPMODE=depmode=$am_cv_CXX_dependencies_compiler_type if test "x$enable_dependency_tracking" != xno \ && test "$am_cv_CXX_dependencies_compiler_type" = gcc3; then am__fastdepCXX_TRUE= am__fastdepCXX_FALSE='#' else am__fastdepCXX_TRUE='#' am__fastdepCXX_FALSE= fi if test -n "$CXX" && ( test "X$CXX" != "Xno" && ( (test "X$CXX" = "Xg++" && `g++ -v >/dev/null 2>&1` ) || (test "X$CXX" != "Xg++"))) ; then ac_ext=cc ac_cpp='$CXXCPP $CPPFLAGS' ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_cxx_compiler_gnu echo "$as_me:$LINENO: checking how to run the C++ preprocessor" >&5 echo $ECHO_N "checking how to run the C++ preprocessor... $ECHO_C" >&6 if test -z "$CXXCPP"; then if test "${ac_cv_prog_CXXCPP+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else # Double quotes because CXXCPP needs to be expanded for CXXCPP in "$CXX -E" "/lib/cpp" do ac_preproc_ok=false for ac_cxx_preproc_warn_flag in '' yes do # Use a header file that comes with gcc, so configuring glibc # with a fresh cross-compiler works. # Prefer to if __STDC__ is defined, since # exists even on freestanding compilers. # On the NeXT, cc -E runs the code through the compiler's parser, # not just through cpp. "Syntax error" is here to catch this case. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #ifdef __STDC__ # include #else # include #endif Syntax error _ACEOF if { (eval echo "$as_me:$LINENO: \"$ac_cpp conftest.$ac_ext\"") >&5 (eval $ac_cpp conftest.$ac_ext) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } >/dev/null; then if test -s conftest.err; then ac_cpp_err=$ac_cxx_preproc_warn_flag ac_cpp_err=$ac_cpp_err$ac_cxx_werror_flag else ac_cpp_err= fi else ac_cpp_err=yes fi if test -z "$ac_cpp_err"; then : else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 # Broken: fails on valid input. continue fi rm -f conftest.err conftest.$ac_ext # OK, works on sane cases. Now check whether non-existent headers # can be detected and how. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include _ACEOF if { (eval echo "$as_me:$LINENO: \"$ac_cpp conftest.$ac_ext\"") >&5 (eval $ac_cpp conftest.$ac_ext) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } >/dev/null; then if test -s conftest.err; then ac_cpp_err=$ac_cxx_preproc_warn_flag ac_cpp_err=$ac_cpp_err$ac_cxx_werror_flag else ac_cpp_err= fi else ac_cpp_err=yes fi if test -z "$ac_cpp_err"; then # Broken: success on invalid input. continue else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 # Passes both tests. ac_preproc_ok=: break fi rm -f conftest.err conftest.$ac_ext done # Because of `break', _AC_PREPROC_IFELSE's cleaning code was skipped. rm -f conftest.err conftest.$ac_ext if $ac_preproc_ok; then break fi done ac_cv_prog_CXXCPP=$CXXCPP fi CXXCPP=$ac_cv_prog_CXXCPP else ac_cv_prog_CXXCPP=$CXXCPP fi echo "$as_me:$LINENO: result: $CXXCPP" >&5 echo "${ECHO_T}$CXXCPP" >&6 ac_preproc_ok=false for ac_cxx_preproc_warn_flag in '' yes do # Use a header file that comes with gcc, so configuring glibc # with a fresh cross-compiler works. # Prefer to if __STDC__ is defined, since # exists even on freestanding compilers. # On the NeXT, cc -E runs the code through the compiler's parser, # not just through cpp. "Syntax error" is here to catch this case. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #ifdef __STDC__ # include #else # include #endif Syntax error _ACEOF if { (eval echo "$as_me:$LINENO: \"$ac_cpp conftest.$ac_ext\"") >&5 (eval $ac_cpp conftest.$ac_ext) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } >/dev/null; then if test -s conftest.err; then ac_cpp_err=$ac_cxx_preproc_warn_flag ac_cpp_err=$ac_cpp_err$ac_cxx_werror_flag else ac_cpp_err= fi else ac_cpp_err=yes fi if test -z "$ac_cpp_err"; then : else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 # Broken: fails on valid input. continue fi rm -f conftest.err conftest.$ac_ext # OK, works on sane cases. Now check whether non-existent headers # can be detected and how. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include _ACEOF if { (eval echo "$as_me:$LINENO: \"$ac_cpp conftest.$ac_ext\"") >&5 (eval $ac_cpp conftest.$ac_ext) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } >/dev/null; then if test -s conftest.err; then ac_cpp_err=$ac_cxx_preproc_warn_flag ac_cpp_err=$ac_cpp_err$ac_cxx_werror_flag else ac_cpp_err= fi else ac_cpp_err=yes fi if test -z "$ac_cpp_err"; then # Broken: success on invalid input. continue else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 # Passes both tests. ac_preproc_ok=: break fi rm -f conftest.err conftest.$ac_ext done # Because of `break', _AC_PREPROC_IFELSE's cleaning code was skipped. rm -f conftest.err conftest.$ac_ext if $ac_preproc_ok; then : else { { echo "$as_me:$LINENO: error: C++ preprocessor \"$CXXCPP\" fails sanity check See \`config.log' for more details." >&5 echo "$as_me: error: C++ preprocessor \"$CXXCPP\" fails sanity check See \`config.log' for more details." >&2;} { (exit 1); exit 1; }; } fi ac_ext=cc ac_cpp='$CXXCPP $CPPFLAGS' ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_cxx_compiler_gnu fi ac_ext=f ac_compile='$F77 -c $FFLAGS conftest.$ac_ext >&5' ac_link='$F77 -o conftest$ac_exeext $FFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_f77_compiler_gnu if test -n "$ac_tool_prefix"; then for ac_prog in g77 f77 xlf frt pgf77 fort77 fl32 af77 f90 xlf90 pgf90 epcf90 f95 fort xlf95 ifc efc pgf95 lf95 gfortran do # Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args. set dummy $ac_tool_prefix$ac_prog; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_F77+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$F77"; then ac_cv_prog_F77="$F77" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_F77="$ac_tool_prefix$ac_prog" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi F77=$ac_cv_prog_F77 if test -n "$F77"; then echo "$as_me:$LINENO: result: $F77" >&5 echo "${ECHO_T}$F77" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi test -n "$F77" && break done fi if test -z "$F77"; then ac_ct_F77=$F77 for ac_prog in g77 f77 xlf frt pgf77 fort77 fl32 af77 f90 xlf90 pgf90 epcf90 f95 fort xlf95 ifc efc pgf95 lf95 gfortran do # Extract the first word of "$ac_prog", so it can be a program name with args. set dummy $ac_prog; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_ac_ct_F77+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$ac_ct_F77"; then ac_cv_prog_ac_ct_F77="$ac_ct_F77" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_F77="$ac_prog" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi ac_ct_F77=$ac_cv_prog_ac_ct_F77 if test -n "$ac_ct_F77"; then echo "$as_me:$LINENO: result: $ac_ct_F77" >&5 echo "${ECHO_T}$ac_ct_F77" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi test -n "$ac_ct_F77" && break done F77=$ac_ct_F77 fi # Provide some information about the compiler. echo "$as_me:6635:" \ "checking for Fortran 77 compiler version" >&5 ac_compiler=`set X $ac_compile; echo $2` { (eval echo "$as_me:$LINENO: \"$ac_compiler --version &5\"") >&5 (eval $ac_compiler --version &5) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } { (eval echo "$as_me:$LINENO: \"$ac_compiler -v &5\"") >&5 (eval $ac_compiler -v &5) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } { (eval echo "$as_me:$LINENO: \"$ac_compiler -V &5\"") >&5 (eval $ac_compiler -V &5) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } rm -f a.out # If we don't use `.F' as extension, the preprocessor is not run on the # input file. (Note that this only needs to work for GNU compilers.) ac_save_ext=$ac_ext ac_ext=F echo "$as_me:$LINENO: checking whether we are using the GNU Fortran 77 compiler" >&5 echo $ECHO_N "checking whether we are using the GNU Fortran 77 compiler... $ECHO_C" >&6 if test "${ac_cv_f77_compiler_gnu+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF program main #ifndef __GNUC__ choke me #endif end _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_f77_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_compiler_gnu=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_compiler_gnu=no fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext ac_cv_f77_compiler_gnu=$ac_compiler_gnu fi echo "$as_me:$LINENO: result: $ac_cv_f77_compiler_gnu" >&5 echo "${ECHO_T}$ac_cv_f77_compiler_gnu" >&6 ac_ext=$ac_save_ext ac_test_FFLAGS=${FFLAGS+set} ac_save_FFLAGS=$FFLAGS FFLAGS= echo "$as_me:$LINENO: checking whether $F77 accepts -g" >&5 echo $ECHO_N "checking whether $F77 accepts -g... $ECHO_C" >&6 if test "${ac_cv_prog_f77_g+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else FFLAGS=-g cat >conftest.$ac_ext <<_ACEOF program main end _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_f77_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_prog_f77_g=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_prog_f77_g=no fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext fi echo "$as_me:$LINENO: result: $ac_cv_prog_f77_g" >&5 echo "${ECHO_T}$ac_cv_prog_f77_g" >&6 if test "$ac_test_FFLAGS" = set; then FFLAGS=$ac_save_FFLAGS elif test $ac_cv_prog_f77_g = yes; then if test "x$ac_cv_f77_compiler_gnu" = xyes; then FFLAGS="-g -O2" else FFLAGS="-g" fi else if test "x$ac_cv_f77_compiler_gnu" = xyes; then FFLAGS="-O2" else FFLAGS= fi fi G77=`test $ac_compiler_gnu = yes && echo yes` ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu # Autoconf 2.13's AC_OBJEXT and AC_EXEEXT macros only works for C compilers! # find the maximum length of command line arguments echo "$as_me:$LINENO: checking the maximum length of command line arguments" >&5 echo $ECHO_N "checking the maximum length of command line arguments... $ECHO_C" >&6 if test "${lt_cv_sys_max_cmd_len+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else i=0 teststring="ABCD" case $build_os in msdosdjgpp*) # On DJGPP, this test can blow up pretty badly due to problems in libc # (any single argument exceeding 2000 bytes causes a buffer overrun # during glob expansion). Even if it were fixed, the result of this # check would be larger than it should be. lt_cv_sys_max_cmd_len=12288; # 12K is about right ;; gnu*) # Under GNU Hurd, this test is not required because there is # no limit to the length of command line arguments. # Libtool will interpret -1 as no limit whatsoever lt_cv_sys_max_cmd_len=-1; ;; cygwin* | mingw*) # On Win9x/ME, this test blows up -- it succeeds, but takes # about 5 minutes as the teststring grows exponentially. # Worse, since 9x/ME are not pre-emptively multitasking, # you end up with a "frozen" computer, even though with patience # the test eventually succeeds (with a max line length of 256k). # Instead, let's just punt: use the minimum linelength reported by # all of the supported platforms: 8192 (on NT/2K/XP). lt_cv_sys_max_cmd_len=8192; ;; amigaos*) # On AmigaOS with pdksh, this test takes hours, literally. # So we just punt and use a minimum line length of 8192. lt_cv_sys_max_cmd_len=8192; ;; netbsd* | freebsd* | openbsd* | darwin* | dragonfly*) # This has been around since 386BSD, at least. Likely further. if test -x /sbin/sysctl; then lt_cv_sys_max_cmd_len=`/sbin/sysctl -n kern.argmax` elif test -x /usr/sbin/sysctl; then lt_cv_sys_max_cmd_len=`/usr/sbin/sysctl -n kern.argmax` else lt_cv_sys_max_cmd_len=65536 # usable default for all BSDs fi # And add a safety zone lt_cv_sys_max_cmd_len=`expr $lt_cv_sys_max_cmd_len \/ 4` lt_cv_sys_max_cmd_len=`expr $lt_cv_sys_max_cmd_len \* 3` ;; interix*) # We know the value 262144 and hardcode it with a safety zone (like BSD) lt_cv_sys_max_cmd_len=196608 ;; osf*) # Dr. Hans Ekkehard Plesser reports seeing a kernel panic running configure # due to this test when exec_disable_arg_limit is 1 on Tru64. It is not # nice to cause kernel panics so lets avoid the loop below. # First set a reasonable default. lt_cv_sys_max_cmd_len=16384 # if test -x /sbin/sysconfig; then case `/sbin/sysconfig -q proc exec_disable_arg_limit` in *1*) lt_cv_sys_max_cmd_len=-1 ;; esac fi ;; sco3.2v5*) lt_cv_sys_max_cmd_len=102400 ;; sysv5* | sco5v6* | sysv4.2uw2*) kargmax=`grep ARG_MAX /etc/conf/cf.d/stune 2>/dev/null` if test -n "$kargmax"; then lt_cv_sys_max_cmd_len=`echo $kargmax | sed 's/.*[ ]//'` else lt_cv_sys_max_cmd_len=32768 fi ;; *) # If test is not a shell built-in, we'll probably end up computing a # maximum length that is only half of the actual maximum length, but # we can't tell. SHELL=${SHELL-${CONFIG_SHELL-/bin/sh}} while (test "X"`$SHELL $0 --fallback-echo "X$teststring" 2>/dev/null` \ = "XX$teststring") >/dev/null 2>&1 && new_result=`expr "X$teststring" : ".*" 2>&1` && lt_cv_sys_max_cmd_len=$new_result && test $i != 17 # 1/2 MB should be enough do i=`expr $i + 1` teststring=$teststring$teststring done teststring= # Add a significant safety factor because C++ compilers can tack on massive # amounts of additional arguments before passing them to the linker. # It appears as though 1/2 is a usable value. lt_cv_sys_max_cmd_len=`expr $lt_cv_sys_max_cmd_len \/ 2` ;; esac fi if test -n $lt_cv_sys_max_cmd_len ; then echo "$as_me:$LINENO: result: $lt_cv_sys_max_cmd_len" >&5 echo "${ECHO_T}$lt_cv_sys_max_cmd_len" >&6 else echo "$as_me:$LINENO: result: none" >&5 echo "${ECHO_T}none" >&6 fi # Check for command to grab the raw symbol name followed by C symbol from nm. echo "$as_me:$LINENO: checking command to parse $NM output from $compiler object" >&5 echo $ECHO_N "checking command to parse $NM output from $compiler object... $ECHO_C" >&6 if test "${lt_cv_sys_global_symbol_pipe+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else # These are sane defaults that work on at least a few old systems. # [They come from Ultrix. What could be older than Ultrix?!! ;)] # Character class describing NM global symbol codes. symcode='[BCDEGRST]' # Regexp to match symbols that can be accessed directly from C. sympat='\([_A-Za-z][_A-Za-z0-9]*\)' # Transform an extracted symbol line into a proper C declaration lt_cv_sys_global_symbol_to_cdecl="sed -n -e 's/^. .* \(.*\)$/extern int \1;/p'" # Transform an extracted symbol line into symbol name and symbol address lt_cv_sys_global_symbol_to_c_name_address="sed -n -e 's/^: \([^ ]*\) $/ {\\\"\1\\\", (lt_ptr) 0},/p' -e 's/^$symcode \([^ ]*\) \([^ ]*\)$/ {\"\2\", (lt_ptr) \&\2},/p'" # Define system-specific variables. case $host_os in aix*) symcode='[BCDT]' ;; cygwin* | mingw* | pw32*) symcode='[ABCDGISTW]' ;; hpux*) # Its linker distinguishes data from code symbols if test "$host_cpu" = ia64; then symcode='[ABCDEGRST]' fi lt_cv_sys_global_symbol_to_cdecl="sed -n -e 's/^T .* \(.*\)$/extern int \1();/p' -e 's/^$symcode* .* \(.*\)$/extern char \1;/p'" lt_cv_sys_global_symbol_to_c_name_address="sed -n -e 's/^: \([^ ]*\) $/ {\\\"\1\\\", (lt_ptr) 0},/p' -e 's/^$symcode* \([^ ]*\) \([^ ]*\)$/ {\"\2\", (lt_ptr) \&\2},/p'" ;; linux*) if test "$host_cpu" = ia64; then symcode='[ABCDGIRSTW]' lt_cv_sys_global_symbol_to_cdecl="sed -n -e 's/^T .* \(.*\)$/extern int \1();/p' -e 's/^$symcode* .* \(.*\)$/extern char \1;/p'" lt_cv_sys_global_symbol_to_c_name_address="sed -n -e 's/^: \([^ ]*\) $/ {\\\"\1\\\", (lt_ptr) 0},/p' -e 's/^$symcode* \([^ ]*\) \([^ ]*\)$/ {\"\2\", (lt_ptr) \&\2},/p'" fi ;; irix* | nonstopux*) symcode='[BCDEGRST]' ;; osf*) symcode='[BCDEGQRST]' ;; solaris*) symcode='[BDRT]' ;; sco3.2v5*) symcode='[DT]' ;; sysv4.2uw2*) symcode='[DT]' ;; sysv5* | sco5v6* | unixware* | OpenUNIX*) symcode='[ABDT]' ;; sysv4) symcode='[DFNSTU]' ;; esac # Handle CRLF in mingw tool chain opt_cr= case $build_os in mingw*) opt_cr=`echo 'x\{0,1\}' | tr x '\015'` # option cr in regexp ;; esac # If we're using GNU nm, then use its standard symbol codes. case `$NM -V 2>&1` in *GNU* | *'with BFD'*) symcode='[ABCDGIRSTW]' ;; esac # Try without a prefix undercore, then with it. for ac_symprfx in "" "_"; do # Transform symcode, sympat, and symprfx into a raw symbol and a C symbol. symxfrm="\\1 $ac_symprfx\\2 \\2" # Write the raw and C identifiers. lt_cv_sys_global_symbol_pipe="sed -n -e 's/^.*[ ]\($symcode$symcode*\)[ ][ ]*$ac_symprfx$sympat$opt_cr$/$symxfrm/p'" # Check to see that the pipe works correctly. pipe_works=no rm -f conftest* cat > conftest.$ac_ext <&5 (eval $ac_compile) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; then # Now try to grab the symbols. nlist=conftest.nm if { (eval echo "$as_me:$LINENO: \"$NM conftest.$ac_objext \| $lt_cv_sys_global_symbol_pipe \> $nlist\"") >&5 (eval $NM conftest.$ac_objext \| $lt_cv_sys_global_symbol_pipe \> $nlist) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && test -s "$nlist"; then # Try sorting and uniquifying the output. if sort "$nlist" | uniq > "$nlist"T; then mv -f "$nlist"T "$nlist" else rm -f "$nlist"T fi # Make sure that we snagged all the symbols we need. if grep ' nm_test_var$' "$nlist" >/dev/null; then if grep ' nm_test_func$' "$nlist" >/dev/null; then cat < conftest.$ac_ext #ifdef __cplusplus extern "C" { #endif EOF # Now generate the symbol file. eval "$lt_cv_sys_global_symbol_to_cdecl"' < "$nlist" | grep -v main >> conftest.$ac_ext' cat <> conftest.$ac_ext #if defined (__STDC__) && __STDC__ # define lt_ptr_t void * #else # define lt_ptr_t char * # define const #endif /* The mapping between symbol names and symbols. */ const struct { const char *name; lt_ptr_t address; } lt_preloaded_symbols[] = { EOF $SED "s/^$symcode$symcode* \(.*\) \(.*\)$/ {\"\2\", (lt_ptr_t) \&\2},/" < "$nlist" | grep -v main >> conftest.$ac_ext cat <<\EOF >> conftest.$ac_ext {0, (lt_ptr_t) 0} }; #ifdef __cplusplus } #endif EOF # Now try linking the two files. mv conftest.$ac_objext conftstm.$ac_objext lt_save_LIBS="$LIBS" lt_save_CFLAGS="$CFLAGS" LIBS="conftstm.$ac_objext" CFLAGS="$CFLAGS$lt_prog_compiler_no_builtin_flag" if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && test -s conftest${ac_exeext}; then pipe_works=yes fi LIBS="$lt_save_LIBS" CFLAGS="$lt_save_CFLAGS" else echo "cannot find nm_test_func in $nlist" >&5 fi else echo "cannot find nm_test_var in $nlist" >&5 fi else echo "cannot run $lt_cv_sys_global_symbol_pipe" >&5 fi else echo "$progname: failed program was:" >&5 cat conftest.$ac_ext >&5 fi rm -f conftest* conftst* # Do not use the global_symbol_pipe unless it works. if test "$pipe_works" = yes; then break else lt_cv_sys_global_symbol_pipe= fi done fi if test -z "$lt_cv_sys_global_symbol_pipe"; then lt_cv_sys_global_symbol_to_cdecl= fi if test -z "$lt_cv_sys_global_symbol_pipe$lt_cv_sys_global_symbol_to_cdecl"; then echo "$as_me:$LINENO: result: failed" >&5 echo "${ECHO_T}failed" >&6 else echo "$as_me:$LINENO: result: ok" >&5 echo "${ECHO_T}ok" >&6 fi echo "$as_me:$LINENO: checking for objdir" >&5 echo $ECHO_N "checking for objdir... $ECHO_C" >&6 if test "${lt_cv_objdir+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else rm -f .libs 2>/dev/null mkdir .libs 2>/dev/null if test -d .libs; then lt_cv_objdir=.libs else # MS-DOS does not allow filenames that begin with a dot. lt_cv_objdir=_libs fi rmdir .libs 2>/dev/null fi echo "$as_me:$LINENO: result: $lt_cv_objdir" >&5 echo "${ECHO_T}$lt_cv_objdir" >&6 objdir=$lt_cv_objdir case $host_os in aix3*) # AIX sometimes has problems with the GCC collect2 program. For some # reason, if we set the COLLECT_NAMES environment variable, the problems # vanish in a puff of smoke. if test "X${COLLECT_NAMES+set}" != Xset; then COLLECT_NAMES= export COLLECT_NAMES fi ;; esac # Sed substitution that helps us do robust quoting. It backslashifies # metacharacters that are still active within double-quoted strings. Xsed='sed -e 1s/^X//' sed_quote_subst='s/\([\\"\\`$\\\\]\)/\\\1/g' # Same as above, but do not quote variable references. double_quote_subst='s/\([\\"\\`\\\\]\)/\\\1/g' # Sed substitution to delay expansion of an escaped shell variable in a # double_quote_subst'ed string. delay_variable_subst='s/\\\\\\\\\\\$/\\\\\\$/g' # Sed substitution to avoid accidental globbing in evaled expressions no_glob_subst='s/\*/\\\*/g' # Constants: rm="rm -f" # Global variables: default_ofile=libtool can_build_shared=yes # All known linkers require a `.a' archive for static linking (except MSVC, # which needs '.lib'). libext=a ltmain="$ac_aux_dir/ltmain.sh" ofile="$default_ofile" with_gnu_ld="$lt_cv_prog_gnu_ld" if test -n "$ac_tool_prefix"; then # Extract the first word of "${ac_tool_prefix}ar", so it can be a program name with args. set dummy ${ac_tool_prefix}ar; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_AR+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$AR"; then ac_cv_prog_AR="$AR" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_AR="${ac_tool_prefix}ar" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi AR=$ac_cv_prog_AR if test -n "$AR"; then echo "$as_me:$LINENO: result: $AR" >&5 echo "${ECHO_T}$AR" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi fi if test -z "$ac_cv_prog_AR"; then ac_ct_AR=$AR # Extract the first word of "ar", so it can be a program name with args. set dummy ar; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_ac_ct_AR+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$ac_ct_AR"; then ac_cv_prog_ac_ct_AR="$ac_ct_AR" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_AR="ar" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done test -z "$ac_cv_prog_ac_ct_AR" && ac_cv_prog_ac_ct_AR="false" fi fi ac_ct_AR=$ac_cv_prog_ac_ct_AR if test -n "$ac_ct_AR"; then echo "$as_me:$LINENO: result: $ac_ct_AR" >&5 echo "${ECHO_T}$ac_ct_AR" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi AR=$ac_ct_AR else AR="$ac_cv_prog_AR" fi if test -n "$ac_tool_prefix"; then # Extract the first word of "${ac_tool_prefix}ranlib", so it can be a program name with args. set dummy ${ac_tool_prefix}ranlib; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_RANLIB+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$RANLIB"; then ac_cv_prog_RANLIB="$RANLIB" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_RANLIB="${ac_tool_prefix}ranlib" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi RANLIB=$ac_cv_prog_RANLIB if test -n "$RANLIB"; then echo "$as_me:$LINENO: result: $RANLIB" >&5 echo "${ECHO_T}$RANLIB" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi fi if test -z "$ac_cv_prog_RANLIB"; then ac_ct_RANLIB=$RANLIB # Extract the first word of "ranlib", so it can be a program name with args. set dummy ranlib; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_ac_ct_RANLIB+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$ac_ct_RANLIB"; then ac_cv_prog_ac_ct_RANLIB="$ac_ct_RANLIB" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_RANLIB="ranlib" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done test -z "$ac_cv_prog_ac_ct_RANLIB" && ac_cv_prog_ac_ct_RANLIB=":" fi fi ac_ct_RANLIB=$ac_cv_prog_ac_ct_RANLIB if test -n "$ac_ct_RANLIB"; then echo "$as_me:$LINENO: result: $ac_ct_RANLIB" >&5 echo "${ECHO_T}$ac_ct_RANLIB" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi RANLIB=$ac_ct_RANLIB else RANLIB="$ac_cv_prog_RANLIB" fi if test -n "$ac_tool_prefix"; then # Extract the first word of "${ac_tool_prefix}strip", so it can be a program name with args. set dummy ${ac_tool_prefix}strip; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_STRIP+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$STRIP"; then ac_cv_prog_STRIP="$STRIP" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_STRIP="${ac_tool_prefix}strip" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi STRIP=$ac_cv_prog_STRIP if test -n "$STRIP"; then echo "$as_me:$LINENO: result: $STRIP" >&5 echo "${ECHO_T}$STRIP" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi fi if test -z "$ac_cv_prog_STRIP"; then ac_ct_STRIP=$STRIP # Extract the first word of "strip", so it can be a program name with args. set dummy strip; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_ac_ct_STRIP+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$ac_ct_STRIP"; then ac_cv_prog_ac_ct_STRIP="$ac_ct_STRIP" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_STRIP="strip" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done test -z "$ac_cv_prog_ac_ct_STRIP" && ac_cv_prog_ac_ct_STRIP=":" fi fi ac_ct_STRIP=$ac_cv_prog_ac_ct_STRIP if test -n "$ac_ct_STRIP"; then echo "$as_me:$LINENO: result: $ac_ct_STRIP" >&5 echo "${ECHO_T}$ac_ct_STRIP" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi STRIP=$ac_ct_STRIP else STRIP="$ac_cv_prog_STRIP" fi old_CC="$CC" old_CFLAGS="$CFLAGS" # Set sane defaults for various variables test -z "$AR" && AR=ar test -z "$AR_FLAGS" && AR_FLAGS=cru test -z "$AS" && AS=as test -z "$CC" && CC=cc test -z "$LTCC" && LTCC=$CC test -z "$LTCFLAGS" && LTCFLAGS=$CFLAGS test -z "$DLLTOOL" && DLLTOOL=dlltool test -z "$LD" && LD=ld test -z "$LN_S" && LN_S="ln -s" test -z "$MAGIC_CMD" && MAGIC_CMD=file test -z "$NM" && NM=nm test -z "$SED" && SED=sed test -z "$OBJDUMP" && OBJDUMP=objdump test -z "$RANLIB" && RANLIB=: test -z "$STRIP" && STRIP=: test -z "$ac_objext" && ac_objext=o # Determine commands to create old-style static archives. old_archive_cmds='$AR $AR_FLAGS $oldlib$oldobjs$old_deplibs' old_postinstall_cmds='chmod 644 $oldlib' old_postuninstall_cmds= if test -n "$RANLIB"; then case $host_os in openbsd*) old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB -t \$oldlib" ;; *) old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB \$oldlib" ;; esac old_archive_cmds="$old_archive_cmds~\$RANLIB \$oldlib" fi for cc_temp in $compiler""; do case $cc_temp in compile | *[\\/]compile | ccache | *[\\/]ccache ) ;; distcc | *[\\/]distcc | purify | *[\\/]purify ) ;; \-*) ;; *) break;; esac done cc_basename=`$echo "X$cc_temp" | $Xsed -e 's%.*/%%' -e "s%^$host_alias-%%"` # Only perform the check for file, if the check method requires it case $deplibs_check_method in file_magic*) if test "$file_magic_cmd" = '$MAGIC_CMD'; then echo "$as_me:$LINENO: checking for ${ac_tool_prefix}file" >&5 echo $ECHO_N "checking for ${ac_tool_prefix}file... $ECHO_C" >&6 if test "${lt_cv_path_MAGIC_CMD+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else case $MAGIC_CMD in [\\/*] | ?:[\\/]*) lt_cv_path_MAGIC_CMD="$MAGIC_CMD" # Let the user override the test with a path. ;; *) lt_save_MAGIC_CMD="$MAGIC_CMD" lt_save_ifs="$IFS"; IFS=$PATH_SEPARATOR ac_dummy="/usr/bin$PATH_SEPARATOR$PATH" for ac_dir in $ac_dummy; do IFS="$lt_save_ifs" test -z "$ac_dir" && ac_dir=. if test -f $ac_dir/${ac_tool_prefix}file; then lt_cv_path_MAGIC_CMD="$ac_dir/${ac_tool_prefix}file" if test -n "$file_magic_test_file"; then case $deplibs_check_method in "file_magic "*) file_magic_regex=`expr "$deplibs_check_method" : "file_magic \(.*\)"` MAGIC_CMD="$lt_cv_path_MAGIC_CMD" if eval $file_magic_cmd \$file_magic_test_file 2> /dev/null | $EGREP "$file_magic_regex" > /dev/null; then : else cat <&2 *** Warning: the command libtool uses to detect shared libraries, *** $file_magic_cmd, produces output that libtool cannot recognize. *** The result is that libtool may fail to recognize shared libraries *** as such. This will affect the creation of libtool libraries that *** depend on shared libraries, but programs linked with such libtool *** libraries will work regardless of this problem. Nevertheless, you *** may want to report the problem to your system manager and/or to *** bug-libtool@gnu.org EOF fi ;; esac fi break fi done IFS="$lt_save_ifs" MAGIC_CMD="$lt_save_MAGIC_CMD" ;; esac fi MAGIC_CMD="$lt_cv_path_MAGIC_CMD" if test -n "$MAGIC_CMD"; then echo "$as_me:$LINENO: result: $MAGIC_CMD" >&5 echo "${ECHO_T}$MAGIC_CMD" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi if test -z "$lt_cv_path_MAGIC_CMD"; then if test -n "$ac_tool_prefix"; then echo "$as_me:$LINENO: checking for file" >&5 echo $ECHO_N "checking for file... $ECHO_C" >&6 if test "${lt_cv_path_MAGIC_CMD+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else case $MAGIC_CMD in [\\/*] | ?:[\\/]*) lt_cv_path_MAGIC_CMD="$MAGIC_CMD" # Let the user override the test with a path. ;; *) lt_save_MAGIC_CMD="$MAGIC_CMD" lt_save_ifs="$IFS"; IFS=$PATH_SEPARATOR ac_dummy="/usr/bin$PATH_SEPARATOR$PATH" for ac_dir in $ac_dummy; do IFS="$lt_save_ifs" test -z "$ac_dir" && ac_dir=. if test -f $ac_dir/file; then lt_cv_path_MAGIC_CMD="$ac_dir/file" if test -n "$file_magic_test_file"; then case $deplibs_check_method in "file_magic "*) file_magic_regex=`expr "$deplibs_check_method" : "file_magic \(.*\)"` MAGIC_CMD="$lt_cv_path_MAGIC_CMD" if eval $file_magic_cmd \$file_magic_test_file 2> /dev/null | $EGREP "$file_magic_regex" > /dev/null; then : else cat <&2 *** Warning: the command libtool uses to detect shared libraries, *** $file_magic_cmd, produces output that libtool cannot recognize. *** The result is that libtool may fail to recognize shared libraries *** as such. This will affect the creation of libtool libraries that *** depend on shared libraries, but programs linked with such libtool *** libraries will work regardless of this problem. Nevertheless, you *** may want to report the problem to your system manager and/or to *** bug-libtool@gnu.org EOF fi ;; esac fi break fi done IFS="$lt_save_ifs" MAGIC_CMD="$lt_save_MAGIC_CMD" ;; esac fi MAGIC_CMD="$lt_cv_path_MAGIC_CMD" if test -n "$MAGIC_CMD"; then echo "$as_me:$LINENO: result: $MAGIC_CMD" >&5 echo "${ECHO_T}$MAGIC_CMD" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi else MAGIC_CMD=: fi fi fi ;; esac enable_dlopen=no enable_win32_dll=yes # Check whether --enable-libtool-lock or --disable-libtool-lock was given. if test "${enable_libtool_lock+set}" = set; then enableval="$enable_libtool_lock" fi; test "x$enable_libtool_lock" != xno && enable_libtool_lock=yes # Check whether --with-pic or --without-pic was given. if test "${with_pic+set}" = set; then withval="$with_pic" pic_mode="$withval" else pic_mode=default fi; test -z "$pic_mode" && pic_mode=default # Use C for the default configuration in the libtool script tagname= lt_save_CC="$CC" ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu # Source file extension for C test sources. ac_ext=c # Object file extension for compiled C test sources. objext=o objext=$objext # Code to be used in simple compile tests lt_simple_compile_test_code="int some_variable = 0;\n" # Code to be used in simple link tests lt_simple_link_test_code='int main(){return(0);}\n' # If no C compiler was specified, use CC. LTCC=${LTCC-"$CC"} # If no C compiler flags were specified, use CFLAGS. LTCFLAGS=${LTCFLAGS-"$CFLAGS"} # Allow CC to be a program name with arguments. compiler=$CC # save warnings/boilerplate of simple test code ac_outfile=conftest.$ac_objext printf "$lt_simple_compile_test_code" >conftest.$ac_ext eval "$ac_compile" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err _lt_compiler_boilerplate=`cat conftest.err` $rm conftest* ac_outfile=conftest.$ac_objext printf "$lt_simple_link_test_code" >conftest.$ac_ext eval "$ac_link" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err _lt_linker_boilerplate=`cat conftest.err` $rm conftest* lt_prog_compiler_no_builtin_flag= if test "$GCC" = yes; then lt_prog_compiler_no_builtin_flag=' -fno-builtin' echo "$as_me:$LINENO: checking if $compiler supports -fno-rtti -fno-exceptions" >&5 echo $ECHO_N "checking if $compiler supports -fno-rtti -fno-exceptions... $ECHO_C" >&6 if test "${lt_cv_prog_compiler_rtti_exceptions+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else lt_cv_prog_compiler_rtti_exceptions=no ac_outfile=conftest.$ac_objext printf "$lt_simple_compile_test_code" > conftest.$ac_ext lt_compiler_flag="-fno-rtti -fno-exceptions" # Insert the option either (1) after the last *FLAGS variable, or # (2) before a word containing "conftest.", or (3) at the end. # Note that $ac_compile itself does not contain backslashes and begins # with a dollar sign (not a hyphen), so the echo should work correctly. # The option is referenced via a variable to avoid confusing sed. lt_compile=`echo "$ac_compile" | $SED \ -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ -e 's:$: $lt_compiler_flag:'` (eval echo "\"\$as_me:7698: $lt_compile\"" >&5) (eval "$lt_compile" 2>conftest.err) ac_status=$? cat conftest.err >&5 echo "$as_me:7702: \$? = $ac_status" >&5 if (exit $ac_status) && test -s "$ac_outfile"; then # The compiler can only warn and ignore the option if not recognized # So say no if there are warnings other than the usual output. $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' >conftest.exp $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 if test ! -s conftest.er2 || diff conftest.exp conftest.er2 >/dev/null; then lt_cv_prog_compiler_rtti_exceptions=yes fi fi $rm conftest* fi echo "$as_me:$LINENO: result: $lt_cv_prog_compiler_rtti_exceptions" >&5 echo "${ECHO_T}$lt_cv_prog_compiler_rtti_exceptions" >&6 if test x"$lt_cv_prog_compiler_rtti_exceptions" = xyes; then lt_prog_compiler_no_builtin_flag="$lt_prog_compiler_no_builtin_flag -fno-rtti -fno-exceptions" else : fi fi lt_prog_compiler_wl= lt_prog_compiler_pic= lt_prog_compiler_static= echo "$as_me:$LINENO: checking for $compiler option to produce PIC" >&5 echo $ECHO_N "checking for $compiler option to produce PIC... $ECHO_C" >&6 if test "$GCC" = yes; then lt_prog_compiler_wl='-Wl,' lt_prog_compiler_static='-static' case $host_os in aix*) # All AIX code is PIC. if test "$host_cpu" = ia64; then # AIX 5 now supports IA64 processor lt_prog_compiler_static='-Bstatic' fi ;; amigaos*) # FIXME: we need at least 68020 code to build shared libraries, but # adding the `-m68020' flag to GCC prevents building anything better, # like `-m68040'. lt_prog_compiler_pic='-m68020 -resident32 -malways-restore-a4' ;; beos* | cygwin* | irix5* | irix6* | nonstopux* | osf3* | osf4* | osf5*) # PIC is the default for these OSes. ;; mingw* | pw32* | os2*) # This hack is so that the source file can tell whether it is being # built for inclusion in a dll (and should export symbols for example). lt_prog_compiler_pic='-DDLL_EXPORT' ;; darwin* | rhapsody*) # PIC is the default on this platform # Common symbols not allowed in MH_DYLIB files lt_prog_compiler_pic='-fno-common' ;; interix3*) # Interix 3.x gcc -fpic/-fPIC options generate broken code. # Instead, we relocate shared libraries at runtime. ;; msdosdjgpp*) # Just because we use GCC doesn't mean we suddenly get shared libraries # on systems that don't support them. lt_prog_compiler_can_build_shared=no enable_shared=no ;; sysv4*MP*) if test -d /usr/nec; then lt_prog_compiler_pic=-Kconform_pic fi ;; hpux*) # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but # not for PA HP-UX. case $host_cpu in hppa*64*|ia64*) # +Z the default ;; *) lt_prog_compiler_pic='-fPIC' ;; esac ;; *) lt_prog_compiler_pic='-fPIC' ;; esac else # PORTME Check for flag to pass linker flags through the system compiler. case $host_os in aix*) lt_prog_compiler_wl='-Wl,' if test "$host_cpu" = ia64; then # AIX 5 now supports IA64 processor lt_prog_compiler_static='-Bstatic' else lt_prog_compiler_static='-bnso -bI:/lib/syscalls.exp' fi ;; darwin*) # PIC is the default on this platform # Common symbols not allowed in MH_DYLIB files case $cc_basename in xlc*) lt_prog_compiler_pic='-qnocommon' lt_prog_compiler_wl='-Wl,' ;; esac ;; mingw* | pw32* | os2*) # This hack is so that the source file can tell whether it is being # built for inclusion in a dll (and should export symbols for example). lt_prog_compiler_pic='-DDLL_EXPORT' ;; hpux9* | hpux10* | hpux11*) lt_prog_compiler_wl='-Wl,' # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but # not for PA HP-UX. case $host_cpu in hppa*64*|ia64*) # +Z the default ;; *) lt_prog_compiler_pic='+Z' ;; esac # Is there a better lt_prog_compiler_static that works with the bundled CC? lt_prog_compiler_static='${wl}-a ${wl}archive' ;; irix5* | irix6* | nonstopux*) lt_prog_compiler_wl='-Wl,' # PIC (with -KPIC) is the default. lt_prog_compiler_static='-non_shared' ;; newsos6) lt_prog_compiler_pic='-KPIC' lt_prog_compiler_static='-Bstatic' ;; linux*) case $cc_basename in icc* | ecc*) lt_prog_compiler_wl='-Wl,' lt_prog_compiler_pic='-KPIC' lt_prog_compiler_static='-static' ;; pgcc* | pgf77* | pgf90* | pgf95*) # Portland Group compilers (*not* the Pentium gcc compiler, # which looks to be a dead project) lt_prog_compiler_wl='-Wl,' lt_prog_compiler_pic='-fpic' lt_prog_compiler_static='-Bstatic' ;; ccc*) lt_prog_compiler_wl='-Wl,' # All Alpha code is PIC. lt_prog_compiler_static='-non_shared' ;; esac ;; osf3* | osf4* | osf5*) lt_prog_compiler_wl='-Wl,' # All OSF/1 code is PIC. lt_prog_compiler_static='-non_shared' ;; solaris*) lt_prog_compiler_pic='-KPIC' lt_prog_compiler_static='-Bstatic' case $cc_basename in f77* | f90* | f95*) lt_prog_compiler_wl='-Qoption ld ';; *) lt_prog_compiler_wl='-Wl,';; esac ;; sunos4*) lt_prog_compiler_wl='-Qoption ld ' lt_prog_compiler_pic='-PIC' lt_prog_compiler_static='-Bstatic' ;; sysv4 | sysv4.2uw2* | sysv4.3*) lt_prog_compiler_wl='-Wl,' lt_prog_compiler_pic='-KPIC' lt_prog_compiler_static='-Bstatic' ;; sysv4*MP*) if test -d /usr/nec ;then lt_prog_compiler_pic='-Kconform_pic' lt_prog_compiler_static='-Bstatic' fi ;; sysv5* | unixware* | sco3.2v5* | sco5v6* | OpenUNIX*) lt_prog_compiler_wl='-Wl,' lt_prog_compiler_pic='-KPIC' lt_prog_compiler_static='-Bstatic' ;; unicos*) lt_prog_compiler_wl='-Wl,' lt_prog_compiler_can_build_shared=no ;; uts4*) lt_prog_compiler_pic='-pic' lt_prog_compiler_static='-Bstatic' ;; *) lt_prog_compiler_can_build_shared=no ;; esac fi echo "$as_me:$LINENO: result: $lt_prog_compiler_pic" >&5 echo "${ECHO_T}$lt_prog_compiler_pic" >&6 # # Check to make sure the PIC flag actually works. # if test -n "$lt_prog_compiler_pic"; then echo "$as_me:$LINENO: checking if $compiler PIC flag $lt_prog_compiler_pic works" >&5 echo $ECHO_N "checking if $compiler PIC flag $lt_prog_compiler_pic works... $ECHO_C" >&6 if test "${lt_prog_compiler_pic_works+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else lt_prog_compiler_pic_works=no ac_outfile=conftest.$ac_objext printf "$lt_simple_compile_test_code" > conftest.$ac_ext lt_compiler_flag="$lt_prog_compiler_pic -DPIC" # Insert the option either (1) after the last *FLAGS variable, or # (2) before a word containing "conftest.", or (3) at the end. # Note that $ac_compile itself does not contain backslashes and begins # with a dollar sign (not a hyphen), so the echo should work correctly. # The option is referenced via a variable to avoid confusing sed. lt_compile=`echo "$ac_compile" | $SED \ -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ -e 's:$: $lt_compiler_flag:'` (eval echo "\"\$as_me:7966: $lt_compile\"" >&5) (eval "$lt_compile" 2>conftest.err) ac_status=$? cat conftest.err >&5 echo "$as_me:7970: \$? = $ac_status" >&5 if (exit $ac_status) && test -s "$ac_outfile"; then # The compiler can only warn and ignore the option if not recognized # So say no if there are warnings other than the usual output. $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' >conftest.exp $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 if test ! -s conftest.er2 || diff conftest.exp conftest.er2 >/dev/null; then lt_prog_compiler_pic_works=yes fi fi $rm conftest* fi echo "$as_me:$LINENO: result: $lt_prog_compiler_pic_works" >&5 echo "${ECHO_T}$lt_prog_compiler_pic_works" >&6 if test x"$lt_prog_compiler_pic_works" = xyes; then case $lt_prog_compiler_pic in "" | " "*) ;; *) lt_prog_compiler_pic=" $lt_prog_compiler_pic" ;; esac else lt_prog_compiler_pic= lt_prog_compiler_can_build_shared=no fi fi case $host_os in # For platforms which do not support PIC, -DPIC is meaningless: *djgpp*) lt_prog_compiler_pic= ;; *) lt_prog_compiler_pic="$lt_prog_compiler_pic -DPIC" ;; esac # # Check to make sure the static flag actually works. # wl=$lt_prog_compiler_wl eval lt_tmp_static_flag=\"$lt_prog_compiler_static\" echo "$as_me:$LINENO: checking if $compiler static flag $lt_tmp_static_flag works" >&5 echo $ECHO_N "checking if $compiler static flag $lt_tmp_static_flag works... $ECHO_C" >&6 if test "${lt_prog_compiler_static_works+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else lt_prog_compiler_static_works=no save_LDFLAGS="$LDFLAGS" LDFLAGS="$LDFLAGS $lt_tmp_static_flag" printf "$lt_simple_link_test_code" > conftest.$ac_ext if (eval $ac_link 2>conftest.err) && test -s conftest$ac_exeext; then # The linker can only warn and ignore the option if not recognized # So say no if there are warnings if test -s conftest.err; then # Append any errors to the config.log. cat conftest.err 1>&5 $echo "X$_lt_linker_boilerplate" | $Xsed -e '/^$/d' > conftest.exp $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 if diff conftest.exp conftest.er2 >/dev/null; then lt_prog_compiler_static_works=yes fi else lt_prog_compiler_static_works=yes fi fi $rm conftest* LDFLAGS="$save_LDFLAGS" fi echo "$as_me:$LINENO: result: $lt_prog_compiler_static_works" >&5 echo "${ECHO_T}$lt_prog_compiler_static_works" >&6 if test x"$lt_prog_compiler_static_works" = xyes; then : else lt_prog_compiler_static= fi echo "$as_me:$LINENO: checking if $compiler supports -c -o file.$ac_objext" >&5 echo $ECHO_N "checking if $compiler supports -c -o file.$ac_objext... $ECHO_C" >&6 if test "${lt_cv_prog_compiler_c_o+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else lt_cv_prog_compiler_c_o=no $rm -r conftest 2>/dev/null mkdir conftest cd conftest mkdir out printf "$lt_simple_compile_test_code" > conftest.$ac_ext lt_compiler_flag="-o out/conftest2.$ac_objext" # Insert the option either (1) after the last *FLAGS variable, or # (2) before a word containing "conftest.", or (3) at the end. # Note that $ac_compile itself does not contain backslashes and begins # with a dollar sign (not a hyphen), so the echo should work correctly. lt_compile=`echo "$ac_compile" | $SED \ -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ -e 's:$: $lt_compiler_flag:'` (eval echo "\"\$as_me:8070: $lt_compile\"" >&5) (eval "$lt_compile" 2>out/conftest.err) ac_status=$? cat out/conftest.err >&5 echo "$as_me:8074: \$? = $ac_status" >&5 if (exit $ac_status) && test -s out/conftest2.$ac_objext then # The compiler can only warn and ignore the option if not recognized # So say no if there are warnings $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' > out/conftest.exp $SED '/^$/d; /^ *+/d' out/conftest.err >out/conftest.er2 if test ! -s out/conftest.er2 || diff out/conftest.exp out/conftest.er2 >/dev/null; then lt_cv_prog_compiler_c_o=yes fi fi chmod u+w . 2>&5 $rm conftest* # SGI C++ compiler will create directory out/ii_files/ for # template instantiation test -d out/ii_files && $rm out/ii_files/* && rmdir out/ii_files $rm out/* && rmdir out cd .. rmdir conftest $rm conftest* fi echo "$as_me:$LINENO: result: $lt_cv_prog_compiler_c_o" >&5 echo "${ECHO_T}$lt_cv_prog_compiler_c_o" >&6 hard_links="nottested" if test "$lt_cv_prog_compiler_c_o" = no && test "$need_locks" != no; then # do not overwrite the value of need_locks provided by the user echo "$as_me:$LINENO: checking if we can lock with hard links" >&5 echo $ECHO_N "checking if we can lock with hard links... $ECHO_C" >&6 hard_links=yes $rm conftest* ln conftest.a conftest.b 2>/dev/null && hard_links=no touch conftest.a ln conftest.a conftest.b 2>&5 || hard_links=no ln conftest.a conftest.b 2>/dev/null && hard_links=no echo "$as_me:$LINENO: result: $hard_links" >&5 echo "${ECHO_T}$hard_links" >&6 if test "$hard_links" = no; then { echo "$as_me:$LINENO: WARNING: \`$CC' does not support \`-c -o', so \`make -j' may be unsafe" >&5 echo "$as_me: WARNING: \`$CC' does not support \`-c -o', so \`make -j' may be unsafe" >&2;} need_locks=warn fi else need_locks=no fi echo "$as_me:$LINENO: checking whether the $compiler linker ($LD) supports shared libraries" >&5 echo $ECHO_N "checking whether the $compiler linker ($LD) supports shared libraries... $ECHO_C" >&6 runpath_var= allow_undefined_flag= enable_shared_with_static_runtimes=no archive_cmds= archive_expsym_cmds= old_archive_From_new_cmds= old_archive_from_expsyms_cmds= export_dynamic_flag_spec= whole_archive_flag_spec= thread_safe_flag_spec= hardcode_libdir_flag_spec= hardcode_libdir_flag_spec_ld= hardcode_libdir_separator= hardcode_direct=no hardcode_minus_L=no hardcode_shlibpath_var=unsupported link_all_deplibs=unknown hardcode_automatic=no module_cmds= module_expsym_cmds= always_export_symbols=no export_symbols_cmds='$NM $libobjs $convenience | $global_symbol_pipe | $SED '\''s/.* //'\'' | sort | uniq > $export_symbols' # include_expsyms should be a list of space-separated symbols to be *always* # included in the symbol list include_expsyms= # exclude_expsyms can be an extended regexp of symbols to exclude # it will be wrapped by ` (' and `)$', so one must not match beginning or # end of line. Example: `a|bc|.*d.*' will exclude the symbols `a' and `bc', # as well as any symbol that contains `d'. exclude_expsyms="_GLOBAL_OFFSET_TABLE_" # Although _GLOBAL_OFFSET_TABLE_ is a valid symbol C name, most a.out # platforms (ab)use it in PIC code, but their linkers get confused if # the symbol is explicitly referenced. Since portable code cannot # rely on this symbol name, it's probably fine to never include it in # preloaded symbol tables. extract_expsyms_cmds= # Just being paranoid about ensuring that cc_basename is set. for cc_temp in $compiler""; do case $cc_temp in compile | *[\\/]compile | ccache | *[\\/]ccache ) ;; distcc | *[\\/]distcc | purify | *[\\/]purify ) ;; \-*) ;; *) break;; esac done cc_basename=`$echo "X$cc_temp" | $Xsed -e 's%.*/%%' -e "s%^$host_alias-%%"` case $host_os in cygwin* | mingw* | pw32*) # FIXME: the MSVC++ port hasn't been tested in a loooong time # When not using gcc, we currently assume that we are using # Microsoft Visual C++. if test "$GCC" != yes; then with_gnu_ld=no fi ;; interix*) # we just hope/assume this is gcc and not c89 (= MSVC++) with_gnu_ld=yes ;; openbsd*) with_gnu_ld=no ;; esac ld_shlibs=yes if test "$with_gnu_ld" = yes; then # If archive_cmds runs LD, not CC, wlarc should be empty wlarc='${wl}' # Set some defaults for GNU ld with shared library support. These # are reset later if shared libraries are not supported. Putting them # here allows them to be overridden if necessary. runpath_var=LD_RUN_PATH hardcode_libdir_flag_spec='${wl}--rpath ${wl}$libdir' export_dynamic_flag_spec='${wl}--export-dynamic' # ancient GNU ld didn't support --whole-archive et. al. if $LD --help 2>&1 | grep 'no-whole-archive' > /dev/null; then whole_archive_flag_spec="$wlarc"'--whole-archive$convenience '"$wlarc"'--no-whole-archive' else whole_archive_flag_spec= fi supports_anon_versioning=no case `$LD -v 2>/dev/null` in *\ [01].* | *\ 2.[0-9].* | *\ 2.10.*) ;; # catch versions < 2.11 *\ 2.11.93.0.2\ *) supports_anon_versioning=yes ;; # RH7.3 ... *\ 2.11.92.0.12\ *) supports_anon_versioning=yes ;; # Mandrake 8.2 ... *\ 2.11.*) ;; # other 2.11 versions *) supports_anon_versioning=yes ;; esac # See if GNU ld supports shared libraries. case $host_os in aix3* | aix4* | aix5*) # On AIX/PPC, the GNU linker is very broken if test "$host_cpu" != ia64; then ld_shlibs=no cat <&2 *** Warning: the GNU linker, at least up to release 2.9.1, is reported *** to be unable to reliably create shared libraries on AIX. *** Therefore, libtool is disabling shared libraries support. If you *** really care for shared libraries, you may want to modify your PATH *** so that a non-GNU linker is found, and then restart. EOF fi ;; amigaos*) archive_cmds='$rm $output_objdir/a2ixlibrary.data~$echo "#define NAME $libname" > $output_objdir/a2ixlibrary.data~$echo "#define LIBRARY_ID 1" >> $output_objdir/a2ixlibrary.data~$echo "#define VERSION $major" >> $output_objdir/a2ixlibrary.data~$echo "#define REVISION $revision" >> $output_objdir/a2ixlibrary.data~$AR $AR_FLAGS $lib $libobjs~$RANLIB $lib~(cd $output_objdir && a2ixlibrary -32)' hardcode_libdir_flag_spec='-L$libdir' hardcode_minus_L=yes # Samuel A. Falvo II reports # that the semantics of dynamic libraries on AmigaOS, at least up # to version 4, is to share data among multiple programs linked # with the same dynamic library. Since this doesn't match the # behavior of shared libraries on other platforms, we can't use # them. ld_shlibs=no ;; beos*) if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then allow_undefined_flag=unsupported # Joseph Beckenbach says some releases of gcc # support --undefined. This deserves some investigation. FIXME archive_cmds='$CC -nostart $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' else ld_shlibs=no fi ;; cygwin* | mingw* | pw32*) # _LT_AC_TAGVAR(hardcode_libdir_flag_spec, ) is actually meaningless, # as there is no search path for DLLs. hardcode_libdir_flag_spec='-L$libdir' allow_undefined_flag=unsupported always_export_symbols=no enable_shared_with_static_runtimes=yes export_symbols_cmds='$NM $libobjs $convenience | $global_symbol_pipe | $SED -e '\''/^[BCDGRS] /s/.* \([^ ]*\)/\1 DATA/'\'' | $SED -e '\''/^[AITW] /s/.* //'\'' | sort | uniq > $export_symbols' if $LD --help 2>&1 | grep 'auto-import' > /dev/null; then archive_cmds='$CC -shared $libobjs $deplibs $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' # If the export-symbols file already is a .def file (1st line # is EXPORTS), use it as is; otherwise, prepend... archive_expsym_cmds='if test "x`$SED 1q $export_symbols`" = xEXPORTS; then cp $export_symbols $output_objdir/$soname.def; else echo EXPORTS > $output_objdir/$soname.def; cat $export_symbols >> $output_objdir/$soname.def; fi~ $CC -shared $output_objdir/$soname.def $libobjs $deplibs $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' else ld_shlibs=no fi ;; interix3*) hardcode_direct=no hardcode_shlibpath_var=no hardcode_libdir_flag_spec='${wl}-rpath,$libdir' export_dynamic_flag_spec='${wl}-E' # Hack: On Interix 3.x, we cannot compile PIC because of a broken gcc. # Instead, shared libraries are loaded at an image base (0x10000000 by # default) and relocated if they conflict, which is a slow very memory # consuming and fragmenting process. To avoid this, we pick a random, # 256 KiB-aligned image base between 0x50000000 and 0x6FFC0000 at link # time. Moving up from 0x10000000 also allows more sbrk(2) space. archive_cmds='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags ${wl}-h,$soname ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' archive_expsym_cmds='sed "s,^,_," $export_symbols >$output_objdir/$soname.expsym~$CC -shared $pic_flag $libobjs $deplibs $compiler_flags ${wl}-h,$soname ${wl}--retain-symbols-file,$output_objdir/$soname.expsym ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' ;; linux*) if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then tmp_addflag= case $cc_basename,$host_cpu in pgcc*) # Portland Group C compiler whole_archive_flag_spec='${wl}--whole-archive`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}--no-whole-archive' tmp_addflag=' $pic_flag' ;; pgf77* | pgf90* | pgf95*) # Portland Group f77 and f90 compilers whole_archive_flag_spec='${wl}--whole-archive`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}--no-whole-archive' tmp_addflag=' $pic_flag -Mnomain' ;; ecc*,ia64* | icc*,ia64*) # Intel C compiler on ia64 tmp_addflag=' -i_dynamic' ;; efc*,ia64* | ifort*,ia64*) # Intel Fortran compiler on ia64 tmp_addflag=' -i_dynamic -nofor_main' ;; ifc* | ifort*) # Intel Fortran compiler tmp_addflag=' -nofor_main' ;; esac archive_cmds='$CC -shared'"$tmp_addflag"' $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' if test $supports_anon_versioning = yes; then archive_expsym_cmds='$echo "{ global:" > $output_objdir/$libname.ver~ cat $export_symbols | sed -e "s/\(.*\)/\1;/" >> $output_objdir/$libname.ver~ $echo "local: *; };" >> $output_objdir/$libname.ver~ $CC -shared'"$tmp_addflag"' $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-version-script ${wl}$output_objdir/$libname.ver -o $lib' fi else ld_shlibs=no fi ;; netbsd*) if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then archive_cmds='$LD -Bshareable $libobjs $deplibs $linker_flags -o $lib' wlarc= else archive_cmds='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' archive_expsym_cmds='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' fi ;; solaris*) if $LD -v 2>&1 | grep 'BFD 2\.8' > /dev/null; then ld_shlibs=no cat <&2 *** Warning: The releases 2.8.* of the GNU linker cannot reliably *** create shared libraries on Solaris systems. Therefore, libtool *** is disabling shared libraries support. We urge you to upgrade GNU *** binutils to release 2.9.1 or newer. Another option is to modify *** your PATH or compiler configuration so that the native linker is *** used, and then restart. EOF elif $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then archive_cmds='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' archive_expsym_cmds='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' else ld_shlibs=no fi ;; sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX*) case `$LD -v 2>&1` in *\ [01].* | *\ 2.[0-9].* | *\ 2.1[0-5].*) ld_shlibs=no cat <<_LT_EOF 1>&2 *** Warning: Releases of the GNU linker prior to 2.16.91.0.3 can not *** reliably create shared libraries on SCO systems. Therefore, libtool *** is disabling shared libraries support. We urge you to upgrade GNU *** binutils to release 2.16.91.0.3 or newer. Another option is to modify *** your PATH or compiler configuration so that the native linker is *** used, and then restart. _LT_EOF ;; *) if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then hardcode_libdir_flag_spec='`test -z "$SCOABSPATH" && echo ${wl}-rpath,$libdir`' archive_cmds='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib' archive_expsym_cmds='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname,\${SCOABSPATH:+${install_libdir}/}$soname,-retain-symbols-file,$export_symbols -o $lib' else ld_shlibs=no fi ;; esac ;; sunos4*) archive_cmds='$LD -assert pure-text -Bshareable -o $lib $libobjs $deplibs $linker_flags' wlarc= hardcode_direct=yes hardcode_shlibpath_var=no ;; *) if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then archive_cmds='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' archive_expsym_cmds='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' else ld_shlibs=no fi ;; esac if test "$ld_shlibs" = no; then runpath_var= hardcode_libdir_flag_spec= export_dynamic_flag_spec= whole_archive_flag_spec= fi else # PORTME fill in a description of your system's linker (not GNU ld) case $host_os in aix3*) allow_undefined_flag=unsupported always_export_symbols=yes archive_expsym_cmds='$LD -o $output_objdir/$soname $libobjs $deplibs $linker_flags -bE:$export_symbols -T512 -H512 -bM:SRE~$AR $AR_FLAGS $lib $output_objdir/$soname' # Note: this linker hardcodes the directories in LIBPATH if there # are no directories specified by -L. hardcode_minus_L=yes if test "$GCC" = yes && test -z "$lt_prog_compiler_static"; then # Neither direct hardcoding nor static linking is supported with a # broken collect2. hardcode_direct=unsupported fi ;; aix4* | aix5*) if test "$host_cpu" = ia64; then # On IA64, the linker does run time linking by default, so we don't # have to do anything special. aix_use_runtimelinking=no exp_sym_flag='-Bexport' no_entry_flag="" else # If we're using GNU nm, then we don't want the "-C" option. # -C means demangle to AIX nm, but means don't demangle with GNU nm if $NM -V 2>&1 | grep 'GNU' > /dev/null; then export_symbols_cmds='$NM -Bpg $libobjs $convenience | awk '\''{ if (((\$2 == "T") || (\$2 == "D") || (\$2 == "B")) && (substr(\$3,1,1) != ".")) { print \$3 } }'\'' | sort -u > $export_symbols' else export_symbols_cmds='$NM -BCpg $libobjs $convenience | awk '\''{ if (((\$2 == "T") || (\$2 == "D") || (\$2 == "B")) && (substr(\$3,1,1) != ".")) { print \$3 } }'\'' | sort -u > $export_symbols' fi aix_use_runtimelinking=no # Test if we are trying to use run time linking or normal # AIX style linking. If -brtl is somewhere in LDFLAGS, we # need to do runtime linking. case $host_os in aix4.[23]|aix4.[23].*|aix5*) for ld_flag in $LDFLAGS; do if (test $ld_flag = "-brtl" || test $ld_flag = "-Wl,-brtl"); then aix_use_runtimelinking=yes break fi done ;; esac exp_sym_flag='-bexport' no_entry_flag='-bnoentry' fi # When large executables or shared objects are built, AIX ld can # have problems creating the table of contents. If linking a library # or program results in "error TOC overflow" add -mminimal-toc to # CXXFLAGS/CFLAGS for g++/gcc. In the cases where that is not # enough to fix the problem, add -Wl,-bbigtoc to LDFLAGS. archive_cmds='' hardcode_direct=yes hardcode_libdir_separator=':' link_all_deplibs=yes if test "$GCC" = yes; then case $host_os in aix4.[012]|aix4.[012].*) # We only want to do this on AIX 4.2 and lower, the check # below for broken collect2 doesn't work under 4.3+ collect2name=`${CC} -print-prog-name=collect2` if test -f "$collect2name" && \ strings "$collect2name" | grep resolve_lib_name >/dev/null then # We have reworked collect2 hardcode_direct=yes else # We have old collect2 hardcode_direct=unsupported # It fails to find uninstalled libraries when the uninstalled # path is not listed in the libpath. Setting hardcode_minus_L # to unsupported forces relinking hardcode_minus_L=yes hardcode_libdir_flag_spec='-L$libdir' hardcode_libdir_separator= fi ;; esac shared_flag='-shared' if test "$aix_use_runtimelinking" = yes; then shared_flag="$shared_flag "'${wl}-G' fi else # not using gcc if test "$host_cpu" = ia64; then # VisualAge C++, Version 5.5 for AIX 5L for IA-64, Beta 3 Release # chokes on -Wl,-G. The following line is correct: shared_flag='-G' else if test "$aix_use_runtimelinking" = yes; then shared_flag='${wl}-G' else shared_flag='${wl}-bM:SRE' fi fi fi # It seems that -bexpall does not export symbols beginning with # underscore (_), so it is better to generate a list of symbols to export. always_export_symbols=yes if test "$aix_use_runtimelinking" = yes; then # Warning - without using the other runtime loading flags (-brtl), # -berok will link without error, but may produce a broken library. allow_undefined_flag='-berok' # Determine the default libpath from the value encoded in an empty executable. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then aix_libpath=`dump -H conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } }'` # Check for a 64-bit object if we didn't find anything. if test -z "$aix_libpath"; then aix_libpath=`dump -HX64 conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } }'`; fi else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext if test -z "$aix_libpath"; then aix_libpath="/usr/lib:/lib"; fi hardcode_libdir_flag_spec='${wl}-blibpath:$libdir:'"$aix_libpath" archive_expsym_cmds="\$CC"' -o $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' $compiler_flags `if test "x${allow_undefined_flag}" != "x"; then echo "${wl}${allow_undefined_flag}"; else :; fi` '"\${wl}$exp_sym_flag:\$export_symbols $shared_flag" else if test "$host_cpu" = ia64; then hardcode_libdir_flag_spec='${wl}-R $libdir:/usr/lib:/lib' allow_undefined_flag="-z nodefs" archive_expsym_cmds="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' $compiler_flags ${wl}${allow_undefined_flag} '"\${wl}$exp_sym_flag:\$export_symbols" else # Determine the default libpath from the value encoded in an empty executable. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then aix_libpath=`dump -H conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } }'` # Check for a 64-bit object if we didn't find anything. if test -z "$aix_libpath"; then aix_libpath=`dump -HX64 conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } }'`; fi else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext if test -z "$aix_libpath"; then aix_libpath="/usr/lib:/lib"; fi hardcode_libdir_flag_spec='${wl}-blibpath:$libdir:'"$aix_libpath" # Warning - without using the other run time loading flags, # -berok will link without error, but may produce a broken library. no_undefined_flag=' ${wl}-bernotok' allow_undefined_flag=' ${wl}-berok' # Exported symbols can be pulled into shared objects from archives whole_archive_flag_spec='$convenience' archive_cmds_need_lc=yes # This is similar to how AIX traditionally builds its shared libraries. archive_expsym_cmds="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs ${wl}-bnoentry $compiler_flags ${wl}-bE:$export_symbols${allow_undefined_flag}~$AR $AR_FLAGS $output_objdir/$libname$release.a $output_objdir/$soname' fi fi ;; amigaos*) archive_cmds='$rm $output_objdir/a2ixlibrary.data~$echo "#define NAME $libname" > $output_objdir/a2ixlibrary.data~$echo "#define LIBRARY_ID 1" >> $output_objdir/a2ixlibrary.data~$echo "#define VERSION $major" >> $output_objdir/a2ixlibrary.data~$echo "#define REVISION $revision" >> $output_objdir/a2ixlibrary.data~$AR $AR_FLAGS $lib $libobjs~$RANLIB $lib~(cd $output_objdir && a2ixlibrary -32)' hardcode_libdir_flag_spec='-L$libdir' hardcode_minus_L=yes # see comment about different semantics on the GNU ld section ld_shlibs=no ;; bsdi[45]*) export_dynamic_flag_spec=-rdynamic ;; cygwin* | mingw* | pw32*) # When not using gcc, we currently assume that we are using # Microsoft Visual C++. # hardcode_libdir_flag_spec is actually meaningless, as there is # no search path for DLLs. hardcode_libdir_flag_spec=' ' allow_undefined_flag=unsupported # Tell ltmain to make .lib files, not .a files. libext=lib # Tell ltmain to make .dll files, not .so files. shrext_cmds=".dll" # FIXME: Setting linknames here is a bad hack. archive_cmds='$CC -o $lib $libobjs $compiler_flags `echo "$deplibs" | $SED -e '\''s/ -lc$//'\''` -link -dll~linknames=' # The linker will automatically build a .lib file if we build a DLL. old_archive_From_new_cmds='true' # FIXME: Should let the user specify the lib program. old_archive_cmds='lib /OUT:$oldlib$oldobjs$old_deplibs' fix_srcfile_path='`cygpath -w "$srcfile"`' enable_shared_with_static_runtimes=yes ;; darwin* | rhapsody*) case $host_os in rhapsody* | darwin1.[012]) allow_undefined_flag='${wl}-undefined ${wl}suppress' ;; *) # Darwin 1.3 on if test -z ${MACOSX_DEPLOYMENT_TARGET} ; then allow_undefined_flag='${wl}-flat_namespace ${wl}-undefined ${wl}suppress' else case ${MACOSX_DEPLOYMENT_TARGET} in 10.[012]) allow_undefined_flag='${wl}-flat_namespace ${wl}-undefined ${wl}suppress' ;; 10.*) allow_undefined_flag='${wl}-undefined ${wl}dynamic_lookup' ;; esac fi ;; esac archive_cmds_need_lc=no hardcode_direct=no hardcode_automatic=yes hardcode_shlibpath_var=unsupported whole_archive_flag_spec='' link_all_deplibs=yes if test "$GCC" = yes ; then output_verbose_link_cmd='echo' archive_cmds='$CC -dynamiclib $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags -install_name $rpath/$soname $verstring' module_cmds='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags' # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin lds archive_expsym_cmds='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -dynamiclib $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags -install_name $rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' module_expsym_cmds='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' else case $cc_basename in xlc*) output_verbose_link_cmd='echo' archive_cmds='$CC -qmkshrobj $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}`echo $rpath/$soname` $verstring' module_cmds='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags' # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin lds archive_expsym_cmds='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -qmkshrobj $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}$rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' module_expsym_cmds='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' ;; *) ld_shlibs=no ;; esac fi ;; dgux*) archive_cmds='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' hardcode_libdir_flag_spec='-L$libdir' hardcode_shlibpath_var=no ;; freebsd1*) ld_shlibs=no ;; # FreeBSD 2.2.[012] allows us to include c++rt0.o to get C++ constructor # support. Future versions do this automatically, but an explicit c++rt0.o # does not break anything, and helps significantly (at the cost of a little # extra space). freebsd2.2*) archive_cmds='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags /usr/lib/c++rt0.o' hardcode_libdir_flag_spec='-R$libdir' hardcode_direct=yes hardcode_shlibpath_var=no ;; # Unfortunately, older versions of FreeBSD 2 do not have this feature. freebsd2*) archive_cmds='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags' hardcode_direct=yes hardcode_minus_L=yes hardcode_shlibpath_var=no ;; # FreeBSD 3 and greater uses gcc -shared to do shared libraries. freebsd* | kfreebsd*-gnu | dragonfly*) archive_cmds='$CC -shared -o $lib $libobjs $deplibs $compiler_flags' hardcode_libdir_flag_spec='-R$libdir' hardcode_direct=yes hardcode_shlibpath_var=no ;; hpux9*) if test "$GCC" = yes; then archive_cmds='$rm $output_objdir/$soname~$CC -shared -fPIC ${wl}+b ${wl}$install_libdir -o $output_objdir/$soname $libobjs $deplibs $compiler_flags~test $output_objdir/$soname = $lib || mv $output_objdir/$soname $lib' else archive_cmds='$rm $output_objdir/$soname~$LD -b +b $install_libdir -o $output_objdir/$soname $libobjs $deplibs $linker_flags~test $output_objdir/$soname = $lib || mv $output_objdir/$soname $lib' fi hardcode_libdir_flag_spec='${wl}+b ${wl}$libdir' hardcode_libdir_separator=: hardcode_direct=yes # hardcode_minus_L: Not really in the search PATH, # but as the default location of the library. hardcode_minus_L=yes export_dynamic_flag_spec='${wl}-E' ;; hpux10*) if test "$GCC" = yes -a "$with_gnu_ld" = no; then archive_cmds='$CC -shared -fPIC ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags' else archive_cmds='$LD -b +h $soname +b $install_libdir -o $lib $libobjs $deplibs $linker_flags' fi if test "$with_gnu_ld" = no; then hardcode_libdir_flag_spec='${wl}+b ${wl}$libdir' hardcode_libdir_separator=: hardcode_direct=yes export_dynamic_flag_spec='${wl}-E' # hardcode_minus_L: Not really in the search PATH, # but as the default location of the library. hardcode_minus_L=yes fi ;; hpux11*) if test "$GCC" = yes -a "$with_gnu_ld" = no; then case $host_cpu in hppa*64*) archive_cmds='$CC -shared ${wl}+h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' ;; ia64*) archive_cmds='$CC -shared ${wl}+h ${wl}$soname ${wl}+nodefaultrpath -o $lib $libobjs $deplibs $compiler_flags' ;; *) archive_cmds='$CC -shared -fPIC ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags' ;; esac else case $host_cpu in hppa*64*) archive_cmds='$CC -b ${wl}+h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' ;; ia64*) archive_cmds='$CC -b ${wl}+h ${wl}$soname ${wl}+nodefaultrpath -o $lib $libobjs $deplibs $compiler_flags' ;; *) archive_cmds='$CC -b ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags' ;; esac fi if test "$with_gnu_ld" = no; then hardcode_libdir_flag_spec='${wl}+b ${wl}$libdir' hardcode_libdir_separator=: case $host_cpu in hppa*64*|ia64*) hardcode_libdir_flag_spec_ld='+b $libdir' hardcode_direct=no hardcode_shlibpath_var=no ;; *) hardcode_direct=yes export_dynamic_flag_spec='${wl}-E' # hardcode_minus_L: Not really in the search PATH, # but as the default location of the library. hardcode_minus_L=yes ;; esac fi ;; irix5* | irix6* | nonstopux*) if test "$GCC" = yes; then archive_cmds='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' else archive_cmds='$LD -shared $libobjs $deplibs $linker_flags -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib' hardcode_libdir_flag_spec_ld='-rpath $libdir' fi hardcode_libdir_flag_spec='${wl}-rpath ${wl}$libdir' hardcode_libdir_separator=: link_all_deplibs=yes ;; netbsd*) if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then archive_cmds='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags' # a.out else archive_cmds='$LD -shared -o $lib $libobjs $deplibs $linker_flags' # ELF fi hardcode_libdir_flag_spec='-R$libdir' hardcode_direct=yes hardcode_shlibpath_var=no ;; newsos6) archive_cmds='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' hardcode_direct=yes hardcode_libdir_flag_spec='${wl}-rpath ${wl}$libdir' hardcode_libdir_separator=: hardcode_shlibpath_var=no ;; openbsd*) hardcode_direct=yes hardcode_shlibpath_var=no if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`" || test "$host_os-$host_cpu" = "openbsd2.8-powerpc"; then archive_cmds='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags' archive_expsym_cmds='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-retain-symbols-file,$export_symbols' hardcode_libdir_flag_spec='${wl}-rpath,$libdir' export_dynamic_flag_spec='${wl}-E' else case $host_os in openbsd[01].* | openbsd2.[0-7] | openbsd2.[0-7].*) archive_cmds='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags' hardcode_libdir_flag_spec='-R$libdir' ;; *) archive_cmds='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags' hardcode_libdir_flag_spec='${wl}-rpath,$libdir' ;; esac fi ;; os2*) hardcode_libdir_flag_spec='-L$libdir' hardcode_minus_L=yes allow_undefined_flag=unsupported archive_cmds='$echo "LIBRARY $libname INITINSTANCE" > $output_objdir/$libname.def~$echo "DESCRIPTION \"$libname\"" >> $output_objdir/$libname.def~$echo DATA >> $output_objdir/$libname.def~$echo " SINGLE NONSHARED" >> $output_objdir/$libname.def~$echo EXPORTS >> $output_objdir/$libname.def~emxexp $libobjs >> $output_objdir/$libname.def~$CC -Zdll -Zcrtdll -o $lib $libobjs $deplibs $compiler_flags $output_objdir/$libname.def' old_archive_From_new_cmds='emximp -o $output_objdir/$libname.a $output_objdir/$libname.def' ;; osf3*) if test "$GCC" = yes; then allow_undefined_flag=' ${wl}-expect_unresolved ${wl}\*' archive_cmds='$CC -shared${allow_undefined_flag} $libobjs $deplibs $compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' else allow_undefined_flag=' -expect_unresolved \*' archive_cmds='$LD -shared${allow_undefined_flag} $libobjs $deplibs $linker_flags -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib' fi hardcode_libdir_flag_spec='${wl}-rpath ${wl}$libdir' hardcode_libdir_separator=: ;; osf4* | osf5*) # as osf3* with the addition of -msym flag if test "$GCC" = yes; then allow_undefined_flag=' ${wl}-expect_unresolved ${wl}\*' archive_cmds='$CC -shared${allow_undefined_flag} $libobjs $deplibs $compiler_flags ${wl}-msym ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' hardcode_libdir_flag_spec='${wl}-rpath ${wl}$libdir' else allow_undefined_flag=' -expect_unresolved \*' archive_cmds='$LD -shared${allow_undefined_flag} $libobjs $deplibs $linker_flags -msym -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib' archive_expsym_cmds='for i in `cat $export_symbols`; do printf "%s %s\\n" -exported_symbol "\$i" >> $lib.exp; done; echo "-hidden">> $lib.exp~ $LD -shared${allow_undefined_flag} -input $lib.exp $linker_flags $libobjs $deplibs -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib~$rm $lib.exp' # Both c and cxx compiler support -rpath directly hardcode_libdir_flag_spec='-rpath $libdir' fi hardcode_libdir_separator=: ;; solaris*) no_undefined_flag=' -z text' if test "$GCC" = yes; then wlarc='${wl}' archive_cmds='$CC -shared ${wl}-h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' archive_expsym_cmds='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~ $CC -shared ${wl}-M ${wl}$lib.exp ${wl}-h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags~$rm $lib.exp' else wlarc='' archive_cmds='$LD -G${allow_undefined_flag} -h $soname -o $lib $libobjs $deplibs $linker_flags' archive_expsym_cmds='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~ $LD -G${allow_undefined_flag} -M $lib.exp -h $soname -o $lib $libobjs $deplibs $linker_flags~$rm $lib.exp' fi hardcode_libdir_flag_spec='-R$libdir' hardcode_shlibpath_var=no case $host_os in solaris2.[0-5] | solaris2.[0-5].*) ;; *) # The compiler driver will combine linker options so we # cannot just pass the convience library names through # without $wl, iff we do not link with $LD. # Luckily, gcc supports the same syntax we need for Sun Studio. # Supported since Solaris 2.6 (maybe 2.5.1?) case $wlarc in '') whole_archive_flag_spec='-z allextract$convenience -z defaultextract' ;; *) whole_archive_flag_spec='${wl}-z ${wl}allextract`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}-z ${wl}defaultextract' ;; esac ;; esac link_all_deplibs=yes ;; sunos4*) if test "x$host_vendor" = xsequent; then # Use $CC to link under sequent, because it throws in some extra .o # files that make .init and .fini sections work. archive_cmds='$CC -G ${wl}-h $soname -o $lib $libobjs $deplibs $compiler_flags' else archive_cmds='$LD -assert pure-text -Bstatic -o $lib $libobjs $deplibs $linker_flags' fi hardcode_libdir_flag_spec='-L$libdir' hardcode_direct=yes hardcode_minus_L=yes hardcode_shlibpath_var=no ;; sysv4) case $host_vendor in sni) archive_cmds='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' hardcode_direct=yes # is this really true??? ;; siemens) ## LD is ld it makes a PLAMLIB ## CC just makes a GrossModule. archive_cmds='$LD -G -o $lib $libobjs $deplibs $linker_flags' reload_cmds='$CC -r -o $output$reload_objs' hardcode_direct=no ;; motorola) archive_cmds='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' hardcode_direct=no #Motorola manual says yes, but my tests say they lie ;; esac runpath_var='LD_RUN_PATH' hardcode_shlibpath_var=no ;; sysv4.3*) archive_cmds='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' hardcode_shlibpath_var=no export_dynamic_flag_spec='-Bexport' ;; sysv4*MP*) if test -d /usr/nec; then archive_cmds='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' hardcode_shlibpath_var=no runpath_var=LD_RUN_PATH hardcode_runpath_var=yes ld_shlibs=yes fi ;; sysv4*uw2* | sysv5OpenUNIX* | sysv5UnixWare7.[01].[10]* | unixware7*) no_undefined_flag='${wl}-z,text' archive_cmds_need_lc=no hardcode_shlibpath_var=no runpath_var='LD_RUN_PATH' if test "$GCC" = yes; then archive_cmds='$CC -shared ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' archive_expsym_cmds='$CC -shared ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' else archive_cmds='$CC -G ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' archive_expsym_cmds='$CC -G ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' fi ;; sysv5* | sco3.2v5* | sco5v6*) # Note: We can NOT use -z defs as we might desire, because we do not # link with -lc, and that would cause any symbols used from libc to # always be unresolved, which means just about no library would # ever link correctly. If we're not using GNU ld we use -z text # though, which does catch some bad symbols but isn't as heavy-handed # as -z defs. no_undefined_flag='${wl}-z,text' allow_undefined_flag='${wl}-z,nodefs' archive_cmds_need_lc=no hardcode_shlibpath_var=no hardcode_libdir_flag_spec='`test -z "$SCOABSPATH" && echo ${wl}-R,$libdir`' hardcode_libdir_separator=':' link_all_deplibs=yes export_dynamic_flag_spec='${wl}-Bexport' runpath_var='LD_RUN_PATH' if test "$GCC" = yes; then archive_cmds='$CC -shared ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' archive_expsym_cmds='$CC -shared ${wl}-Bexport:$export_symbols ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' else archive_cmds='$CC -G ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' archive_expsym_cmds='$CC -G ${wl}-Bexport:$export_symbols ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' fi ;; uts4*) archive_cmds='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' hardcode_libdir_flag_spec='-L$libdir' hardcode_shlibpath_var=no ;; *) ld_shlibs=no ;; esac fi echo "$as_me:$LINENO: result: $ld_shlibs" >&5 echo "${ECHO_T}$ld_shlibs" >&6 test "$ld_shlibs" = no && can_build_shared=no # # Do we need to explicitly link libc? # case "x$archive_cmds_need_lc" in x|xyes) # Assume -lc should be added archive_cmds_need_lc=yes if test "$enable_shared" = yes && test "$GCC" = yes; then case $archive_cmds in *'~'*) # FIXME: we may have to deal with multi-command sequences. ;; '$CC '*) # Test whether the compiler implicitly links with -lc since on some # systems, -lgcc has to come before -lc. If gcc already passes -lc # to ld, don't add -lc before -lgcc. echo "$as_me:$LINENO: checking whether -lc should be explicitly linked in" >&5 echo $ECHO_N "checking whether -lc should be explicitly linked in... $ECHO_C" >&6 $rm conftest* printf "$lt_simple_compile_test_code" > conftest.$ac_ext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } 2>conftest.err; then soname=conftest lib=conftest libobjs=conftest.$ac_objext deplibs= wl=$lt_prog_compiler_wl pic_flag=$lt_prog_compiler_pic compiler_flags=-v linker_flags=-v verstring= output_objdir=. libname=conftest lt_save_allow_undefined_flag=$allow_undefined_flag allow_undefined_flag= if { (eval echo "$as_me:$LINENO: \"$archive_cmds 2\>\&1 \| grep \" -lc \" \>/dev/null 2\>\&1\"") >&5 (eval $archive_cmds 2\>\&1 \| grep \" -lc \" \>/dev/null 2\>\&1) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } then archive_cmds_need_lc=no else archive_cmds_need_lc=yes fi allow_undefined_flag=$lt_save_allow_undefined_flag else cat conftest.err 1>&5 fi $rm conftest* echo "$as_me:$LINENO: result: $archive_cmds_need_lc" >&5 echo "${ECHO_T}$archive_cmds_need_lc" >&6 ;; esac fi ;; esac echo "$as_me:$LINENO: checking dynamic linker characteristics" >&5 echo $ECHO_N "checking dynamic linker characteristics... $ECHO_C" >&6 library_names_spec= libname_spec='lib$name' soname_spec= shrext_cmds=".so" postinstall_cmds= postuninstall_cmds= finish_cmds= finish_eval= shlibpath_var= shlibpath_overrides_runpath=unknown version_type=none dynamic_linker="$host_os ld.so" sys_lib_dlsearch_path_spec="/lib /usr/lib" if test "$GCC" = yes; then sys_lib_search_path_spec=`$CC -print-search-dirs | grep "^libraries:" | $SED -e "s/^libraries://" -e "s,=/,/,g"` if echo "$sys_lib_search_path_spec" | grep ';' >/dev/null ; then # if the path contains ";" then we assume it to be the separator # otherwise default to the standard path separator (i.e. ":") - it is # assumed that no part of a normal pathname contains ";" but that should # okay in the real world where ";" in dirpaths is itself problematic. sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e 's/;/ /g'` else sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e "s/$PATH_SEPARATOR/ /g"` fi else sys_lib_search_path_spec="/lib /usr/lib /usr/local/lib" fi need_lib_prefix=unknown hardcode_into_libs=no # when you set need_version to no, make sure it does not cause -set_version # flags to be left without arguments need_version=unknown case $host_os in aix3*) version_type=linux library_names_spec='${libname}${release}${shared_ext}$versuffix $libname.a' shlibpath_var=LIBPATH # AIX 3 has no versioning support, so we append a major version to the name. soname_spec='${libname}${release}${shared_ext}$major' ;; aix4* | aix5*) version_type=linux need_lib_prefix=no need_version=no hardcode_into_libs=yes if test "$host_cpu" = ia64; then # AIX 5 supports IA64 library_names_spec='${libname}${release}${shared_ext}$major ${libname}${release}${shared_ext}$versuffix $libname${shared_ext}' shlibpath_var=LD_LIBRARY_PATH else # With GCC up to 2.95.x, collect2 would create an import file # for dependence libraries. The import file would start with # the line `#! .'. This would cause the generated library to # depend on `.', always an invalid library. This was fixed in # development snapshots of GCC prior to 3.0. case $host_os in aix4 | aix4.[01] | aix4.[01].*) if { echo '#if __GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 97)' echo ' yes ' echo '#endif'; } | ${CC} -E - | grep yes > /dev/null; then : else can_build_shared=no fi ;; esac # AIX (on Power*) has no versioning support, so currently we can not hardcode correct # soname into executable. Probably we can add versioning support to # collect2, so additional links can be useful in future. if test "$aix_use_runtimelinking" = yes; then # If using run time linking (on AIX 4.2 or later) use lib.so # instead of lib.a to let people know that these are not # typical AIX shared libraries. library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' else # We preserve .a as extension for shared libraries through AIX4.2 # and later when we are not doing run time linking. library_names_spec='${libname}${release}.a $libname.a' soname_spec='${libname}${release}${shared_ext}$major' fi shlibpath_var=LIBPATH fi ;; amigaos*) library_names_spec='$libname.ixlibrary $libname.a' # Create ${libname}_ixlibrary.a entries in /sys/libs. finish_eval='for lib in `ls $libdir/*.ixlibrary 2>/dev/null`; do libname=`$echo "X$lib" | $Xsed -e '\''s%^.*/\([^/]*\)\.ixlibrary$%\1%'\''`; test $rm /sys/libs/${libname}_ixlibrary.a; $show "cd /sys/libs && $LN_S $lib ${libname}_ixlibrary.a"; cd /sys/libs && $LN_S $lib ${libname}_ixlibrary.a || exit 1; done' ;; beos*) library_names_spec='${libname}${shared_ext}' dynamic_linker="$host_os ld.so" shlibpath_var=LIBRARY_PATH ;; bsdi[45]*) version_type=linux need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' finish_cmds='PATH="\$PATH:/sbin" ldconfig $libdir' shlibpath_var=LD_LIBRARY_PATH sys_lib_search_path_spec="/shlib /usr/lib /usr/X11/lib /usr/contrib/lib /lib /usr/local/lib" sys_lib_dlsearch_path_spec="/shlib /usr/lib /usr/local/lib" # the default ld.so.conf also contains /usr/contrib/lib and # /usr/X11R6/lib (/usr/X11 is a link to /usr/X11R6), but let us allow # libtool to hard-code these into programs ;; cygwin* | mingw* | pw32*) version_type=windows shrext_cmds=".dll" need_version=no need_lib_prefix=no case $GCC,$host_os in yes,cygwin* | yes,mingw* | yes,pw32*) library_names_spec='$libname.dll.a' # DLL is installed to $(libdir)/../bin by postinstall_cmds postinstall_cmds='base_file=`basename \${file}`~ dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i;echo \$dlname'\''`~ dldir=$destdir/`dirname \$dlpath`~ test -d \$dldir || mkdir -p \$dldir~ $install_prog $dir/$dlname \$dldir/$dlname~ chmod a+x \$dldir/$dlname' postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~ dlpath=$dir/\$dldll~ $rm \$dlpath' shlibpath_overrides_runpath=yes case $host_os in cygwin*) # Cygwin DLLs use 'cyg' prefix rather than 'lib' soname_spec='`echo ${libname} | sed -e 's/^lib/cyg/'``echo ${release} | $SED -e 's/[.]/-/g'`${versuffix}${shared_ext}' sys_lib_search_path_spec="/usr/lib /lib/w32api /lib /usr/local/lib" ;; mingw*) # MinGW DLLs use traditional 'lib' prefix soname_spec='${libname}`echo ${release} | $SED -e 's/[.]/-/g'`${versuffix}${shared_ext}' sys_lib_search_path_spec=`$CC -print-search-dirs | grep "^libraries:" | $SED -e "s/^libraries://" -e "s,=/,/,g"` if echo "$sys_lib_search_path_spec" | grep ';[c-zC-Z]:/' >/dev/null; then # It is most probably a Windows format PATH printed by # mingw gcc, but we are running on Cygwin. Gcc prints its search # path with ; separators, and with drive letters. We can handle the # drive letters (cygwin fileutils understands them), so leave them, # especially as we might pass files found there to a mingw objdump, # which wouldn't understand a cygwinified path. Ahh. sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e 's/;/ /g'` else sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e "s/$PATH_SEPARATOR/ /g"` fi ;; pw32*) # pw32 DLLs use 'pw' prefix rather than 'lib' library_names_spec='`echo ${libname} | sed -e 's/^lib/pw/'``echo ${release} | $SED -e 's/[.]/-/g'`${versuffix}${shared_ext}' ;; esac ;; *) library_names_spec='${libname}`echo ${release} | $SED -e 's/[.]/-/g'`${versuffix}${shared_ext} $libname.lib' ;; esac dynamic_linker='Win32 ld.exe' # FIXME: first we should search . and the directory the executable is in shlibpath_var=PATH ;; darwin* | rhapsody*) dynamic_linker="$host_os dyld" version_type=darwin need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${versuffix}$shared_ext ${libname}${release}${major}$shared_ext ${libname}$shared_ext' soname_spec='${libname}${release}${major}$shared_ext' shlibpath_overrides_runpath=yes shlibpath_var=DYLD_LIBRARY_PATH shrext_cmds='`test .$module = .yes && echo .so || echo .dylib`' # Apple's gcc prints 'gcc -print-search-dirs' doesn't operate the same. if test "$GCC" = yes; then sys_lib_search_path_spec=`$CC -print-search-dirs | tr "\n" "$PATH_SEPARATOR" | sed -e 's/libraries:/@libraries:/' | tr "@" "\n" | grep "^libraries:" | sed -e "s/^libraries://" -e "s,=/,/,g" -e "s,$PATH_SEPARATOR, ,g" -e "s,.*,& /lib /usr/lib /usr/local/lib,g"` else sys_lib_search_path_spec='/lib /usr/lib /usr/local/lib' fi sys_lib_dlsearch_path_spec='/usr/local/lib /lib /usr/lib' ;; dgux*) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname$shared_ext' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH ;; freebsd1*) dynamic_linker=no ;; kfreebsd*-gnu) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=no hardcode_into_libs=yes dynamic_linker='GNU ld.so' ;; freebsd* | dragonfly*) # DragonFly does not have aout. When/if they implement a new # versioning mechanism, adjust this. if test -x /usr/bin/objformat; then objformat=`/usr/bin/objformat` else case $host_os in freebsd[123]*) objformat=aout ;; *) objformat=elf ;; esac fi version_type=freebsd-$objformat case $version_type in freebsd-elf*) library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext} $libname${shared_ext}' need_version=no need_lib_prefix=no ;; freebsd-*) library_names_spec='${libname}${release}${shared_ext}$versuffix $libname${shared_ext}$versuffix' need_version=yes ;; esac shlibpath_var=LD_LIBRARY_PATH case $host_os in freebsd2*) shlibpath_overrides_runpath=yes ;; freebsd3.[01]* | freebsdelf3.[01]*) shlibpath_overrides_runpath=yes hardcode_into_libs=yes ;; freebsd3.[2-9]* | freebsdelf3.[2-9]* | \ freebsd4.[0-5] | freebsdelf4.[0-5] | freebsd4.1.1 | freebsdelf4.1.1) shlibpath_overrides_runpath=no hardcode_into_libs=yes ;; freebsd*) # from 4.6 on shlibpath_overrides_runpath=yes hardcode_into_libs=yes ;; esac ;; gnu*) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}${major} ${libname}${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH hardcode_into_libs=yes ;; hpux9* | hpux10* | hpux11*) # Give a soname corresponding to the major version so that dld.sl refuses to # link against other versions. version_type=sunos need_lib_prefix=no need_version=no case $host_cpu in ia64*) shrext_cmds='.so' hardcode_into_libs=yes dynamic_linker="$host_os dld.so" shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=yes # Unless +noenvvar is specified. library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' if test "X$HPUX_IA64_MODE" = X32; then sys_lib_search_path_spec="/usr/lib/hpux32 /usr/local/lib/hpux32 /usr/local/lib" else sys_lib_search_path_spec="/usr/lib/hpux64 /usr/local/lib/hpux64" fi sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec ;; hppa*64*) shrext_cmds='.sl' hardcode_into_libs=yes dynamic_linker="$host_os dld.sl" shlibpath_var=LD_LIBRARY_PATH # How should we handle SHLIB_PATH shlibpath_overrides_runpath=yes # Unless +noenvvar is specified. library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' sys_lib_search_path_spec="/usr/lib/pa20_64 /usr/ccs/lib/pa20_64" sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec ;; *) shrext_cmds='.sl' dynamic_linker="$host_os dld.sl" shlibpath_var=SHLIB_PATH shlibpath_overrides_runpath=no # +s is required to enable SHLIB_PATH library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' ;; esac # HP-UX runs *really* slowly unless shared libraries are mode 555. postinstall_cmds='chmod 555 $lib' ;; interix3*) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' dynamic_linker='Interix 3.x ld.so.1 (PE, like ELF)' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=no hardcode_into_libs=yes ;; irix5* | irix6* | nonstopux*) case $host_os in nonstopux*) version_type=nonstopux ;; *) if test "$lt_cv_prog_gnu_ld" = yes; then version_type=linux else version_type=irix fi ;; esac need_lib_prefix=no need_version=no soname_spec='${libname}${release}${shared_ext}$major' library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${release}${shared_ext} $libname${shared_ext}' case $host_os in irix5* | nonstopux*) libsuff= shlibsuff= ;; *) case $LD in # libtool.m4 will add one of these switches to LD *-32|*"-32 "|*-melf32bsmip|*"-melf32bsmip ") libsuff= shlibsuff= libmagic=32-bit;; *-n32|*"-n32 "|*-melf32bmipn32|*"-melf32bmipn32 ") libsuff=32 shlibsuff=N32 libmagic=N32;; *-64|*"-64 "|*-melf64bmip|*"-melf64bmip ") libsuff=64 shlibsuff=64 libmagic=64-bit;; *) libsuff= shlibsuff= libmagic=never-match;; esac ;; esac shlibpath_var=LD_LIBRARY${shlibsuff}_PATH shlibpath_overrides_runpath=no sys_lib_search_path_spec="/usr/lib${libsuff} /lib${libsuff} /usr/local/lib${libsuff}" sys_lib_dlsearch_path_spec="/usr/lib${libsuff} /lib${libsuff}" hardcode_into_libs=yes ;; # No shared lib support for Linux oldld, aout, or coff. linux*oldld* | linux*aout* | linux*coff*) dynamic_linker=no ;; # This must be Linux ELF. linux*) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' finish_cmds='PATH="\$PATH:/sbin" ldconfig -n $libdir' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=no # This implies no fast_install, which is unacceptable. # Some rework will be needed to allow for fast_install # before this can be enabled. hardcode_into_libs=yes # Append ld.so.conf contents to the search path if test -f /etc/ld.so.conf; then lt_ld_extra=`awk '/^include / { system(sprintf("cd /etc; cat %s", \$2)); skip = 1; } { if (!skip) print \$0; skip = 0; }' < /etc/ld.so.conf | $SED -e 's/#.*//;s/[:, ]/ /g;s/=[^=]*$//;s/=[^= ]* / /g;/^$/d' | tr '\n' ' '` sys_lib_dlsearch_path_spec="/lib /usr/lib $lt_ld_extra" fi # We used to test for /lib/ld.so.1 and disable shared libraries on # powerpc, because MkLinux only supported shared libraries with the # GNU dynamic linker. Since this was broken with cross compilers, # most powerpc-linux boxes support dynamic linking these days and # people can always --disable-shared, the test was removed, and we # assume the GNU/Linux dynamic linker is in use. dynamic_linker='GNU/Linux ld.so' ;; knetbsd*-gnu) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=no hardcode_into_libs=yes dynamic_linker='GNU ld.so' ;; netbsd*) version_type=sunos need_lib_prefix=no need_version=no if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix' finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir' dynamic_linker='NetBSD (a.out) ld.so' else library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' dynamic_linker='NetBSD ld.elf_so' fi shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=yes hardcode_into_libs=yes ;; newsos6) version_type=linux library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=yes ;; nto-qnx*) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=yes ;; openbsd*) version_type=sunos sys_lib_dlsearch_path_spec="/usr/lib" need_lib_prefix=no # Some older versions of OpenBSD (3.3 at least) *do* need versioned libs. case $host_os in openbsd3.3 | openbsd3.3.*) need_version=yes ;; *) need_version=no ;; esac library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix' finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir' shlibpath_var=LD_LIBRARY_PATH if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`" || test "$host_os-$host_cpu" = "openbsd2.8-powerpc"; then case $host_os in openbsd2.[89] | openbsd2.[89].*) shlibpath_overrides_runpath=no ;; *) shlibpath_overrides_runpath=yes ;; esac else shlibpath_overrides_runpath=yes fi ;; os2*) libname_spec='$name' shrext_cmds=".dll" need_lib_prefix=no library_names_spec='$libname${shared_ext} $libname.a' dynamic_linker='OS/2 ld.exe' shlibpath_var=LIBPATH ;; osf3* | osf4* | osf5*) version_type=osf need_lib_prefix=no need_version=no soname_spec='${libname}${release}${shared_ext}$major' library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' shlibpath_var=LD_LIBRARY_PATH sys_lib_search_path_spec="/usr/shlib /usr/ccs/lib /usr/lib/cmplrs/cc /usr/lib /usr/local/lib /var/shlib" sys_lib_dlsearch_path_spec="$sys_lib_search_path_spec" ;; solaris*) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=yes hardcode_into_libs=yes # ldd complains unless libraries are executable postinstall_cmds='chmod +x $lib' ;; sunos4*) version_type=sunos library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix' finish_cmds='PATH="\$PATH:/usr/etc" ldconfig $libdir' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=yes if test "$with_gnu_ld" = yes; then need_lib_prefix=no fi need_version=yes ;; sysv4 | sysv4.3*) version_type=linux library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH case $host_vendor in sni) shlibpath_overrides_runpath=no need_lib_prefix=no export_dynamic_flag_spec='${wl}-Blargedynsym' runpath_var=LD_RUN_PATH ;; siemens) need_lib_prefix=no ;; motorola) need_lib_prefix=no need_version=no shlibpath_overrides_runpath=no sys_lib_search_path_spec='/lib /usr/lib /usr/ccs/lib' ;; esac ;; sysv4*MP*) if test -d /usr/nec ;then version_type=linux library_names_spec='$libname${shared_ext}.$versuffix $libname${shared_ext}.$major $libname${shared_ext}' soname_spec='$libname${shared_ext}.$major' shlibpath_var=LD_LIBRARY_PATH fi ;; sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX* | sysv4*uw2*) version_type=freebsd-elf need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext} $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH hardcode_into_libs=yes if test "$with_gnu_ld" = yes; then sys_lib_search_path_spec='/usr/local/lib /usr/gnu/lib /usr/ccs/lib /usr/lib /lib' shlibpath_overrides_runpath=no else sys_lib_search_path_spec='/usr/ccs/lib /usr/lib' shlibpath_overrides_runpath=yes case $host_os in sco3.2v5*) sys_lib_search_path_spec="$sys_lib_search_path_spec /lib" ;; esac fi sys_lib_dlsearch_path_spec='/usr/lib' ;; uts4*) version_type=linux library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH ;; *) dynamic_linker=no ;; esac echo "$as_me:$LINENO: result: $dynamic_linker" >&5 echo "${ECHO_T}$dynamic_linker" >&6 test "$dynamic_linker" = no && can_build_shared=no variables_saved_for_relink="PATH $shlibpath_var $runpath_var" if test "$GCC" = yes; then variables_saved_for_relink="$variables_saved_for_relink GCC_EXEC_PREFIX COMPILER_PATH LIBRARY_PATH" fi echo "$as_me:$LINENO: checking how to hardcode library paths into programs" >&5 echo $ECHO_N "checking how to hardcode library paths into programs... $ECHO_C" >&6 hardcode_action= if test -n "$hardcode_libdir_flag_spec" || \ test -n "$runpath_var" || \ test "X$hardcode_automatic" = "Xyes" ; then # We can hardcode non-existant directories. if test "$hardcode_direct" != no && # If the only mechanism to avoid hardcoding is shlibpath_var, we # have to relink, otherwise we might link with an installed library # when we should be linking with a yet-to-be-installed one ## test "$_LT_AC_TAGVAR(hardcode_shlibpath_var, )" != no && test "$hardcode_minus_L" != no; then # Linking always hardcodes the temporary library directory. hardcode_action=relink else # We can link without hardcoding, and we can hardcode nonexisting dirs. hardcode_action=immediate fi else # We cannot hardcode anything, or else we can only hardcode existing # directories. hardcode_action=unsupported fi echo "$as_me:$LINENO: result: $hardcode_action" >&5 echo "${ECHO_T}$hardcode_action" >&6 if test "$hardcode_action" = relink; then # Fast installation is not supported enable_fast_install=no elif test "$shlibpath_overrides_runpath" = yes || test "$enable_shared" = no; then # Fast installation is not necessary enable_fast_install=needless fi striplib= old_striplib= echo "$as_me:$LINENO: checking whether stripping libraries is possible" >&5 echo $ECHO_N "checking whether stripping libraries is possible... $ECHO_C" >&6 if test -n "$STRIP" && $STRIP -V 2>&1 | grep "GNU strip" >/dev/null; then test -z "$old_striplib" && old_striplib="$STRIP --strip-debug" test -z "$striplib" && striplib="$STRIP --strip-unneeded" echo "$as_me:$LINENO: result: yes" >&5 echo "${ECHO_T}yes" >&6 else # FIXME - insert some real tests, host_os isn't really good enough case $host_os in darwin*) if test -n "$STRIP" ; then striplib="$STRIP -x" echo "$as_me:$LINENO: result: yes" >&5 echo "${ECHO_T}yes" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi ;; *) echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 ;; esac fi if test "x$enable_dlopen" != xyes; then enable_dlopen=unknown enable_dlopen_self=unknown enable_dlopen_self_static=unknown else lt_cv_dlopen=no lt_cv_dlopen_libs= case $host_os in beos*) lt_cv_dlopen="load_add_on" lt_cv_dlopen_libs= lt_cv_dlopen_self=yes ;; mingw* | pw32*) lt_cv_dlopen="LoadLibrary" lt_cv_dlopen_libs= ;; cygwin*) lt_cv_dlopen="dlopen" lt_cv_dlopen_libs= ;; darwin*) # if libdl is installed we need to link against it echo "$as_me:$LINENO: checking for dlopen in -ldl" >&5 echo $ECHO_N "checking for dlopen in -ldl... $ECHO_C" >&6 if test "${ac_cv_lib_dl_dlopen+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else ac_check_lib_save_LIBS=$LIBS LIBS="-ldl $LIBS" cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char dlopen (); int main () { dlopen (); ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_lib_dl_dlopen=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_lib_dl_dlopen=no fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext LIBS=$ac_check_lib_save_LIBS fi echo "$as_me:$LINENO: result: $ac_cv_lib_dl_dlopen" >&5 echo "${ECHO_T}$ac_cv_lib_dl_dlopen" >&6 if test $ac_cv_lib_dl_dlopen = yes; then lt_cv_dlopen="dlopen" lt_cv_dlopen_libs="-ldl" else lt_cv_dlopen="dyld" lt_cv_dlopen_libs= lt_cv_dlopen_self=yes fi ;; *) echo "$as_me:$LINENO: checking for shl_load" >&5 echo $ECHO_N "checking for shl_load... $ECHO_C" >&6 if test "${ac_cv_func_shl_load+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Define shl_load to an innocuous variant, in case declares shl_load. For example, HP-UX 11i declares gettimeofday. */ #define shl_load innocuous_shl_load /* System header to define __stub macros and hopefully few prototypes, which can conflict with char shl_load (); below. Prefer to if __STDC__ is defined, since exists even on freestanding compilers. */ #ifdef __STDC__ # include #else # include #endif #undef shl_load /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" { #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char shl_load (); /* The GNU C library defines this for functions which it implements to always fail with ENOSYS. Some functions are actually named something starting with __ and the normal name is an alias. */ #if defined (__stub_shl_load) || defined (__stub___shl_load) choke me #else char (*f) () = shl_load; #endif #ifdef __cplusplus } #endif int main () { return f != shl_load; ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_func_shl_load=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_func_shl_load=no fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext fi echo "$as_me:$LINENO: result: $ac_cv_func_shl_load" >&5 echo "${ECHO_T}$ac_cv_func_shl_load" >&6 if test $ac_cv_func_shl_load = yes; then lt_cv_dlopen="shl_load" else echo "$as_me:$LINENO: checking for shl_load in -ldld" >&5 echo $ECHO_N "checking for shl_load in -ldld... $ECHO_C" >&6 if test "${ac_cv_lib_dld_shl_load+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else ac_check_lib_save_LIBS=$LIBS LIBS="-ldld $LIBS" cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char shl_load (); int main () { shl_load (); ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_lib_dld_shl_load=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_lib_dld_shl_load=no fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext LIBS=$ac_check_lib_save_LIBS fi echo "$as_me:$LINENO: result: $ac_cv_lib_dld_shl_load" >&5 echo "${ECHO_T}$ac_cv_lib_dld_shl_load" >&6 if test $ac_cv_lib_dld_shl_load = yes; then lt_cv_dlopen="shl_load" lt_cv_dlopen_libs="-dld" else echo "$as_me:$LINENO: checking for dlopen" >&5 echo $ECHO_N "checking for dlopen... $ECHO_C" >&6 if test "${ac_cv_func_dlopen+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Define dlopen to an innocuous variant, in case declares dlopen. For example, HP-UX 11i declares gettimeofday. */ #define dlopen innocuous_dlopen /* System header to define __stub macros and hopefully few prototypes, which can conflict with char dlopen (); below. Prefer to if __STDC__ is defined, since exists even on freestanding compilers. */ #ifdef __STDC__ # include #else # include #endif #undef dlopen /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" { #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char dlopen (); /* The GNU C library defines this for functions which it implements to always fail with ENOSYS. Some functions are actually named something starting with __ and the normal name is an alias. */ #if defined (__stub_dlopen) || defined (__stub___dlopen) choke me #else char (*f) () = dlopen; #endif #ifdef __cplusplus } #endif int main () { return f != dlopen; ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_func_dlopen=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_func_dlopen=no fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext fi echo "$as_me:$LINENO: result: $ac_cv_func_dlopen" >&5 echo "${ECHO_T}$ac_cv_func_dlopen" >&6 if test $ac_cv_func_dlopen = yes; then lt_cv_dlopen="dlopen" else echo "$as_me:$LINENO: checking for dlopen in -ldl" >&5 echo $ECHO_N "checking for dlopen in -ldl... $ECHO_C" >&6 if test "${ac_cv_lib_dl_dlopen+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else ac_check_lib_save_LIBS=$LIBS LIBS="-ldl $LIBS" cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char dlopen (); int main () { dlopen (); ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_lib_dl_dlopen=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_lib_dl_dlopen=no fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext LIBS=$ac_check_lib_save_LIBS fi echo "$as_me:$LINENO: result: $ac_cv_lib_dl_dlopen" >&5 echo "${ECHO_T}$ac_cv_lib_dl_dlopen" >&6 if test $ac_cv_lib_dl_dlopen = yes; then lt_cv_dlopen="dlopen" lt_cv_dlopen_libs="-ldl" else echo "$as_me:$LINENO: checking for dlopen in -lsvld" >&5 echo $ECHO_N "checking for dlopen in -lsvld... $ECHO_C" >&6 if test "${ac_cv_lib_svld_dlopen+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else ac_check_lib_save_LIBS=$LIBS LIBS="-lsvld $LIBS" cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char dlopen (); int main () { dlopen (); ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_lib_svld_dlopen=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_lib_svld_dlopen=no fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext LIBS=$ac_check_lib_save_LIBS fi echo "$as_me:$LINENO: result: $ac_cv_lib_svld_dlopen" >&5 echo "${ECHO_T}$ac_cv_lib_svld_dlopen" >&6 if test $ac_cv_lib_svld_dlopen = yes; then lt_cv_dlopen="dlopen" lt_cv_dlopen_libs="-lsvld" else echo "$as_me:$LINENO: checking for dld_link in -ldld" >&5 echo $ECHO_N "checking for dld_link in -ldld... $ECHO_C" >&6 if test "${ac_cv_lib_dld_dld_link+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else ac_check_lib_save_LIBS=$LIBS LIBS="-ldld $LIBS" cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char dld_link (); int main () { dld_link (); ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_lib_dld_dld_link=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_lib_dld_dld_link=no fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext LIBS=$ac_check_lib_save_LIBS fi echo "$as_me:$LINENO: result: $ac_cv_lib_dld_dld_link" >&5 echo "${ECHO_T}$ac_cv_lib_dld_dld_link" >&6 if test $ac_cv_lib_dld_dld_link = yes; then lt_cv_dlopen="dld_link" lt_cv_dlopen_libs="-dld" fi fi fi fi fi fi ;; esac if test "x$lt_cv_dlopen" != xno; then enable_dlopen=yes else enable_dlopen=no fi case $lt_cv_dlopen in dlopen) save_CPPFLAGS="$CPPFLAGS" test "x$ac_cv_header_dlfcn_h" = xyes && CPPFLAGS="$CPPFLAGS -DHAVE_DLFCN_H" save_LDFLAGS="$LDFLAGS" wl=$lt_prog_compiler_wl eval LDFLAGS=\"\$LDFLAGS $export_dynamic_flag_spec\" save_LIBS="$LIBS" LIBS="$lt_cv_dlopen_libs $LIBS" echo "$as_me:$LINENO: checking whether a program can dlopen itself" >&5 echo $ECHO_N "checking whether a program can dlopen itself... $ECHO_C" >&6 if test "${lt_cv_dlopen_self+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test "$cross_compiling" = yes; then : lt_cv_dlopen_self=cross else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext < #endif #include #ifdef RTLD_GLOBAL # define LT_DLGLOBAL RTLD_GLOBAL #else # ifdef DL_GLOBAL # define LT_DLGLOBAL DL_GLOBAL # else # define LT_DLGLOBAL 0 # endif #endif /* We may have to define LT_DLLAZY_OR_NOW in the command line if we find out it does not work in some platform. */ #ifndef LT_DLLAZY_OR_NOW # ifdef RTLD_LAZY # define LT_DLLAZY_OR_NOW RTLD_LAZY # else # ifdef DL_LAZY # define LT_DLLAZY_OR_NOW DL_LAZY # else # ifdef RTLD_NOW # define LT_DLLAZY_OR_NOW RTLD_NOW # else # ifdef DL_NOW # define LT_DLLAZY_OR_NOW DL_NOW # else # define LT_DLLAZY_OR_NOW 0 # endif # endif # endif # endif #endif #ifdef __cplusplus extern "C" void exit (int); #endif void fnord() { int i=42;} int main () { void *self = dlopen (0, LT_DLGLOBAL|LT_DLLAZY_OR_NOW); int status = $lt_dlunknown; if (self) { if (dlsym (self,"fnord")) status = $lt_dlno_uscore; else if (dlsym( self,"_fnord")) status = $lt_dlneed_uscore; /* dlclose (self); */ } else puts (dlerror ()); exit (status); } EOF if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && test -s conftest${ac_exeext} 2>/dev/null; then (./conftest; exit; ) >&5 2>/dev/null lt_status=$? case x$lt_status in x$lt_dlno_uscore) lt_cv_dlopen_self=yes ;; x$lt_dlneed_uscore) lt_cv_dlopen_self=yes ;; x$lt_dlunknown|x*) lt_cv_dlopen_self=no ;; esac else : # compilation failed lt_cv_dlopen_self=no fi fi rm -fr conftest* fi echo "$as_me:$LINENO: result: $lt_cv_dlopen_self" >&5 echo "${ECHO_T}$lt_cv_dlopen_self" >&6 if test "x$lt_cv_dlopen_self" = xyes; then wl=$lt_prog_compiler_wl eval LDFLAGS=\"\$LDFLAGS $lt_prog_compiler_static\" echo "$as_me:$LINENO: checking whether a statically linked program can dlopen itself" >&5 echo $ECHO_N "checking whether a statically linked program can dlopen itself... $ECHO_C" >&6 if test "${lt_cv_dlopen_self_static+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test "$cross_compiling" = yes; then : lt_cv_dlopen_self_static=cross else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext < #endif #include #ifdef RTLD_GLOBAL # define LT_DLGLOBAL RTLD_GLOBAL #else # ifdef DL_GLOBAL # define LT_DLGLOBAL DL_GLOBAL # else # define LT_DLGLOBAL 0 # endif #endif /* We may have to define LT_DLLAZY_OR_NOW in the command line if we find out it does not work in some platform. */ #ifndef LT_DLLAZY_OR_NOW # ifdef RTLD_LAZY # define LT_DLLAZY_OR_NOW RTLD_LAZY # else # ifdef DL_LAZY # define LT_DLLAZY_OR_NOW DL_LAZY # else # ifdef RTLD_NOW # define LT_DLLAZY_OR_NOW RTLD_NOW # else # ifdef DL_NOW # define LT_DLLAZY_OR_NOW DL_NOW # else # define LT_DLLAZY_OR_NOW 0 # endif # endif # endif # endif #endif #ifdef __cplusplus extern "C" void exit (int); #endif void fnord() { int i=42;} int main () { void *self = dlopen (0, LT_DLGLOBAL|LT_DLLAZY_OR_NOW); int status = $lt_dlunknown; if (self) { if (dlsym (self,"fnord")) status = $lt_dlno_uscore; else if (dlsym( self,"_fnord")) status = $lt_dlneed_uscore; /* dlclose (self); */ } else puts (dlerror ()); exit (status); } EOF if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && test -s conftest${ac_exeext} 2>/dev/null; then (./conftest; exit; ) >&5 2>/dev/null lt_status=$? case x$lt_status in x$lt_dlno_uscore) lt_cv_dlopen_self_static=yes ;; x$lt_dlneed_uscore) lt_cv_dlopen_self_static=yes ;; x$lt_dlunknown|x*) lt_cv_dlopen_self_static=no ;; esac else : # compilation failed lt_cv_dlopen_self_static=no fi fi rm -fr conftest* fi echo "$as_me:$LINENO: result: $lt_cv_dlopen_self_static" >&5 echo "${ECHO_T}$lt_cv_dlopen_self_static" >&6 fi CPPFLAGS="$save_CPPFLAGS" LDFLAGS="$save_LDFLAGS" LIBS="$save_LIBS" ;; esac case $lt_cv_dlopen_self in yes|no) enable_dlopen_self=$lt_cv_dlopen_self ;; *) enable_dlopen_self=unknown ;; esac case $lt_cv_dlopen_self_static in yes|no) enable_dlopen_self_static=$lt_cv_dlopen_self_static ;; *) enable_dlopen_self_static=unknown ;; esac fi # Report which library types will actually be built echo "$as_me:$LINENO: checking if libtool supports shared libraries" >&5 echo $ECHO_N "checking if libtool supports shared libraries... $ECHO_C" >&6 echo "$as_me:$LINENO: result: $can_build_shared" >&5 echo "${ECHO_T}$can_build_shared" >&6 echo "$as_me:$LINENO: checking whether to build shared libraries" >&5 echo $ECHO_N "checking whether to build shared libraries... $ECHO_C" >&6 test "$can_build_shared" = "no" && enable_shared=no # On AIX, shared libraries and static libraries use the same namespace, and # are all built from PIC. case $host_os in aix3*) test "$enable_shared" = yes && enable_static=no if test -n "$RANLIB"; then archive_cmds="$archive_cmds~\$RANLIB \$lib" postinstall_cmds='$RANLIB $lib' fi ;; aix4* | aix5*) if test "$host_cpu" != ia64 && test "$aix_use_runtimelinking" = no ; then test "$enable_shared" = yes && enable_static=no fi ;; esac echo "$as_me:$LINENO: result: $enable_shared" >&5 echo "${ECHO_T}$enable_shared" >&6 echo "$as_me:$LINENO: checking whether to build static libraries" >&5 echo $ECHO_N "checking whether to build static libraries... $ECHO_C" >&6 # Make sure either enable_shared or enable_static is yes. test "$enable_shared" = yes || enable_static=yes echo "$as_me:$LINENO: result: $enable_static" >&5 echo "${ECHO_T}$enable_static" >&6 # The else clause should only fire when bootstrapping the # libtool distribution, otherwise you forgot to ship ltmain.sh # with your package, and you will get complaints that there are # no rules to generate ltmain.sh. if test -f "$ltmain"; then # See if we are running on zsh, and set the options which allow our commands through # without removal of \ escapes. if test -n "${ZSH_VERSION+set}" ; then setopt NO_GLOB_SUBST fi # Now quote all the things that may contain metacharacters while being # careful not to overquote the AC_SUBSTed values. We take copies of the # variables and quote the copies for generation of the libtool script. for var in echo old_CC old_CFLAGS AR AR_FLAGS EGREP RANLIB LN_S LTCC LTCFLAGS NM \ SED SHELL STRIP \ libname_spec library_names_spec soname_spec extract_expsyms_cmds \ old_striplib striplib file_magic_cmd finish_cmds finish_eval \ deplibs_check_method reload_flag reload_cmds need_locks \ lt_cv_sys_global_symbol_pipe lt_cv_sys_global_symbol_to_cdecl \ lt_cv_sys_global_symbol_to_c_name_address \ sys_lib_search_path_spec sys_lib_dlsearch_path_spec \ old_postinstall_cmds old_postuninstall_cmds \ compiler \ CC \ LD \ lt_prog_compiler_wl \ lt_prog_compiler_pic \ lt_prog_compiler_static \ lt_prog_compiler_no_builtin_flag \ export_dynamic_flag_spec \ thread_safe_flag_spec \ whole_archive_flag_spec \ enable_shared_with_static_runtimes \ old_archive_cmds \ old_archive_from_new_cmds \ predep_objects \ postdep_objects \ predeps \ postdeps \ compiler_lib_search_path \ archive_cmds \ archive_expsym_cmds \ postinstall_cmds \ postuninstall_cmds \ old_archive_from_expsyms_cmds \ allow_undefined_flag \ no_undefined_flag \ export_symbols_cmds \ hardcode_libdir_flag_spec \ hardcode_libdir_flag_spec_ld \ hardcode_libdir_separator \ hardcode_automatic \ module_cmds \ module_expsym_cmds \ lt_cv_prog_compiler_c_o \ exclude_expsyms \ include_expsyms; do case $var in old_archive_cmds | \ old_archive_from_new_cmds | \ archive_cmds | \ archive_expsym_cmds | \ module_cmds | \ module_expsym_cmds | \ old_archive_from_expsyms_cmds | \ export_symbols_cmds | \ extract_expsyms_cmds | reload_cmds | finish_cmds | \ postinstall_cmds | postuninstall_cmds | \ old_postinstall_cmds | old_postuninstall_cmds | \ sys_lib_search_path_spec | sys_lib_dlsearch_path_spec) # Double-quote double-evaled strings. eval "lt_$var=\\\"\`\$echo \"X\$$var\" | \$Xsed -e \"\$double_quote_subst\" -e \"\$sed_quote_subst\" -e \"\$delay_variable_subst\"\`\\\"" ;; *) eval "lt_$var=\\\"\`\$echo \"X\$$var\" | \$Xsed -e \"\$sed_quote_subst\"\`\\\"" ;; esac done case $lt_echo in *'\$0 --fallback-echo"') lt_echo=`$echo "X$lt_echo" | $Xsed -e 's/\\\\\\\$0 --fallback-echo"$/$0 --fallback-echo"/'` ;; esac cfgfile="${ofile}T" trap "$rm \"$cfgfile\"; exit 1" 1 2 15 $rm -f "$cfgfile" { echo "$as_me:$LINENO: creating $ofile" >&5 echo "$as_me: creating $ofile" >&6;} cat <<__EOF__ >> "$cfgfile" #! $SHELL # `$echo "$cfgfile" | sed 's%^.*/%%'` - Provide generalized library-building support services. # Generated automatically by $PROGRAM (GNU $PACKAGE $VERSION$TIMESTAMP) # NOTE: Changes made to this file will be lost: look at ltmain.sh. # # Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001 # Free Software Foundation, Inc. # # This file is part of GNU Libtool: # Originally by Gordon Matzigkeit , 1996 # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. # # As a special exception to the GNU General Public License, if you # distribute this file as part of a program that contains a # configuration script generated by Autoconf, you may include it under # the same distribution terms that you use for the rest of that program. # A sed program that does not truncate output. SED=$lt_SED # Sed that helps us avoid accidentally triggering echo(1) options like -n. Xsed="$SED -e 1s/^X//" # The HP-UX ksh and POSIX shell print the target directory to stdout # if CDPATH is set. (unset CDPATH) >/dev/null 2>&1 && unset CDPATH # The names of the tagged configurations supported by this script. available_tags= # ### BEGIN LIBTOOL CONFIG # Libtool was configured on host `(hostname || uname -n) 2>/dev/null | sed 1q`: # Shell to use when invoking shell scripts. SHELL=$lt_SHELL # Whether or not to build shared libraries. build_libtool_libs=$enable_shared # Whether or not to build static libraries. build_old_libs=$enable_static # Whether or not to add -lc for building shared libraries. build_libtool_need_lc=$archive_cmds_need_lc # Whether or not to disallow shared libs when runtime libs are static allow_libtool_libs_with_static_runtimes=$enable_shared_with_static_runtimes # Whether or not to optimize for fast installation. fast_install=$enable_fast_install # The host system. host_alias=$host_alias host=$host host_os=$host_os # The build system. build_alias=$build_alias build=$build build_os=$build_os # An echo program that does not interpret backslashes. echo=$lt_echo # The archiver. AR=$lt_AR AR_FLAGS=$lt_AR_FLAGS # A C compiler. LTCC=$lt_LTCC # LTCC compiler flags. LTCFLAGS=$lt_LTCFLAGS # A language-specific compiler. CC=$lt_compiler # Is the compiler the GNU C compiler? with_gcc=$GCC # An ERE matcher. EGREP=$lt_EGREP # The linker used to build libraries. LD=$lt_LD # Whether we need hard or soft links. LN_S=$lt_LN_S # A BSD-compatible nm program. NM=$lt_NM # A symbol stripping program STRIP=$lt_STRIP # Used to examine libraries when file_magic_cmd begins "file" MAGIC_CMD=$MAGIC_CMD # Used on cygwin: DLL creation program. DLLTOOL="$DLLTOOL" # Used on cygwin: object dumper. OBJDUMP="$OBJDUMP" # Used on cygwin: assembler. AS="$AS" # The name of the directory that contains temporary libtool files. objdir=$objdir # How to create reloadable object files. reload_flag=$lt_reload_flag reload_cmds=$lt_reload_cmds # How to pass a linker flag through the compiler. wl=$lt_lt_prog_compiler_wl # Object file suffix (normally "o"). objext="$ac_objext" # Old archive suffix (normally "a"). libext="$libext" # Shared library suffix (normally ".so"). shrext_cmds='$shrext_cmds' # Executable file suffix (normally ""). exeext="$exeext" # Additional compiler flags for building library objects. pic_flag=$lt_lt_prog_compiler_pic pic_mode=$pic_mode # What is the maximum length of a command? max_cmd_len=$lt_cv_sys_max_cmd_len # Does compiler simultaneously support -c and -o options? compiler_c_o=$lt_lt_cv_prog_compiler_c_o # Must we lock files when doing compilation? need_locks=$lt_need_locks # Do we need the lib prefix for modules? need_lib_prefix=$need_lib_prefix # Do we need a version for libraries? need_version=$need_version # Whether dlopen is supported. dlopen_support=$enable_dlopen # Whether dlopen of programs is supported. dlopen_self=$enable_dlopen_self # Whether dlopen of statically linked programs is supported. dlopen_self_static=$enable_dlopen_self_static # Compiler flag to prevent dynamic linking. link_static_flag=$lt_lt_prog_compiler_static # Compiler flag to turn off builtin functions. no_builtin_flag=$lt_lt_prog_compiler_no_builtin_flag # Compiler flag to allow reflexive dlopens. export_dynamic_flag_spec=$lt_export_dynamic_flag_spec # Compiler flag to generate shared objects directly from archives. whole_archive_flag_spec=$lt_whole_archive_flag_spec # Compiler flag to generate thread-safe objects. thread_safe_flag_spec=$lt_thread_safe_flag_spec # Library versioning type. version_type=$version_type # Format of library name prefix. libname_spec=$lt_libname_spec # List of archive names. First name is the real one, the rest are links. # The last name is the one that the linker finds with -lNAME. library_names_spec=$lt_library_names_spec # The coded name of the library, if different from the real name. soname_spec=$lt_soname_spec # Commands used to build and install an old-style archive. RANLIB=$lt_RANLIB old_archive_cmds=$lt_old_archive_cmds old_postinstall_cmds=$lt_old_postinstall_cmds old_postuninstall_cmds=$lt_old_postuninstall_cmds # Create an old-style archive from a shared archive. old_archive_from_new_cmds=$lt_old_archive_from_new_cmds # Create a temporary old-style archive to link instead of a shared archive. old_archive_from_expsyms_cmds=$lt_old_archive_from_expsyms_cmds # Commands used to build and install a shared archive. archive_cmds=$lt_archive_cmds archive_expsym_cmds=$lt_archive_expsym_cmds postinstall_cmds=$lt_postinstall_cmds postuninstall_cmds=$lt_postuninstall_cmds # Commands used to build a loadable module (assumed same as above if empty) module_cmds=$lt_module_cmds module_expsym_cmds=$lt_module_expsym_cmds # Commands to strip libraries. old_striplib=$lt_old_striplib striplib=$lt_striplib # Dependencies to place before the objects being linked to create a # shared library. predep_objects=$lt_predep_objects # Dependencies to place after the objects being linked to create a # shared library. postdep_objects=$lt_postdep_objects # Dependencies to place before the objects being linked to create a # shared library. predeps=$lt_predeps # Dependencies to place after the objects being linked to create a # shared library. postdeps=$lt_postdeps # The library search path used internally by the compiler when linking # a shared library. compiler_lib_search_path=$lt_compiler_lib_search_path # Method to check whether dependent libraries are shared objects. deplibs_check_method=$lt_deplibs_check_method # Command to use when deplibs_check_method == file_magic. file_magic_cmd=$lt_file_magic_cmd # Flag that allows shared libraries with undefined symbols to be built. allow_undefined_flag=$lt_allow_undefined_flag # Flag that forces no undefined symbols. no_undefined_flag=$lt_no_undefined_flag # Commands used to finish a libtool library installation in a directory. finish_cmds=$lt_finish_cmds # Same as above, but a single script fragment to be evaled but not shown. finish_eval=$lt_finish_eval # Take the output of nm and produce a listing of raw symbols and C names. global_symbol_pipe=$lt_lt_cv_sys_global_symbol_pipe # Transform the output of nm in a proper C declaration global_symbol_to_cdecl=$lt_lt_cv_sys_global_symbol_to_cdecl # Transform the output of nm in a C name address pair global_symbol_to_c_name_address=$lt_lt_cv_sys_global_symbol_to_c_name_address # This is the shared library runtime path variable. runpath_var=$runpath_var # This is the shared library path variable. shlibpath_var=$shlibpath_var # Is shlibpath searched before the hard-coded library search path? shlibpath_overrides_runpath=$shlibpath_overrides_runpath # How to hardcode a shared library path into an executable. hardcode_action=$hardcode_action # Whether we should hardcode library paths into libraries. hardcode_into_libs=$hardcode_into_libs # Flag to hardcode \$libdir into a binary during linking. # This must work even if \$libdir does not exist. hardcode_libdir_flag_spec=$lt_hardcode_libdir_flag_spec # If ld is used when linking, flag to hardcode \$libdir into # a binary during linking. This must work even if \$libdir does # not exist. hardcode_libdir_flag_spec_ld=$lt_hardcode_libdir_flag_spec_ld # Whether we need a single -rpath flag with a separated argument. hardcode_libdir_separator=$lt_hardcode_libdir_separator # Set to yes if using DIR/libNAME${shared_ext} during linking hardcodes DIR into the # resulting binary. hardcode_direct=$hardcode_direct # Set to yes if using the -LDIR flag during linking hardcodes DIR into the # resulting binary. hardcode_minus_L=$hardcode_minus_L # Set to yes if using SHLIBPATH_VAR=DIR during linking hardcodes DIR into # the resulting binary. hardcode_shlibpath_var=$hardcode_shlibpath_var # Set to yes if building a shared library automatically hardcodes DIR into the library # and all subsequent libraries and executables linked against it. hardcode_automatic=$hardcode_automatic # Variables whose values should be saved in libtool wrapper scripts and # restored at relink time. variables_saved_for_relink="$variables_saved_for_relink" # Whether libtool must link a program against all its dependency libraries. link_all_deplibs=$link_all_deplibs # Compile-time system search path for libraries sys_lib_search_path_spec=$lt_sys_lib_search_path_spec # Run-time system search path for libraries sys_lib_dlsearch_path_spec=$lt_sys_lib_dlsearch_path_spec # Fix the shell variable \$srcfile for the compiler. fix_srcfile_path="$fix_srcfile_path" # Set to yes if exported symbols are required. always_export_symbols=$always_export_symbols # The commands to list exported symbols. export_symbols_cmds=$lt_export_symbols_cmds # The commands to extract the exported symbol list from a shared archive. extract_expsyms_cmds=$lt_extract_expsyms_cmds # Symbols that should not be listed in the preloaded symbols. exclude_expsyms=$lt_exclude_expsyms # Symbols that must always be exported. include_expsyms=$lt_include_expsyms # ### END LIBTOOL CONFIG __EOF__ case $host_os in aix3*) cat <<\EOF >> "$cfgfile" # AIX sometimes has problems with the GCC collect2 program. For some # reason, if we set the COLLECT_NAMES environment variable, the problems # vanish in a puff of smoke. if test "X${COLLECT_NAMES+set}" != Xset; then COLLECT_NAMES= export COLLECT_NAMES fi EOF ;; esac # We use sed instead of cat because bash on DJGPP gets confused if # if finds mixed CR/LF and LF-only lines. Since sed operates in # text mode, it properly converts lines to CR/LF. This bash problem # is reportedly fixed, but why not run on old versions too? sed '$q' "$ltmain" >> "$cfgfile" || (rm -f "$cfgfile"; exit 1) mv -f "$cfgfile" "$ofile" || \ (rm -f "$ofile" && cp "$cfgfile" "$ofile" && rm -f "$cfgfile") chmod +x "$ofile" else # If there is no Makefile yet, we rely on a make rule to execute # `config.status --recheck' to rerun these tests and create the # libtool script then. ltmain_in=`echo $ltmain | sed -e 's/\.sh$/.in/'` if test -f "$ltmain_in"; then test -f Makefile && make "$ltmain" fi fi ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu CC="$lt_save_CC" # Check whether --with-tags or --without-tags was given. if test "${with_tags+set}" = set; then withval="$with_tags" tagnames="$withval" fi; if test -f "$ltmain" && test -n "$tagnames"; then if test ! -f "${ofile}"; then { echo "$as_me:$LINENO: WARNING: output file \`$ofile' does not exist" >&5 echo "$as_me: WARNING: output file \`$ofile' does not exist" >&2;} fi if test -z "$LTCC"; then eval "`$SHELL ${ofile} --config | grep '^LTCC='`" if test -z "$LTCC"; then { echo "$as_me:$LINENO: WARNING: output file \`$ofile' does not look like a libtool script" >&5 echo "$as_me: WARNING: output file \`$ofile' does not look like a libtool script" >&2;} else { echo "$as_me:$LINENO: WARNING: using \`LTCC=$LTCC', extracted from \`$ofile'" >&5 echo "$as_me: WARNING: using \`LTCC=$LTCC', extracted from \`$ofile'" >&2;} fi fi if test -z "$LTCFLAGS"; then eval "`$SHELL ${ofile} --config | grep '^LTCFLAGS='`" fi # Extract list of available tagged configurations in $ofile. # Note that this assumes the entire list is on one line. available_tags=`grep "^available_tags=" "${ofile}" | $SED -e 's/available_tags=\(.*$\)/\1/' -e 's/\"//g'` lt_save_ifs="$IFS"; IFS="${IFS}$PATH_SEPARATOR," for tagname in $tagnames; do IFS="$lt_save_ifs" # Check whether tagname contains only valid characters case `$echo "X$tagname" | $Xsed -e 's:[-_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890,/]::g'` in "") ;; *) { { echo "$as_me:$LINENO: error: invalid tag name: $tagname" >&5 echo "$as_me: error: invalid tag name: $tagname" >&2;} { (exit 1); exit 1; }; } ;; esac if grep "^# ### BEGIN LIBTOOL TAG CONFIG: $tagname$" < "${ofile}" > /dev/null then { { echo "$as_me:$LINENO: error: tag name \"$tagname\" already exists" >&5 echo "$as_me: error: tag name \"$tagname\" already exists" >&2;} { (exit 1); exit 1; }; } fi # Update the list of available tags. if test -n "$tagname"; then echo appending configuration tag \"$tagname\" to $ofile case $tagname in CXX) if test -n "$CXX" && ( test "X$CXX" != "Xno" && ( (test "X$CXX" = "Xg++" && `g++ -v >/dev/null 2>&1` ) || (test "X$CXX" != "Xg++"))) ; then ac_ext=cc ac_cpp='$CXXCPP $CPPFLAGS' ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_cxx_compiler_gnu archive_cmds_need_lc_CXX=no allow_undefined_flag_CXX= always_export_symbols_CXX=no archive_expsym_cmds_CXX= export_dynamic_flag_spec_CXX= hardcode_direct_CXX=no hardcode_libdir_flag_spec_CXX= hardcode_libdir_flag_spec_ld_CXX= hardcode_libdir_separator_CXX= hardcode_minus_L_CXX=no hardcode_shlibpath_var_CXX=unsupported hardcode_automatic_CXX=no module_cmds_CXX= module_expsym_cmds_CXX= link_all_deplibs_CXX=unknown old_archive_cmds_CXX=$old_archive_cmds no_undefined_flag_CXX= whole_archive_flag_spec_CXX= enable_shared_with_static_runtimes_CXX=no # Dependencies to place before and after the object being linked: predep_objects_CXX= postdep_objects_CXX= predeps_CXX= postdeps_CXX= compiler_lib_search_path_CXX= # Source file extension for C++ test sources. ac_ext=cpp # Object file extension for compiled C++ test sources. objext=o objext_CXX=$objext # Code to be used in simple compile tests lt_simple_compile_test_code="int some_variable = 0;\n" # Code to be used in simple link tests lt_simple_link_test_code='int main(int, char *[]) { return(0); }\n' # ltmain only uses $CC for tagged configurations so make sure $CC is set. # If no C compiler was specified, use CC. LTCC=${LTCC-"$CC"} # If no C compiler flags were specified, use CFLAGS. LTCFLAGS=${LTCFLAGS-"$CFLAGS"} # Allow CC to be a program name with arguments. compiler=$CC # save warnings/boilerplate of simple test code ac_outfile=conftest.$ac_objext printf "$lt_simple_compile_test_code" >conftest.$ac_ext eval "$ac_compile" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err _lt_compiler_boilerplate=`cat conftest.err` $rm conftest* ac_outfile=conftest.$ac_objext printf "$lt_simple_link_test_code" >conftest.$ac_ext eval "$ac_link" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err _lt_linker_boilerplate=`cat conftest.err` $rm conftest* # Allow CC to be a program name with arguments. lt_save_CC=$CC lt_save_LD=$LD lt_save_GCC=$GCC GCC=$GXX lt_save_with_gnu_ld=$with_gnu_ld lt_save_path_LD=$lt_cv_path_LD if test -n "${lt_cv_prog_gnu_ldcxx+set}"; then lt_cv_prog_gnu_ld=$lt_cv_prog_gnu_ldcxx else $as_unset lt_cv_prog_gnu_ld fi if test -n "${lt_cv_path_LDCXX+set}"; then lt_cv_path_LD=$lt_cv_path_LDCXX else $as_unset lt_cv_path_LD fi test -z "${LDCXX+set}" || LD=$LDCXX CC=${CXX-"c++"} compiler=$CC compiler_CXX=$CC for cc_temp in $compiler""; do case $cc_temp in compile | *[\\/]compile | ccache | *[\\/]ccache ) ;; distcc | *[\\/]distcc | purify | *[\\/]purify ) ;; \-*) ;; *) break;; esac done cc_basename=`$echo "X$cc_temp" | $Xsed -e 's%.*/%%' -e "s%^$host_alias-%%"` # We don't want -fno-exception wen compiling C++ code, so set the # no_builtin_flag separately if test "$GXX" = yes; then lt_prog_compiler_no_builtin_flag_CXX=' -fno-builtin' else lt_prog_compiler_no_builtin_flag_CXX= fi if test "$GXX" = yes; then # Set up default GNU C++ configuration # Check whether --with-gnu-ld or --without-gnu-ld was given. if test "${with_gnu_ld+set}" = set; then withval="$with_gnu_ld" test "$withval" = no || with_gnu_ld=yes else with_gnu_ld=no fi; ac_prog=ld if test "$GCC" = yes; then # Check if gcc -print-prog-name=ld gives a path. echo "$as_me:$LINENO: checking for ld used by $CC" >&5 echo $ECHO_N "checking for ld used by $CC... $ECHO_C" >&6 case $host in *-*-mingw*) # gcc leaves a trailing carriage return which upsets mingw ac_prog=`($CC -print-prog-name=ld) 2>&5 | tr -d '\015'` ;; *) ac_prog=`($CC -print-prog-name=ld) 2>&5` ;; esac case $ac_prog in # Accept absolute paths. [\\/]* | ?:[\\/]*) re_direlt='/[^/][^/]*/\.\./' # Canonicalize the pathname of ld ac_prog=`echo $ac_prog| $SED 's%\\\\%/%g'` while echo $ac_prog | grep "$re_direlt" > /dev/null 2>&1; do ac_prog=`echo $ac_prog| $SED "s%$re_direlt%/%"` done test -z "$LD" && LD="$ac_prog" ;; "") # If it fails, then pretend we aren't using GCC. ac_prog=ld ;; *) # If it is relative, then search for the first ld in PATH. with_gnu_ld=unknown ;; esac elif test "$with_gnu_ld" = yes; then echo "$as_me:$LINENO: checking for GNU ld" >&5 echo $ECHO_N "checking for GNU ld... $ECHO_C" >&6 else echo "$as_me:$LINENO: checking for non-GNU ld" >&5 echo $ECHO_N "checking for non-GNU ld... $ECHO_C" >&6 fi if test "${lt_cv_path_LD+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -z "$LD"; then lt_save_ifs="$IFS"; IFS=$PATH_SEPARATOR for ac_dir in $PATH; do IFS="$lt_save_ifs" test -z "$ac_dir" && ac_dir=. if test -f "$ac_dir/$ac_prog" || test -f "$ac_dir/$ac_prog$ac_exeext"; then lt_cv_path_LD="$ac_dir/$ac_prog" # Check to see if the program is GNU ld. I'd rather use --version, # but apparently some variants of GNU ld only accept -v. # Break only if it was the GNU/non-GNU ld that we prefer. case `"$lt_cv_path_LD" -v 2>&1 &5 echo "${ECHO_T}$LD" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi test -z "$LD" && { { echo "$as_me:$LINENO: error: no acceptable ld found in \$PATH" >&5 echo "$as_me: error: no acceptable ld found in \$PATH" >&2;} { (exit 1); exit 1; }; } echo "$as_me:$LINENO: checking if the linker ($LD) is GNU ld" >&5 echo $ECHO_N "checking if the linker ($LD) is GNU ld... $ECHO_C" >&6 if test "${lt_cv_prog_gnu_ld+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else # I'd rather use --version here, but apparently some GNU lds only accept -v. case `$LD -v 2>&1 &5 echo "${ECHO_T}$lt_cv_prog_gnu_ld" >&6 with_gnu_ld=$lt_cv_prog_gnu_ld # Check if GNU C++ uses GNU ld as the underlying linker, since the # archiving commands below assume that GNU ld is being used. if test "$with_gnu_ld" = yes; then archive_cmds_CXX='$CC -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $wl$soname -o $lib' archive_expsym_cmds_CXX='$CC -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' hardcode_libdir_flag_spec_CXX='${wl}--rpath ${wl}$libdir' export_dynamic_flag_spec_CXX='${wl}--export-dynamic' # If archive_cmds runs LD, not CC, wlarc should be empty # XXX I think wlarc can be eliminated in ltcf-cxx, but I need to # investigate it a little bit more. (MM) wlarc='${wl}' # ancient GNU ld didn't support --whole-archive et. al. if eval "`$CC -print-prog-name=ld` --help 2>&1" | \ grep 'no-whole-archive' > /dev/null; then whole_archive_flag_spec_CXX="$wlarc"'--whole-archive$convenience '"$wlarc"'--no-whole-archive' else whole_archive_flag_spec_CXX= fi else with_gnu_ld=no wlarc= # A generic and very simple default shared library creation # command for GNU C++ for the case where it uses the native # linker, instead of GNU ld. If possible, this setting should # overridden to take advantage of the native linker features on # the platform it is being used on. archive_cmds_CXX='$CC -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -o $lib' fi # Commands to make compiler produce verbose output that lists # what "hidden" libraries, object files and flags are used when # linking a shared library. output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | grep "\-L"' else GXX=no with_gnu_ld=no wlarc= fi # PORTME: fill in a description of your system's C++ link characteristics echo "$as_me:$LINENO: checking whether the $compiler linker ($LD) supports shared libraries" >&5 echo $ECHO_N "checking whether the $compiler linker ($LD) supports shared libraries... $ECHO_C" >&6 ld_shlibs_CXX=yes case $host_os in aix3*) # FIXME: insert proper C++ library support ld_shlibs_CXX=no ;; aix4* | aix5*) if test "$host_cpu" = ia64; then # On IA64, the linker does run time linking by default, so we don't # have to do anything special. aix_use_runtimelinking=no exp_sym_flag='-Bexport' no_entry_flag="" else aix_use_runtimelinking=no # Test if we are trying to use run time linking or normal # AIX style linking. If -brtl is somewhere in LDFLAGS, we # need to do runtime linking. case $host_os in aix4.[23]|aix4.[23].*|aix5*) for ld_flag in $LDFLAGS; do case $ld_flag in *-brtl*) aix_use_runtimelinking=yes break ;; esac done ;; esac exp_sym_flag='-bexport' no_entry_flag='-bnoentry' fi # When large executables or shared objects are built, AIX ld can # have problems creating the table of contents. If linking a library # or program results in "error TOC overflow" add -mminimal-toc to # CXXFLAGS/CFLAGS for g++/gcc. In the cases where that is not # enough to fix the problem, add -Wl,-bbigtoc to LDFLAGS. archive_cmds_CXX='' hardcode_direct_CXX=yes hardcode_libdir_separator_CXX=':' link_all_deplibs_CXX=yes if test "$GXX" = yes; then case $host_os in aix4.[012]|aix4.[012].*) # We only want to do this on AIX 4.2 and lower, the check # below for broken collect2 doesn't work under 4.3+ collect2name=`${CC} -print-prog-name=collect2` if test -f "$collect2name" && \ strings "$collect2name" | grep resolve_lib_name >/dev/null then # We have reworked collect2 hardcode_direct_CXX=yes else # We have old collect2 hardcode_direct_CXX=unsupported # It fails to find uninstalled libraries when the uninstalled # path is not listed in the libpath. Setting hardcode_minus_L # to unsupported forces relinking hardcode_minus_L_CXX=yes hardcode_libdir_flag_spec_CXX='-L$libdir' hardcode_libdir_separator_CXX= fi ;; esac shared_flag='-shared' if test "$aix_use_runtimelinking" = yes; then shared_flag="$shared_flag "'${wl}-G' fi else # not using gcc if test "$host_cpu" = ia64; then # VisualAge C++, Version 5.5 for AIX 5L for IA-64, Beta 3 Release # chokes on -Wl,-G. The following line is correct: shared_flag='-G' else if test "$aix_use_runtimelinking" = yes; then shared_flag='${wl}-G' else shared_flag='${wl}-bM:SRE' fi fi fi # It seems that -bexpall does not export symbols beginning with # underscore (_), so it is better to generate a list of symbols to export. always_export_symbols_CXX=yes if test "$aix_use_runtimelinking" = yes; then # Warning - without using the other runtime loading flags (-brtl), # -berok will link without error, but may produce a broken library. allow_undefined_flag_CXX='-berok' # Determine the default libpath from the value encoded in an empty executable. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_cxx_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then aix_libpath=`dump -H conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } }'` # Check for a 64-bit object if we didn't find anything. if test -z "$aix_libpath"; then aix_libpath=`dump -HX64 conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } }'`; fi else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext if test -z "$aix_libpath"; then aix_libpath="/usr/lib:/lib"; fi hardcode_libdir_flag_spec_CXX='${wl}-blibpath:$libdir:'"$aix_libpath" archive_expsym_cmds_CXX="\$CC"' -o $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' $compiler_flags `if test "x${allow_undefined_flag}" != "x"; then echo "${wl}${allow_undefined_flag}"; else :; fi` '"\${wl}$exp_sym_flag:\$export_symbols $shared_flag" else if test "$host_cpu" = ia64; then hardcode_libdir_flag_spec_CXX='${wl}-R $libdir:/usr/lib:/lib' allow_undefined_flag_CXX="-z nodefs" archive_expsym_cmds_CXX="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' $compiler_flags ${wl}${allow_undefined_flag} '"\${wl}$exp_sym_flag:\$export_symbols" else # Determine the default libpath from the value encoded in an empty executable. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_cxx_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then aix_libpath=`dump -H conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } }'` # Check for a 64-bit object if we didn't find anything. if test -z "$aix_libpath"; then aix_libpath=`dump -HX64 conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } }'`; fi else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext if test -z "$aix_libpath"; then aix_libpath="/usr/lib:/lib"; fi hardcode_libdir_flag_spec_CXX='${wl}-blibpath:$libdir:'"$aix_libpath" # Warning - without using the other run time loading flags, # -berok will link without error, but may produce a broken library. no_undefined_flag_CXX=' ${wl}-bernotok' allow_undefined_flag_CXX=' ${wl}-berok' # Exported symbols can be pulled into shared objects from archives whole_archive_flag_spec_CXX='$convenience' archive_cmds_need_lc_CXX=yes # This is similar to how AIX traditionally builds its shared libraries. archive_expsym_cmds_CXX="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs ${wl}-bnoentry $compiler_flags ${wl}-bE:$export_symbols${allow_undefined_flag}~$AR $AR_FLAGS $output_objdir/$libname$release.a $output_objdir/$soname' fi fi ;; beos*) if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then allow_undefined_flag_CXX=unsupported # Joseph Beckenbach says some releases of gcc # support --undefined. This deserves some investigation. FIXME archive_cmds_CXX='$CC -nostart $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' else ld_shlibs_CXX=no fi ;; chorus*) case $cc_basename in *) # FIXME: insert proper C++ library support ld_shlibs_CXX=no ;; esac ;; cygwin* | mingw* | pw32*) # _LT_AC_TAGVAR(hardcode_libdir_flag_spec, CXX) is actually meaningless, # as there is no search path for DLLs. hardcode_libdir_flag_spec_CXX='-L$libdir' allow_undefined_flag_CXX=unsupported always_export_symbols_CXX=no enable_shared_with_static_runtimes_CXX=yes if $LD --help 2>&1 | grep 'auto-import' > /dev/null; then archive_cmds_CXX='$CC -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' # If the export-symbols file already is a .def file (1st line # is EXPORTS), use it as is; otherwise, prepend... archive_expsym_cmds_CXX='if test "x`$SED 1q $export_symbols`" = xEXPORTS; then cp $export_symbols $output_objdir/$soname.def; else echo EXPORTS > $output_objdir/$soname.def; cat $export_symbols >> $output_objdir/$soname.def; fi~ $CC -shared -nostdlib $output_objdir/$soname.def $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' else ld_shlibs_CXX=no fi ;; darwin* | rhapsody*) case $host_os in rhapsody* | darwin1.[012]) allow_undefined_flag_CXX='${wl}-undefined ${wl}suppress' ;; *) # Darwin 1.3 on if test -z ${MACOSX_DEPLOYMENT_TARGET} ; then allow_undefined_flag_CXX='${wl}-flat_namespace ${wl}-undefined ${wl}suppress' else case ${MACOSX_DEPLOYMENT_TARGET} in 10.[012]) allow_undefined_flag_CXX='${wl}-flat_namespace ${wl}-undefined ${wl}suppress' ;; 10.*) allow_undefined_flag_CXX='${wl}-undefined ${wl}dynamic_lookup' ;; esac fi ;; esac archive_cmds_need_lc_CXX=no hardcode_direct_CXX=no hardcode_automatic_CXX=yes hardcode_shlibpath_var_CXX=unsupported whole_archive_flag_spec_CXX='' link_all_deplibs_CXX=yes if test "$GXX" = yes ; then lt_int_apple_cc_single_mod=no output_verbose_link_cmd='echo' if $CC -dumpspecs 2>&1 | $EGREP 'single_module' >/dev/null ; then lt_int_apple_cc_single_mod=yes fi if test "X$lt_int_apple_cc_single_mod" = Xyes ; then archive_cmds_CXX='$CC -dynamiclib -single_module $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags -install_name $rpath/$soname $verstring' else archive_cmds_CXX='$CC -r -keep_private_externs -nostdlib -o ${lib}-master.o $libobjs~$CC -dynamiclib $allow_undefined_flag -o $lib ${lib}-master.o $deplibs $compiler_flags -install_name $rpath/$soname $verstring' fi module_cmds_CXX='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags' # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin lds if test "X$lt_int_apple_cc_single_mod" = Xyes ; then archive_expsym_cmds_CXX='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -dynamiclib -single_module $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags -install_name $rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' else archive_expsym_cmds_CXX='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -r -keep_private_externs -nostdlib -o ${lib}-master.o $libobjs~$CC -dynamiclib $allow_undefined_flag -o $lib ${lib}-master.o $deplibs $compiler_flags -install_name $rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' fi module_expsym_cmds_CXX='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' else case $cc_basename in xlc*) output_verbose_link_cmd='echo' archive_cmds_CXX='$CC -qmkshrobj ${wl}-single_module $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}`echo $rpath/$soname` $verstring' module_cmds_CXX='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags' # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin lds archive_expsym_cmds_CXX='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -qmkshrobj ${wl}-single_module $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}$rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' module_expsym_cmds_CXX='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' ;; *) ld_shlibs_CXX=no ;; esac fi ;; dgux*) case $cc_basename in ec++*) # FIXME: insert proper C++ library support ld_shlibs_CXX=no ;; ghcx*) # Green Hills C++ Compiler # FIXME: insert proper C++ library support ld_shlibs_CXX=no ;; *) # FIXME: insert proper C++ library support ld_shlibs_CXX=no ;; esac ;; freebsd[12]*) # C++ shared libraries reported to be fairly broken before switch to ELF ld_shlibs_CXX=no ;; freebsd-elf*) archive_cmds_need_lc_CXX=no ;; freebsd* | kfreebsd*-gnu | dragonfly*) # FreeBSD 3 and later use GNU C++ and GNU ld with standard ELF # conventions ld_shlibs_CXX=yes ;; gnu*) ;; hpux9*) hardcode_libdir_flag_spec_CXX='${wl}+b ${wl}$libdir' hardcode_libdir_separator_CXX=: export_dynamic_flag_spec_CXX='${wl}-E' hardcode_direct_CXX=yes hardcode_minus_L_CXX=yes # Not in the search PATH, # but as the default # location of the library. case $cc_basename in CC*) # FIXME: insert proper C++ library support ld_shlibs_CXX=no ;; aCC*) archive_cmds_CXX='$rm $output_objdir/$soname~$CC -b ${wl}+b ${wl}$install_libdir -o $output_objdir/$soname $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags~test $output_objdir/$soname = $lib || mv $output_objdir/$soname $lib' # Commands to make compiler produce verbose output that lists # what "hidden" libraries, object files and flags are used when # linking a shared library. # # There doesn't appear to be a way to prevent this compiler from # explicitly linking system object files so we need to strip them # from the output so that they don't get included in the library # dependencies. output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v conftest.$objext 2>&1) | grep "[-]L"`; list=""; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; echo $list' ;; *) if test "$GXX" = yes; then archive_cmds_CXX='$rm $output_objdir/$soname~$CC -shared -nostdlib -fPIC ${wl}+b ${wl}$install_libdir -o $output_objdir/$soname $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags~test $output_objdir/$soname = $lib || mv $output_objdir/$soname $lib' else # FIXME: insert proper C++ library support ld_shlibs_CXX=no fi ;; esac ;; hpux10*|hpux11*) if test $with_gnu_ld = no; then hardcode_libdir_flag_spec_CXX='${wl}+b ${wl}$libdir' hardcode_libdir_separator_CXX=: case $host_cpu in hppa*64*|ia64*) hardcode_libdir_flag_spec_ld_CXX='+b $libdir' ;; *) export_dynamic_flag_spec_CXX='${wl}-E' ;; esac fi case $host_cpu in hppa*64*|ia64*) hardcode_direct_CXX=no hardcode_shlibpath_var_CXX=no ;; *) hardcode_direct_CXX=yes hardcode_minus_L_CXX=yes # Not in the search PATH, # but as the default # location of the library. ;; esac case $cc_basename in CC*) # FIXME: insert proper C++ library support ld_shlibs_CXX=no ;; aCC*) case $host_cpu in hppa*64*) archive_cmds_CXX='$CC -b ${wl}+h ${wl}$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' ;; ia64*) archive_cmds_CXX='$CC -b ${wl}+h ${wl}$soname ${wl}+nodefaultrpath -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' ;; *) archive_cmds_CXX='$CC -b ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' ;; esac # Commands to make compiler produce verbose output that lists # what "hidden" libraries, object files and flags are used when # linking a shared library. # # There doesn't appear to be a way to prevent this compiler from # explicitly linking system object files so we need to strip them # from the output so that they don't get included in the library # dependencies. output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v conftest.$objext 2>&1) | grep "\-L"`; list=""; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; echo $list' ;; *) if test "$GXX" = yes; then if test $with_gnu_ld = no; then case $host_cpu in hppa*64*) archive_cmds_CXX='$CC -shared -nostdlib -fPIC ${wl}+h ${wl}$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' ;; ia64*) archive_cmds_CXX='$CC -shared -nostdlib -fPIC ${wl}+h ${wl}$soname ${wl}+nodefaultrpath -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' ;; *) archive_cmds_CXX='$CC -shared -nostdlib -fPIC ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' ;; esac fi else # FIXME: insert proper C++ library support ld_shlibs_CXX=no fi ;; esac ;; interix3*) hardcode_direct_CXX=no hardcode_shlibpath_var_CXX=no hardcode_libdir_flag_spec_CXX='${wl}-rpath,$libdir' export_dynamic_flag_spec_CXX='${wl}-E' # Hack: On Interix 3.x, we cannot compile PIC because of a broken gcc. # Instead, shared libraries are loaded at an image base (0x10000000 by # default) and relocated if they conflict, which is a slow very memory # consuming and fragmenting process. To avoid this, we pick a random, # 256 KiB-aligned image base between 0x50000000 and 0x6FFC0000 at link # time. Moving up from 0x10000000 also allows more sbrk(2) space. archive_cmds_CXX='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags ${wl}-h,$soname ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' archive_expsym_cmds_CXX='sed "s,^,_," $export_symbols >$output_objdir/$soname.expsym~$CC -shared $pic_flag $libobjs $deplibs $compiler_flags ${wl}-h,$soname ${wl}--retain-symbols-file,$output_objdir/$soname.expsym ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' ;; irix5* | irix6*) case $cc_basename in CC*) # SGI C++ archive_cmds_CXX='$CC -shared -all -multigot $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib' # Archives containing C++ object files must be created using # "CC -ar", where "CC" is the IRIX C++ compiler. This is # necessary to make sure instantiated templates are included # in the archive. old_archive_cmds_CXX='$CC -ar -WR,-u -o $oldlib $oldobjs' ;; *) if test "$GXX" = yes; then if test "$with_gnu_ld" = no; then archive_cmds_CXX='$CC -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' else archive_cmds_CXX='$CC -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` -o $lib' fi fi link_all_deplibs_CXX=yes ;; esac hardcode_libdir_flag_spec_CXX='${wl}-rpath ${wl}$libdir' hardcode_libdir_separator_CXX=: ;; linux*) case $cc_basename in KCC*) # Kuck and Associates, Inc. (KAI) C++ Compiler # KCC will only create a shared library if the output file # ends with ".so" (or ".sl" for HP-UX), so rename the library # to its proper name (with version) after linking. archive_cmds_CXX='tempext=`echo $shared_ext | $SED -e '\''s/\([^()0-9A-Za-z{}]\)/\\\\\1/g'\''`; templib=`echo $lib | $SED -e "s/\${tempext}\..*/.so/"`; $CC $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags --soname $soname -o \$templib; mv \$templib $lib' archive_expsym_cmds_CXX='tempext=`echo $shared_ext | $SED -e '\''s/\([^()0-9A-Za-z{}]\)/\\\\\1/g'\''`; templib=`echo $lib | $SED -e "s/\${tempext}\..*/.so/"`; $CC $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags --soname $soname -o \$templib ${wl}-retain-symbols-file,$export_symbols; mv \$templib $lib' # Commands to make compiler produce verbose output that lists # what "hidden" libraries, object files and flags are used when # linking a shared library. # # There doesn't appear to be a way to prevent this compiler from # explicitly linking system object files so we need to strip them # from the output so that they don't get included in the library # dependencies. output_verbose_link_cmd='templist=`$CC $CFLAGS -v conftest.$objext -o libconftest$shared_ext 2>&1 | grep "ld"`; rm -f libconftest$shared_ext; list=""; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; echo $list' hardcode_libdir_flag_spec_CXX='${wl}--rpath,$libdir' export_dynamic_flag_spec_CXX='${wl}--export-dynamic' # Archives containing C++ object files must be created using # "CC -Bstatic", where "CC" is the KAI C++ compiler. old_archive_cmds_CXX='$CC -Bstatic -o $oldlib $oldobjs' ;; icpc*) # Intel C++ with_gnu_ld=yes # version 8.0 and above of icpc choke on multiply defined symbols # if we add $predep_objects and $postdep_objects, however 7.1 and # earlier do not add the objects themselves. case `$CC -V 2>&1` in *"Version 7."*) archive_cmds_CXX='$CC -shared $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $wl$soname -o $lib' archive_expsym_cmds_CXX='$CC -shared $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' ;; *) # Version 8.0 or newer tmp_idyn= case $host_cpu in ia64*) tmp_idyn=' -i_dynamic';; esac archive_cmds_CXX='$CC -shared'"$tmp_idyn"' $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' archive_expsym_cmds_CXX='$CC -shared'"$tmp_idyn"' $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' ;; esac archive_cmds_need_lc_CXX=no hardcode_libdir_flag_spec_CXX='${wl}-rpath,$libdir' export_dynamic_flag_spec_CXX='${wl}--export-dynamic' whole_archive_flag_spec_CXX='${wl}--whole-archive$convenience ${wl}--no-whole-archive' ;; pgCC*) # Portland Group C++ compiler archive_cmds_CXX='$CC -shared $pic_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname -o $lib' archive_expsym_cmds_CXX='$CC -shared $pic_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname ${wl}-retain-symbols-file ${wl}$export_symbols -o $lib' hardcode_libdir_flag_spec_CXX='${wl}--rpath ${wl}$libdir' export_dynamic_flag_spec_CXX='${wl}--export-dynamic' whole_archive_flag_spec_CXX='${wl}--whole-archive`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}--no-whole-archive' ;; cxx*) # Compaq C++ archive_cmds_CXX='$CC -shared $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $wl$soname -o $lib' archive_expsym_cmds_CXX='$CC -shared $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $wl$soname -o $lib ${wl}-retain-symbols-file $wl$export_symbols' runpath_var=LD_RUN_PATH hardcode_libdir_flag_spec_CXX='-rpath $libdir' hardcode_libdir_separator_CXX=: # Commands to make compiler produce verbose output that lists # what "hidden" libraries, object files and flags are used when # linking a shared library. # # There doesn't appear to be a way to prevent this compiler from # explicitly linking system object files so we need to strip them # from the output so that they don't get included in the library # dependencies. output_verbose_link_cmd='templist=`$CC -shared $CFLAGS -v conftest.$objext 2>&1 | grep "ld"`; templist=`echo $templist | $SED "s/\(^.*ld.*\)\( .*ld .*$\)/\1/"`; list=""; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; echo $list' ;; esac ;; lynxos*) # FIXME: insert proper C++ library support ld_shlibs_CXX=no ;; m88k*) # FIXME: insert proper C++ library support ld_shlibs_CXX=no ;; mvs*) case $cc_basename in cxx*) # FIXME: insert proper C++ library support ld_shlibs_CXX=no ;; *) # FIXME: insert proper C++ library support ld_shlibs_CXX=no ;; esac ;; netbsd*) if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then archive_cmds_CXX='$LD -Bshareable -o $lib $predep_objects $libobjs $deplibs $postdep_objects $linker_flags' wlarc= hardcode_libdir_flag_spec_CXX='-R$libdir' hardcode_direct_CXX=yes hardcode_shlibpath_var_CXX=no fi # Workaround some broken pre-1.5 toolchains output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | grep conftest.$objext | $SED -e "s:-lgcc -lc -lgcc::"' ;; openbsd2*) # C++ shared libraries are fairly broken ld_shlibs_CXX=no ;; openbsd*) hardcode_direct_CXX=yes hardcode_shlibpath_var_CXX=no archive_cmds_CXX='$CC -shared $pic_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -o $lib' hardcode_libdir_flag_spec_CXX='${wl}-rpath,$libdir' if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`" || test "$host_os-$host_cpu" = "openbsd2.8-powerpc"; then archive_expsym_cmds_CXX='$CC -shared $pic_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-retain-symbols-file,$export_symbols -o $lib' export_dynamic_flag_spec_CXX='${wl}-E' whole_archive_flag_spec_CXX="$wlarc"'--whole-archive$convenience '"$wlarc"'--no-whole-archive' fi output_verbose_link_cmd='echo' ;; osf3*) case $cc_basename in KCC*) # Kuck and Associates, Inc. (KAI) C++ Compiler # KCC will only create a shared library if the output file # ends with ".so" (or ".sl" for HP-UX), so rename the library # to its proper name (with version) after linking. archive_cmds_CXX='tempext=`echo $shared_ext | $SED -e '\''s/\([^()0-9A-Za-z{}]\)/\\\\\1/g'\''`; templib=`echo $lib | $SED -e "s/\${tempext}\..*/.so/"`; $CC $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags --soname $soname -o \$templib; mv \$templib $lib' hardcode_libdir_flag_spec_CXX='${wl}-rpath,$libdir' hardcode_libdir_separator_CXX=: # Archives containing C++ object files must be created using # "CC -Bstatic", where "CC" is the KAI C++ compiler. old_archive_cmds_CXX='$CC -Bstatic -o $oldlib $oldobjs' ;; RCC*) # Rational C++ 2.4.1 # FIXME: insert proper C++ library support ld_shlibs_CXX=no ;; cxx*) allow_undefined_flag_CXX=' ${wl}-expect_unresolved ${wl}\*' archive_cmds_CXX='$CC -shared${allow_undefined_flag} $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $soname `test -n "$verstring" && echo ${wl}-set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib' hardcode_libdir_flag_spec_CXX='${wl}-rpath ${wl}$libdir' hardcode_libdir_separator_CXX=: # Commands to make compiler produce verbose output that lists # what "hidden" libraries, object files and flags are used when # linking a shared library. # # There doesn't appear to be a way to prevent this compiler from # explicitly linking system object files so we need to strip them # from the output so that they don't get included in the library # dependencies. output_verbose_link_cmd='templist=`$CC -shared $CFLAGS -v conftest.$objext 2>&1 | grep "ld" | grep -v "ld:"`; templist=`echo $templist | $SED "s/\(^.*ld.*\)\( .*ld.*$\)/\1/"`; list=""; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; echo $list' ;; *) if test "$GXX" = yes && test "$with_gnu_ld" = no; then allow_undefined_flag_CXX=' ${wl}-expect_unresolved ${wl}\*' archive_cmds_CXX='$CC -shared -nostdlib ${allow_undefined_flag} $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' hardcode_libdir_flag_spec_CXX='${wl}-rpath ${wl}$libdir' hardcode_libdir_separator_CXX=: # Commands to make compiler produce verbose output that lists # what "hidden" libraries, object files and flags are used when # linking a shared library. output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | grep "\-L"' else # FIXME: insert proper C++ library support ld_shlibs_CXX=no fi ;; esac ;; osf4* | osf5*) case $cc_basename in KCC*) # Kuck and Associates, Inc. (KAI) C++ Compiler # KCC will only create a shared library if the output file # ends with ".so" (or ".sl" for HP-UX), so rename the library # to its proper name (with version) after linking. archive_cmds_CXX='tempext=`echo $shared_ext | $SED -e '\''s/\([^()0-9A-Za-z{}]\)/\\\\\1/g'\''`; templib=`echo $lib | $SED -e "s/\${tempext}\..*/.so/"`; $CC $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags --soname $soname -o \$templib; mv \$templib $lib' hardcode_libdir_flag_spec_CXX='${wl}-rpath,$libdir' hardcode_libdir_separator_CXX=: # Archives containing C++ object files must be created using # the KAI C++ compiler. old_archive_cmds_CXX='$CC -o $oldlib $oldobjs' ;; RCC*) # Rational C++ 2.4.1 # FIXME: insert proper C++ library support ld_shlibs_CXX=no ;; cxx*) allow_undefined_flag_CXX=' -expect_unresolved \*' archive_cmds_CXX='$CC -shared${allow_undefined_flag} $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -msym -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib' archive_expsym_cmds_CXX='for i in `cat $export_symbols`; do printf "%s %s\\n" -exported_symbol "\$i" >> $lib.exp; done~ echo "-hidden">> $lib.exp~ $CC -shared$allow_undefined_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -msym -soname $soname -Wl,-input -Wl,$lib.exp `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib~ $rm $lib.exp' hardcode_libdir_flag_spec_CXX='-rpath $libdir' hardcode_libdir_separator_CXX=: # Commands to make compiler produce verbose output that lists # what "hidden" libraries, object files and flags are used when # linking a shared library. # # There doesn't appear to be a way to prevent this compiler from # explicitly linking system object files so we need to strip them # from the output so that they don't get included in the library # dependencies. output_verbose_link_cmd='templist=`$CC -shared $CFLAGS -v conftest.$objext 2>&1 | grep "ld" | grep -v "ld:"`; templist=`echo $templist | $SED "s/\(^.*ld.*\)\( .*ld.*$\)/\1/"`; list=""; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; echo $list' ;; *) if test "$GXX" = yes && test "$with_gnu_ld" = no; then allow_undefined_flag_CXX=' ${wl}-expect_unresolved ${wl}\*' archive_cmds_CXX='$CC -shared -nostdlib ${allow_undefined_flag} $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-msym ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' hardcode_libdir_flag_spec_CXX='${wl}-rpath ${wl}$libdir' hardcode_libdir_separator_CXX=: # Commands to make compiler produce verbose output that lists # what "hidden" libraries, object files and flags are used when # linking a shared library. output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | grep "\-L"' else # FIXME: insert proper C++ library support ld_shlibs_CXX=no fi ;; esac ;; psos*) # FIXME: insert proper C++ library support ld_shlibs_CXX=no ;; sunos4*) case $cc_basename in CC*) # Sun C++ 4.x # FIXME: insert proper C++ library support ld_shlibs_CXX=no ;; lcc*) # Lucid # FIXME: insert proper C++ library support ld_shlibs_CXX=no ;; *) # FIXME: insert proper C++ library support ld_shlibs_CXX=no ;; esac ;; solaris*) case $cc_basename in CC*) # Sun C++ 4.2, 5.x and Centerline C++ archive_cmds_need_lc_CXX=yes no_undefined_flag_CXX=' -zdefs' archive_cmds_CXX='$CC -G${allow_undefined_flag} -h$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' archive_expsym_cmds_CXX='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~ $CC -G${allow_undefined_flag} ${wl}-M ${wl}$lib.exp -h$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags~$rm $lib.exp' hardcode_libdir_flag_spec_CXX='-R$libdir' hardcode_shlibpath_var_CXX=no case $host_os in solaris2.[0-5] | solaris2.[0-5].*) ;; *) # The C++ compiler is used as linker so we must use $wl # flag to pass the commands to the underlying system # linker. We must also pass each convience library through # to the system linker between allextract/defaultextract. # The C++ compiler will combine linker options so we # cannot just pass the convience library names through # without $wl. # Supported since Solaris 2.6 (maybe 2.5.1?) whole_archive_flag_spec_CXX='${wl}-z ${wl}allextract`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}-z ${wl}defaultextract' ;; esac link_all_deplibs_CXX=yes output_verbose_link_cmd='echo' # Archives containing C++ object files must be created using # "CC -xar", where "CC" is the Sun C++ compiler. This is # necessary to make sure instantiated templates are included # in the archive. old_archive_cmds_CXX='$CC -xar -o $oldlib $oldobjs' ;; gcx*) # Green Hills C++ Compiler archive_cmds_CXX='$CC -shared $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-h $wl$soname -o $lib' # The C++ compiler must be used to create the archive. old_archive_cmds_CXX='$CC $LDFLAGS -archive -o $oldlib $oldobjs' ;; *) # GNU C++ compiler with Solaris linker if test "$GXX" = yes && test "$with_gnu_ld" = no; then no_undefined_flag_CXX=' ${wl}-z ${wl}defs' if $CC --version | grep -v '^2\.7' > /dev/null; then archive_cmds_CXX='$CC -shared -nostdlib $LDFLAGS $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-h $wl$soname -o $lib' archive_expsym_cmds_CXX='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~ $CC -shared -nostdlib ${wl}-M $wl$lib.exp -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags~$rm $lib.exp' # Commands to make compiler produce verbose output that lists # what "hidden" libraries, object files and flags are used when # linking a shared library. output_verbose_link_cmd="$CC -shared $CFLAGS -v conftest.$objext 2>&1 | grep \"\-L\"" else # g++ 2.7 appears to require `-G' NOT `-shared' on this # platform. archive_cmds_CXX='$CC -G -nostdlib $LDFLAGS $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-h $wl$soname -o $lib' archive_expsym_cmds_CXX='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~ $CC -G -nostdlib ${wl}-M $wl$lib.exp -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags~$rm $lib.exp' # Commands to make compiler produce verbose output that lists # what "hidden" libraries, object files and flags are used when # linking a shared library. output_verbose_link_cmd="$CC -G $CFLAGS -v conftest.$objext 2>&1 | grep \"\-L\"" fi hardcode_libdir_flag_spec_CXX='${wl}-R $wl$libdir' fi ;; esac ;; sysv4*uw2* | sysv5OpenUNIX* | sysv5UnixWare7.[01].[10]* | unixware7* | sco3.2v5.0.[024]*) no_undefined_flag_CXX='${wl}-z,text' archive_cmds_need_lc_CXX=no hardcode_shlibpath_var_CXX=no runpath_var='LD_RUN_PATH' case $cc_basename in CC*) archive_cmds_CXX='$CC -G ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' archive_expsym_cmds_CXX='$CC -G ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' ;; *) archive_cmds_CXX='$CC -shared ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' archive_expsym_cmds_CXX='$CC -shared ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' ;; esac ;; sysv5* | sco3.2v5* | sco5v6*) # Note: We can NOT use -z defs as we might desire, because we do not # link with -lc, and that would cause any symbols used from libc to # always be unresolved, which means just about no library would # ever link correctly. If we're not using GNU ld we use -z text # though, which does catch some bad symbols but isn't as heavy-handed # as -z defs. # For security reasons, it is highly recommended that you always # use absolute paths for naming shared libraries, and exclude the # DT_RUNPATH tag from executables and libraries. But doing so # requires that you compile everything twice, which is a pain. # So that behaviour is only enabled if SCOABSPATH is set to a # non-empty value in the environment. Most likely only useful for # creating official distributions of packages. # This is a hack until libtool officially supports absolute path # names for shared libraries. no_undefined_flag_CXX='${wl}-z,text' allow_undefined_flag_CXX='${wl}-z,nodefs' archive_cmds_need_lc_CXX=no hardcode_shlibpath_var_CXX=no hardcode_libdir_flag_spec_CXX='`test -z "$SCOABSPATH" && echo ${wl}-R,$libdir`' hardcode_libdir_separator_CXX=':' link_all_deplibs_CXX=yes export_dynamic_flag_spec_CXX='${wl}-Bexport' runpath_var='LD_RUN_PATH' case $cc_basename in CC*) archive_cmds_CXX='$CC -G ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' archive_expsym_cmds_CXX='$CC -G ${wl}-Bexport:$export_symbols ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' ;; *) archive_cmds_CXX='$CC -shared ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' archive_expsym_cmds_CXX='$CC -shared ${wl}-Bexport:$export_symbols ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' ;; esac ;; tandem*) case $cc_basename in NCC*) # NonStop-UX NCC 3.20 # FIXME: insert proper C++ library support ld_shlibs_CXX=no ;; *) # FIXME: insert proper C++ library support ld_shlibs_CXX=no ;; esac ;; vxworks*) # FIXME: insert proper C++ library support ld_shlibs_CXX=no ;; *) # FIXME: insert proper C++ library support ld_shlibs_CXX=no ;; esac echo "$as_me:$LINENO: result: $ld_shlibs_CXX" >&5 echo "${ECHO_T}$ld_shlibs_CXX" >&6 test "$ld_shlibs_CXX" = no && can_build_shared=no GCC_CXX="$GXX" LD_CXX="$LD" cat > conftest.$ac_ext <&5 (eval $ac_compile) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; then # Parse the compiler output and extract the necessary # objects, libraries and library flags. # Sentinel used to keep track of whether or not we are before # the conftest object file. pre_test_object_deps_done=no # The `*' in the case matches for architectures that use `case' in # $output_verbose_cmd can trigger glob expansion during the loop # eval without this substitution. output_verbose_link_cmd=`$echo "X$output_verbose_link_cmd" | $Xsed -e "$no_glob_subst"` for p in `eval $output_verbose_link_cmd`; do case $p in -L* | -R* | -l*) # Some compilers place space between "-{L,R}" and the path. # Remove the space. if test $p = "-L" \ || test $p = "-R"; then prev=$p continue else prev= fi if test "$pre_test_object_deps_done" = no; then case $p in -L* | -R*) # Internal compiler library paths should come after those # provided the user. The postdeps already come after the # user supplied libs so there is no need to process them. if test -z "$compiler_lib_search_path_CXX"; then compiler_lib_search_path_CXX="${prev}${p}" else compiler_lib_search_path_CXX="${compiler_lib_search_path_CXX} ${prev}${p}" fi ;; # The "-l" case would never come before the object being # linked, so don't bother handling this case. esac else if test -z "$postdeps_CXX"; then postdeps_CXX="${prev}${p}" else postdeps_CXX="${postdeps_CXX} ${prev}${p}" fi fi ;; *.$objext) # This assumes that the test object file only shows up # once in the compiler output. if test "$p" = "conftest.$objext"; then pre_test_object_deps_done=yes continue fi if test "$pre_test_object_deps_done" = no; then if test -z "$predep_objects_CXX"; then predep_objects_CXX="$p" else predep_objects_CXX="$predep_objects_CXX $p" fi else if test -z "$postdep_objects_CXX"; then postdep_objects_CXX="$p" else postdep_objects_CXX="$postdep_objects_CXX $p" fi fi ;; *) ;; # Ignore the rest. esac done # Clean up. rm -f a.out a.exe else echo "libtool.m4: error: problem compiling CXX test program" fi $rm -f confest.$objext # PORTME: override above test on systems where it is broken case $host_os in interix3*) # Interix 3.5 installs completely hosed .la files for C++, so rather than # hack all around it, let's just trust "g++" to DTRT. predep_objects_CXX= postdep_objects_CXX= postdeps_CXX= ;; solaris*) case $cc_basename in CC*) # Adding this requires a known-good setup of shared libraries for # Sun compiler versions before 5.6, else PIC objects from an old # archive will be linked into the output, leading to subtle bugs. postdeps_CXX='-lCstd -lCrun' ;; esac ;; esac case " $postdeps_CXX " in *" -lc "*) archive_cmds_need_lc_CXX=no ;; esac lt_prog_compiler_wl_CXX= lt_prog_compiler_pic_CXX= lt_prog_compiler_static_CXX= echo "$as_me:$LINENO: checking for $compiler option to produce PIC" >&5 echo $ECHO_N "checking for $compiler option to produce PIC... $ECHO_C" >&6 # C++ specific cases for pic, static, wl, etc. if test "$GXX" = yes; then lt_prog_compiler_wl_CXX='-Wl,' lt_prog_compiler_static_CXX='-static' case $host_os in aix*) # All AIX code is PIC. if test "$host_cpu" = ia64; then # AIX 5 now supports IA64 processor lt_prog_compiler_static_CXX='-Bstatic' fi ;; amigaos*) # FIXME: we need at least 68020 code to build shared libraries, but # adding the `-m68020' flag to GCC prevents building anything better, # like `-m68040'. lt_prog_compiler_pic_CXX='-m68020 -resident32 -malways-restore-a4' ;; beos* | cygwin* | irix5* | irix6* | nonstopux* | osf3* | osf4* | osf5*) # PIC is the default for these OSes. ;; mingw* | os2* | pw32*) # This hack is so that the source file can tell whether it is being # built for inclusion in a dll (and should export symbols for example). lt_prog_compiler_pic_CXX='-DDLL_EXPORT' ;; darwin* | rhapsody*) # PIC is the default on this platform # Common symbols not allowed in MH_DYLIB files lt_prog_compiler_pic_CXX='-fno-common' ;; *djgpp*) # DJGPP does not support shared libraries at all lt_prog_compiler_pic_CXX= ;; interix3*) # Interix 3.x gcc -fpic/-fPIC options generate broken code. # Instead, we relocate shared libraries at runtime. ;; sysv4*MP*) if test -d /usr/nec; then lt_prog_compiler_pic_CXX=-Kconform_pic fi ;; hpux*) # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but # not for PA HP-UX. case $host_cpu in hppa*64*|ia64*) ;; *) lt_prog_compiler_pic_CXX='-fPIC' ;; esac ;; *) lt_prog_compiler_pic_CXX='-fPIC' ;; esac else case $host_os in aix4* | aix5*) # All AIX code is PIC. if test "$host_cpu" = ia64; then # AIX 5 now supports IA64 processor lt_prog_compiler_static_CXX='-Bstatic' else lt_prog_compiler_static_CXX='-bnso -bI:/lib/syscalls.exp' fi ;; chorus*) case $cc_basename in cxch68*) # Green Hills C++ Compiler # _LT_AC_TAGVAR(lt_prog_compiler_static, CXX)="--no_auto_instantiation -u __main -u __premain -u _abort -r $COOL_DIR/lib/libOrb.a $MVME_DIR/lib/CC/libC.a $MVME_DIR/lib/classix/libcx.s.a" ;; esac ;; darwin*) # PIC is the default on this platform # Common symbols not allowed in MH_DYLIB files case $cc_basename in xlc*) lt_prog_compiler_pic_CXX='-qnocommon' lt_prog_compiler_wl_CXX='-Wl,' ;; esac ;; dgux*) case $cc_basename in ec++*) lt_prog_compiler_pic_CXX='-KPIC' ;; ghcx*) # Green Hills C++ Compiler lt_prog_compiler_pic_CXX='-pic' ;; *) ;; esac ;; freebsd* | kfreebsd*-gnu | dragonfly*) # FreeBSD uses GNU C++ ;; hpux9* | hpux10* | hpux11*) case $cc_basename in CC*) lt_prog_compiler_wl_CXX='-Wl,' lt_prog_compiler_static_CXX='${wl}-a ${wl}archive' if test "$host_cpu" != ia64; then lt_prog_compiler_pic_CXX='+Z' fi ;; aCC*) lt_prog_compiler_wl_CXX='-Wl,' lt_prog_compiler_static_CXX='${wl}-a ${wl}archive' case $host_cpu in hppa*64*|ia64*) # +Z the default ;; *) lt_prog_compiler_pic_CXX='+Z' ;; esac ;; *) ;; esac ;; interix*) # This is c89, which is MS Visual C++ (no shared libs) # Anyone wants to do a port? ;; irix5* | irix6* | nonstopux*) case $cc_basename in CC*) lt_prog_compiler_wl_CXX='-Wl,' lt_prog_compiler_static_CXX='-non_shared' # CC pic flag -KPIC is the default. ;; *) ;; esac ;; linux*) case $cc_basename in KCC*) # KAI C++ Compiler lt_prog_compiler_wl_CXX='--backend -Wl,' lt_prog_compiler_pic_CXX='-fPIC' ;; icpc* | ecpc*) # Intel C++ lt_prog_compiler_wl_CXX='-Wl,' lt_prog_compiler_pic_CXX='-KPIC' lt_prog_compiler_static_CXX='-static' ;; pgCC*) # Portland Group C++ compiler. lt_prog_compiler_wl_CXX='-Wl,' lt_prog_compiler_pic_CXX='-fpic' lt_prog_compiler_static_CXX='-Bstatic' ;; cxx*) # Compaq C++ # Make sure the PIC flag is empty. It appears that all Alpha # Linux and Compaq Tru64 Unix objects are PIC. lt_prog_compiler_pic_CXX= lt_prog_compiler_static_CXX='-non_shared' ;; *) ;; esac ;; lynxos*) ;; m88k*) ;; mvs*) case $cc_basename in cxx*) lt_prog_compiler_pic_CXX='-W c,exportall' ;; *) ;; esac ;; netbsd*) ;; osf3* | osf4* | osf5*) case $cc_basename in KCC*) lt_prog_compiler_wl_CXX='--backend -Wl,' ;; RCC*) # Rational C++ 2.4.1 lt_prog_compiler_pic_CXX='-pic' ;; cxx*) # Digital/Compaq C++ lt_prog_compiler_wl_CXX='-Wl,' # Make sure the PIC flag is empty. It appears that all Alpha # Linux and Compaq Tru64 Unix objects are PIC. lt_prog_compiler_pic_CXX= lt_prog_compiler_static_CXX='-non_shared' ;; *) ;; esac ;; psos*) ;; solaris*) case $cc_basename in CC*) # Sun C++ 4.2, 5.x and Centerline C++ lt_prog_compiler_pic_CXX='-KPIC' lt_prog_compiler_static_CXX='-Bstatic' lt_prog_compiler_wl_CXX='-Qoption ld ' ;; gcx*) # Green Hills C++ Compiler lt_prog_compiler_pic_CXX='-PIC' ;; *) ;; esac ;; sunos4*) case $cc_basename in CC*) # Sun C++ 4.x lt_prog_compiler_pic_CXX='-pic' lt_prog_compiler_static_CXX='-Bstatic' ;; lcc*) # Lucid lt_prog_compiler_pic_CXX='-pic' ;; *) ;; esac ;; tandem*) case $cc_basename in NCC*) # NonStop-UX NCC 3.20 lt_prog_compiler_pic_CXX='-KPIC' ;; *) ;; esac ;; sysv5* | unixware* | sco3.2v5* | sco5v6* | OpenUNIX*) case $cc_basename in CC*) lt_prog_compiler_wl_CXX='-Wl,' lt_prog_compiler_pic_CXX='-KPIC' lt_prog_compiler_static_CXX='-Bstatic' ;; esac ;; vxworks*) ;; *) lt_prog_compiler_can_build_shared_CXX=no ;; esac fi echo "$as_me:$LINENO: result: $lt_prog_compiler_pic_CXX" >&5 echo "${ECHO_T}$lt_prog_compiler_pic_CXX" >&6 # # Check to make sure the PIC flag actually works. # if test -n "$lt_prog_compiler_pic_CXX"; then echo "$as_me:$LINENO: checking if $compiler PIC flag $lt_prog_compiler_pic_CXX works" >&5 echo $ECHO_N "checking if $compiler PIC flag $lt_prog_compiler_pic_CXX works... $ECHO_C" >&6 if test "${lt_prog_compiler_pic_works_CXX+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else lt_prog_compiler_pic_works_CXX=no ac_outfile=conftest.$ac_objext printf "$lt_simple_compile_test_code" > conftest.$ac_ext lt_compiler_flag="$lt_prog_compiler_pic_CXX -DPIC" # Insert the option either (1) after the last *FLAGS variable, or # (2) before a word containing "conftest.", or (3) at the end. # Note that $ac_compile itself does not contain backslashes and begins # with a dollar sign (not a hyphen), so the echo should work correctly. # The option is referenced via a variable to avoid confusing sed. lt_compile=`echo "$ac_compile" | $SED \ -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ -e 's:$: $lt_compiler_flag:'` (eval echo "\"\$as_me:12855: $lt_compile\"" >&5) (eval "$lt_compile" 2>conftest.err) ac_status=$? cat conftest.err >&5 echo "$as_me:12859: \$? = $ac_status" >&5 if (exit $ac_status) && test -s "$ac_outfile"; then # The compiler can only warn and ignore the option if not recognized # So say no if there are warnings other than the usual output. $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' >conftest.exp $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 if test ! -s conftest.er2 || diff conftest.exp conftest.er2 >/dev/null; then lt_prog_compiler_pic_works_CXX=yes fi fi $rm conftest* fi echo "$as_me:$LINENO: result: $lt_prog_compiler_pic_works_CXX" >&5 echo "${ECHO_T}$lt_prog_compiler_pic_works_CXX" >&6 if test x"$lt_prog_compiler_pic_works_CXX" = xyes; then case $lt_prog_compiler_pic_CXX in "" | " "*) ;; *) lt_prog_compiler_pic_CXX=" $lt_prog_compiler_pic_CXX" ;; esac else lt_prog_compiler_pic_CXX= lt_prog_compiler_can_build_shared_CXX=no fi fi case $host_os in # For platforms which do not support PIC, -DPIC is meaningless: *djgpp*) lt_prog_compiler_pic_CXX= ;; *) lt_prog_compiler_pic_CXX="$lt_prog_compiler_pic_CXX -DPIC" ;; esac # # Check to make sure the static flag actually works. # wl=$lt_prog_compiler_wl_CXX eval lt_tmp_static_flag=\"$lt_prog_compiler_static_CXX\" echo "$as_me:$LINENO: checking if $compiler static flag $lt_tmp_static_flag works" >&5 echo $ECHO_N "checking if $compiler static flag $lt_tmp_static_flag works... $ECHO_C" >&6 if test "${lt_prog_compiler_static_works_CXX+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else lt_prog_compiler_static_works_CXX=no save_LDFLAGS="$LDFLAGS" LDFLAGS="$LDFLAGS $lt_tmp_static_flag" printf "$lt_simple_link_test_code" > conftest.$ac_ext if (eval $ac_link 2>conftest.err) && test -s conftest$ac_exeext; then # The linker can only warn and ignore the option if not recognized # So say no if there are warnings if test -s conftest.err; then # Append any errors to the config.log. cat conftest.err 1>&5 $echo "X$_lt_linker_boilerplate" | $Xsed -e '/^$/d' > conftest.exp $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 if diff conftest.exp conftest.er2 >/dev/null; then lt_prog_compiler_static_works_CXX=yes fi else lt_prog_compiler_static_works_CXX=yes fi fi $rm conftest* LDFLAGS="$save_LDFLAGS" fi echo "$as_me:$LINENO: result: $lt_prog_compiler_static_works_CXX" >&5 echo "${ECHO_T}$lt_prog_compiler_static_works_CXX" >&6 if test x"$lt_prog_compiler_static_works_CXX" = xyes; then : else lt_prog_compiler_static_CXX= fi echo "$as_me:$LINENO: checking if $compiler supports -c -o file.$ac_objext" >&5 echo $ECHO_N "checking if $compiler supports -c -o file.$ac_objext... $ECHO_C" >&6 if test "${lt_cv_prog_compiler_c_o_CXX+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else lt_cv_prog_compiler_c_o_CXX=no $rm -r conftest 2>/dev/null mkdir conftest cd conftest mkdir out printf "$lt_simple_compile_test_code" > conftest.$ac_ext lt_compiler_flag="-o out/conftest2.$ac_objext" # Insert the option either (1) after the last *FLAGS variable, or # (2) before a word containing "conftest.", or (3) at the end. # Note that $ac_compile itself does not contain backslashes and begins # with a dollar sign (not a hyphen), so the echo should work correctly. lt_compile=`echo "$ac_compile" | $SED \ -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ -e 's:$: $lt_compiler_flag:'` (eval echo "\"\$as_me:12959: $lt_compile\"" >&5) (eval "$lt_compile" 2>out/conftest.err) ac_status=$? cat out/conftest.err >&5 echo "$as_me:12963: \$? = $ac_status" >&5 if (exit $ac_status) && test -s out/conftest2.$ac_objext then # The compiler can only warn and ignore the option if not recognized # So say no if there are warnings $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' > out/conftest.exp $SED '/^$/d; /^ *+/d' out/conftest.err >out/conftest.er2 if test ! -s out/conftest.er2 || diff out/conftest.exp out/conftest.er2 >/dev/null; then lt_cv_prog_compiler_c_o_CXX=yes fi fi chmod u+w . 2>&5 $rm conftest* # SGI C++ compiler will create directory out/ii_files/ for # template instantiation test -d out/ii_files && $rm out/ii_files/* && rmdir out/ii_files $rm out/* && rmdir out cd .. rmdir conftest $rm conftest* fi echo "$as_me:$LINENO: result: $lt_cv_prog_compiler_c_o_CXX" >&5 echo "${ECHO_T}$lt_cv_prog_compiler_c_o_CXX" >&6 hard_links="nottested" if test "$lt_cv_prog_compiler_c_o_CXX" = no && test "$need_locks" != no; then # do not overwrite the value of need_locks provided by the user echo "$as_me:$LINENO: checking if we can lock with hard links" >&5 echo $ECHO_N "checking if we can lock with hard links... $ECHO_C" >&6 hard_links=yes $rm conftest* ln conftest.a conftest.b 2>/dev/null && hard_links=no touch conftest.a ln conftest.a conftest.b 2>&5 || hard_links=no ln conftest.a conftest.b 2>/dev/null && hard_links=no echo "$as_me:$LINENO: result: $hard_links" >&5 echo "${ECHO_T}$hard_links" >&6 if test "$hard_links" = no; then { echo "$as_me:$LINENO: WARNING: \`$CC' does not support \`-c -o', so \`make -j' may be unsafe" >&5 echo "$as_me: WARNING: \`$CC' does not support \`-c -o', so \`make -j' may be unsafe" >&2;} need_locks=warn fi else need_locks=no fi echo "$as_me:$LINENO: checking whether the $compiler linker ($LD) supports shared libraries" >&5 echo $ECHO_N "checking whether the $compiler linker ($LD) supports shared libraries... $ECHO_C" >&6 export_symbols_cmds_CXX='$NM $libobjs $convenience | $global_symbol_pipe | $SED '\''s/.* //'\'' | sort | uniq > $export_symbols' case $host_os in aix4* | aix5*) # If we're using GNU nm, then we don't want the "-C" option. # -C means demangle to AIX nm, but means don't demangle with GNU nm if $NM -V 2>&1 | grep 'GNU' > /dev/null; then export_symbols_cmds_CXX='$NM -Bpg $libobjs $convenience | awk '\''{ if (((\$2 == "T") || (\$2 == "D") || (\$2 == "B")) && (substr(\$3,1,1) != ".")) { print \$3 } }'\'' | sort -u > $export_symbols' else export_symbols_cmds_CXX='$NM -BCpg $libobjs $convenience | awk '\''{ if (((\$2 == "T") || (\$2 == "D") || (\$2 == "B")) && (substr(\$3,1,1) != ".")) { print \$3 } }'\'' | sort -u > $export_symbols' fi ;; pw32*) export_symbols_cmds_CXX="$ltdll_cmds" ;; cygwin* | mingw*) export_symbols_cmds_CXX='$NM $libobjs $convenience | $global_symbol_pipe | $SED -e '\''/^[BCDGRS] /s/.* \([^ ]*\)/\1 DATA/;/^.* __nm__/s/^.* __nm__\([^ ]*\) [^ ]*/\1 DATA/;/^I /d;/^[AITW] /s/.* //'\'' | sort | uniq > $export_symbols' ;; *) export_symbols_cmds_CXX='$NM $libobjs $convenience | $global_symbol_pipe | $SED '\''s/.* //'\'' | sort | uniq > $export_symbols' ;; esac echo "$as_me:$LINENO: result: $ld_shlibs_CXX" >&5 echo "${ECHO_T}$ld_shlibs_CXX" >&6 test "$ld_shlibs_CXX" = no && can_build_shared=no # # Do we need to explicitly link libc? # case "x$archive_cmds_need_lc_CXX" in x|xyes) # Assume -lc should be added archive_cmds_need_lc_CXX=yes if test "$enable_shared" = yes && test "$GCC" = yes; then case $archive_cmds_CXX in *'~'*) # FIXME: we may have to deal with multi-command sequences. ;; '$CC '*) # Test whether the compiler implicitly links with -lc since on some # systems, -lgcc has to come before -lc. If gcc already passes -lc # to ld, don't add -lc before -lgcc. echo "$as_me:$LINENO: checking whether -lc should be explicitly linked in" >&5 echo $ECHO_N "checking whether -lc should be explicitly linked in... $ECHO_C" >&6 $rm conftest* printf "$lt_simple_compile_test_code" > conftest.$ac_ext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } 2>conftest.err; then soname=conftest lib=conftest libobjs=conftest.$ac_objext deplibs= wl=$lt_prog_compiler_wl_CXX pic_flag=$lt_prog_compiler_pic_CXX compiler_flags=-v linker_flags=-v verstring= output_objdir=. libname=conftest lt_save_allow_undefined_flag=$allow_undefined_flag_CXX allow_undefined_flag_CXX= if { (eval echo "$as_me:$LINENO: \"$archive_cmds_CXX 2\>\&1 \| grep \" -lc \" \>/dev/null 2\>\&1\"") >&5 (eval $archive_cmds_CXX 2\>\&1 \| grep \" -lc \" \>/dev/null 2\>\&1) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } then archive_cmds_need_lc_CXX=no else archive_cmds_need_lc_CXX=yes fi allow_undefined_flag_CXX=$lt_save_allow_undefined_flag else cat conftest.err 1>&5 fi $rm conftest* echo "$as_me:$LINENO: result: $archive_cmds_need_lc_CXX" >&5 echo "${ECHO_T}$archive_cmds_need_lc_CXX" >&6 ;; esac fi ;; esac echo "$as_me:$LINENO: checking dynamic linker characteristics" >&5 echo $ECHO_N "checking dynamic linker characteristics... $ECHO_C" >&6 library_names_spec= libname_spec='lib$name' soname_spec= shrext_cmds=".so" postinstall_cmds= postuninstall_cmds= finish_cmds= finish_eval= shlibpath_var= shlibpath_overrides_runpath=unknown version_type=none dynamic_linker="$host_os ld.so" sys_lib_dlsearch_path_spec="/lib /usr/lib" if test "$GCC" = yes; then sys_lib_search_path_spec=`$CC -print-search-dirs | grep "^libraries:" | $SED -e "s/^libraries://" -e "s,=/,/,g"` if echo "$sys_lib_search_path_spec" | grep ';' >/dev/null ; then # if the path contains ";" then we assume it to be the separator # otherwise default to the standard path separator (i.e. ":") - it is # assumed that no part of a normal pathname contains ";" but that should # okay in the real world where ";" in dirpaths is itself problematic. sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e 's/;/ /g'` else sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e "s/$PATH_SEPARATOR/ /g"` fi else sys_lib_search_path_spec="/lib /usr/lib /usr/local/lib" fi need_lib_prefix=unknown hardcode_into_libs=no # when you set need_version to no, make sure it does not cause -set_version # flags to be left without arguments need_version=unknown case $host_os in aix3*) version_type=linux library_names_spec='${libname}${release}${shared_ext}$versuffix $libname.a' shlibpath_var=LIBPATH # AIX 3 has no versioning support, so we append a major version to the name. soname_spec='${libname}${release}${shared_ext}$major' ;; aix4* | aix5*) version_type=linux need_lib_prefix=no need_version=no hardcode_into_libs=yes if test "$host_cpu" = ia64; then # AIX 5 supports IA64 library_names_spec='${libname}${release}${shared_ext}$major ${libname}${release}${shared_ext}$versuffix $libname${shared_ext}' shlibpath_var=LD_LIBRARY_PATH else # With GCC up to 2.95.x, collect2 would create an import file # for dependence libraries. The import file would start with # the line `#! .'. This would cause the generated library to # depend on `.', always an invalid library. This was fixed in # development snapshots of GCC prior to 3.0. case $host_os in aix4 | aix4.[01] | aix4.[01].*) if { echo '#if __GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 97)' echo ' yes ' echo '#endif'; } | ${CC} -E - | grep yes > /dev/null; then : else can_build_shared=no fi ;; esac # AIX (on Power*) has no versioning support, so currently we can not hardcode correct # soname into executable. Probably we can add versioning support to # collect2, so additional links can be useful in future. if test "$aix_use_runtimelinking" = yes; then # If using run time linking (on AIX 4.2 or later) use lib.so # instead of lib.a to let people know that these are not # typical AIX shared libraries. library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' else # We preserve .a as extension for shared libraries through AIX4.2 # and later when we are not doing run time linking. library_names_spec='${libname}${release}.a $libname.a' soname_spec='${libname}${release}${shared_ext}$major' fi shlibpath_var=LIBPATH fi ;; amigaos*) library_names_spec='$libname.ixlibrary $libname.a' # Create ${libname}_ixlibrary.a entries in /sys/libs. finish_eval='for lib in `ls $libdir/*.ixlibrary 2>/dev/null`; do libname=`$echo "X$lib" | $Xsed -e '\''s%^.*/\([^/]*\)\.ixlibrary$%\1%'\''`; test $rm /sys/libs/${libname}_ixlibrary.a; $show "cd /sys/libs && $LN_S $lib ${libname}_ixlibrary.a"; cd /sys/libs && $LN_S $lib ${libname}_ixlibrary.a || exit 1; done' ;; beos*) library_names_spec='${libname}${shared_ext}' dynamic_linker="$host_os ld.so" shlibpath_var=LIBRARY_PATH ;; bsdi[45]*) version_type=linux need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' finish_cmds='PATH="\$PATH:/sbin" ldconfig $libdir' shlibpath_var=LD_LIBRARY_PATH sys_lib_search_path_spec="/shlib /usr/lib /usr/X11/lib /usr/contrib/lib /lib /usr/local/lib" sys_lib_dlsearch_path_spec="/shlib /usr/lib /usr/local/lib" # the default ld.so.conf also contains /usr/contrib/lib and # /usr/X11R6/lib (/usr/X11 is a link to /usr/X11R6), but let us allow # libtool to hard-code these into programs ;; cygwin* | mingw* | pw32*) version_type=windows shrext_cmds=".dll" need_version=no need_lib_prefix=no case $GCC,$host_os in yes,cygwin* | yes,mingw* | yes,pw32*) library_names_spec='$libname.dll.a' # DLL is installed to $(libdir)/../bin by postinstall_cmds postinstall_cmds='base_file=`basename \${file}`~ dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i;echo \$dlname'\''`~ dldir=$destdir/`dirname \$dlpath`~ test -d \$dldir || mkdir -p \$dldir~ $install_prog $dir/$dlname \$dldir/$dlname~ chmod a+x \$dldir/$dlname' postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~ dlpath=$dir/\$dldll~ $rm \$dlpath' shlibpath_overrides_runpath=yes case $host_os in cygwin*) # Cygwin DLLs use 'cyg' prefix rather than 'lib' soname_spec='`echo ${libname} | sed -e 's/^lib/cyg/'``echo ${release} | $SED -e 's/[.]/-/g'`${versuffix}${shared_ext}' sys_lib_search_path_spec="/usr/lib /lib/w32api /lib /usr/local/lib" ;; mingw*) # MinGW DLLs use traditional 'lib' prefix soname_spec='${libname}`echo ${release} | $SED -e 's/[.]/-/g'`${versuffix}${shared_ext}' sys_lib_search_path_spec=`$CC -print-search-dirs | grep "^libraries:" | $SED -e "s/^libraries://" -e "s,=/,/,g"` if echo "$sys_lib_search_path_spec" | grep ';[c-zC-Z]:/' >/dev/null; then # It is most probably a Windows format PATH printed by # mingw gcc, but we are running on Cygwin. Gcc prints its search # path with ; separators, and with drive letters. We can handle the # drive letters (cygwin fileutils understands them), so leave them, # especially as we might pass files found there to a mingw objdump, # which wouldn't understand a cygwinified path. Ahh. sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e 's/;/ /g'` else sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e "s/$PATH_SEPARATOR/ /g"` fi ;; pw32*) # pw32 DLLs use 'pw' prefix rather than 'lib' library_names_spec='`echo ${libname} | sed -e 's/^lib/pw/'``echo ${release} | $SED -e 's/[.]/-/g'`${versuffix}${shared_ext}' ;; esac ;; *) library_names_spec='${libname}`echo ${release} | $SED -e 's/[.]/-/g'`${versuffix}${shared_ext} $libname.lib' ;; esac dynamic_linker='Win32 ld.exe' # FIXME: first we should search . and the directory the executable is in shlibpath_var=PATH ;; darwin* | rhapsody*) dynamic_linker="$host_os dyld" version_type=darwin need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${versuffix}$shared_ext ${libname}${release}${major}$shared_ext ${libname}$shared_ext' soname_spec='${libname}${release}${major}$shared_ext' shlibpath_overrides_runpath=yes shlibpath_var=DYLD_LIBRARY_PATH shrext_cmds='`test .$module = .yes && echo .so || echo .dylib`' # Apple's gcc prints 'gcc -print-search-dirs' doesn't operate the same. if test "$GCC" = yes; then sys_lib_search_path_spec=`$CC -print-search-dirs | tr "\n" "$PATH_SEPARATOR" | sed -e 's/libraries:/@libraries:/' | tr "@" "\n" | grep "^libraries:" | sed -e "s/^libraries://" -e "s,=/,/,g" -e "s,$PATH_SEPARATOR, ,g" -e "s,.*,& /lib /usr/lib /usr/local/lib,g"` else sys_lib_search_path_spec='/lib /usr/lib /usr/local/lib' fi sys_lib_dlsearch_path_spec='/usr/local/lib /lib /usr/lib' ;; dgux*) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname$shared_ext' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH ;; freebsd1*) dynamic_linker=no ;; kfreebsd*-gnu) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=no hardcode_into_libs=yes dynamic_linker='GNU ld.so' ;; freebsd* | dragonfly*) # DragonFly does not have aout. When/if they implement a new # versioning mechanism, adjust this. if test -x /usr/bin/objformat; then objformat=`/usr/bin/objformat` else case $host_os in freebsd[123]*) objformat=aout ;; *) objformat=elf ;; esac fi version_type=freebsd-$objformat case $version_type in freebsd-elf*) library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext} $libname${shared_ext}' need_version=no need_lib_prefix=no ;; freebsd-*) library_names_spec='${libname}${release}${shared_ext}$versuffix $libname${shared_ext}$versuffix' need_version=yes ;; esac shlibpath_var=LD_LIBRARY_PATH case $host_os in freebsd2*) shlibpath_overrides_runpath=yes ;; freebsd3.[01]* | freebsdelf3.[01]*) shlibpath_overrides_runpath=yes hardcode_into_libs=yes ;; freebsd3.[2-9]* | freebsdelf3.[2-9]* | \ freebsd4.[0-5] | freebsdelf4.[0-5] | freebsd4.1.1 | freebsdelf4.1.1) shlibpath_overrides_runpath=no hardcode_into_libs=yes ;; freebsd*) # from 4.6 on shlibpath_overrides_runpath=yes hardcode_into_libs=yes ;; esac ;; gnu*) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}${major} ${libname}${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH hardcode_into_libs=yes ;; hpux9* | hpux10* | hpux11*) # Give a soname corresponding to the major version so that dld.sl refuses to # link against other versions. version_type=sunos need_lib_prefix=no need_version=no case $host_cpu in ia64*) shrext_cmds='.so' hardcode_into_libs=yes dynamic_linker="$host_os dld.so" shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=yes # Unless +noenvvar is specified. library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' if test "X$HPUX_IA64_MODE" = X32; then sys_lib_search_path_spec="/usr/lib/hpux32 /usr/local/lib/hpux32 /usr/local/lib" else sys_lib_search_path_spec="/usr/lib/hpux64 /usr/local/lib/hpux64" fi sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec ;; hppa*64*) shrext_cmds='.sl' hardcode_into_libs=yes dynamic_linker="$host_os dld.sl" shlibpath_var=LD_LIBRARY_PATH # How should we handle SHLIB_PATH shlibpath_overrides_runpath=yes # Unless +noenvvar is specified. library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' sys_lib_search_path_spec="/usr/lib/pa20_64 /usr/ccs/lib/pa20_64" sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec ;; *) shrext_cmds='.sl' dynamic_linker="$host_os dld.sl" shlibpath_var=SHLIB_PATH shlibpath_overrides_runpath=no # +s is required to enable SHLIB_PATH library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' ;; esac # HP-UX runs *really* slowly unless shared libraries are mode 555. postinstall_cmds='chmod 555 $lib' ;; interix3*) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' dynamic_linker='Interix 3.x ld.so.1 (PE, like ELF)' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=no hardcode_into_libs=yes ;; irix5* | irix6* | nonstopux*) case $host_os in nonstopux*) version_type=nonstopux ;; *) if test "$lt_cv_prog_gnu_ld" = yes; then version_type=linux else version_type=irix fi ;; esac need_lib_prefix=no need_version=no soname_spec='${libname}${release}${shared_ext}$major' library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${release}${shared_ext} $libname${shared_ext}' case $host_os in irix5* | nonstopux*) libsuff= shlibsuff= ;; *) case $LD in # libtool.m4 will add one of these switches to LD *-32|*"-32 "|*-melf32bsmip|*"-melf32bsmip ") libsuff= shlibsuff= libmagic=32-bit;; *-n32|*"-n32 "|*-melf32bmipn32|*"-melf32bmipn32 ") libsuff=32 shlibsuff=N32 libmagic=N32;; *-64|*"-64 "|*-melf64bmip|*"-melf64bmip ") libsuff=64 shlibsuff=64 libmagic=64-bit;; *) libsuff= shlibsuff= libmagic=never-match;; esac ;; esac shlibpath_var=LD_LIBRARY${shlibsuff}_PATH shlibpath_overrides_runpath=no sys_lib_search_path_spec="/usr/lib${libsuff} /lib${libsuff} /usr/local/lib${libsuff}" sys_lib_dlsearch_path_spec="/usr/lib${libsuff} /lib${libsuff}" hardcode_into_libs=yes ;; # No shared lib support for Linux oldld, aout, or coff. linux*oldld* | linux*aout* | linux*coff*) dynamic_linker=no ;; # This must be Linux ELF. linux*) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' finish_cmds='PATH="\$PATH:/sbin" ldconfig -n $libdir' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=no # This implies no fast_install, which is unacceptable. # Some rework will be needed to allow for fast_install # before this can be enabled. hardcode_into_libs=yes # Append ld.so.conf contents to the search path if test -f /etc/ld.so.conf; then lt_ld_extra=`awk '/^include / { system(sprintf("cd /etc; cat %s", \$2)); skip = 1; } { if (!skip) print \$0; skip = 0; }' < /etc/ld.so.conf | $SED -e 's/#.*//;s/[:, ]/ /g;s/=[^=]*$//;s/=[^= ]* / /g;/^$/d' | tr '\n' ' '` sys_lib_dlsearch_path_spec="/lib /usr/lib $lt_ld_extra" fi # We used to test for /lib/ld.so.1 and disable shared libraries on # powerpc, because MkLinux only supported shared libraries with the # GNU dynamic linker. Since this was broken with cross compilers, # most powerpc-linux boxes support dynamic linking these days and # people can always --disable-shared, the test was removed, and we # assume the GNU/Linux dynamic linker is in use. dynamic_linker='GNU/Linux ld.so' ;; knetbsd*-gnu) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=no hardcode_into_libs=yes dynamic_linker='GNU ld.so' ;; netbsd*) version_type=sunos need_lib_prefix=no need_version=no if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix' finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir' dynamic_linker='NetBSD (a.out) ld.so' else library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' dynamic_linker='NetBSD ld.elf_so' fi shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=yes hardcode_into_libs=yes ;; newsos6) version_type=linux library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=yes ;; nto-qnx*) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=yes ;; openbsd*) version_type=sunos sys_lib_dlsearch_path_spec="/usr/lib" need_lib_prefix=no # Some older versions of OpenBSD (3.3 at least) *do* need versioned libs. case $host_os in openbsd3.3 | openbsd3.3.*) need_version=yes ;; *) need_version=no ;; esac library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix' finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir' shlibpath_var=LD_LIBRARY_PATH if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`" || test "$host_os-$host_cpu" = "openbsd2.8-powerpc"; then case $host_os in openbsd2.[89] | openbsd2.[89].*) shlibpath_overrides_runpath=no ;; *) shlibpath_overrides_runpath=yes ;; esac else shlibpath_overrides_runpath=yes fi ;; os2*) libname_spec='$name' shrext_cmds=".dll" need_lib_prefix=no library_names_spec='$libname${shared_ext} $libname.a' dynamic_linker='OS/2 ld.exe' shlibpath_var=LIBPATH ;; osf3* | osf4* | osf5*) version_type=osf need_lib_prefix=no need_version=no soname_spec='${libname}${release}${shared_ext}$major' library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' shlibpath_var=LD_LIBRARY_PATH sys_lib_search_path_spec="/usr/shlib /usr/ccs/lib /usr/lib/cmplrs/cc /usr/lib /usr/local/lib /var/shlib" sys_lib_dlsearch_path_spec="$sys_lib_search_path_spec" ;; solaris*) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=yes hardcode_into_libs=yes # ldd complains unless libraries are executable postinstall_cmds='chmod +x $lib' ;; sunos4*) version_type=sunos library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix' finish_cmds='PATH="\$PATH:/usr/etc" ldconfig $libdir' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=yes if test "$with_gnu_ld" = yes; then need_lib_prefix=no fi need_version=yes ;; sysv4 | sysv4.3*) version_type=linux library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH case $host_vendor in sni) shlibpath_overrides_runpath=no need_lib_prefix=no export_dynamic_flag_spec='${wl}-Blargedynsym' runpath_var=LD_RUN_PATH ;; siemens) need_lib_prefix=no ;; motorola) need_lib_prefix=no need_version=no shlibpath_overrides_runpath=no sys_lib_search_path_spec='/lib /usr/lib /usr/ccs/lib' ;; esac ;; sysv4*MP*) if test -d /usr/nec ;then version_type=linux library_names_spec='$libname${shared_ext}.$versuffix $libname${shared_ext}.$major $libname${shared_ext}' soname_spec='$libname${shared_ext}.$major' shlibpath_var=LD_LIBRARY_PATH fi ;; sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX* | sysv4*uw2*) version_type=freebsd-elf need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext} $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH hardcode_into_libs=yes if test "$with_gnu_ld" = yes; then sys_lib_search_path_spec='/usr/local/lib /usr/gnu/lib /usr/ccs/lib /usr/lib /lib' shlibpath_overrides_runpath=no else sys_lib_search_path_spec='/usr/ccs/lib /usr/lib' shlibpath_overrides_runpath=yes case $host_os in sco3.2v5*) sys_lib_search_path_spec="$sys_lib_search_path_spec /lib" ;; esac fi sys_lib_dlsearch_path_spec='/usr/lib' ;; uts4*) version_type=linux library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH ;; *) dynamic_linker=no ;; esac echo "$as_me:$LINENO: result: $dynamic_linker" >&5 echo "${ECHO_T}$dynamic_linker" >&6 test "$dynamic_linker" = no && can_build_shared=no variables_saved_for_relink="PATH $shlibpath_var $runpath_var" if test "$GCC" = yes; then variables_saved_for_relink="$variables_saved_for_relink GCC_EXEC_PREFIX COMPILER_PATH LIBRARY_PATH" fi echo "$as_me:$LINENO: checking how to hardcode library paths into programs" >&5 echo $ECHO_N "checking how to hardcode library paths into programs... $ECHO_C" >&6 hardcode_action_CXX= if test -n "$hardcode_libdir_flag_spec_CXX" || \ test -n "$runpath_var_CXX" || \ test "X$hardcode_automatic_CXX" = "Xyes" ; then # We can hardcode non-existant directories. if test "$hardcode_direct_CXX" != no && # If the only mechanism to avoid hardcoding is shlibpath_var, we # have to relink, otherwise we might link with an installed library # when we should be linking with a yet-to-be-installed one ## test "$_LT_AC_TAGVAR(hardcode_shlibpath_var, CXX)" != no && test "$hardcode_minus_L_CXX" != no; then # Linking always hardcodes the temporary library directory. hardcode_action_CXX=relink else # We can link without hardcoding, and we can hardcode nonexisting dirs. hardcode_action_CXX=immediate fi else # We cannot hardcode anything, or else we can only hardcode existing # directories. hardcode_action_CXX=unsupported fi echo "$as_me:$LINENO: result: $hardcode_action_CXX" >&5 echo "${ECHO_T}$hardcode_action_CXX" >&6 if test "$hardcode_action_CXX" = relink; then # Fast installation is not supported enable_fast_install=no elif test "$shlibpath_overrides_runpath" = yes || test "$enable_shared" = no; then # Fast installation is not necessary enable_fast_install=needless fi # The else clause should only fire when bootstrapping the # libtool distribution, otherwise you forgot to ship ltmain.sh # with your package, and you will get complaints that there are # no rules to generate ltmain.sh. if test -f "$ltmain"; then # See if we are running on zsh, and set the options which allow our commands through # without removal of \ escapes. if test -n "${ZSH_VERSION+set}" ; then setopt NO_GLOB_SUBST fi # Now quote all the things that may contain metacharacters while being # careful not to overquote the AC_SUBSTed values. We take copies of the # variables and quote the copies for generation of the libtool script. for var in echo old_CC old_CFLAGS AR AR_FLAGS EGREP RANLIB LN_S LTCC LTCFLAGS NM \ SED SHELL STRIP \ libname_spec library_names_spec soname_spec extract_expsyms_cmds \ old_striplib striplib file_magic_cmd finish_cmds finish_eval \ deplibs_check_method reload_flag reload_cmds need_locks \ lt_cv_sys_global_symbol_pipe lt_cv_sys_global_symbol_to_cdecl \ lt_cv_sys_global_symbol_to_c_name_address \ sys_lib_search_path_spec sys_lib_dlsearch_path_spec \ old_postinstall_cmds old_postuninstall_cmds \ compiler_CXX \ CC_CXX \ LD_CXX \ lt_prog_compiler_wl_CXX \ lt_prog_compiler_pic_CXX \ lt_prog_compiler_static_CXX \ lt_prog_compiler_no_builtin_flag_CXX \ export_dynamic_flag_spec_CXX \ thread_safe_flag_spec_CXX \ whole_archive_flag_spec_CXX \ enable_shared_with_static_runtimes_CXX \ old_archive_cmds_CXX \ old_archive_from_new_cmds_CXX \ predep_objects_CXX \ postdep_objects_CXX \ predeps_CXX \ postdeps_CXX \ compiler_lib_search_path_CXX \ archive_cmds_CXX \ archive_expsym_cmds_CXX \ postinstall_cmds_CXX \ postuninstall_cmds_CXX \ old_archive_from_expsyms_cmds_CXX \ allow_undefined_flag_CXX \ no_undefined_flag_CXX \ export_symbols_cmds_CXX \ hardcode_libdir_flag_spec_CXX \ hardcode_libdir_flag_spec_ld_CXX \ hardcode_libdir_separator_CXX \ hardcode_automatic_CXX \ module_cmds_CXX \ module_expsym_cmds_CXX \ lt_cv_prog_compiler_c_o_CXX \ exclude_expsyms_CXX \ include_expsyms_CXX; do case $var in old_archive_cmds_CXX | \ old_archive_from_new_cmds_CXX | \ archive_cmds_CXX | \ archive_expsym_cmds_CXX | \ module_cmds_CXX | \ module_expsym_cmds_CXX | \ old_archive_from_expsyms_cmds_CXX | \ export_symbols_cmds_CXX | \ extract_expsyms_cmds | reload_cmds | finish_cmds | \ postinstall_cmds | postuninstall_cmds | \ old_postinstall_cmds | old_postuninstall_cmds | \ sys_lib_search_path_spec | sys_lib_dlsearch_path_spec) # Double-quote double-evaled strings. eval "lt_$var=\\\"\`\$echo \"X\$$var\" | \$Xsed -e \"\$double_quote_subst\" -e \"\$sed_quote_subst\" -e \"\$delay_variable_subst\"\`\\\"" ;; *) eval "lt_$var=\\\"\`\$echo \"X\$$var\" | \$Xsed -e \"\$sed_quote_subst\"\`\\\"" ;; esac done case $lt_echo in *'\$0 --fallback-echo"') lt_echo=`$echo "X$lt_echo" | $Xsed -e 's/\\\\\\\$0 --fallback-echo"$/$0 --fallback-echo"/'` ;; esac cfgfile="$ofile" cat <<__EOF__ >> "$cfgfile" # ### BEGIN LIBTOOL TAG CONFIG: $tagname # Libtool was configured on host `(hostname || uname -n) 2>/dev/null | sed 1q`: # Shell to use when invoking shell scripts. SHELL=$lt_SHELL # Whether or not to build shared libraries. build_libtool_libs=$enable_shared # Whether or not to build static libraries. build_old_libs=$enable_static # Whether or not to add -lc for building shared libraries. build_libtool_need_lc=$archive_cmds_need_lc_CXX # Whether or not to disallow shared libs when runtime libs are static allow_libtool_libs_with_static_runtimes=$enable_shared_with_static_runtimes_CXX # Whether or not to optimize for fast installation. fast_install=$enable_fast_install # The host system. host_alias=$host_alias host=$host host_os=$host_os # The build system. build_alias=$build_alias build=$build build_os=$build_os # An echo program that does not interpret backslashes. echo=$lt_echo # The archiver. AR=$lt_AR AR_FLAGS=$lt_AR_FLAGS # A C compiler. LTCC=$lt_LTCC # LTCC compiler flags. LTCFLAGS=$lt_LTCFLAGS # A language-specific compiler. CC=$lt_compiler_CXX # Is the compiler the GNU C compiler? with_gcc=$GCC_CXX # An ERE matcher. EGREP=$lt_EGREP # The linker used to build libraries. LD=$lt_LD_CXX # Whether we need hard or soft links. LN_S=$lt_LN_S # A BSD-compatible nm program. NM=$lt_NM # A symbol stripping program STRIP=$lt_STRIP # Used to examine libraries when file_magic_cmd begins "file" MAGIC_CMD=$MAGIC_CMD # Used on cygwin: DLL creation program. DLLTOOL="$DLLTOOL" # Used on cygwin: object dumper. OBJDUMP="$OBJDUMP" # Used on cygwin: assembler. AS="$AS" # The name of the directory that contains temporary libtool files. objdir=$objdir # How to create reloadable object files. reload_flag=$lt_reload_flag reload_cmds=$lt_reload_cmds # How to pass a linker flag through the compiler. wl=$lt_lt_prog_compiler_wl_CXX # Object file suffix (normally "o"). objext="$ac_objext" # Old archive suffix (normally "a"). libext="$libext" # Shared library suffix (normally ".so"). shrext_cmds='$shrext_cmds' # Executable file suffix (normally ""). exeext="$exeext" # Additional compiler flags for building library objects. pic_flag=$lt_lt_prog_compiler_pic_CXX pic_mode=$pic_mode # What is the maximum length of a command? max_cmd_len=$lt_cv_sys_max_cmd_len # Does compiler simultaneously support -c and -o options? compiler_c_o=$lt_lt_cv_prog_compiler_c_o_CXX # Must we lock files when doing compilation? need_locks=$lt_need_locks # Do we need the lib prefix for modules? need_lib_prefix=$need_lib_prefix # Do we need a version for libraries? need_version=$need_version # Whether dlopen is supported. dlopen_support=$enable_dlopen # Whether dlopen of programs is supported. dlopen_self=$enable_dlopen_self # Whether dlopen of statically linked programs is supported. dlopen_self_static=$enable_dlopen_self_static # Compiler flag to prevent dynamic linking. link_static_flag=$lt_lt_prog_compiler_static_CXX # Compiler flag to turn off builtin functions. no_builtin_flag=$lt_lt_prog_compiler_no_builtin_flag_CXX # Compiler flag to allow reflexive dlopens. export_dynamic_flag_spec=$lt_export_dynamic_flag_spec_CXX # Compiler flag to generate shared objects directly from archives. whole_archive_flag_spec=$lt_whole_archive_flag_spec_CXX # Compiler flag to generate thread-safe objects. thread_safe_flag_spec=$lt_thread_safe_flag_spec_CXX # Library versioning type. version_type=$version_type # Format of library name prefix. libname_spec=$lt_libname_spec # List of archive names. First name is the real one, the rest are links. # The last name is the one that the linker finds with -lNAME. library_names_spec=$lt_library_names_spec # The coded name of the library, if different from the real name. soname_spec=$lt_soname_spec # Commands used to build and install an old-style archive. RANLIB=$lt_RANLIB old_archive_cmds=$lt_old_archive_cmds_CXX old_postinstall_cmds=$lt_old_postinstall_cmds old_postuninstall_cmds=$lt_old_postuninstall_cmds # Create an old-style archive from a shared archive. old_archive_from_new_cmds=$lt_old_archive_from_new_cmds_CXX # Create a temporary old-style archive to link instead of a shared archive. old_archive_from_expsyms_cmds=$lt_old_archive_from_expsyms_cmds_CXX # Commands used to build and install a shared archive. archive_cmds=$lt_archive_cmds_CXX archive_expsym_cmds=$lt_archive_expsym_cmds_CXX postinstall_cmds=$lt_postinstall_cmds postuninstall_cmds=$lt_postuninstall_cmds # Commands used to build a loadable module (assumed same as above if empty) module_cmds=$lt_module_cmds_CXX module_expsym_cmds=$lt_module_expsym_cmds_CXX # Commands to strip libraries. old_striplib=$lt_old_striplib striplib=$lt_striplib # Dependencies to place before the objects being linked to create a # shared library. predep_objects=$lt_predep_objects_CXX # Dependencies to place after the objects being linked to create a # shared library. postdep_objects=$lt_postdep_objects_CXX # Dependencies to place before the objects being linked to create a # shared library. predeps=$lt_predeps_CXX # Dependencies to place after the objects being linked to create a # shared library. postdeps=$lt_postdeps_CXX # The library search path used internally by the compiler when linking # a shared library. compiler_lib_search_path=$lt_compiler_lib_search_path_CXX # Method to check whether dependent libraries are shared objects. deplibs_check_method=$lt_deplibs_check_method # Command to use when deplibs_check_method == file_magic. file_magic_cmd=$lt_file_magic_cmd # Flag that allows shared libraries with undefined symbols to be built. allow_undefined_flag=$lt_allow_undefined_flag_CXX # Flag that forces no undefined symbols. no_undefined_flag=$lt_no_undefined_flag_CXX # Commands used to finish a libtool library installation in a directory. finish_cmds=$lt_finish_cmds # Same as above, but a single script fragment to be evaled but not shown. finish_eval=$lt_finish_eval # Take the output of nm and produce a listing of raw symbols and C names. global_symbol_pipe=$lt_lt_cv_sys_global_symbol_pipe # Transform the output of nm in a proper C declaration global_symbol_to_cdecl=$lt_lt_cv_sys_global_symbol_to_cdecl # Transform the output of nm in a C name address pair global_symbol_to_c_name_address=$lt_lt_cv_sys_global_symbol_to_c_name_address # This is the shared library runtime path variable. runpath_var=$runpath_var # This is the shared library path variable. shlibpath_var=$shlibpath_var # Is shlibpath searched before the hard-coded library search path? shlibpath_overrides_runpath=$shlibpath_overrides_runpath # How to hardcode a shared library path into an executable. hardcode_action=$hardcode_action_CXX # Whether we should hardcode library paths into libraries. hardcode_into_libs=$hardcode_into_libs # Flag to hardcode \$libdir into a binary during linking. # This must work even if \$libdir does not exist. hardcode_libdir_flag_spec=$lt_hardcode_libdir_flag_spec_CXX # If ld is used when linking, flag to hardcode \$libdir into # a binary during linking. This must work even if \$libdir does # not exist. hardcode_libdir_flag_spec_ld=$lt_hardcode_libdir_flag_spec_ld_CXX # Whether we need a single -rpath flag with a separated argument. hardcode_libdir_separator=$lt_hardcode_libdir_separator_CXX # Set to yes if using DIR/libNAME${shared_ext} during linking hardcodes DIR into the # resulting binary. hardcode_direct=$hardcode_direct_CXX # Set to yes if using the -LDIR flag during linking hardcodes DIR into the # resulting binary. hardcode_minus_L=$hardcode_minus_L_CXX # Set to yes if using SHLIBPATH_VAR=DIR during linking hardcodes DIR into # the resulting binary. hardcode_shlibpath_var=$hardcode_shlibpath_var_CXX # Set to yes if building a shared library automatically hardcodes DIR into the library # and all subsequent libraries and executables linked against it. hardcode_automatic=$hardcode_automatic_CXX # Variables whose values should be saved in libtool wrapper scripts and # restored at relink time. variables_saved_for_relink="$variables_saved_for_relink" # Whether libtool must link a program against all its dependency libraries. link_all_deplibs=$link_all_deplibs_CXX # Compile-time system search path for libraries sys_lib_search_path_spec=$lt_sys_lib_search_path_spec # Run-time system search path for libraries sys_lib_dlsearch_path_spec=$lt_sys_lib_dlsearch_path_spec # Fix the shell variable \$srcfile for the compiler. fix_srcfile_path="$fix_srcfile_path_CXX" # Set to yes if exported symbols are required. always_export_symbols=$always_export_symbols_CXX # The commands to list exported symbols. export_symbols_cmds=$lt_export_symbols_cmds_CXX # The commands to extract the exported symbol list from a shared archive. extract_expsyms_cmds=$lt_extract_expsyms_cmds # Symbols that should not be listed in the preloaded symbols. exclude_expsyms=$lt_exclude_expsyms_CXX # Symbols that must always be exported. include_expsyms=$lt_include_expsyms_CXX # ### END LIBTOOL TAG CONFIG: $tagname __EOF__ else # If there is no Makefile yet, we rely on a make rule to execute # `config.status --recheck' to rerun these tests and create the # libtool script then. ltmain_in=`echo $ltmain | sed -e 's/\.sh$/.in/'` if test -f "$ltmain_in"; then test -f Makefile && make "$ltmain" fi fi ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu CC=$lt_save_CC LDCXX=$LD LD=$lt_save_LD GCC=$lt_save_GCC with_gnu_ldcxx=$with_gnu_ld with_gnu_ld=$lt_save_with_gnu_ld lt_cv_path_LDCXX=$lt_cv_path_LD lt_cv_path_LD=$lt_save_path_LD lt_cv_prog_gnu_ldcxx=$lt_cv_prog_gnu_ld lt_cv_prog_gnu_ld=$lt_save_with_gnu_ld else tagname="" fi ;; F77) if test -n "$F77" && test "X$F77" != "Xno"; then ac_ext=f ac_compile='$F77 -c $FFLAGS conftest.$ac_ext >&5' ac_link='$F77 -o conftest$ac_exeext $FFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_f77_compiler_gnu archive_cmds_need_lc_F77=no allow_undefined_flag_F77= always_export_symbols_F77=no archive_expsym_cmds_F77= export_dynamic_flag_spec_F77= hardcode_direct_F77=no hardcode_libdir_flag_spec_F77= hardcode_libdir_flag_spec_ld_F77= hardcode_libdir_separator_F77= hardcode_minus_L_F77=no hardcode_automatic_F77=no module_cmds_F77= module_expsym_cmds_F77= link_all_deplibs_F77=unknown old_archive_cmds_F77=$old_archive_cmds no_undefined_flag_F77= whole_archive_flag_spec_F77= enable_shared_with_static_runtimes_F77=no # Source file extension for f77 test sources. ac_ext=f # Object file extension for compiled f77 test sources. objext=o objext_F77=$objext # Code to be used in simple compile tests lt_simple_compile_test_code=" subroutine t\n return\n end\n" # Code to be used in simple link tests lt_simple_link_test_code=" program t\n end\n" # ltmain only uses $CC for tagged configurations so make sure $CC is set. # If no C compiler was specified, use CC. LTCC=${LTCC-"$CC"} # If no C compiler flags were specified, use CFLAGS. LTCFLAGS=${LTCFLAGS-"$CFLAGS"} # Allow CC to be a program name with arguments. compiler=$CC # save warnings/boilerplate of simple test code ac_outfile=conftest.$ac_objext printf "$lt_simple_compile_test_code" >conftest.$ac_ext eval "$ac_compile" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err _lt_compiler_boilerplate=`cat conftest.err` $rm conftest* ac_outfile=conftest.$ac_objext printf "$lt_simple_link_test_code" >conftest.$ac_ext eval "$ac_link" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err _lt_linker_boilerplate=`cat conftest.err` $rm conftest* # Allow CC to be a program name with arguments. lt_save_CC="$CC" CC=${F77-"f77"} compiler=$CC compiler_F77=$CC for cc_temp in $compiler""; do case $cc_temp in compile | *[\\/]compile | ccache | *[\\/]ccache ) ;; distcc | *[\\/]distcc | purify | *[\\/]purify ) ;; \-*) ;; *) break;; esac done cc_basename=`$echo "X$cc_temp" | $Xsed -e 's%.*/%%' -e "s%^$host_alias-%%"` echo "$as_me:$LINENO: checking if libtool supports shared libraries" >&5 echo $ECHO_N "checking if libtool supports shared libraries... $ECHO_C" >&6 echo "$as_me:$LINENO: result: $can_build_shared" >&5 echo "${ECHO_T}$can_build_shared" >&6 echo "$as_me:$LINENO: checking whether to build shared libraries" >&5 echo $ECHO_N "checking whether to build shared libraries... $ECHO_C" >&6 test "$can_build_shared" = "no" && enable_shared=no # On AIX, shared libraries and static libraries use the same namespace, and # are all built from PIC. case $host_os in aix3*) test "$enable_shared" = yes && enable_static=no if test -n "$RANLIB"; then archive_cmds="$archive_cmds~\$RANLIB \$lib" postinstall_cmds='$RANLIB $lib' fi ;; aix4* | aix5*) if test "$host_cpu" != ia64 && test "$aix_use_runtimelinking" = no ; then test "$enable_shared" = yes && enable_static=no fi ;; esac echo "$as_me:$LINENO: result: $enable_shared" >&5 echo "${ECHO_T}$enable_shared" >&6 echo "$as_me:$LINENO: checking whether to build static libraries" >&5 echo $ECHO_N "checking whether to build static libraries... $ECHO_C" >&6 # Make sure either enable_shared or enable_static is yes. test "$enable_shared" = yes || enable_static=yes echo "$as_me:$LINENO: result: $enable_static" >&5 echo "${ECHO_T}$enable_static" >&6 GCC_F77="$G77" LD_F77="$LD" lt_prog_compiler_wl_F77= lt_prog_compiler_pic_F77= lt_prog_compiler_static_F77= echo "$as_me:$LINENO: checking for $compiler option to produce PIC" >&5 echo $ECHO_N "checking for $compiler option to produce PIC... $ECHO_C" >&6 if test "$GCC" = yes; then lt_prog_compiler_wl_F77='-Wl,' lt_prog_compiler_static_F77='-static' case $host_os in aix*) # All AIX code is PIC. if test "$host_cpu" = ia64; then # AIX 5 now supports IA64 processor lt_prog_compiler_static_F77='-Bstatic' fi ;; amigaos*) # FIXME: we need at least 68020 code to build shared libraries, but # adding the `-m68020' flag to GCC prevents building anything better, # like `-m68040'. lt_prog_compiler_pic_F77='-m68020 -resident32 -malways-restore-a4' ;; beos* | cygwin* | irix5* | irix6* | nonstopux* | osf3* | osf4* | osf5*) # PIC is the default for these OSes. ;; mingw* | pw32* | os2*) # This hack is so that the source file can tell whether it is being # built for inclusion in a dll (and should export symbols for example). lt_prog_compiler_pic_F77='-DDLL_EXPORT' ;; darwin* | rhapsody*) # PIC is the default on this platform # Common symbols not allowed in MH_DYLIB files lt_prog_compiler_pic_F77='-fno-common' ;; interix3*) # Interix 3.x gcc -fpic/-fPIC options generate broken code. # Instead, we relocate shared libraries at runtime. ;; msdosdjgpp*) # Just because we use GCC doesn't mean we suddenly get shared libraries # on systems that don't support them. lt_prog_compiler_can_build_shared_F77=no enable_shared=no ;; sysv4*MP*) if test -d /usr/nec; then lt_prog_compiler_pic_F77=-Kconform_pic fi ;; hpux*) # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but # not for PA HP-UX. case $host_cpu in hppa*64*|ia64*) # +Z the default ;; *) lt_prog_compiler_pic_F77='-fPIC' ;; esac ;; *) lt_prog_compiler_pic_F77='-fPIC' ;; esac else # PORTME Check for flag to pass linker flags through the system compiler. case $host_os in aix*) lt_prog_compiler_wl_F77='-Wl,' if test "$host_cpu" = ia64; then # AIX 5 now supports IA64 processor lt_prog_compiler_static_F77='-Bstatic' else lt_prog_compiler_static_F77='-bnso -bI:/lib/syscalls.exp' fi ;; darwin*) # PIC is the default on this platform # Common symbols not allowed in MH_DYLIB files case $cc_basename in xlc*) lt_prog_compiler_pic_F77='-qnocommon' lt_prog_compiler_wl_F77='-Wl,' ;; esac ;; mingw* | pw32* | os2*) # This hack is so that the source file can tell whether it is being # built for inclusion in a dll (and should export symbols for example). lt_prog_compiler_pic_F77='-DDLL_EXPORT' ;; hpux9* | hpux10* | hpux11*) lt_prog_compiler_wl_F77='-Wl,' # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but # not for PA HP-UX. case $host_cpu in hppa*64*|ia64*) # +Z the default ;; *) lt_prog_compiler_pic_F77='+Z' ;; esac # Is there a better lt_prog_compiler_static that works with the bundled CC? lt_prog_compiler_static_F77='${wl}-a ${wl}archive' ;; irix5* | irix6* | nonstopux*) lt_prog_compiler_wl_F77='-Wl,' # PIC (with -KPIC) is the default. lt_prog_compiler_static_F77='-non_shared' ;; newsos6) lt_prog_compiler_pic_F77='-KPIC' lt_prog_compiler_static_F77='-Bstatic' ;; linux*) case $cc_basename in icc* | ecc*) lt_prog_compiler_wl_F77='-Wl,' lt_prog_compiler_pic_F77='-KPIC' lt_prog_compiler_static_F77='-static' ;; pgcc* | pgf77* | pgf90* | pgf95*) # Portland Group compilers (*not* the Pentium gcc compiler, # which looks to be a dead project) lt_prog_compiler_wl_F77='-Wl,' lt_prog_compiler_pic_F77='-fpic' lt_prog_compiler_static_F77='-Bstatic' ;; ccc*) lt_prog_compiler_wl_F77='-Wl,' # All Alpha code is PIC. lt_prog_compiler_static_F77='-non_shared' ;; esac ;; osf3* | osf4* | osf5*) lt_prog_compiler_wl_F77='-Wl,' # All OSF/1 code is PIC. lt_prog_compiler_static_F77='-non_shared' ;; solaris*) lt_prog_compiler_pic_F77='-KPIC' lt_prog_compiler_static_F77='-Bstatic' case $cc_basename in f77* | f90* | f95*) lt_prog_compiler_wl_F77='-Qoption ld ';; *) lt_prog_compiler_wl_F77='-Wl,';; esac ;; sunos4*) lt_prog_compiler_wl_F77='-Qoption ld ' lt_prog_compiler_pic_F77='-PIC' lt_prog_compiler_static_F77='-Bstatic' ;; sysv4 | sysv4.2uw2* | sysv4.3*) lt_prog_compiler_wl_F77='-Wl,' lt_prog_compiler_pic_F77='-KPIC' lt_prog_compiler_static_F77='-Bstatic' ;; sysv4*MP*) if test -d /usr/nec ;then lt_prog_compiler_pic_F77='-Kconform_pic' lt_prog_compiler_static_F77='-Bstatic' fi ;; sysv5* | unixware* | sco3.2v5* | sco5v6* | OpenUNIX*) lt_prog_compiler_wl_F77='-Wl,' lt_prog_compiler_pic_F77='-KPIC' lt_prog_compiler_static_F77='-Bstatic' ;; unicos*) lt_prog_compiler_wl_F77='-Wl,' lt_prog_compiler_can_build_shared_F77=no ;; uts4*) lt_prog_compiler_pic_F77='-pic' lt_prog_compiler_static_F77='-Bstatic' ;; *) lt_prog_compiler_can_build_shared_F77=no ;; esac fi echo "$as_me:$LINENO: result: $lt_prog_compiler_pic_F77" >&5 echo "${ECHO_T}$lt_prog_compiler_pic_F77" >&6 # # Check to make sure the PIC flag actually works. # if test -n "$lt_prog_compiler_pic_F77"; then echo "$as_me:$LINENO: checking if $compiler PIC flag $lt_prog_compiler_pic_F77 works" >&5 echo $ECHO_N "checking if $compiler PIC flag $lt_prog_compiler_pic_F77 works... $ECHO_C" >&6 if test "${lt_prog_compiler_pic_works_F77+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else lt_prog_compiler_pic_works_F77=no ac_outfile=conftest.$ac_objext printf "$lt_simple_compile_test_code" > conftest.$ac_ext lt_compiler_flag="$lt_prog_compiler_pic_F77" # Insert the option either (1) after the last *FLAGS variable, or # (2) before a word containing "conftest.", or (3) at the end. # Note that $ac_compile itself does not contain backslashes and begins # with a dollar sign (not a hyphen), so the echo should work correctly. # The option is referenced via a variable to avoid confusing sed. lt_compile=`echo "$ac_compile" | $SED \ -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ -e 's:$: $lt_compiler_flag:'` (eval echo "\"\$as_me:14529: $lt_compile\"" >&5) (eval "$lt_compile" 2>conftest.err) ac_status=$? cat conftest.err >&5 echo "$as_me:14533: \$? = $ac_status" >&5 if (exit $ac_status) && test -s "$ac_outfile"; then # The compiler can only warn and ignore the option if not recognized # So say no if there are warnings other than the usual output. $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' >conftest.exp $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 if test ! -s conftest.er2 || diff conftest.exp conftest.er2 >/dev/null; then lt_prog_compiler_pic_works_F77=yes fi fi $rm conftest* fi echo "$as_me:$LINENO: result: $lt_prog_compiler_pic_works_F77" >&5 echo "${ECHO_T}$lt_prog_compiler_pic_works_F77" >&6 if test x"$lt_prog_compiler_pic_works_F77" = xyes; then case $lt_prog_compiler_pic_F77 in "" | " "*) ;; *) lt_prog_compiler_pic_F77=" $lt_prog_compiler_pic_F77" ;; esac else lt_prog_compiler_pic_F77= lt_prog_compiler_can_build_shared_F77=no fi fi case $host_os in # For platforms which do not support PIC, -DPIC is meaningless: *djgpp*) lt_prog_compiler_pic_F77= ;; *) lt_prog_compiler_pic_F77="$lt_prog_compiler_pic_F77" ;; esac # # Check to make sure the static flag actually works. # wl=$lt_prog_compiler_wl_F77 eval lt_tmp_static_flag=\"$lt_prog_compiler_static_F77\" echo "$as_me:$LINENO: checking if $compiler static flag $lt_tmp_static_flag works" >&5 echo $ECHO_N "checking if $compiler static flag $lt_tmp_static_flag works... $ECHO_C" >&6 if test "${lt_prog_compiler_static_works_F77+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else lt_prog_compiler_static_works_F77=no save_LDFLAGS="$LDFLAGS" LDFLAGS="$LDFLAGS $lt_tmp_static_flag" printf "$lt_simple_link_test_code" > conftest.$ac_ext if (eval $ac_link 2>conftest.err) && test -s conftest$ac_exeext; then # The linker can only warn and ignore the option if not recognized # So say no if there are warnings if test -s conftest.err; then # Append any errors to the config.log. cat conftest.err 1>&5 $echo "X$_lt_linker_boilerplate" | $Xsed -e '/^$/d' > conftest.exp $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 if diff conftest.exp conftest.er2 >/dev/null; then lt_prog_compiler_static_works_F77=yes fi else lt_prog_compiler_static_works_F77=yes fi fi $rm conftest* LDFLAGS="$save_LDFLAGS" fi echo "$as_me:$LINENO: result: $lt_prog_compiler_static_works_F77" >&5 echo "${ECHO_T}$lt_prog_compiler_static_works_F77" >&6 if test x"$lt_prog_compiler_static_works_F77" = xyes; then : else lt_prog_compiler_static_F77= fi echo "$as_me:$LINENO: checking if $compiler supports -c -o file.$ac_objext" >&5 echo $ECHO_N "checking if $compiler supports -c -o file.$ac_objext... $ECHO_C" >&6 if test "${lt_cv_prog_compiler_c_o_F77+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else lt_cv_prog_compiler_c_o_F77=no $rm -r conftest 2>/dev/null mkdir conftest cd conftest mkdir out printf "$lt_simple_compile_test_code" > conftest.$ac_ext lt_compiler_flag="-o out/conftest2.$ac_objext" # Insert the option either (1) after the last *FLAGS variable, or # (2) before a word containing "conftest.", or (3) at the end. # Note that $ac_compile itself does not contain backslashes and begins # with a dollar sign (not a hyphen), so the echo should work correctly. lt_compile=`echo "$ac_compile" | $SED \ -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ -e 's:$: $lt_compiler_flag:'` (eval echo "\"\$as_me:14633: $lt_compile\"" >&5) (eval "$lt_compile" 2>out/conftest.err) ac_status=$? cat out/conftest.err >&5 echo "$as_me:14637: \$? = $ac_status" >&5 if (exit $ac_status) && test -s out/conftest2.$ac_objext then # The compiler can only warn and ignore the option if not recognized # So say no if there are warnings $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' > out/conftest.exp $SED '/^$/d; /^ *+/d' out/conftest.err >out/conftest.er2 if test ! -s out/conftest.er2 || diff out/conftest.exp out/conftest.er2 >/dev/null; then lt_cv_prog_compiler_c_o_F77=yes fi fi chmod u+w . 2>&5 $rm conftest* # SGI C++ compiler will create directory out/ii_files/ for # template instantiation test -d out/ii_files && $rm out/ii_files/* && rmdir out/ii_files $rm out/* && rmdir out cd .. rmdir conftest $rm conftest* fi echo "$as_me:$LINENO: result: $lt_cv_prog_compiler_c_o_F77" >&5 echo "${ECHO_T}$lt_cv_prog_compiler_c_o_F77" >&6 hard_links="nottested" if test "$lt_cv_prog_compiler_c_o_F77" = no && test "$need_locks" != no; then # do not overwrite the value of need_locks provided by the user echo "$as_me:$LINENO: checking if we can lock with hard links" >&5 echo $ECHO_N "checking if we can lock with hard links... $ECHO_C" >&6 hard_links=yes $rm conftest* ln conftest.a conftest.b 2>/dev/null && hard_links=no touch conftest.a ln conftest.a conftest.b 2>&5 || hard_links=no ln conftest.a conftest.b 2>/dev/null && hard_links=no echo "$as_me:$LINENO: result: $hard_links" >&5 echo "${ECHO_T}$hard_links" >&6 if test "$hard_links" = no; then { echo "$as_me:$LINENO: WARNING: \`$CC' does not support \`-c -o', so \`make -j' may be unsafe" >&5 echo "$as_me: WARNING: \`$CC' does not support \`-c -o', so \`make -j' may be unsafe" >&2;} need_locks=warn fi else need_locks=no fi echo "$as_me:$LINENO: checking whether the $compiler linker ($LD) supports shared libraries" >&5 echo $ECHO_N "checking whether the $compiler linker ($LD) supports shared libraries... $ECHO_C" >&6 runpath_var= allow_undefined_flag_F77= enable_shared_with_static_runtimes_F77=no archive_cmds_F77= archive_expsym_cmds_F77= old_archive_From_new_cmds_F77= old_archive_from_expsyms_cmds_F77= export_dynamic_flag_spec_F77= whole_archive_flag_spec_F77= thread_safe_flag_spec_F77= hardcode_libdir_flag_spec_F77= hardcode_libdir_flag_spec_ld_F77= hardcode_libdir_separator_F77= hardcode_direct_F77=no hardcode_minus_L_F77=no hardcode_shlibpath_var_F77=unsupported link_all_deplibs_F77=unknown hardcode_automatic_F77=no module_cmds_F77= module_expsym_cmds_F77= always_export_symbols_F77=no export_symbols_cmds_F77='$NM $libobjs $convenience | $global_symbol_pipe | $SED '\''s/.* //'\'' | sort | uniq > $export_symbols' # include_expsyms should be a list of space-separated symbols to be *always* # included in the symbol list include_expsyms_F77= # exclude_expsyms can be an extended regexp of symbols to exclude # it will be wrapped by ` (' and `)$', so one must not match beginning or # end of line. Example: `a|bc|.*d.*' will exclude the symbols `a' and `bc', # as well as any symbol that contains `d'. exclude_expsyms_F77="_GLOBAL_OFFSET_TABLE_" # Although _GLOBAL_OFFSET_TABLE_ is a valid symbol C name, most a.out # platforms (ab)use it in PIC code, but their linkers get confused if # the symbol is explicitly referenced. Since portable code cannot # rely on this symbol name, it's probably fine to never include it in # preloaded symbol tables. extract_expsyms_cmds= # Just being paranoid about ensuring that cc_basename is set. for cc_temp in $compiler""; do case $cc_temp in compile | *[\\/]compile | ccache | *[\\/]ccache ) ;; distcc | *[\\/]distcc | purify | *[\\/]purify ) ;; \-*) ;; *) break;; esac done cc_basename=`$echo "X$cc_temp" | $Xsed -e 's%.*/%%' -e "s%^$host_alias-%%"` case $host_os in cygwin* | mingw* | pw32*) # FIXME: the MSVC++ port hasn't been tested in a loooong time # When not using gcc, we currently assume that we are using # Microsoft Visual C++. if test "$GCC" != yes; then with_gnu_ld=no fi ;; interix*) # we just hope/assume this is gcc and not c89 (= MSVC++) with_gnu_ld=yes ;; openbsd*) with_gnu_ld=no ;; esac ld_shlibs_F77=yes if test "$with_gnu_ld" = yes; then # If archive_cmds runs LD, not CC, wlarc should be empty wlarc='${wl}' # Set some defaults for GNU ld with shared library support. These # are reset later if shared libraries are not supported. Putting them # here allows them to be overridden if necessary. runpath_var=LD_RUN_PATH hardcode_libdir_flag_spec_F77='${wl}--rpath ${wl}$libdir' export_dynamic_flag_spec_F77='${wl}--export-dynamic' # ancient GNU ld didn't support --whole-archive et. al. if $LD --help 2>&1 | grep 'no-whole-archive' > /dev/null; then whole_archive_flag_spec_F77="$wlarc"'--whole-archive$convenience '"$wlarc"'--no-whole-archive' else whole_archive_flag_spec_F77= fi supports_anon_versioning=no case `$LD -v 2>/dev/null` in *\ [01].* | *\ 2.[0-9].* | *\ 2.10.*) ;; # catch versions < 2.11 *\ 2.11.93.0.2\ *) supports_anon_versioning=yes ;; # RH7.3 ... *\ 2.11.92.0.12\ *) supports_anon_versioning=yes ;; # Mandrake 8.2 ... *\ 2.11.*) ;; # other 2.11 versions *) supports_anon_versioning=yes ;; esac # See if GNU ld supports shared libraries. case $host_os in aix3* | aix4* | aix5*) # On AIX/PPC, the GNU linker is very broken if test "$host_cpu" != ia64; then ld_shlibs_F77=no cat <&2 *** Warning: the GNU linker, at least up to release 2.9.1, is reported *** to be unable to reliably create shared libraries on AIX. *** Therefore, libtool is disabling shared libraries support. If you *** really care for shared libraries, you may want to modify your PATH *** so that a non-GNU linker is found, and then restart. EOF fi ;; amigaos*) archive_cmds_F77='$rm $output_objdir/a2ixlibrary.data~$echo "#define NAME $libname" > $output_objdir/a2ixlibrary.data~$echo "#define LIBRARY_ID 1" >> $output_objdir/a2ixlibrary.data~$echo "#define VERSION $major" >> $output_objdir/a2ixlibrary.data~$echo "#define REVISION $revision" >> $output_objdir/a2ixlibrary.data~$AR $AR_FLAGS $lib $libobjs~$RANLIB $lib~(cd $output_objdir && a2ixlibrary -32)' hardcode_libdir_flag_spec_F77='-L$libdir' hardcode_minus_L_F77=yes # Samuel A. Falvo II reports # that the semantics of dynamic libraries on AmigaOS, at least up # to version 4, is to share data among multiple programs linked # with the same dynamic library. Since this doesn't match the # behavior of shared libraries on other platforms, we can't use # them. ld_shlibs_F77=no ;; beos*) if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then allow_undefined_flag_F77=unsupported # Joseph Beckenbach says some releases of gcc # support --undefined. This deserves some investigation. FIXME archive_cmds_F77='$CC -nostart $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' else ld_shlibs_F77=no fi ;; cygwin* | mingw* | pw32*) # _LT_AC_TAGVAR(hardcode_libdir_flag_spec, F77) is actually meaningless, # as there is no search path for DLLs. hardcode_libdir_flag_spec_F77='-L$libdir' allow_undefined_flag_F77=unsupported always_export_symbols_F77=no enable_shared_with_static_runtimes_F77=yes export_symbols_cmds_F77='$NM $libobjs $convenience | $global_symbol_pipe | $SED -e '\''/^[BCDGRS] /s/.* \([^ ]*\)/\1 DATA/'\'' | $SED -e '\''/^[AITW] /s/.* //'\'' | sort | uniq > $export_symbols' if $LD --help 2>&1 | grep 'auto-import' > /dev/null; then archive_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' # If the export-symbols file already is a .def file (1st line # is EXPORTS), use it as is; otherwise, prepend... archive_expsym_cmds_F77='if test "x`$SED 1q $export_symbols`" = xEXPORTS; then cp $export_symbols $output_objdir/$soname.def; else echo EXPORTS > $output_objdir/$soname.def; cat $export_symbols >> $output_objdir/$soname.def; fi~ $CC -shared $output_objdir/$soname.def $libobjs $deplibs $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' else ld_shlibs_F77=no fi ;; interix3*) hardcode_direct_F77=no hardcode_shlibpath_var_F77=no hardcode_libdir_flag_spec_F77='${wl}-rpath,$libdir' export_dynamic_flag_spec_F77='${wl}-E' # Hack: On Interix 3.x, we cannot compile PIC because of a broken gcc. # Instead, shared libraries are loaded at an image base (0x10000000 by # default) and relocated if they conflict, which is a slow very memory # consuming and fragmenting process. To avoid this, we pick a random, # 256 KiB-aligned image base between 0x50000000 and 0x6FFC0000 at link # time. Moving up from 0x10000000 also allows more sbrk(2) space. archive_cmds_F77='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags ${wl}-h,$soname ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' archive_expsym_cmds_F77='sed "s,^,_," $export_symbols >$output_objdir/$soname.expsym~$CC -shared $pic_flag $libobjs $deplibs $compiler_flags ${wl}-h,$soname ${wl}--retain-symbols-file,$output_objdir/$soname.expsym ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' ;; linux*) if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then tmp_addflag= case $cc_basename,$host_cpu in pgcc*) # Portland Group C compiler whole_archive_flag_spec_F77='${wl}--whole-archive`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}--no-whole-archive' tmp_addflag=' $pic_flag' ;; pgf77* | pgf90* | pgf95*) # Portland Group f77 and f90 compilers whole_archive_flag_spec_F77='${wl}--whole-archive`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}--no-whole-archive' tmp_addflag=' $pic_flag -Mnomain' ;; ecc*,ia64* | icc*,ia64*) # Intel C compiler on ia64 tmp_addflag=' -i_dynamic' ;; efc*,ia64* | ifort*,ia64*) # Intel Fortran compiler on ia64 tmp_addflag=' -i_dynamic -nofor_main' ;; ifc* | ifort*) # Intel Fortran compiler tmp_addflag=' -nofor_main' ;; esac archive_cmds_F77='$CC -shared'"$tmp_addflag"' $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' if test $supports_anon_versioning = yes; then archive_expsym_cmds_F77='$echo "{ global:" > $output_objdir/$libname.ver~ cat $export_symbols | sed -e "s/\(.*\)/\1;/" >> $output_objdir/$libname.ver~ $echo "local: *; };" >> $output_objdir/$libname.ver~ $CC -shared'"$tmp_addflag"' $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-version-script ${wl}$output_objdir/$libname.ver -o $lib' fi else ld_shlibs_F77=no fi ;; netbsd*) if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then archive_cmds_F77='$LD -Bshareable $libobjs $deplibs $linker_flags -o $lib' wlarc= else archive_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' archive_expsym_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' fi ;; solaris*) if $LD -v 2>&1 | grep 'BFD 2\.8' > /dev/null; then ld_shlibs_F77=no cat <&2 *** Warning: The releases 2.8.* of the GNU linker cannot reliably *** create shared libraries on Solaris systems. Therefore, libtool *** is disabling shared libraries support. We urge you to upgrade GNU *** binutils to release 2.9.1 or newer. Another option is to modify *** your PATH or compiler configuration so that the native linker is *** used, and then restart. EOF elif $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then archive_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' archive_expsym_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' else ld_shlibs_F77=no fi ;; sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX*) case `$LD -v 2>&1` in *\ [01].* | *\ 2.[0-9].* | *\ 2.1[0-5].*) ld_shlibs_F77=no cat <<_LT_EOF 1>&2 *** Warning: Releases of the GNU linker prior to 2.16.91.0.3 can not *** reliably create shared libraries on SCO systems. Therefore, libtool *** is disabling shared libraries support. We urge you to upgrade GNU *** binutils to release 2.16.91.0.3 or newer. Another option is to modify *** your PATH or compiler configuration so that the native linker is *** used, and then restart. _LT_EOF ;; *) if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then hardcode_libdir_flag_spec_F77='`test -z "$SCOABSPATH" && echo ${wl}-rpath,$libdir`' archive_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib' archive_expsym_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname,\${SCOABSPATH:+${install_libdir}/}$soname,-retain-symbols-file,$export_symbols -o $lib' else ld_shlibs_F77=no fi ;; esac ;; sunos4*) archive_cmds_F77='$LD -assert pure-text -Bshareable -o $lib $libobjs $deplibs $linker_flags' wlarc= hardcode_direct_F77=yes hardcode_shlibpath_var_F77=no ;; *) if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then archive_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' archive_expsym_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' else ld_shlibs_F77=no fi ;; esac if test "$ld_shlibs_F77" = no; then runpath_var= hardcode_libdir_flag_spec_F77= export_dynamic_flag_spec_F77= whole_archive_flag_spec_F77= fi else # PORTME fill in a description of your system's linker (not GNU ld) case $host_os in aix3*) allow_undefined_flag_F77=unsupported always_export_symbols_F77=yes archive_expsym_cmds_F77='$LD -o $output_objdir/$soname $libobjs $deplibs $linker_flags -bE:$export_symbols -T512 -H512 -bM:SRE~$AR $AR_FLAGS $lib $output_objdir/$soname' # Note: this linker hardcodes the directories in LIBPATH if there # are no directories specified by -L. hardcode_minus_L_F77=yes if test "$GCC" = yes && test -z "$lt_prog_compiler_static"; then # Neither direct hardcoding nor static linking is supported with a # broken collect2. hardcode_direct_F77=unsupported fi ;; aix4* | aix5*) if test "$host_cpu" = ia64; then # On IA64, the linker does run time linking by default, so we don't # have to do anything special. aix_use_runtimelinking=no exp_sym_flag='-Bexport' no_entry_flag="" else # If we're using GNU nm, then we don't want the "-C" option. # -C means demangle to AIX nm, but means don't demangle with GNU nm if $NM -V 2>&1 | grep 'GNU' > /dev/null; then export_symbols_cmds_F77='$NM -Bpg $libobjs $convenience | awk '\''{ if (((\$2 == "T") || (\$2 == "D") || (\$2 == "B")) && (substr(\$3,1,1) != ".")) { print \$3 } }'\'' | sort -u > $export_symbols' else export_symbols_cmds_F77='$NM -BCpg $libobjs $convenience | awk '\''{ if (((\$2 == "T") || (\$2 == "D") || (\$2 == "B")) && (substr(\$3,1,1) != ".")) { print \$3 } }'\'' | sort -u > $export_symbols' fi aix_use_runtimelinking=no # Test if we are trying to use run time linking or normal # AIX style linking. If -brtl is somewhere in LDFLAGS, we # need to do runtime linking. case $host_os in aix4.[23]|aix4.[23].*|aix5*) for ld_flag in $LDFLAGS; do if (test $ld_flag = "-brtl" || test $ld_flag = "-Wl,-brtl"); then aix_use_runtimelinking=yes break fi done ;; esac exp_sym_flag='-bexport' no_entry_flag='-bnoentry' fi # When large executables or shared objects are built, AIX ld can # have problems creating the table of contents. If linking a library # or program results in "error TOC overflow" add -mminimal-toc to # CXXFLAGS/CFLAGS for g++/gcc. In the cases where that is not # enough to fix the problem, add -Wl,-bbigtoc to LDFLAGS. archive_cmds_F77='' hardcode_direct_F77=yes hardcode_libdir_separator_F77=':' link_all_deplibs_F77=yes if test "$GCC" = yes; then case $host_os in aix4.[012]|aix4.[012].*) # We only want to do this on AIX 4.2 and lower, the check # below for broken collect2 doesn't work under 4.3+ collect2name=`${CC} -print-prog-name=collect2` if test -f "$collect2name" && \ strings "$collect2name" | grep resolve_lib_name >/dev/null then # We have reworked collect2 hardcode_direct_F77=yes else # We have old collect2 hardcode_direct_F77=unsupported # It fails to find uninstalled libraries when the uninstalled # path is not listed in the libpath. Setting hardcode_minus_L # to unsupported forces relinking hardcode_minus_L_F77=yes hardcode_libdir_flag_spec_F77='-L$libdir' hardcode_libdir_separator_F77= fi ;; esac shared_flag='-shared' if test "$aix_use_runtimelinking" = yes; then shared_flag="$shared_flag "'${wl}-G' fi else # not using gcc if test "$host_cpu" = ia64; then # VisualAge C++, Version 5.5 for AIX 5L for IA-64, Beta 3 Release # chokes on -Wl,-G. The following line is correct: shared_flag='-G' else if test "$aix_use_runtimelinking" = yes; then shared_flag='${wl}-G' else shared_flag='${wl}-bM:SRE' fi fi fi # It seems that -bexpall does not export symbols beginning with # underscore (_), so it is better to generate a list of symbols to export. always_export_symbols_F77=yes if test "$aix_use_runtimelinking" = yes; then # Warning - without using the other runtime loading flags (-brtl), # -berok will link without error, but may produce a broken library. allow_undefined_flag_F77='-berok' # Determine the default libpath from the value encoded in an empty executable. cat >conftest.$ac_ext <<_ACEOF program main end _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_f77_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then aix_libpath=`dump -H conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } }'` # Check for a 64-bit object if we didn't find anything. if test -z "$aix_libpath"; then aix_libpath=`dump -HX64 conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } }'`; fi else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext if test -z "$aix_libpath"; then aix_libpath="/usr/lib:/lib"; fi hardcode_libdir_flag_spec_F77='${wl}-blibpath:$libdir:'"$aix_libpath" archive_expsym_cmds_F77="\$CC"' -o $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' $compiler_flags `if test "x${allow_undefined_flag}" != "x"; then echo "${wl}${allow_undefined_flag}"; else :; fi` '"\${wl}$exp_sym_flag:\$export_symbols $shared_flag" else if test "$host_cpu" = ia64; then hardcode_libdir_flag_spec_F77='${wl}-R $libdir:/usr/lib:/lib' allow_undefined_flag_F77="-z nodefs" archive_expsym_cmds_F77="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' $compiler_flags ${wl}${allow_undefined_flag} '"\${wl}$exp_sym_flag:\$export_symbols" else # Determine the default libpath from the value encoded in an empty executable. cat >conftest.$ac_ext <<_ACEOF program main end _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_f77_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then aix_libpath=`dump -H conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } }'` # Check for a 64-bit object if we didn't find anything. if test -z "$aix_libpath"; then aix_libpath=`dump -HX64 conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } }'`; fi else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext if test -z "$aix_libpath"; then aix_libpath="/usr/lib:/lib"; fi hardcode_libdir_flag_spec_F77='${wl}-blibpath:$libdir:'"$aix_libpath" # Warning - without using the other run time loading flags, # -berok will link without error, but may produce a broken library. no_undefined_flag_F77=' ${wl}-bernotok' allow_undefined_flag_F77=' ${wl}-berok' # Exported symbols can be pulled into shared objects from archives whole_archive_flag_spec_F77='$convenience' archive_cmds_need_lc_F77=yes # This is similar to how AIX traditionally builds its shared libraries. archive_expsym_cmds_F77="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs ${wl}-bnoentry $compiler_flags ${wl}-bE:$export_symbols${allow_undefined_flag}~$AR $AR_FLAGS $output_objdir/$libname$release.a $output_objdir/$soname' fi fi ;; amigaos*) archive_cmds_F77='$rm $output_objdir/a2ixlibrary.data~$echo "#define NAME $libname" > $output_objdir/a2ixlibrary.data~$echo "#define LIBRARY_ID 1" >> $output_objdir/a2ixlibrary.data~$echo "#define VERSION $major" >> $output_objdir/a2ixlibrary.data~$echo "#define REVISION $revision" >> $output_objdir/a2ixlibrary.data~$AR $AR_FLAGS $lib $libobjs~$RANLIB $lib~(cd $output_objdir && a2ixlibrary -32)' hardcode_libdir_flag_spec_F77='-L$libdir' hardcode_minus_L_F77=yes # see comment about different semantics on the GNU ld section ld_shlibs_F77=no ;; bsdi[45]*) export_dynamic_flag_spec_F77=-rdynamic ;; cygwin* | mingw* | pw32*) # When not using gcc, we currently assume that we are using # Microsoft Visual C++. # hardcode_libdir_flag_spec is actually meaningless, as there is # no search path for DLLs. hardcode_libdir_flag_spec_F77=' ' allow_undefined_flag_F77=unsupported # Tell ltmain to make .lib files, not .a files. libext=lib # Tell ltmain to make .dll files, not .so files. shrext_cmds=".dll" # FIXME: Setting linknames here is a bad hack. archive_cmds_F77='$CC -o $lib $libobjs $compiler_flags `echo "$deplibs" | $SED -e '\''s/ -lc$//'\''` -link -dll~linknames=' # The linker will automatically build a .lib file if we build a DLL. old_archive_From_new_cmds_F77='true' # FIXME: Should let the user specify the lib program. old_archive_cmds_F77='lib /OUT:$oldlib$oldobjs$old_deplibs' fix_srcfile_path_F77='`cygpath -w "$srcfile"`' enable_shared_with_static_runtimes_F77=yes ;; darwin* | rhapsody*) case $host_os in rhapsody* | darwin1.[012]) allow_undefined_flag_F77='${wl}-undefined ${wl}suppress' ;; *) # Darwin 1.3 on if test -z ${MACOSX_DEPLOYMENT_TARGET} ; then allow_undefined_flag_F77='${wl}-flat_namespace ${wl}-undefined ${wl}suppress' else case ${MACOSX_DEPLOYMENT_TARGET} in 10.[012]) allow_undefined_flag_F77='${wl}-flat_namespace ${wl}-undefined ${wl}suppress' ;; 10.*) allow_undefined_flag_F77='${wl}-undefined ${wl}dynamic_lookup' ;; esac fi ;; esac archive_cmds_need_lc_F77=no hardcode_direct_F77=no hardcode_automatic_F77=yes hardcode_shlibpath_var_F77=unsupported whole_archive_flag_spec_F77='' link_all_deplibs_F77=yes if test "$GCC" = yes ; then output_verbose_link_cmd='echo' archive_cmds_F77='$CC -dynamiclib $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags -install_name $rpath/$soname $verstring' module_cmds_F77='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags' # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin lds archive_expsym_cmds_F77='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -dynamiclib $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags -install_name $rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' module_expsym_cmds_F77='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' else case $cc_basename in xlc*) output_verbose_link_cmd='echo' archive_cmds_F77='$CC -qmkshrobj $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}`echo $rpath/$soname` $verstring' module_cmds_F77='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags' # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin lds archive_expsym_cmds_F77='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -qmkshrobj $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}$rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' module_expsym_cmds_F77='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' ;; *) ld_shlibs_F77=no ;; esac fi ;; dgux*) archive_cmds_F77='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' hardcode_libdir_flag_spec_F77='-L$libdir' hardcode_shlibpath_var_F77=no ;; freebsd1*) ld_shlibs_F77=no ;; # FreeBSD 2.2.[012] allows us to include c++rt0.o to get C++ constructor # support. Future versions do this automatically, but an explicit c++rt0.o # does not break anything, and helps significantly (at the cost of a little # extra space). freebsd2.2*) archive_cmds_F77='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags /usr/lib/c++rt0.o' hardcode_libdir_flag_spec_F77='-R$libdir' hardcode_direct_F77=yes hardcode_shlibpath_var_F77=no ;; # Unfortunately, older versions of FreeBSD 2 do not have this feature. freebsd2*) archive_cmds_F77='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags' hardcode_direct_F77=yes hardcode_minus_L_F77=yes hardcode_shlibpath_var_F77=no ;; # FreeBSD 3 and greater uses gcc -shared to do shared libraries. freebsd* | kfreebsd*-gnu | dragonfly*) archive_cmds_F77='$CC -shared -o $lib $libobjs $deplibs $compiler_flags' hardcode_libdir_flag_spec_F77='-R$libdir' hardcode_direct_F77=yes hardcode_shlibpath_var_F77=no ;; hpux9*) if test "$GCC" = yes; then archive_cmds_F77='$rm $output_objdir/$soname~$CC -shared -fPIC ${wl}+b ${wl}$install_libdir -o $output_objdir/$soname $libobjs $deplibs $compiler_flags~test $output_objdir/$soname = $lib || mv $output_objdir/$soname $lib' else archive_cmds_F77='$rm $output_objdir/$soname~$LD -b +b $install_libdir -o $output_objdir/$soname $libobjs $deplibs $linker_flags~test $output_objdir/$soname = $lib || mv $output_objdir/$soname $lib' fi hardcode_libdir_flag_spec_F77='${wl}+b ${wl}$libdir' hardcode_libdir_separator_F77=: hardcode_direct_F77=yes # hardcode_minus_L: Not really in the search PATH, # but as the default location of the library. hardcode_minus_L_F77=yes export_dynamic_flag_spec_F77='${wl}-E' ;; hpux10*) if test "$GCC" = yes -a "$with_gnu_ld" = no; then archive_cmds_F77='$CC -shared -fPIC ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags' else archive_cmds_F77='$LD -b +h $soname +b $install_libdir -o $lib $libobjs $deplibs $linker_flags' fi if test "$with_gnu_ld" = no; then hardcode_libdir_flag_spec_F77='${wl}+b ${wl}$libdir' hardcode_libdir_separator_F77=: hardcode_direct_F77=yes export_dynamic_flag_spec_F77='${wl}-E' # hardcode_minus_L: Not really in the search PATH, # but as the default location of the library. hardcode_minus_L_F77=yes fi ;; hpux11*) if test "$GCC" = yes -a "$with_gnu_ld" = no; then case $host_cpu in hppa*64*) archive_cmds_F77='$CC -shared ${wl}+h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' ;; ia64*) archive_cmds_F77='$CC -shared ${wl}+h ${wl}$soname ${wl}+nodefaultrpath -o $lib $libobjs $deplibs $compiler_flags' ;; *) archive_cmds_F77='$CC -shared -fPIC ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags' ;; esac else case $host_cpu in hppa*64*) archive_cmds_F77='$CC -b ${wl}+h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' ;; ia64*) archive_cmds_F77='$CC -b ${wl}+h ${wl}$soname ${wl}+nodefaultrpath -o $lib $libobjs $deplibs $compiler_flags' ;; *) archive_cmds_F77='$CC -b ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags' ;; esac fi if test "$with_gnu_ld" = no; then hardcode_libdir_flag_spec_F77='${wl}+b ${wl}$libdir' hardcode_libdir_separator_F77=: case $host_cpu in hppa*64*|ia64*) hardcode_libdir_flag_spec_ld_F77='+b $libdir' hardcode_direct_F77=no hardcode_shlibpath_var_F77=no ;; *) hardcode_direct_F77=yes export_dynamic_flag_spec_F77='${wl}-E' # hardcode_minus_L: Not really in the search PATH, # but as the default location of the library. hardcode_minus_L_F77=yes ;; esac fi ;; irix5* | irix6* | nonstopux*) if test "$GCC" = yes; then archive_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' else archive_cmds_F77='$LD -shared $libobjs $deplibs $linker_flags -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib' hardcode_libdir_flag_spec_ld_F77='-rpath $libdir' fi hardcode_libdir_flag_spec_F77='${wl}-rpath ${wl}$libdir' hardcode_libdir_separator_F77=: link_all_deplibs_F77=yes ;; netbsd*) if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then archive_cmds_F77='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags' # a.out else archive_cmds_F77='$LD -shared -o $lib $libobjs $deplibs $linker_flags' # ELF fi hardcode_libdir_flag_spec_F77='-R$libdir' hardcode_direct_F77=yes hardcode_shlibpath_var_F77=no ;; newsos6) archive_cmds_F77='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' hardcode_direct_F77=yes hardcode_libdir_flag_spec_F77='${wl}-rpath ${wl}$libdir' hardcode_libdir_separator_F77=: hardcode_shlibpath_var_F77=no ;; openbsd*) hardcode_direct_F77=yes hardcode_shlibpath_var_F77=no if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`" || test "$host_os-$host_cpu" = "openbsd2.8-powerpc"; then archive_cmds_F77='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags' archive_expsym_cmds_F77='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-retain-symbols-file,$export_symbols' hardcode_libdir_flag_spec_F77='${wl}-rpath,$libdir' export_dynamic_flag_spec_F77='${wl}-E' else case $host_os in openbsd[01].* | openbsd2.[0-7] | openbsd2.[0-7].*) archive_cmds_F77='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags' hardcode_libdir_flag_spec_F77='-R$libdir' ;; *) archive_cmds_F77='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags' hardcode_libdir_flag_spec_F77='${wl}-rpath,$libdir' ;; esac fi ;; os2*) hardcode_libdir_flag_spec_F77='-L$libdir' hardcode_minus_L_F77=yes allow_undefined_flag_F77=unsupported archive_cmds_F77='$echo "LIBRARY $libname INITINSTANCE" > $output_objdir/$libname.def~$echo "DESCRIPTION \"$libname\"" >> $output_objdir/$libname.def~$echo DATA >> $output_objdir/$libname.def~$echo " SINGLE NONSHARED" >> $output_objdir/$libname.def~$echo EXPORTS >> $output_objdir/$libname.def~emxexp $libobjs >> $output_objdir/$libname.def~$CC -Zdll -Zcrtdll -o $lib $libobjs $deplibs $compiler_flags $output_objdir/$libname.def' old_archive_From_new_cmds_F77='emximp -o $output_objdir/$libname.a $output_objdir/$libname.def' ;; osf3*) if test "$GCC" = yes; then allow_undefined_flag_F77=' ${wl}-expect_unresolved ${wl}\*' archive_cmds_F77='$CC -shared${allow_undefined_flag} $libobjs $deplibs $compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' else allow_undefined_flag_F77=' -expect_unresolved \*' archive_cmds_F77='$LD -shared${allow_undefined_flag} $libobjs $deplibs $linker_flags -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib' fi hardcode_libdir_flag_spec_F77='${wl}-rpath ${wl}$libdir' hardcode_libdir_separator_F77=: ;; osf4* | osf5*) # as osf3* with the addition of -msym flag if test "$GCC" = yes; then allow_undefined_flag_F77=' ${wl}-expect_unresolved ${wl}\*' archive_cmds_F77='$CC -shared${allow_undefined_flag} $libobjs $deplibs $compiler_flags ${wl}-msym ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' hardcode_libdir_flag_spec_F77='${wl}-rpath ${wl}$libdir' else allow_undefined_flag_F77=' -expect_unresolved \*' archive_cmds_F77='$LD -shared${allow_undefined_flag} $libobjs $deplibs $linker_flags -msym -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib' archive_expsym_cmds_F77='for i in `cat $export_symbols`; do printf "%s %s\\n" -exported_symbol "\$i" >> $lib.exp; done; echo "-hidden">> $lib.exp~ $LD -shared${allow_undefined_flag} -input $lib.exp $linker_flags $libobjs $deplibs -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib~$rm $lib.exp' # Both c and cxx compiler support -rpath directly hardcode_libdir_flag_spec_F77='-rpath $libdir' fi hardcode_libdir_separator_F77=: ;; solaris*) no_undefined_flag_F77=' -z text' if test "$GCC" = yes; then wlarc='${wl}' archive_cmds_F77='$CC -shared ${wl}-h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' archive_expsym_cmds_F77='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~ $CC -shared ${wl}-M ${wl}$lib.exp ${wl}-h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags~$rm $lib.exp' else wlarc='' archive_cmds_F77='$LD -G${allow_undefined_flag} -h $soname -o $lib $libobjs $deplibs $linker_flags' archive_expsym_cmds_F77='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~ $LD -G${allow_undefined_flag} -M $lib.exp -h $soname -o $lib $libobjs $deplibs $linker_flags~$rm $lib.exp' fi hardcode_libdir_flag_spec_F77='-R$libdir' hardcode_shlibpath_var_F77=no case $host_os in solaris2.[0-5] | solaris2.[0-5].*) ;; *) # The compiler driver will combine linker options so we # cannot just pass the convience library names through # without $wl, iff we do not link with $LD. # Luckily, gcc supports the same syntax we need for Sun Studio. # Supported since Solaris 2.6 (maybe 2.5.1?) case $wlarc in '') whole_archive_flag_spec_F77='-z allextract$convenience -z defaultextract' ;; *) whole_archive_flag_spec_F77='${wl}-z ${wl}allextract`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}-z ${wl}defaultextract' ;; esac ;; esac link_all_deplibs_F77=yes ;; sunos4*) if test "x$host_vendor" = xsequent; then # Use $CC to link under sequent, because it throws in some extra .o # files that make .init and .fini sections work. archive_cmds_F77='$CC -G ${wl}-h $soname -o $lib $libobjs $deplibs $compiler_flags' else archive_cmds_F77='$LD -assert pure-text -Bstatic -o $lib $libobjs $deplibs $linker_flags' fi hardcode_libdir_flag_spec_F77='-L$libdir' hardcode_direct_F77=yes hardcode_minus_L_F77=yes hardcode_shlibpath_var_F77=no ;; sysv4) case $host_vendor in sni) archive_cmds_F77='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' hardcode_direct_F77=yes # is this really true??? ;; siemens) ## LD is ld it makes a PLAMLIB ## CC just makes a GrossModule. archive_cmds_F77='$LD -G -o $lib $libobjs $deplibs $linker_flags' reload_cmds_F77='$CC -r -o $output$reload_objs' hardcode_direct_F77=no ;; motorola) archive_cmds_F77='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' hardcode_direct_F77=no #Motorola manual says yes, but my tests say they lie ;; esac runpath_var='LD_RUN_PATH' hardcode_shlibpath_var_F77=no ;; sysv4.3*) archive_cmds_F77='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' hardcode_shlibpath_var_F77=no export_dynamic_flag_spec_F77='-Bexport' ;; sysv4*MP*) if test -d /usr/nec; then archive_cmds_F77='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' hardcode_shlibpath_var_F77=no runpath_var=LD_RUN_PATH hardcode_runpath_var=yes ld_shlibs_F77=yes fi ;; sysv4*uw2* | sysv5OpenUNIX* | sysv5UnixWare7.[01].[10]* | unixware7*) no_undefined_flag_F77='${wl}-z,text' archive_cmds_need_lc_F77=no hardcode_shlibpath_var_F77=no runpath_var='LD_RUN_PATH' if test "$GCC" = yes; then archive_cmds_F77='$CC -shared ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' archive_expsym_cmds_F77='$CC -shared ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' else archive_cmds_F77='$CC -G ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' archive_expsym_cmds_F77='$CC -G ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' fi ;; sysv5* | sco3.2v5* | sco5v6*) # Note: We can NOT use -z defs as we might desire, because we do not # link with -lc, and that would cause any symbols used from libc to # always be unresolved, which means just about no library would # ever link correctly. If we're not using GNU ld we use -z text # though, which does catch some bad symbols but isn't as heavy-handed # as -z defs. no_undefined_flag_F77='${wl}-z,text' allow_undefined_flag_F77='${wl}-z,nodefs' archive_cmds_need_lc_F77=no hardcode_shlibpath_var_F77=no hardcode_libdir_flag_spec_F77='`test -z "$SCOABSPATH" && echo ${wl}-R,$libdir`' hardcode_libdir_separator_F77=':' link_all_deplibs_F77=yes export_dynamic_flag_spec_F77='${wl}-Bexport' runpath_var='LD_RUN_PATH' if test "$GCC" = yes; then archive_cmds_F77='$CC -shared ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' archive_expsym_cmds_F77='$CC -shared ${wl}-Bexport:$export_symbols ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' else archive_cmds_F77='$CC -G ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' archive_expsym_cmds_F77='$CC -G ${wl}-Bexport:$export_symbols ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' fi ;; uts4*) archive_cmds_F77='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' hardcode_libdir_flag_spec_F77='-L$libdir' hardcode_shlibpath_var_F77=no ;; *) ld_shlibs_F77=no ;; esac fi echo "$as_me:$LINENO: result: $ld_shlibs_F77" >&5 echo "${ECHO_T}$ld_shlibs_F77" >&6 test "$ld_shlibs_F77" = no && can_build_shared=no # # Do we need to explicitly link libc? # case "x$archive_cmds_need_lc_F77" in x|xyes) # Assume -lc should be added archive_cmds_need_lc_F77=yes if test "$enable_shared" = yes && test "$GCC" = yes; then case $archive_cmds_F77 in *'~'*) # FIXME: we may have to deal with multi-command sequences. ;; '$CC '*) # Test whether the compiler implicitly links with -lc since on some # systems, -lgcc has to come before -lc. If gcc already passes -lc # to ld, don't add -lc before -lgcc. echo "$as_me:$LINENO: checking whether -lc should be explicitly linked in" >&5 echo $ECHO_N "checking whether -lc should be explicitly linked in... $ECHO_C" >&6 $rm conftest* printf "$lt_simple_compile_test_code" > conftest.$ac_ext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } 2>conftest.err; then soname=conftest lib=conftest libobjs=conftest.$ac_objext deplibs= wl=$lt_prog_compiler_wl_F77 pic_flag=$lt_prog_compiler_pic_F77 compiler_flags=-v linker_flags=-v verstring= output_objdir=. libname=conftest lt_save_allow_undefined_flag=$allow_undefined_flag_F77 allow_undefined_flag_F77= if { (eval echo "$as_me:$LINENO: \"$archive_cmds_F77 2\>\&1 \| grep \" -lc \" \>/dev/null 2\>\&1\"") >&5 (eval $archive_cmds_F77 2\>\&1 \| grep \" -lc \" \>/dev/null 2\>\&1) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } then archive_cmds_need_lc_F77=no else archive_cmds_need_lc_F77=yes fi allow_undefined_flag_F77=$lt_save_allow_undefined_flag else cat conftest.err 1>&5 fi $rm conftest* echo "$as_me:$LINENO: result: $archive_cmds_need_lc_F77" >&5 echo "${ECHO_T}$archive_cmds_need_lc_F77" >&6 ;; esac fi ;; esac echo "$as_me:$LINENO: checking dynamic linker characteristics" >&5 echo $ECHO_N "checking dynamic linker characteristics... $ECHO_C" >&6 library_names_spec= libname_spec='lib$name' soname_spec= shrext_cmds=".so" postinstall_cmds= postuninstall_cmds= finish_cmds= finish_eval= shlibpath_var= shlibpath_overrides_runpath=unknown version_type=none dynamic_linker="$host_os ld.so" sys_lib_dlsearch_path_spec="/lib /usr/lib" if test "$GCC" = yes; then sys_lib_search_path_spec=`$CC -print-search-dirs | grep "^libraries:" | $SED -e "s/^libraries://" -e "s,=/,/,g"` if echo "$sys_lib_search_path_spec" | grep ';' >/dev/null ; then # if the path contains ";" then we assume it to be the separator # otherwise default to the standard path separator (i.e. ":") - it is # assumed that no part of a normal pathname contains ";" but that should # okay in the real world where ";" in dirpaths is itself problematic. sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e 's/;/ /g'` else sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e "s/$PATH_SEPARATOR/ /g"` fi else sys_lib_search_path_spec="/lib /usr/lib /usr/local/lib" fi need_lib_prefix=unknown hardcode_into_libs=no # when you set need_version to no, make sure it does not cause -set_version # flags to be left without arguments need_version=unknown case $host_os in aix3*) version_type=linux library_names_spec='${libname}${release}${shared_ext}$versuffix $libname.a' shlibpath_var=LIBPATH # AIX 3 has no versioning support, so we append a major version to the name. soname_spec='${libname}${release}${shared_ext}$major' ;; aix4* | aix5*) version_type=linux need_lib_prefix=no need_version=no hardcode_into_libs=yes if test "$host_cpu" = ia64; then # AIX 5 supports IA64 library_names_spec='${libname}${release}${shared_ext}$major ${libname}${release}${shared_ext}$versuffix $libname${shared_ext}' shlibpath_var=LD_LIBRARY_PATH else # With GCC up to 2.95.x, collect2 would create an import file # for dependence libraries. The import file would start with # the line `#! .'. This would cause the generated library to # depend on `.', always an invalid library. This was fixed in # development snapshots of GCC prior to 3.0. case $host_os in aix4 | aix4.[01] | aix4.[01].*) if { echo '#if __GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 97)' echo ' yes ' echo '#endif'; } | ${CC} -E - | grep yes > /dev/null; then : else can_build_shared=no fi ;; esac # AIX (on Power*) has no versioning support, so currently we can not hardcode correct # soname into executable. Probably we can add versioning support to # collect2, so additional links can be useful in future. if test "$aix_use_runtimelinking" = yes; then # If using run time linking (on AIX 4.2 or later) use lib.so # instead of lib.a to let people know that these are not # typical AIX shared libraries. library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' else # We preserve .a as extension for shared libraries through AIX4.2 # and later when we are not doing run time linking. library_names_spec='${libname}${release}.a $libname.a' soname_spec='${libname}${release}${shared_ext}$major' fi shlibpath_var=LIBPATH fi ;; amigaos*) library_names_spec='$libname.ixlibrary $libname.a' # Create ${libname}_ixlibrary.a entries in /sys/libs. finish_eval='for lib in `ls $libdir/*.ixlibrary 2>/dev/null`; do libname=`$echo "X$lib" | $Xsed -e '\''s%^.*/\([^/]*\)\.ixlibrary$%\1%'\''`; test $rm /sys/libs/${libname}_ixlibrary.a; $show "cd /sys/libs && $LN_S $lib ${libname}_ixlibrary.a"; cd /sys/libs && $LN_S $lib ${libname}_ixlibrary.a || exit 1; done' ;; beos*) library_names_spec='${libname}${shared_ext}' dynamic_linker="$host_os ld.so" shlibpath_var=LIBRARY_PATH ;; bsdi[45]*) version_type=linux need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' finish_cmds='PATH="\$PATH:/sbin" ldconfig $libdir' shlibpath_var=LD_LIBRARY_PATH sys_lib_search_path_spec="/shlib /usr/lib /usr/X11/lib /usr/contrib/lib /lib /usr/local/lib" sys_lib_dlsearch_path_spec="/shlib /usr/lib /usr/local/lib" # the default ld.so.conf also contains /usr/contrib/lib and # /usr/X11R6/lib (/usr/X11 is a link to /usr/X11R6), but let us allow # libtool to hard-code these into programs ;; cygwin* | mingw* | pw32*) version_type=windows shrext_cmds=".dll" need_version=no need_lib_prefix=no case $GCC,$host_os in yes,cygwin* | yes,mingw* | yes,pw32*) library_names_spec='$libname.dll.a' # DLL is installed to $(libdir)/../bin by postinstall_cmds postinstall_cmds='base_file=`basename \${file}`~ dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i;echo \$dlname'\''`~ dldir=$destdir/`dirname \$dlpath`~ test -d \$dldir || mkdir -p \$dldir~ $install_prog $dir/$dlname \$dldir/$dlname~ chmod a+x \$dldir/$dlname' postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~ dlpath=$dir/\$dldll~ $rm \$dlpath' shlibpath_overrides_runpath=yes case $host_os in cygwin*) # Cygwin DLLs use 'cyg' prefix rather than 'lib' soname_spec='`echo ${libname} | sed -e 's/^lib/cyg/'``echo ${release} | $SED -e 's/[.]/-/g'`${versuffix}${shared_ext}' sys_lib_search_path_spec="/usr/lib /lib/w32api /lib /usr/local/lib" ;; mingw*) # MinGW DLLs use traditional 'lib' prefix soname_spec='${libname}`echo ${release} | $SED -e 's/[.]/-/g'`${versuffix}${shared_ext}' sys_lib_search_path_spec=`$CC -print-search-dirs | grep "^libraries:" | $SED -e "s/^libraries://" -e "s,=/,/,g"` if echo "$sys_lib_search_path_spec" | grep ';[c-zC-Z]:/' >/dev/null; then # It is most probably a Windows format PATH printed by # mingw gcc, but we are running on Cygwin. Gcc prints its search # path with ; separators, and with drive letters. We can handle the # drive letters (cygwin fileutils understands them), so leave them, # especially as we might pass files found there to a mingw objdump, # which wouldn't understand a cygwinified path. Ahh. sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e 's/;/ /g'` else sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e "s/$PATH_SEPARATOR/ /g"` fi ;; pw32*) # pw32 DLLs use 'pw' prefix rather than 'lib' library_names_spec='`echo ${libname} | sed -e 's/^lib/pw/'``echo ${release} | $SED -e 's/[.]/-/g'`${versuffix}${shared_ext}' ;; esac ;; *) library_names_spec='${libname}`echo ${release} | $SED -e 's/[.]/-/g'`${versuffix}${shared_ext} $libname.lib' ;; esac dynamic_linker='Win32 ld.exe' # FIXME: first we should search . and the directory the executable is in shlibpath_var=PATH ;; darwin* | rhapsody*) dynamic_linker="$host_os dyld" version_type=darwin need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${versuffix}$shared_ext ${libname}${release}${major}$shared_ext ${libname}$shared_ext' soname_spec='${libname}${release}${major}$shared_ext' shlibpath_overrides_runpath=yes shlibpath_var=DYLD_LIBRARY_PATH shrext_cmds='`test .$module = .yes && echo .so || echo .dylib`' # Apple's gcc prints 'gcc -print-search-dirs' doesn't operate the same. if test "$GCC" = yes; then sys_lib_search_path_spec=`$CC -print-search-dirs | tr "\n" "$PATH_SEPARATOR" | sed -e 's/libraries:/@libraries:/' | tr "@" "\n" | grep "^libraries:" | sed -e "s/^libraries://" -e "s,=/,/,g" -e "s,$PATH_SEPARATOR, ,g" -e "s,.*,& /lib /usr/lib /usr/local/lib,g"` else sys_lib_search_path_spec='/lib /usr/lib /usr/local/lib' fi sys_lib_dlsearch_path_spec='/usr/local/lib /lib /usr/lib' ;; dgux*) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname$shared_ext' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH ;; freebsd1*) dynamic_linker=no ;; kfreebsd*-gnu) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=no hardcode_into_libs=yes dynamic_linker='GNU ld.so' ;; freebsd* | dragonfly*) # DragonFly does not have aout. When/if they implement a new # versioning mechanism, adjust this. if test -x /usr/bin/objformat; then objformat=`/usr/bin/objformat` else case $host_os in freebsd[123]*) objformat=aout ;; *) objformat=elf ;; esac fi version_type=freebsd-$objformat case $version_type in freebsd-elf*) library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext} $libname${shared_ext}' need_version=no need_lib_prefix=no ;; freebsd-*) library_names_spec='${libname}${release}${shared_ext}$versuffix $libname${shared_ext}$versuffix' need_version=yes ;; esac shlibpath_var=LD_LIBRARY_PATH case $host_os in freebsd2*) shlibpath_overrides_runpath=yes ;; freebsd3.[01]* | freebsdelf3.[01]*) shlibpath_overrides_runpath=yes hardcode_into_libs=yes ;; freebsd3.[2-9]* | freebsdelf3.[2-9]* | \ freebsd4.[0-5] | freebsdelf4.[0-5] | freebsd4.1.1 | freebsdelf4.1.1) shlibpath_overrides_runpath=no hardcode_into_libs=yes ;; freebsd*) # from 4.6 on shlibpath_overrides_runpath=yes hardcode_into_libs=yes ;; esac ;; gnu*) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}${major} ${libname}${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH hardcode_into_libs=yes ;; hpux9* | hpux10* | hpux11*) # Give a soname corresponding to the major version so that dld.sl refuses to # link against other versions. version_type=sunos need_lib_prefix=no need_version=no case $host_cpu in ia64*) shrext_cmds='.so' hardcode_into_libs=yes dynamic_linker="$host_os dld.so" shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=yes # Unless +noenvvar is specified. library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' if test "X$HPUX_IA64_MODE" = X32; then sys_lib_search_path_spec="/usr/lib/hpux32 /usr/local/lib/hpux32 /usr/local/lib" else sys_lib_search_path_spec="/usr/lib/hpux64 /usr/local/lib/hpux64" fi sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec ;; hppa*64*) shrext_cmds='.sl' hardcode_into_libs=yes dynamic_linker="$host_os dld.sl" shlibpath_var=LD_LIBRARY_PATH # How should we handle SHLIB_PATH shlibpath_overrides_runpath=yes # Unless +noenvvar is specified. library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' sys_lib_search_path_spec="/usr/lib/pa20_64 /usr/ccs/lib/pa20_64" sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec ;; *) shrext_cmds='.sl' dynamic_linker="$host_os dld.sl" shlibpath_var=SHLIB_PATH shlibpath_overrides_runpath=no # +s is required to enable SHLIB_PATH library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' ;; esac # HP-UX runs *really* slowly unless shared libraries are mode 555. postinstall_cmds='chmod 555 $lib' ;; interix3*) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' dynamic_linker='Interix 3.x ld.so.1 (PE, like ELF)' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=no hardcode_into_libs=yes ;; irix5* | irix6* | nonstopux*) case $host_os in nonstopux*) version_type=nonstopux ;; *) if test "$lt_cv_prog_gnu_ld" = yes; then version_type=linux else version_type=irix fi ;; esac need_lib_prefix=no need_version=no soname_spec='${libname}${release}${shared_ext}$major' library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${release}${shared_ext} $libname${shared_ext}' case $host_os in irix5* | nonstopux*) libsuff= shlibsuff= ;; *) case $LD in # libtool.m4 will add one of these switches to LD *-32|*"-32 "|*-melf32bsmip|*"-melf32bsmip ") libsuff= shlibsuff= libmagic=32-bit;; *-n32|*"-n32 "|*-melf32bmipn32|*"-melf32bmipn32 ") libsuff=32 shlibsuff=N32 libmagic=N32;; *-64|*"-64 "|*-melf64bmip|*"-melf64bmip ") libsuff=64 shlibsuff=64 libmagic=64-bit;; *) libsuff= shlibsuff= libmagic=never-match;; esac ;; esac shlibpath_var=LD_LIBRARY${shlibsuff}_PATH shlibpath_overrides_runpath=no sys_lib_search_path_spec="/usr/lib${libsuff} /lib${libsuff} /usr/local/lib${libsuff}" sys_lib_dlsearch_path_spec="/usr/lib${libsuff} /lib${libsuff}" hardcode_into_libs=yes ;; # No shared lib support for Linux oldld, aout, or coff. linux*oldld* | linux*aout* | linux*coff*) dynamic_linker=no ;; # This must be Linux ELF. linux*) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' finish_cmds='PATH="\$PATH:/sbin" ldconfig -n $libdir' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=no # This implies no fast_install, which is unacceptable. # Some rework will be needed to allow for fast_install # before this can be enabled. hardcode_into_libs=yes # Append ld.so.conf contents to the search path if test -f /etc/ld.so.conf; then lt_ld_extra=`awk '/^include / { system(sprintf("cd /etc; cat %s", \$2)); skip = 1; } { if (!skip) print \$0; skip = 0; }' < /etc/ld.so.conf | $SED -e 's/#.*//;s/[:, ]/ /g;s/=[^=]*$//;s/=[^= ]* / /g;/^$/d' | tr '\n' ' '` sys_lib_dlsearch_path_spec="/lib /usr/lib $lt_ld_extra" fi # We used to test for /lib/ld.so.1 and disable shared libraries on # powerpc, because MkLinux only supported shared libraries with the # GNU dynamic linker. Since this was broken with cross compilers, # most powerpc-linux boxes support dynamic linking these days and # people can always --disable-shared, the test was removed, and we # assume the GNU/Linux dynamic linker is in use. dynamic_linker='GNU/Linux ld.so' ;; knetbsd*-gnu) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=no hardcode_into_libs=yes dynamic_linker='GNU ld.so' ;; netbsd*) version_type=sunos need_lib_prefix=no need_version=no if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix' finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir' dynamic_linker='NetBSD (a.out) ld.so' else library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' dynamic_linker='NetBSD ld.elf_so' fi shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=yes hardcode_into_libs=yes ;; newsos6) version_type=linux library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=yes ;; nto-qnx*) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=yes ;; openbsd*) version_type=sunos sys_lib_dlsearch_path_spec="/usr/lib" need_lib_prefix=no # Some older versions of OpenBSD (3.3 at least) *do* need versioned libs. case $host_os in openbsd3.3 | openbsd3.3.*) need_version=yes ;; *) need_version=no ;; esac library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix' finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir' shlibpath_var=LD_LIBRARY_PATH if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`" || test "$host_os-$host_cpu" = "openbsd2.8-powerpc"; then case $host_os in openbsd2.[89] | openbsd2.[89].*) shlibpath_overrides_runpath=no ;; *) shlibpath_overrides_runpath=yes ;; esac else shlibpath_overrides_runpath=yes fi ;; os2*) libname_spec='$name' shrext_cmds=".dll" need_lib_prefix=no library_names_spec='$libname${shared_ext} $libname.a' dynamic_linker='OS/2 ld.exe' shlibpath_var=LIBPATH ;; osf3* | osf4* | osf5*) version_type=osf need_lib_prefix=no need_version=no soname_spec='${libname}${release}${shared_ext}$major' library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' shlibpath_var=LD_LIBRARY_PATH sys_lib_search_path_spec="/usr/shlib /usr/ccs/lib /usr/lib/cmplrs/cc /usr/lib /usr/local/lib /var/shlib" sys_lib_dlsearch_path_spec="$sys_lib_search_path_spec" ;; solaris*) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=yes hardcode_into_libs=yes # ldd complains unless libraries are executable postinstall_cmds='chmod +x $lib' ;; sunos4*) version_type=sunos library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix' finish_cmds='PATH="\$PATH:/usr/etc" ldconfig $libdir' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=yes if test "$with_gnu_ld" = yes; then need_lib_prefix=no fi need_version=yes ;; sysv4 | sysv4.3*) version_type=linux library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH case $host_vendor in sni) shlibpath_overrides_runpath=no need_lib_prefix=no export_dynamic_flag_spec='${wl}-Blargedynsym' runpath_var=LD_RUN_PATH ;; siemens) need_lib_prefix=no ;; motorola) need_lib_prefix=no need_version=no shlibpath_overrides_runpath=no sys_lib_search_path_spec='/lib /usr/lib /usr/ccs/lib' ;; esac ;; sysv4*MP*) if test -d /usr/nec ;then version_type=linux library_names_spec='$libname${shared_ext}.$versuffix $libname${shared_ext}.$major $libname${shared_ext}' soname_spec='$libname${shared_ext}.$major' shlibpath_var=LD_LIBRARY_PATH fi ;; sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX* | sysv4*uw2*) version_type=freebsd-elf need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext} $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH hardcode_into_libs=yes if test "$with_gnu_ld" = yes; then sys_lib_search_path_spec='/usr/local/lib /usr/gnu/lib /usr/ccs/lib /usr/lib /lib' shlibpath_overrides_runpath=no else sys_lib_search_path_spec='/usr/ccs/lib /usr/lib' shlibpath_overrides_runpath=yes case $host_os in sco3.2v5*) sys_lib_search_path_spec="$sys_lib_search_path_spec /lib" ;; esac fi sys_lib_dlsearch_path_spec='/usr/lib' ;; uts4*) version_type=linux library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH ;; *) dynamic_linker=no ;; esac echo "$as_me:$LINENO: result: $dynamic_linker" >&5 echo "${ECHO_T}$dynamic_linker" >&6 test "$dynamic_linker" = no && can_build_shared=no variables_saved_for_relink="PATH $shlibpath_var $runpath_var" if test "$GCC" = yes; then variables_saved_for_relink="$variables_saved_for_relink GCC_EXEC_PREFIX COMPILER_PATH LIBRARY_PATH" fi echo "$as_me:$LINENO: checking how to hardcode library paths into programs" >&5 echo $ECHO_N "checking how to hardcode library paths into programs... $ECHO_C" >&6 hardcode_action_F77= if test -n "$hardcode_libdir_flag_spec_F77" || \ test -n "$runpath_var_F77" || \ test "X$hardcode_automatic_F77" = "Xyes" ; then # We can hardcode non-existant directories. if test "$hardcode_direct_F77" != no && # If the only mechanism to avoid hardcoding is shlibpath_var, we # have to relink, otherwise we might link with an installed library # when we should be linking with a yet-to-be-installed one ## test "$_LT_AC_TAGVAR(hardcode_shlibpath_var, F77)" != no && test "$hardcode_minus_L_F77" != no; then # Linking always hardcodes the temporary library directory. hardcode_action_F77=relink else # We can link without hardcoding, and we can hardcode nonexisting dirs. hardcode_action_F77=immediate fi else # We cannot hardcode anything, or else we can only hardcode existing # directories. hardcode_action_F77=unsupported fi echo "$as_me:$LINENO: result: $hardcode_action_F77" >&5 echo "${ECHO_T}$hardcode_action_F77" >&6 if test "$hardcode_action_F77" = relink; then # Fast installation is not supported enable_fast_install=no elif test "$shlibpath_overrides_runpath" = yes || test "$enable_shared" = no; then # Fast installation is not necessary enable_fast_install=needless fi # The else clause should only fire when bootstrapping the # libtool distribution, otherwise you forgot to ship ltmain.sh # with your package, and you will get complaints that there are # no rules to generate ltmain.sh. if test -f "$ltmain"; then # See if we are running on zsh, and set the options which allow our commands through # without removal of \ escapes. if test -n "${ZSH_VERSION+set}" ; then setopt NO_GLOB_SUBST fi # Now quote all the things that may contain metacharacters while being # careful not to overquote the AC_SUBSTed values. We take copies of the # variables and quote the copies for generation of the libtool script. for var in echo old_CC old_CFLAGS AR AR_FLAGS EGREP RANLIB LN_S LTCC LTCFLAGS NM \ SED SHELL STRIP \ libname_spec library_names_spec soname_spec extract_expsyms_cmds \ old_striplib striplib file_magic_cmd finish_cmds finish_eval \ deplibs_check_method reload_flag reload_cmds need_locks \ lt_cv_sys_global_symbol_pipe lt_cv_sys_global_symbol_to_cdecl \ lt_cv_sys_global_symbol_to_c_name_address \ sys_lib_search_path_spec sys_lib_dlsearch_path_spec \ old_postinstall_cmds old_postuninstall_cmds \ compiler_F77 \ CC_F77 \ LD_F77 \ lt_prog_compiler_wl_F77 \ lt_prog_compiler_pic_F77 \ lt_prog_compiler_static_F77 \ lt_prog_compiler_no_builtin_flag_F77 \ export_dynamic_flag_spec_F77 \ thread_safe_flag_spec_F77 \ whole_archive_flag_spec_F77 \ enable_shared_with_static_runtimes_F77 \ old_archive_cmds_F77 \ old_archive_from_new_cmds_F77 \ predep_objects_F77 \ postdep_objects_F77 \ predeps_F77 \ postdeps_F77 \ compiler_lib_search_path_F77 \ archive_cmds_F77 \ archive_expsym_cmds_F77 \ postinstall_cmds_F77 \ postuninstall_cmds_F77 \ old_archive_from_expsyms_cmds_F77 \ allow_undefined_flag_F77 \ no_undefined_flag_F77 \ export_symbols_cmds_F77 \ hardcode_libdir_flag_spec_F77 \ hardcode_libdir_flag_spec_ld_F77 \ hardcode_libdir_separator_F77 \ hardcode_automatic_F77 \ module_cmds_F77 \ module_expsym_cmds_F77 \ lt_cv_prog_compiler_c_o_F77 \ exclude_expsyms_F77 \ include_expsyms_F77; do case $var in old_archive_cmds_F77 | \ old_archive_from_new_cmds_F77 | \ archive_cmds_F77 | \ archive_expsym_cmds_F77 | \ module_cmds_F77 | \ module_expsym_cmds_F77 | \ old_archive_from_expsyms_cmds_F77 | \ export_symbols_cmds_F77 | \ extract_expsyms_cmds | reload_cmds | finish_cmds | \ postinstall_cmds | postuninstall_cmds | \ old_postinstall_cmds | old_postuninstall_cmds | \ sys_lib_search_path_spec | sys_lib_dlsearch_path_spec) # Double-quote double-evaled strings. eval "lt_$var=\\\"\`\$echo \"X\$$var\" | \$Xsed -e \"\$double_quote_subst\" -e \"\$sed_quote_subst\" -e \"\$delay_variable_subst\"\`\\\"" ;; *) eval "lt_$var=\\\"\`\$echo \"X\$$var\" | \$Xsed -e \"\$sed_quote_subst\"\`\\\"" ;; esac done case $lt_echo in *'\$0 --fallback-echo"') lt_echo=`$echo "X$lt_echo" | $Xsed -e 's/\\\\\\\$0 --fallback-echo"$/$0 --fallback-echo"/'` ;; esac cfgfile="$ofile" cat <<__EOF__ >> "$cfgfile" # ### BEGIN LIBTOOL TAG CONFIG: $tagname # Libtool was configured on host `(hostname || uname -n) 2>/dev/null | sed 1q`: # Shell to use when invoking shell scripts. SHELL=$lt_SHELL # Whether or not to build shared libraries. build_libtool_libs=$enable_shared # Whether or not to build static libraries. build_old_libs=$enable_static # Whether or not to add -lc for building shared libraries. build_libtool_need_lc=$archive_cmds_need_lc_F77 # Whether or not to disallow shared libs when runtime libs are static allow_libtool_libs_with_static_runtimes=$enable_shared_with_static_runtimes_F77 # Whether or not to optimize for fast installation. fast_install=$enable_fast_install # The host system. host_alias=$host_alias host=$host host_os=$host_os # The build system. build_alias=$build_alias build=$build build_os=$build_os # An echo program that does not interpret backslashes. echo=$lt_echo # The archiver. AR=$lt_AR AR_FLAGS=$lt_AR_FLAGS # A C compiler. LTCC=$lt_LTCC # LTCC compiler flags. LTCFLAGS=$lt_LTCFLAGS # A language-specific compiler. CC=$lt_compiler_F77 # Is the compiler the GNU C compiler? with_gcc=$GCC_F77 # An ERE matcher. EGREP=$lt_EGREP # The linker used to build libraries. LD=$lt_LD_F77 # Whether we need hard or soft links. LN_S=$lt_LN_S # A BSD-compatible nm program. NM=$lt_NM # A symbol stripping program STRIP=$lt_STRIP # Used to examine libraries when file_magic_cmd begins "file" MAGIC_CMD=$MAGIC_CMD # Used on cygwin: DLL creation program. DLLTOOL="$DLLTOOL" # Used on cygwin: object dumper. OBJDUMP="$OBJDUMP" # Used on cygwin: assembler. AS="$AS" # The name of the directory that contains temporary libtool files. objdir=$objdir # How to create reloadable object files. reload_flag=$lt_reload_flag reload_cmds=$lt_reload_cmds # How to pass a linker flag through the compiler. wl=$lt_lt_prog_compiler_wl_F77 # Object file suffix (normally "o"). objext="$ac_objext" # Old archive suffix (normally "a"). libext="$libext" # Shared library suffix (normally ".so"). shrext_cmds='$shrext_cmds' # Executable file suffix (normally ""). exeext="$exeext" # Additional compiler flags for building library objects. pic_flag=$lt_lt_prog_compiler_pic_F77 pic_mode=$pic_mode # What is the maximum length of a command? max_cmd_len=$lt_cv_sys_max_cmd_len # Does compiler simultaneously support -c and -o options? compiler_c_o=$lt_lt_cv_prog_compiler_c_o_F77 # Must we lock files when doing compilation? need_locks=$lt_need_locks # Do we need the lib prefix for modules? need_lib_prefix=$need_lib_prefix # Do we need a version for libraries? need_version=$need_version # Whether dlopen is supported. dlopen_support=$enable_dlopen # Whether dlopen of programs is supported. dlopen_self=$enable_dlopen_self # Whether dlopen of statically linked programs is supported. dlopen_self_static=$enable_dlopen_self_static # Compiler flag to prevent dynamic linking. link_static_flag=$lt_lt_prog_compiler_static_F77 # Compiler flag to turn off builtin functions. no_builtin_flag=$lt_lt_prog_compiler_no_builtin_flag_F77 # Compiler flag to allow reflexive dlopens. export_dynamic_flag_spec=$lt_export_dynamic_flag_spec_F77 # Compiler flag to generate shared objects directly from archives. whole_archive_flag_spec=$lt_whole_archive_flag_spec_F77 # Compiler flag to generate thread-safe objects. thread_safe_flag_spec=$lt_thread_safe_flag_spec_F77 # Library versioning type. version_type=$version_type # Format of library name prefix. libname_spec=$lt_libname_spec # List of archive names. First name is the real one, the rest are links. # The last name is the one that the linker finds with -lNAME. library_names_spec=$lt_library_names_spec # The coded name of the library, if different from the real name. soname_spec=$lt_soname_spec # Commands used to build and install an old-style archive. RANLIB=$lt_RANLIB old_archive_cmds=$lt_old_archive_cmds_F77 old_postinstall_cmds=$lt_old_postinstall_cmds old_postuninstall_cmds=$lt_old_postuninstall_cmds # Create an old-style archive from a shared archive. old_archive_from_new_cmds=$lt_old_archive_from_new_cmds_F77 # Create a temporary old-style archive to link instead of a shared archive. old_archive_from_expsyms_cmds=$lt_old_archive_from_expsyms_cmds_F77 # Commands used to build and install a shared archive. archive_cmds=$lt_archive_cmds_F77 archive_expsym_cmds=$lt_archive_expsym_cmds_F77 postinstall_cmds=$lt_postinstall_cmds postuninstall_cmds=$lt_postuninstall_cmds # Commands used to build a loadable module (assumed same as above if empty) module_cmds=$lt_module_cmds_F77 module_expsym_cmds=$lt_module_expsym_cmds_F77 # Commands to strip libraries. old_striplib=$lt_old_striplib striplib=$lt_striplib # Dependencies to place before the objects being linked to create a # shared library. predep_objects=$lt_predep_objects_F77 # Dependencies to place after the objects being linked to create a # shared library. postdep_objects=$lt_postdep_objects_F77 # Dependencies to place before the objects being linked to create a # shared library. predeps=$lt_predeps_F77 # Dependencies to place after the objects being linked to create a # shared library. postdeps=$lt_postdeps_F77 # The library search path used internally by the compiler when linking # a shared library. compiler_lib_search_path=$lt_compiler_lib_search_path_F77 # Method to check whether dependent libraries are shared objects. deplibs_check_method=$lt_deplibs_check_method # Command to use when deplibs_check_method == file_magic. file_magic_cmd=$lt_file_magic_cmd # Flag that allows shared libraries with undefined symbols to be built. allow_undefined_flag=$lt_allow_undefined_flag_F77 # Flag that forces no undefined symbols. no_undefined_flag=$lt_no_undefined_flag_F77 # Commands used to finish a libtool library installation in a directory. finish_cmds=$lt_finish_cmds # Same as above, but a single script fragment to be evaled but not shown. finish_eval=$lt_finish_eval # Take the output of nm and produce a listing of raw symbols and C names. global_symbol_pipe=$lt_lt_cv_sys_global_symbol_pipe # Transform the output of nm in a proper C declaration global_symbol_to_cdecl=$lt_lt_cv_sys_global_symbol_to_cdecl # Transform the output of nm in a C name address pair global_symbol_to_c_name_address=$lt_lt_cv_sys_global_symbol_to_c_name_address # This is the shared library runtime path variable. runpath_var=$runpath_var # This is the shared library path variable. shlibpath_var=$shlibpath_var # Is shlibpath searched before the hard-coded library search path? shlibpath_overrides_runpath=$shlibpath_overrides_runpath # How to hardcode a shared library path into an executable. hardcode_action=$hardcode_action_F77 # Whether we should hardcode library paths into libraries. hardcode_into_libs=$hardcode_into_libs # Flag to hardcode \$libdir into a binary during linking. # This must work even if \$libdir does not exist. hardcode_libdir_flag_spec=$lt_hardcode_libdir_flag_spec_F77 # If ld is used when linking, flag to hardcode \$libdir into # a binary during linking. This must work even if \$libdir does # not exist. hardcode_libdir_flag_spec_ld=$lt_hardcode_libdir_flag_spec_ld_F77 # Whether we need a single -rpath flag with a separated argument. hardcode_libdir_separator=$lt_hardcode_libdir_separator_F77 # Set to yes if using DIR/libNAME${shared_ext} during linking hardcodes DIR into the # resulting binary. hardcode_direct=$hardcode_direct_F77 # Set to yes if using the -LDIR flag during linking hardcodes DIR into the # resulting binary. hardcode_minus_L=$hardcode_minus_L_F77 # Set to yes if using SHLIBPATH_VAR=DIR during linking hardcodes DIR into # the resulting binary. hardcode_shlibpath_var=$hardcode_shlibpath_var_F77 # Set to yes if building a shared library automatically hardcodes DIR into the library # and all subsequent libraries and executables linked against it. hardcode_automatic=$hardcode_automatic_F77 # Variables whose values should be saved in libtool wrapper scripts and # restored at relink time. variables_saved_for_relink="$variables_saved_for_relink" # Whether libtool must link a program against all its dependency libraries. link_all_deplibs=$link_all_deplibs_F77 # Compile-time system search path for libraries sys_lib_search_path_spec=$lt_sys_lib_search_path_spec # Run-time system search path for libraries sys_lib_dlsearch_path_spec=$lt_sys_lib_dlsearch_path_spec # Fix the shell variable \$srcfile for the compiler. fix_srcfile_path="$fix_srcfile_path_F77" # Set to yes if exported symbols are required. always_export_symbols=$always_export_symbols_F77 # The commands to list exported symbols. export_symbols_cmds=$lt_export_symbols_cmds_F77 # The commands to extract the exported symbol list from a shared archive. extract_expsyms_cmds=$lt_extract_expsyms_cmds # Symbols that should not be listed in the preloaded symbols. exclude_expsyms=$lt_exclude_expsyms_F77 # Symbols that must always be exported. include_expsyms=$lt_include_expsyms_F77 # ### END LIBTOOL TAG CONFIG: $tagname __EOF__ else # If there is no Makefile yet, we rely on a make rule to execute # `config.status --recheck' to rerun these tests and create the # libtool script then. ltmain_in=`echo $ltmain | sed -e 's/\.sh$/.in/'` if test -f "$ltmain_in"; then test -f Makefile && make "$ltmain" fi fi ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu CC="$lt_save_CC" else tagname="" fi ;; GCJ) if test -n "$GCJ" && test "X$GCJ" != "Xno"; then # Source file extension for Java test sources. ac_ext=java # Object file extension for compiled Java test sources. objext=o objext_GCJ=$objext # Code to be used in simple compile tests lt_simple_compile_test_code="class foo {}\n" # Code to be used in simple link tests lt_simple_link_test_code='public class conftest { public static void main(String[] argv) {}; }\n' # ltmain only uses $CC for tagged configurations so make sure $CC is set. # If no C compiler was specified, use CC. LTCC=${LTCC-"$CC"} # If no C compiler flags were specified, use CFLAGS. LTCFLAGS=${LTCFLAGS-"$CFLAGS"} # Allow CC to be a program name with arguments. compiler=$CC # save warnings/boilerplate of simple test code ac_outfile=conftest.$ac_objext printf "$lt_simple_compile_test_code" >conftest.$ac_ext eval "$ac_compile" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err _lt_compiler_boilerplate=`cat conftest.err` $rm conftest* ac_outfile=conftest.$ac_objext printf "$lt_simple_link_test_code" >conftest.$ac_ext eval "$ac_link" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err _lt_linker_boilerplate=`cat conftest.err` $rm conftest* # Allow CC to be a program name with arguments. lt_save_CC="$CC" CC=${GCJ-"gcj"} compiler=$CC compiler_GCJ=$CC for cc_temp in $compiler""; do case $cc_temp in compile | *[\\/]compile | ccache | *[\\/]ccache ) ;; distcc | *[\\/]distcc | purify | *[\\/]purify ) ;; \-*) ;; *) break;; esac done cc_basename=`$echo "X$cc_temp" | $Xsed -e 's%.*/%%' -e "s%^$host_alias-%%"` # GCJ did not exist at the time GCC didn't implicitly link libc in. archive_cmds_need_lc_GCJ=no old_archive_cmds_GCJ=$old_archive_cmds lt_prog_compiler_no_builtin_flag_GCJ= if test "$GCC" = yes; then lt_prog_compiler_no_builtin_flag_GCJ=' -fno-builtin' echo "$as_me:$LINENO: checking if $compiler supports -fno-rtti -fno-exceptions" >&5 echo $ECHO_N "checking if $compiler supports -fno-rtti -fno-exceptions... $ECHO_C" >&6 if test "${lt_cv_prog_compiler_rtti_exceptions+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else lt_cv_prog_compiler_rtti_exceptions=no ac_outfile=conftest.$ac_objext printf "$lt_simple_compile_test_code" > conftest.$ac_ext lt_compiler_flag="-fno-rtti -fno-exceptions" # Insert the option either (1) after the last *FLAGS variable, or # (2) before a word containing "conftest.", or (3) at the end. # Note that $ac_compile itself does not contain backslashes and begins # with a dollar sign (not a hyphen), so the echo should work correctly. # The option is referenced via a variable to avoid confusing sed. lt_compile=`echo "$ac_compile" | $SED \ -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ -e 's:$: $lt_compiler_flag:'` (eval echo "\"\$as_me:16836: $lt_compile\"" >&5) (eval "$lt_compile" 2>conftest.err) ac_status=$? cat conftest.err >&5 echo "$as_me:16840: \$? = $ac_status" >&5 if (exit $ac_status) && test -s "$ac_outfile"; then # The compiler can only warn and ignore the option if not recognized # So say no if there are warnings other than the usual output. $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' >conftest.exp $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 if test ! -s conftest.er2 || diff conftest.exp conftest.er2 >/dev/null; then lt_cv_prog_compiler_rtti_exceptions=yes fi fi $rm conftest* fi echo "$as_me:$LINENO: result: $lt_cv_prog_compiler_rtti_exceptions" >&5 echo "${ECHO_T}$lt_cv_prog_compiler_rtti_exceptions" >&6 if test x"$lt_cv_prog_compiler_rtti_exceptions" = xyes; then lt_prog_compiler_no_builtin_flag_GCJ="$lt_prog_compiler_no_builtin_flag_GCJ -fno-rtti -fno-exceptions" else : fi fi lt_prog_compiler_wl_GCJ= lt_prog_compiler_pic_GCJ= lt_prog_compiler_static_GCJ= echo "$as_me:$LINENO: checking for $compiler option to produce PIC" >&5 echo $ECHO_N "checking for $compiler option to produce PIC... $ECHO_C" >&6 if test "$GCC" = yes; then lt_prog_compiler_wl_GCJ='-Wl,' lt_prog_compiler_static_GCJ='-static' case $host_os in aix*) # All AIX code is PIC. if test "$host_cpu" = ia64; then # AIX 5 now supports IA64 processor lt_prog_compiler_static_GCJ='-Bstatic' fi ;; amigaos*) # FIXME: we need at least 68020 code to build shared libraries, but # adding the `-m68020' flag to GCC prevents building anything better, # like `-m68040'. lt_prog_compiler_pic_GCJ='-m68020 -resident32 -malways-restore-a4' ;; beos* | cygwin* | irix5* | irix6* | nonstopux* | osf3* | osf4* | osf5*) # PIC is the default for these OSes. ;; mingw* | pw32* | os2*) # This hack is so that the source file can tell whether it is being # built for inclusion in a dll (and should export symbols for example). lt_prog_compiler_pic_GCJ='-DDLL_EXPORT' ;; darwin* | rhapsody*) # PIC is the default on this platform # Common symbols not allowed in MH_DYLIB files lt_prog_compiler_pic_GCJ='-fno-common' ;; interix3*) # Interix 3.x gcc -fpic/-fPIC options generate broken code. # Instead, we relocate shared libraries at runtime. ;; msdosdjgpp*) # Just because we use GCC doesn't mean we suddenly get shared libraries # on systems that don't support them. lt_prog_compiler_can_build_shared_GCJ=no enable_shared=no ;; sysv4*MP*) if test -d /usr/nec; then lt_prog_compiler_pic_GCJ=-Kconform_pic fi ;; hpux*) # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but # not for PA HP-UX. case $host_cpu in hppa*64*|ia64*) # +Z the default ;; *) lt_prog_compiler_pic_GCJ='-fPIC' ;; esac ;; *) lt_prog_compiler_pic_GCJ='-fPIC' ;; esac else # PORTME Check for flag to pass linker flags through the system compiler. case $host_os in aix*) lt_prog_compiler_wl_GCJ='-Wl,' if test "$host_cpu" = ia64; then # AIX 5 now supports IA64 processor lt_prog_compiler_static_GCJ='-Bstatic' else lt_prog_compiler_static_GCJ='-bnso -bI:/lib/syscalls.exp' fi ;; darwin*) # PIC is the default on this platform # Common symbols not allowed in MH_DYLIB files case $cc_basename in xlc*) lt_prog_compiler_pic_GCJ='-qnocommon' lt_prog_compiler_wl_GCJ='-Wl,' ;; esac ;; mingw* | pw32* | os2*) # This hack is so that the source file can tell whether it is being # built for inclusion in a dll (and should export symbols for example). lt_prog_compiler_pic_GCJ='-DDLL_EXPORT' ;; hpux9* | hpux10* | hpux11*) lt_prog_compiler_wl_GCJ='-Wl,' # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but # not for PA HP-UX. case $host_cpu in hppa*64*|ia64*) # +Z the default ;; *) lt_prog_compiler_pic_GCJ='+Z' ;; esac # Is there a better lt_prog_compiler_static that works with the bundled CC? lt_prog_compiler_static_GCJ='${wl}-a ${wl}archive' ;; irix5* | irix6* | nonstopux*) lt_prog_compiler_wl_GCJ='-Wl,' # PIC (with -KPIC) is the default. lt_prog_compiler_static_GCJ='-non_shared' ;; newsos6) lt_prog_compiler_pic_GCJ='-KPIC' lt_prog_compiler_static_GCJ='-Bstatic' ;; linux*) case $cc_basename in icc* | ecc*) lt_prog_compiler_wl_GCJ='-Wl,' lt_prog_compiler_pic_GCJ='-KPIC' lt_prog_compiler_static_GCJ='-static' ;; pgcc* | pgf77* | pgf90* | pgf95*) # Portland Group compilers (*not* the Pentium gcc compiler, # which looks to be a dead project) lt_prog_compiler_wl_GCJ='-Wl,' lt_prog_compiler_pic_GCJ='-fpic' lt_prog_compiler_static_GCJ='-Bstatic' ;; ccc*) lt_prog_compiler_wl_GCJ='-Wl,' # All Alpha code is PIC. lt_prog_compiler_static_GCJ='-non_shared' ;; esac ;; osf3* | osf4* | osf5*) lt_prog_compiler_wl_GCJ='-Wl,' # All OSF/1 code is PIC. lt_prog_compiler_static_GCJ='-non_shared' ;; solaris*) lt_prog_compiler_pic_GCJ='-KPIC' lt_prog_compiler_static_GCJ='-Bstatic' case $cc_basename in f77* | f90* | f95*) lt_prog_compiler_wl_GCJ='-Qoption ld ';; *) lt_prog_compiler_wl_GCJ='-Wl,';; esac ;; sunos4*) lt_prog_compiler_wl_GCJ='-Qoption ld ' lt_prog_compiler_pic_GCJ='-PIC' lt_prog_compiler_static_GCJ='-Bstatic' ;; sysv4 | sysv4.2uw2* | sysv4.3*) lt_prog_compiler_wl_GCJ='-Wl,' lt_prog_compiler_pic_GCJ='-KPIC' lt_prog_compiler_static_GCJ='-Bstatic' ;; sysv4*MP*) if test -d /usr/nec ;then lt_prog_compiler_pic_GCJ='-Kconform_pic' lt_prog_compiler_static_GCJ='-Bstatic' fi ;; sysv5* | unixware* | sco3.2v5* | sco5v6* | OpenUNIX*) lt_prog_compiler_wl_GCJ='-Wl,' lt_prog_compiler_pic_GCJ='-KPIC' lt_prog_compiler_static_GCJ='-Bstatic' ;; unicos*) lt_prog_compiler_wl_GCJ='-Wl,' lt_prog_compiler_can_build_shared_GCJ=no ;; uts4*) lt_prog_compiler_pic_GCJ='-pic' lt_prog_compiler_static_GCJ='-Bstatic' ;; *) lt_prog_compiler_can_build_shared_GCJ=no ;; esac fi echo "$as_me:$LINENO: result: $lt_prog_compiler_pic_GCJ" >&5 echo "${ECHO_T}$lt_prog_compiler_pic_GCJ" >&6 # # Check to make sure the PIC flag actually works. # if test -n "$lt_prog_compiler_pic_GCJ"; then echo "$as_me:$LINENO: checking if $compiler PIC flag $lt_prog_compiler_pic_GCJ works" >&5 echo $ECHO_N "checking if $compiler PIC flag $lt_prog_compiler_pic_GCJ works... $ECHO_C" >&6 if test "${lt_prog_compiler_pic_works_GCJ+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else lt_prog_compiler_pic_works_GCJ=no ac_outfile=conftest.$ac_objext printf "$lt_simple_compile_test_code" > conftest.$ac_ext lt_compiler_flag="$lt_prog_compiler_pic_GCJ" # Insert the option either (1) after the last *FLAGS variable, or # (2) before a word containing "conftest.", or (3) at the end. # Note that $ac_compile itself does not contain backslashes and begins # with a dollar sign (not a hyphen), so the echo should work correctly. # The option is referenced via a variable to avoid confusing sed. lt_compile=`echo "$ac_compile" | $SED \ -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ -e 's:$: $lt_compiler_flag:'` (eval echo "\"\$as_me:17104: $lt_compile\"" >&5) (eval "$lt_compile" 2>conftest.err) ac_status=$? cat conftest.err >&5 echo "$as_me:17108: \$? = $ac_status" >&5 if (exit $ac_status) && test -s "$ac_outfile"; then # The compiler can only warn and ignore the option if not recognized # So say no if there are warnings other than the usual output. $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' >conftest.exp $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 if test ! -s conftest.er2 || diff conftest.exp conftest.er2 >/dev/null; then lt_prog_compiler_pic_works_GCJ=yes fi fi $rm conftest* fi echo "$as_me:$LINENO: result: $lt_prog_compiler_pic_works_GCJ" >&5 echo "${ECHO_T}$lt_prog_compiler_pic_works_GCJ" >&6 if test x"$lt_prog_compiler_pic_works_GCJ" = xyes; then case $lt_prog_compiler_pic_GCJ in "" | " "*) ;; *) lt_prog_compiler_pic_GCJ=" $lt_prog_compiler_pic_GCJ" ;; esac else lt_prog_compiler_pic_GCJ= lt_prog_compiler_can_build_shared_GCJ=no fi fi case $host_os in # For platforms which do not support PIC, -DPIC is meaningless: *djgpp*) lt_prog_compiler_pic_GCJ= ;; *) lt_prog_compiler_pic_GCJ="$lt_prog_compiler_pic_GCJ" ;; esac # # Check to make sure the static flag actually works. # wl=$lt_prog_compiler_wl_GCJ eval lt_tmp_static_flag=\"$lt_prog_compiler_static_GCJ\" echo "$as_me:$LINENO: checking if $compiler static flag $lt_tmp_static_flag works" >&5 echo $ECHO_N "checking if $compiler static flag $lt_tmp_static_flag works... $ECHO_C" >&6 if test "${lt_prog_compiler_static_works_GCJ+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else lt_prog_compiler_static_works_GCJ=no save_LDFLAGS="$LDFLAGS" LDFLAGS="$LDFLAGS $lt_tmp_static_flag" printf "$lt_simple_link_test_code" > conftest.$ac_ext if (eval $ac_link 2>conftest.err) && test -s conftest$ac_exeext; then # The linker can only warn and ignore the option if not recognized # So say no if there are warnings if test -s conftest.err; then # Append any errors to the config.log. cat conftest.err 1>&5 $echo "X$_lt_linker_boilerplate" | $Xsed -e '/^$/d' > conftest.exp $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 if diff conftest.exp conftest.er2 >/dev/null; then lt_prog_compiler_static_works_GCJ=yes fi else lt_prog_compiler_static_works_GCJ=yes fi fi $rm conftest* LDFLAGS="$save_LDFLAGS" fi echo "$as_me:$LINENO: result: $lt_prog_compiler_static_works_GCJ" >&5 echo "${ECHO_T}$lt_prog_compiler_static_works_GCJ" >&6 if test x"$lt_prog_compiler_static_works_GCJ" = xyes; then : else lt_prog_compiler_static_GCJ= fi echo "$as_me:$LINENO: checking if $compiler supports -c -o file.$ac_objext" >&5 echo $ECHO_N "checking if $compiler supports -c -o file.$ac_objext... $ECHO_C" >&6 if test "${lt_cv_prog_compiler_c_o_GCJ+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else lt_cv_prog_compiler_c_o_GCJ=no $rm -r conftest 2>/dev/null mkdir conftest cd conftest mkdir out printf "$lt_simple_compile_test_code" > conftest.$ac_ext lt_compiler_flag="-o out/conftest2.$ac_objext" # Insert the option either (1) after the last *FLAGS variable, or # (2) before a word containing "conftest.", or (3) at the end. # Note that $ac_compile itself does not contain backslashes and begins # with a dollar sign (not a hyphen), so the echo should work correctly. lt_compile=`echo "$ac_compile" | $SED \ -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ -e 's:$: $lt_compiler_flag:'` (eval echo "\"\$as_me:17208: $lt_compile\"" >&5) (eval "$lt_compile" 2>out/conftest.err) ac_status=$? cat out/conftest.err >&5 echo "$as_me:17212: \$? = $ac_status" >&5 if (exit $ac_status) && test -s out/conftest2.$ac_objext then # The compiler can only warn and ignore the option if not recognized # So say no if there are warnings $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' > out/conftest.exp $SED '/^$/d; /^ *+/d' out/conftest.err >out/conftest.er2 if test ! -s out/conftest.er2 || diff out/conftest.exp out/conftest.er2 >/dev/null; then lt_cv_prog_compiler_c_o_GCJ=yes fi fi chmod u+w . 2>&5 $rm conftest* # SGI C++ compiler will create directory out/ii_files/ for # template instantiation test -d out/ii_files && $rm out/ii_files/* && rmdir out/ii_files $rm out/* && rmdir out cd .. rmdir conftest $rm conftest* fi echo "$as_me:$LINENO: result: $lt_cv_prog_compiler_c_o_GCJ" >&5 echo "${ECHO_T}$lt_cv_prog_compiler_c_o_GCJ" >&6 hard_links="nottested" if test "$lt_cv_prog_compiler_c_o_GCJ" = no && test "$need_locks" != no; then # do not overwrite the value of need_locks provided by the user echo "$as_me:$LINENO: checking if we can lock with hard links" >&5 echo $ECHO_N "checking if we can lock with hard links... $ECHO_C" >&6 hard_links=yes $rm conftest* ln conftest.a conftest.b 2>/dev/null && hard_links=no touch conftest.a ln conftest.a conftest.b 2>&5 || hard_links=no ln conftest.a conftest.b 2>/dev/null && hard_links=no echo "$as_me:$LINENO: result: $hard_links" >&5 echo "${ECHO_T}$hard_links" >&6 if test "$hard_links" = no; then { echo "$as_me:$LINENO: WARNING: \`$CC' does not support \`-c -o', so \`make -j' may be unsafe" >&5 echo "$as_me: WARNING: \`$CC' does not support \`-c -o', so \`make -j' may be unsafe" >&2;} need_locks=warn fi else need_locks=no fi echo "$as_me:$LINENO: checking whether the $compiler linker ($LD) supports shared libraries" >&5 echo $ECHO_N "checking whether the $compiler linker ($LD) supports shared libraries... $ECHO_C" >&6 runpath_var= allow_undefined_flag_GCJ= enable_shared_with_static_runtimes_GCJ=no archive_cmds_GCJ= archive_expsym_cmds_GCJ= old_archive_From_new_cmds_GCJ= old_archive_from_expsyms_cmds_GCJ= export_dynamic_flag_spec_GCJ= whole_archive_flag_spec_GCJ= thread_safe_flag_spec_GCJ= hardcode_libdir_flag_spec_GCJ= hardcode_libdir_flag_spec_ld_GCJ= hardcode_libdir_separator_GCJ= hardcode_direct_GCJ=no hardcode_minus_L_GCJ=no hardcode_shlibpath_var_GCJ=unsupported link_all_deplibs_GCJ=unknown hardcode_automatic_GCJ=no module_cmds_GCJ= module_expsym_cmds_GCJ= always_export_symbols_GCJ=no export_symbols_cmds_GCJ='$NM $libobjs $convenience | $global_symbol_pipe | $SED '\''s/.* //'\'' | sort | uniq > $export_symbols' # include_expsyms should be a list of space-separated symbols to be *always* # included in the symbol list include_expsyms_GCJ= # exclude_expsyms can be an extended regexp of symbols to exclude # it will be wrapped by ` (' and `)$', so one must not match beginning or # end of line. Example: `a|bc|.*d.*' will exclude the symbols `a' and `bc', # as well as any symbol that contains `d'. exclude_expsyms_GCJ="_GLOBAL_OFFSET_TABLE_" # Although _GLOBAL_OFFSET_TABLE_ is a valid symbol C name, most a.out # platforms (ab)use it in PIC code, but their linkers get confused if # the symbol is explicitly referenced. Since portable code cannot # rely on this symbol name, it's probably fine to never include it in # preloaded symbol tables. extract_expsyms_cmds= # Just being paranoid about ensuring that cc_basename is set. for cc_temp in $compiler""; do case $cc_temp in compile | *[\\/]compile | ccache | *[\\/]ccache ) ;; distcc | *[\\/]distcc | purify | *[\\/]purify ) ;; \-*) ;; *) break;; esac done cc_basename=`$echo "X$cc_temp" | $Xsed -e 's%.*/%%' -e "s%^$host_alias-%%"` case $host_os in cygwin* | mingw* | pw32*) # FIXME: the MSVC++ port hasn't been tested in a loooong time # When not using gcc, we currently assume that we are using # Microsoft Visual C++. if test "$GCC" != yes; then with_gnu_ld=no fi ;; interix*) # we just hope/assume this is gcc and not c89 (= MSVC++) with_gnu_ld=yes ;; openbsd*) with_gnu_ld=no ;; esac ld_shlibs_GCJ=yes if test "$with_gnu_ld" = yes; then # If archive_cmds runs LD, not CC, wlarc should be empty wlarc='${wl}' # Set some defaults for GNU ld with shared library support. These # are reset later if shared libraries are not supported. Putting them # here allows them to be overridden if necessary. runpath_var=LD_RUN_PATH hardcode_libdir_flag_spec_GCJ='${wl}--rpath ${wl}$libdir' export_dynamic_flag_spec_GCJ='${wl}--export-dynamic' # ancient GNU ld didn't support --whole-archive et. al. if $LD --help 2>&1 | grep 'no-whole-archive' > /dev/null; then whole_archive_flag_spec_GCJ="$wlarc"'--whole-archive$convenience '"$wlarc"'--no-whole-archive' else whole_archive_flag_spec_GCJ= fi supports_anon_versioning=no case `$LD -v 2>/dev/null` in *\ [01].* | *\ 2.[0-9].* | *\ 2.10.*) ;; # catch versions < 2.11 *\ 2.11.93.0.2\ *) supports_anon_versioning=yes ;; # RH7.3 ... *\ 2.11.92.0.12\ *) supports_anon_versioning=yes ;; # Mandrake 8.2 ... *\ 2.11.*) ;; # other 2.11 versions *) supports_anon_versioning=yes ;; esac # See if GNU ld supports shared libraries. case $host_os in aix3* | aix4* | aix5*) # On AIX/PPC, the GNU linker is very broken if test "$host_cpu" != ia64; then ld_shlibs_GCJ=no cat <&2 *** Warning: the GNU linker, at least up to release 2.9.1, is reported *** to be unable to reliably create shared libraries on AIX. *** Therefore, libtool is disabling shared libraries support. If you *** really care for shared libraries, you may want to modify your PATH *** so that a non-GNU linker is found, and then restart. EOF fi ;; amigaos*) archive_cmds_GCJ='$rm $output_objdir/a2ixlibrary.data~$echo "#define NAME $libname" > $output_objdir/a2ixlibrary.data~$echo "#define LIBRARY_ID 1" >> $output_objdir/a2ixlibrary.data~$echo "#define VERSION $major" >> $output_objdir/a2ixlibrary.data~$echo "#define REVISION $revision" >> $output_objdir/a2ixlibrary.data~$AR $AR_FLAGS $lib $libobjs~$RANLIB $lib~(cd $output_objdir && a2ixlibrary -32)' hardcode_libdir_flag_spec_GCJ='-L$libdir' hardcode_minus_L_GCJ=yes # Samuel A. Falvo II reports # that the semantics of dynamic libraries on AmigaOS, at least up # to version 4, is to share data among multiple programs linked # with the same dynamic library. Since this doesn't match the # behavior of shared libraries on other platforms, we can't use # them. ld_shlibs_GCJ=no ;; beos*) if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then allow_undefined_flag_GCJ=unsupported # Joseph Beckenbach says some releases of gcc # support --undefined. This deserves some investigation. FIXME archive_cmds_GCJ='$CC -nostart $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' else ld_shlibs_GCJ=no fi ;; cygwin* | mingw* | pw32*) # _LT_AC_TAGVAR(hardcode_libdir_flag_spec, GCJ) is actually meaningless, # as there is no search path for DLLs. hardcode_libdir_flag_spec_GCJ='-L$libdir' allow_undefined_flag_GCJ=unsupported always_export_symbols_GCJ=no enable_shared_with_static_runtimes_GCJ=yes export_symbols_cmds_GCJ='$NM $libobjs $convenience | $global_symbol_pipe | $SED -e '\''/^[BCDGRS] /s/.* \([^ ]*\)/\1 DATA/'\'' | $SED -e '\''/^[AITW] /s/.* //'\'' | sort | uniq > $export_symbols' if $LD --help 2>&1 | grep 'auto-import' > /dev/null; then archive_cmds_GCJ='$CC -shared $libobjs $deplibs $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' # If the export-symbols file already is a .def file (1st line # is EXPORTS), use it as is; otherwise, prepend... archive_expsym_cmds_GCJ='if test "x`$SED 1q $export_symbols`" = xEXPORTS; then cp $export_symbols $output_objdir/$soname.def; else echo EXPORTS > $output_objdir/$soname.def; cat $export_symbols >> $output_objdir/$soname.def; fi~ $CC -shared $output_objdir/$soname.def $libobjs $deplibs $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' else ld_shlibs_GCJ=no fi ;; interix3*) hardcode_direct_GCJ=no hardcode_shlibpath_var_GCJ=no hardcode_libdir_flag_spec_GCJ='${wl}-rpath,$libdir' export_dynamic_flag_spec_GCJ='${wl}-E' # Hack: On Interix 3.x, we cannot compile PIC because of a broken gcc. # Instead, shared libraries are loaded at an image base (0x10000000 by # default) and relocated if they conflict, which is a slow very memory # consuming and fragmenting process. To avoid this, we pick a random, # 256 KiB-aligned image base between 0x50000000 and 0x6FFC0000 at link # time. Moving up from 0x10000000 also allows more sbrk(2) space. archive_cmds_GCJ='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags ${wl}-h,$soname ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' archive_expsym_cmds_GCJ='sed "s,^,_," $export_symbols >$output_objdir/$soname.expsym~$CC -shared $pic_flag $libobjs $deplibs $compiler_flags ${wl}-h,$soname ${wl}--retain-symbols-file,$output_objdir/$soname.expsym ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' ;; linux*) if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then tmp_addflag= case $cc_basename,$host_cpu in pgcc*) # Portland Group C compiler whole_archive_flag_spec_GCJ='${wl}--whole-archive`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}--no-whole-archive' tmp_addflag=' $pic_flag' ;; pgf77* | pgf90* | pgf95*) # Portland Group f77 and f90 compilers whole_archive_flag_spec_GCJ='${wl}--whole-archive`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}--no-whole-archive' tmp_addflag=' $pic_flag -Mnomain' ;; ecc*,ia64* | icc*,ia64*) # Intel C compiler on ia64 tmp_addflag=' -i_dynamic' ;; efc*,ia64* | ifort*,ia64*) # Intel Fortran compiler on ia64 tmp_addflag=' -i_dynamic -nofor_main' ;; ifc* | ifort*) # Intel Fortran compiler tmp_addflag=' -nofor_main' ;; esac archive_cmds_GCJ='$CC -shared'"$tmp_addflag"' $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' if test $supports_anon_versioning = yes; then archive_expsym_cmds_GCJ='$echo "{ global:" > $output_objdir/$libname.ver~ cat $export_symbols | sed -e "s/\(.*\)/\1;/" >> $output_objdir/$libname.ver~ $echo "local: *; };" >> $output_objdir/$libname.ver~ $CC -shared'"$tmp_addflag"' $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-version-script ${wl}$output_objdir/$libname.ver -o $lib' fi else ld_shlibs_GCJ=no fi ;; netbsd*) if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then archive_cmds_GCJ='$LD -Bshareable $libobjs $deplibs $linker_flags -o $lib' wlarc= else archive_cmds_GCJ='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' archive_expsym_cmds_GCJ='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' fi ;; solaris*) if $LD -v 2>&1 | grep 'BFD 2\.8' > /dev/null; then ld_shlibs_GCJ=no cat <&2 *** Warning: The releases 2.8.* of the GNU linker cannot reliably *** create shared libraries on Solaris systems. Therefore, libtool *** is disabling shared libraries support. We urge you to upgrade GNU *** binutils to release 2.9.1 or newer. Another option is to modify *** your PATH or compiler configuration so that the native linker is *** used, and then restart. EOF elif $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then archive_cmds_GCJ='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' archive_expsym_cmds_GCJ='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' else ld_shlibs_GCJ=no fi ;; sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX*) case `$LD -v 2>&1` in *\ [01].* | *\ 2.[0-9].* | *\ 2.1[0-5].*) ld_shlibs_GCJ=no cat <<_LT_EOF 1>&2 *** Warning: Releases of the GNU linker prior to 2.16.91.0.3 can not *** reliably create shared libraries on SCO systems. Therefore, libtool *** is disabling shared libraries support. We urge you to upgrade GNU *** binutils to release 2.16.91.0.3 or newer. Another option is to modify *** your PATH or compiler configuration so that the native linker is *** used, and then restart. _LT_EOF ;; *) if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then hardcode_libdir_flag_spec_GCJ='`test -z "$SCOABSPATH" && echo ${wl}-rpath,$libdir`' archive_cmds_GCJ='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib' archive_expsym_cmds_GCJ='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname,\${SCOABSPATH:+${install_libdir}/}$soname,-retain-symbols-file,$export_symbols -o $lib' else ld_shlibs_GCJ=no fi ;; esac ;; sunos4*) archive_cmds_GCJ='$LD -assert pure-text -Bshareable -o $lib $libobjs $deplibs $linker_flags' wlarc= hardcode_direct_GCJ=yes hardcode_shlibpath_var_GCJ=no ;; *) if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then archive_cmds_GCJ='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' archive_expsym_cmds_GCJ='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' else ld_shlibs_GCJ=no fi ;; esac if test "$ld_shlibs_GCJ" = no; then runpath_var= hardcode_libdir_flag_spec_GCJ= export_dynamic_flag_spec_GCJ= whole_archive_flag_spec_GCJ= fi else # PORTME fill in a description of your system's linker (not GNU ld) case $host_os in aix3*) allow_undefined_flag_GCJ=unsupported always_export_symbols_GCJ=yes archive_expsym_cmds_GCJ='$LD -o $output_objdir/$soname $libobjs $deplibs $linker_flags -bE:$export_symbols -T512 -H512 -bM:SRE~$AR $AR_FLAGS $lib $output_objdir/$soname' # Note: this linker hardcodes the directories in LIBPATH if there # are no directories specified by -L. hardcode_minus_L_GCJ=yes if test "$GCC" = yes && test -z "$lt_prog_compiler_static"; then # Neither direct hardcoding nor static linking is supported with a # broken collect2. hardcode_direct_GCJ=unsupported fi ;; aix4* | aix5*) if test "$host_cpu" = ia64; then # On IA64, the linker does run time linking by default, so we don't # have to do anything special. aix_use_runtimelinking=no exp_sym_flag='-Bexport' no_entry_flag="" else # If we're using GNU nm, then we don't want the "-C" option. # -C means demangle to AIX nm, but means don't demangle with GNU nm if $NM -V 2>&1 | grep 'GNU' > /dev/null; then export_symbols_cmds_GCJ='$NM -Bpg $libobjs $convenience | awk '\''{ if (((\$2 == "T") || (\$2 == "D") || (\$2 == "B")) && (substr(\$3,1,1) != ".")) { print \$3 } }'\'' | sort -u > $export_symbols' else export_symbols_cmds_GCJ='$NM -BCpg $libobjs $convenience | awk '\''{ if (((\$2 == "T") || (\$2 == "D") || (\$2 == "B")) && (substr(\$3,1,1) != ".")) { print \$3 } }'\'' | sort -u > $export_symbols' fi aix_use_runtimelinking=no # Test if we are trying to use run time linking or normal # AIX style linking. If -brtl is somewhere in LDFLAGS, we # need to do runtime linking. case $host_os in aix4.[23]|aix4.[23].*|aix5*) for ld_flag in $LDFLAGS; do if (test $ld_flag = "-brtl" || test $ld_flag = "-Wl,-brtl"); then aix_use_runtimelinking=yes break fi done ;; esac exp_sym_flag='-bexport' no_entry_flag='-bnoentry' fi # When large executables or shared objects are built, AIX ld can # have problems creating the table of contents. If linking a library # or program results in "error TOC overflow" add -mminimal-toc to # CXXFLAGS/CFLAGS for g++/gcc. In the cases where that is not # enough to fix the problem, add -Wl,-bbigtoc to LDFLAGS. archive_cmds_GCJ='' hardcode_direct_GCJ=yes hardcode_libdir_separator_GCJ=':' link_all_deplibs_GCJ=yes if test "$GCC" = yes; then case $host_os in aix4.[012]|aix4.[012].*) # We only want to do this on AIX 4.2 and lower, the check # below for broken collect2 doesn't work under 4.3+ collect2name=`${CC} -print-prog-name=collect2` if test -f "$collect2name" && \ strings "$collect2name" | grep resolve_lib_name >/dev/null then # We have reworked collect2 hardcode_direct_GCJ=yes else # We have old collect2 hardcode_direct_GCJ=unsupported # It fails to find uninstalled libraries when the uninstalled # path is not listed in the libpath. Setting hardcode_minus_L # to unsupported forces relinking hardcode_minus_L_GCJ=yes hardcode_libdir_flag_spec_GCJ='-L$libdir' hardcode_libdir_separator_GCJ= fi ;; esac shared_flag='-shared' if test "$aix_use_runtimelinking" = yes; then shared_flag="$shared_flag "'${wl}-G' fi else # not using gcc if test "$host_cpu" = ia64; then # VisualAge C++, Version 5.5 for AIX 5L for IA-64, Beta 3 Release # chokes on -Wl,-G. The following line is correct: shared_flag='-G' else if test "$aix_use_runtimelinking" = yes; then shared_flag='${wl}-G' else shared_flag='${wl}-bM:SRE' fi fi fi # It seems that -bexpall does not export symbols beginning with # underscore (_), so it is better to generate a list of symbols to export. always_export_symbols_GCJ=yes if test "$aix_use_runtimelinking" = yes; then # Warning - without using the other runtime loading flags (-brtl), # -berok will link without error, but may produce a broken library. allow_undefined_flag_GCJ='-berok' # Determine the default libpath from the value encoded in an empty executable. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then aix_libpath=`dump -H conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } }'` # Check for a 64-bit object if we didn't find anything. if test -z "$aix_libpath"; then aix_libpath=`dump -HX64 conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } }'`; fi else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext if test -z "$aix_libpath"; then aix_libpath="/usr/lib:/lib"; fi hardcode_libdir_flag_spec_GCJ='${wl}-blibpath:$libdir:'"$aix_libpath" archive_expsym_cmds_GCJ="\$CC"' -o $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' $compiler_flags `if test "x${allow_undefined_flag}" != "x"; then echo "${wl}${allow_undefined_flag}"; else :; fi` '"\${wl}$exp_sym_flag:\$export_symbols $shared_flag" else if test "$host_cpu" = ia64; then hardcode_libdir_flag_spec_GCJ='${wl}-R $libdir:/usr/lib:/lib' allow_undefined_flag_GCJ="-z nodefs" archive_expsym_cmds_GCJ="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' $compiler_flags ${wl}${allow_undefined_flag} '"\${wl}$exp_sym_flag:\$export_symbols" else # Determine the default libpath from the value encoded in an empty executable. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then aix_libpath=`dump -H conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } }'` # Check for a 64-bit object if we didn't find anything. if test -z "$aix_libpath"; then aix_libpath=`dump -HX64 conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } }'`; fi else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext if test -z "$aix_libpath"; then aix_libpath="/usr/lib:/lib"; fi hardcode_libdir_flag_spec_GCJ='${wl}-blibpath:$libdir:'"$aix_libpath" # Warning - without using the other run time loading flags, # -berok will link without error, but may produce a broken library. no_undefined_flag_GCJ=' ${wl}-bernotok' allow_undefined_flag_GCJ=' ${wl}-berok' # Exported symbols can be pulled into shared objects from archives whole_archive_flag_spec_GCJ='$convenience' archive_cmds_need_lc_GCJ=yes # This is similar to how AIX traditionally builds its shared libraries. archive_expsym_cmds_GCJ="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs ${wl}-bnoentry $compiler_flags ${wl}-bE:$export_symbols${allow_undefined_flag}~$AR $AR_FLAGS $output_objdir/$libname$release.a $output_objdir/$soname' fi fi ;; amigaos*) archive_cmds_GCJ='$rm $output_objdir/a2ixlibrary.data~$echo "#define NAME $libname" > $output_objdir/a2ixlibrary.data~$echo "#define LIBRARY_ID 1" >> $output_objdir/a2ixlibrary.data~$echo "#define VERSION $major" >> $output_objdir/a2ixlibrary.data~$echo "#define REVISION $revision" >> $output_objdir/a2ixlibrary.data~$AR $AR_FLAGS $lib $libobjs~$RANLIB $lib~(cd $output_objdir && a2ixlibrary -32)' hardcode_libdir_flag_spec_GCJ='-L$libdir' hardcode_minus_L_GCJ=yes # see comment about different semantics on the GNU ld section ld_shlibs_GCJ=no ;; bsdi[45]*) export_dynamic_flag_spec_GCJ=-rdynamic ;; cygwin* | mingw* | pw32*) # When not using gcc, we currently assume that we are using # Microsoft Visual C++. # hardcode_libdir_flag_spec is actually meaningless, as there is # no search path for DLLs. hardcode_libdir_flag_spec_GCJ=' ' allow_undefined_flag_GCJ=unsupported # Tell ltmain to make .lib files, not .a files. libext=lib # Tell ltmain to make .dll files, not .so files. shrext_cmds=".dll" # FIXME: Setting linknames here is a bad hack. archive_cmds_GCJ='$CC -o $lib $libobjs $compiler_flags `echo "$deplibs" | $SED -e '\''s/ -lc$//'\''` -link -dll~linknames=' # The linker will automatically build a .lib file if we build a DLL. old_archive_From_new_cmds_GCJ='true' # FIXME: Should let the user specify the lib program. old_archive_cmds_GCJ='lib /OUT:$oldlib$oldobjs$old_deplibs' fix_srcfile_path_GCJ='`cygpath -w "$srcfile"`' enable_shared_with_static_runtimes_GCJ=yes ;; darwin* | rhapsody*) case $host_os in rhapsody* | darwin1.[012]) allow_undefined_flag_GCJ='${wl}-undefined ${wl}suppress' ;; *) # Darwin 1.3 on if test -z ${MACOSX_DEPLOYMENT_TARGET} ; then allow_undefined_flag_GCJ='${wl}-flat_namespace ${wl}-undefined ${wl}suppress' else case ${MACOSX_DEPLOYMENT_TARGET} in 10.[012]) allow_undefined_flag_GCJ='${wl}-flat_namespace ${wl}-undefined ${wl}suppress' ;; 10.*) allow_undefined_flag_GCJ='${wl}-undefined ${wl}dynamic_lookup' ;; esac fi ;; esac archive_cmds_need_lc_GCJ=no hardcode_direct_GCJ=no hardcode_automatic_GCJ=yes hardcode_shlibpath_var_GCJ=unsupported whole_archive_flag_spec_GCJ='' link_all_deplibs_GCJ=yes if test "$GCC" = yes ; then output_verbose_link_cmd='echo' archive_cmds_GCJ='$CC -dynamiclib $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags -install_name $rpath/$soname $verstring' module_cmds_GCJ='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags' # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin lds archive_expsym_cmds_GCJ='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -dynamiclib $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags -install_name $rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' module_expsym_cmds_GCJ='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' else case $cc_basename in xlc*) output_verbose_link_cmd='echo' archive_cmds_GCJ='$CC -qmkshrobj $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}`echo $rpath/$soname` $verstring' module_cmds_GCJ='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags' # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin lds archive_expsym_cmds_GCJ='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -qmkshrobj $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}$rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' module_expsym_cmds_GCJ='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' ;; *) ld_shlibs_GCJ=no ;; esac fi ;; dgux*) archive_cmds_GCJ='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' hardcode_libdir_flag_spec_GCJ='-L$libdir' hardcode_shlibpath_var_GCJ=no ;; freebsd1*) ld_shlibs_GCJ=no ;; # FreeBSD 2.2.[012] allows us to include c++rt0.o to get C++ constructor # support. Future versions do this automatically, but an explicit c++rt0.o # does not break anything, and helps significantly (at the cost of a little # extra space). freebsd2.2*) archive_cmds_GCJ='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags /usr/lib/c++rt0.o' hardcode_libdir_flag_spec_GCJ='-R$libdir' hardcode_direct_GCJ=yes hardcode_shlibpath_var_GCJ=no ;; # Unfortunately, older versions of FreeBSD 2 do not have this feature. freebsd2*) archive_cmds_GCJ='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags' hardcode_direct_GCJ=yes hardcode_minus_L_GCJ=yes hardcode_shlibpath_var_GCJ=no ;; # FreeBSD 3 and greater uses gcc -shared to do shared libraries. freebsd* | kfreebsd*-gnu | dragonfly*) archive_cmds_GCJ='$CC -shared -o $lib $libobjs $deplibs $compiler_flags' hardcode_libdir_flag_spec_GCJ='-R$libdir' hardcode_direct_GCJ=yes hardcode_shlibpath_var_GCJ=no ;; hpux9*) if test "$GCC" = yes; then archive_cmds_GCJ='$rm $output_objdir/$soname~$CC -shared -fPIC ${wl}+b ${wl}$install_libdir -o $output_objdir/$soname $libobjs $deplibs $compiler_flags~test $output_objdir/$soname = $lib || mv $output_objdir/$soname $lib' else archive_cmds_GCJ='$rm $output_objdir/$soname~$LD -b +b $install_libdir -o $output_objdir/$soname $libobjs $deplibs $linker_flags~test $output_objdir/$soname = $lib || mv $output_objdir/$soname $lib' fi hardcode_libdir_flag_spec_GCJ='${wl}+b ${wl}$libdir' hardcode_libdir_separator_GCJ=: hardcode_direct_GCJ=yes # hardcode_minus_L: Not really in the search PATH, # but as the default location of the library. hardcode_minus_L_GCJ=yes export_dynamic_flag_spec_GCJ='${wl}-E' ;; hpux10*) if test "$GCC" = yes -a "$with_gnu_ld" = no; then archive_cmds_GCJ='$CC -shared -fPIC ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags' else archive_cmds_GCJ='$LD -b +h $soname +b $install_libdir -o $lib $libobjs $deplibs $linker_flags' fi if test "$with_gnu_ld" = no; then hardcode_libdir_flag_spec_GCJ='${wl}+b ${wl}$libdir' hardcode_libdir_separator_GCJ=: hardcode_direct_GCJ=yes export_dynamic_flag_spec_GCJ='${wl}-E' # hardcode_minus_L: Not really in the search PATH, # but as the default location of the library. hardcode_minus_L_GCJ=yes fi ;; hpux11*) if test "$GCC" = yes -a "$with_gnu_ld" = no; then case $host_cpu in hppa*64*) archive_cmds_GCJ='$CC -shared ${wl}+h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' ;; ia64*) archive_cmds_GCJ='$CC -shared ${wl}+h ${wl}$soname ${wl}+nodefaultrpath -o $lib $libobjs $deplibs $compiler_flags' ;; *) archive_cmds_GCJ='$CC -shared -fPIC ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags' ;; esac else case $host_cpu in hppa*64*) archive_cmds_GCJ='$CC -b ${wl}+h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' ;; ia64*) archive_cmds_GCJ='$CC -b ${wl}+h ${wl}$soname ${wl}+nodefaultrpath -o $lib $libobjs $deplibs $compiler_flags' ;; *) archive_cmds_GCJ='$CC -b ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags' ;; esac fi if test "$with_gnu_ld" = no; then hardcode_libdir_flag_spec_GCJ='${wl}+b ${wl}$libdir' hardcode_libdir_separator_GCJ=: case $host_cpu in hppa*64*|ia64*) hardcode_libdir_flag_spec_ld_GCJ='+b $libdir' hardcode_direct_GCJ=no hardcode_shlibpath_var_GCJ=no ;; *) hardcode_direct_GCJ=yes export_dynamic_flag_spec_GCJ='${wl}-E' # hardcode_minus_L: Not really in the search PATH, # but as the default location of the library. hardcode_minus_L_GCJ=yes ;; esac fi ;; irix5* | irix6* | nonstopux*) if test "$GCC" = yes; then archive_cmds_GCJ='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' else archive_cmds_GCJ='$LD -shared $libobjs $deplibs $linker_flags -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib' hardcode_libdir_flag_spec_ld_GCJ='-rpath $libdir' fi hardcode_libdir_flag_spec_GCJ='${wl}-rpath ${wl}$libdir' hardcode_libdir_separator_GCJ=: link_all_deplibs_GCJ=yes ;; netbsd*) if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then archive_cmds_GCJ='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags' # a.out else archive_cmds_GCJ='$LD -shared -o $lib $libobjs $deplibs $linker_flags' # ELF fi hardcode_libdir_flag_spec_GCJ='-R$libdir' hardcode_direct_GCJ=yes hardcode_shlibpath_var_GCJ=no ;; newsos6) archive_cmds_GCJ='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' hardcode_direct_GCJ=yes hardcode_libdir_flag_spec_GCJ='${wl}-rpath ${wl}$libdir' hardcode_libdir_separator_GCJ=: hardcode_shlibpath_var_GCJ=no ;; openbsd*) hardcode_direct_GCJ=yes hardcode_shlibpath_var_GCJ=no if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`" || test "$host_os-$host_cpu" = "openbsd2.8-powerpc"; then archive_cmds_GCJ='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags' archive_expsym_cmds_GCJ='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-retain-symbols-file,$export_symbols' hardcode_libdir_flag_spec_GCJ='${wl}-rpath,$libdir' export_dynamic_flag_spec_GCJ='${wl}-E' else case $host_os in openbsd[01].* | openbsd2.[0-7] | openbsd2.[0-7].*) archive_cmds_GCJ='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags' hardcode_libdir_flag_spec_GCJ='-R$libdir' ;; *) archive_cmds_GCJ='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags' hardcode_libdir_flag_spec_GCJ='${wl}-rpath,$libdir' ;; esac fi ;; os2*) hardcode_libdir_flag_spec_GCJ='-L$libdir' hardcode_minus_L_GCJ=yes allow_undefined_flag_GCJ=unsupported archive_cmds_GCJ='$echo "LIBRARY $libname INITINSTANCE" > $output_objdir/$libname.def~$echo "DESCRIPTION \"$libname\"" >> $output_objdir/$libname.def~$echo DATA >> $output_objdir/$libname.def~$echo " SINGLE NONSHARED" >> $output_objdir/$libname.def~$echo EXPORTS >> $output_objdir/$libname.def~emxexp $libobjs >> $output_objdir/$libname.def~$CC -Zdll -Zcrtdll -o $lib $libobjs $deplibs $compiler_flags $output_objdir/$libname.def' old_archive_From_new_cmds_GCJ='emximp -o $output_objdir/$libname.a $output_objdir/$libname.def' ;; osf3*) if test "$GCC" = yes; then allow_undefined_flag_GCJ=' ${wl}-expect_unresolved ${wl}\*' archive_cmds_GCJ='$CC -shared${allow_undefined_flag} $libobjs $deplibs $compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' else allow_undefined_flag_GCJ=' -expect_unresolved \*' archive_cmds_GCJ='$LD -shared${allow_undefined_flag} $libobjs $deplibs $linker_flags -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib' fi hardcode_libdir_flag_spec_GCJ='${wl}-rpath ${wl}$libdir' hardcode_libdir_separator_GCJ=: ;; osf4* | osf5*) # as osf3* with the addition of -msym flag if test "$GCC" = yes; then allow_undefined_flag_GCJ=' ${wl}-expect_unresolved ${wl}\*' archive_cmds_GCJ='$CC -shared${allow_undefined_flag} $libobjs $deplibs $compiler_flags ${wl}-msym ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' hardcode_libdir_flag_spec_GCJ='${wl}-rpath ${wl}$libdir' else allow_undefined_flag_GCJ=' -expect_unresolved \*' archive_cmds_GCJ='$LD -shared${allow_undefined_flag} $libobjs $deplibs $linker_flags -msym -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib' archive_expsym_cmds_GCJ='for i in `cat $export_symbols`; do printf "%s %s\\n" -exported_symbol "\$i" >> $lib.exp; done; echo "-hidden">> $lib.exp~ $LD -shared${allow_undefined_flag} -input $lib.exp $linker_flags $libobjs $deplibs -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib~$rm $lib.exp' # Both c and cxx compiler support -rpath directly hardcode_libdir_flag_spec_GCJ='-rpath $libdir' fi hardcode_libdir_separator_GCJ=: ;; solaris*) no_undefined_flag_GCJ=' -z text' if test "$GCC" = yes; then wlarc='${wl}' archive_cmds_GCJ='$CC -shared ${wl}-h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' archive_expsym_cmds_GCJ='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~ $CC -shared ${wl}-M ${wl}$lib.exp ${wl}-h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags~$rm $lib.exp' else wlarc='' archive_cmds_GCJ='$LD -G${allow_undefined_flag} -h $soname -o $lib $libobjs $deplibs $linker_flags' archive_expsym_cmds_GCJ='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~ $LD -G${allow_undefined_flag} -M $lib.exp -h $soname -o $lib $libobjs $deplibs $linker_flags~$rm $lib.exp' fi hardcode_libdir_flag_spec_GCJ='-R$libdir' hardcode_shlibpath_var_GCJ=no case $host_os in solaris2.[0-5] | solaris2.[0-5].*) ;; *) # The compiler driver will combine linker options so we # cannot just pass the convience library names through # without $wl, iff we do not link with $LD. # Luckily, gcc supports the same syntax we need for Sun Studio. # Supported since Solaris 2.6 (maybe 2.5.1?) case $wlarc in '') whole_archive_flag_spec_GCJ='-z allextract$convenience -z defaultextract' ;; *) whole_archive_flag_spec_GCJ='${wl}-z ${wl}allextract`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}-z ${wl}defaultextract' ;; esac ;; esac link_all_deplibs_GCJ=yes ;; sunos4*) if test "x$host_vendor" = xsequent; then # Use $CC to link under sequent, because it throws in some extra .o # files that make .init and .fini sections work. archive_cmds_GCJ='$CC -G ${wl}-h $soname -o $lib $libobjs $deplibs $compiler_flags' else archive_cmds_GCJ='$LD -assert pure-text -Bstatic -o $lib $libobjs $deplibs $linker_flags' fi hardcode_libdir_flag_spec_GCJ='-L$libdir' hardcode_direct_GCJ=yes hardcode_minus_L_GCJ=yes hardcode_shlibpath_var_GCJ=no ;; sysv4) case $host_vendor in sni) archive_cmds_GCJ='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' hardcode_direct_GCJ=yes # is this really true??? ;; siemens) ## LD is ld it makes a PLAMLIB ## CC just makes a GrossModule. archive_cmds_GCJ='$LD -G -o $lib $libobjs $deplibs $linker_flags' reload_cmds_GCJ='$CC -r -o $output$reload_objs' hardcode_direct_GCJ=no ;; motorola) archive_cmds_GCJ='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' hardcode_direct_GCJ=no #Motorola manual says yes, but my tests say they lie ;; esac runpath_var='LD_RUN_PATH' hardcode_shlibpath_var_GCJ=no ;; sysv4.3*) archive_cmds_GCJ='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' hardcode_shlibpath_var_GCJ=no export_dynamic_flag_spec_GCJ='-Bexport' ;; sysv4*MP*) if test -d /usr/nec; then archive_cmds_GCJ='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' hardcode_shlibpath_var_GCJ=no runpath_var=LD_RUN_PATH hardcode_runpath_var=yes ld_shlibs_GCJ=yes fi ;; sysv4*uw2* | sysv5OpenUNIX* | sysv5UnixWare7.[01].[10]* | unixware7*) no_undefined_flag_GCJ='${wl}-z,text' archive_cmds_need_lc_GCJ=no hardcode_shlibpath_var_GCJ=no runpath_var='LD_RUN_PATH' if test "$GCC" = yes; then archive_cmds_GCJ='$CC -shared ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' archive_expsym_cmds_GCJ='$CC -shared ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' else archive_cmds_GCJ='$CC -G ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' archive_expsym_cmds_GCJ='$CC -G ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' fi ;; sysv5* | sco3.2v5* | sco5v6*) # Note: We can NOT use -z defs as we might desire, because we do not # link with -lc, and that would cause any symbols used from libc to # always be unresolved, which means just about no library would # ever link correctly. If we're not using GNU ld we use -z text # though, which does catch some bad symbols but isn't as heavy-handed # as -z defs. no_undefined_flag_GCJ='${wl}-z,text' allow_undefined_flag_GCJ='${wl}-z,nodefs' archive_cmds_need_lc_GCJ=no hardcode_shlibpath_var_GCJ=no hardcode_libdir_flag_spec_GCJ='`test -z "$SCOABSPATH" && echo ${wl}-R,$libdir`' hardcode_libdir_separator_GCJ=':' link_all_deplibs_GCJ=yes export_dynamic_flag_spec_GCJ='${wl}-Bexport' runpath_var='LD_RUN_PATH' if test "$GCC" = yes; then archive_cmds_GCJ='$CC -shared ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' archive_expsym_cmds_GCJ='$CC -shared ${wl}-Bexport:$export_symbols ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' else archive_cmds_GCJ='$CC -G ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' archive_expsym_cmds_GCJ='$CC -G ${wl}-Bexport:$export_symbols ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' fi ;; uts4*) archive_cmds_GCJ='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' hardcode_libdir_flag_spec_GCJ='-L$libdir' hardcode_shlibpath_var_GCJ=no ;; *) ld_shlibs_GCJ=no ;; esac fi echo "$as_me:$LINENO: result: $ld_shlibs_GCJ" >&5 echo "${ECHO_T}$ld_shlibs_GCJ" >&6 test "$ld_shlibs_GCJ" = no && can_build_shared=no # # Do we need to explicitly link libc? # case "x$archive_cmds_need_lc_GCJ" in x|xyes) # Assume -lc should be added archive_cmds_need_lc_GCJ=yes if test "$enable_shared" = yes && test "$GCC" = yes; then case $archive_cmds_GCJ in *'~'*) # FIXME: we may have to deal with multi-command sequences. ;; '$CC '*) # Test whether the compiler implicitly links with -lc since on some # systems, -lgcc has to come before -lc. If gcc already passes -lc # to ld, don't add -lc before -lgcc. echo "$as_me:$LINENO: checking whether -lc should be explicitly linked in" >&5 echo $ECHO_N "checking whether -lc should be explicitly linked in... $ECHO_C" >&6 $rm conftest* printf "$lt_simple_compile_test_code" > conftest.$ac_ext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } 2>conftest.err; then soname=conftest lib=conftest libobjs=conftest.$ac_objext deplibs= wl=$lt_prog_compiler_wl_GCJ pic_flag=$lt_prog_compiler_pic_GCJ compiler_flags=-v linker_flags=-v verstring= output_objdir=. libname=conftest lt_save_allow_undefined_flag=$allow_undefined_flag_GCJ allow_undefined_flag_GCJ= if { (eval echo "$as_me:$LINENO: \"$archive_cmds_GCJ 2\>\&1 \| grep \" -lc \" \>/dev/null 2\>\&1\"") >&5 (eval $archive_cmds_GCJ 2\>\&1 \| grep \" -lc \" \>/dev/null 2\>\&1) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } then archive_cmds_need_lc_GCJ=no else archive_cmds_need_lc_GCJ=yes fi allow_undefined_flag_GCJ=$lt_save_allow_undefined_flag else cat conftest.err 1>&5 fi $rm conftest* echo "$as_me:$LINENO: result: $archive_cmds_need_lc_GCJ" >&5 echo "${ECHO_T}$archive_cmds_need_lc_GCJ" >&6 ;; esac fi ;; esac echo "$as_me:$LINENO: checking dynamic linker characteristics" >&5 echo $ECHO_N "checking dynamic linker characteristics... $ECHO_C" >&6 library_names_spec= libname_spec='lib$name' soname_spec= shrext_cmds=".so" postinstall_cmds= postuninstall_cmds= finish_cmds= finish_eval= shlibpath_var= shlibpath_overrides_runpath=unknown version_type=none dynamic_linker="$host_os ld.so" sys_lib_dlsearch_path_spec="/lib /usr/lib" if test "$GCC" = yes; then sys_lib_search_path_spec=`$CC -print-search-dirs | grep "^libraries:" | $SED -e "s/^libraries://" -e "s,=/,/,g"` if echo "$sys_lib_search_path_spec" | grep ';' >/dev/null ; then # if the path contains ";" then we assume it to be the separator # otherwise default to the standard path separator (i.e. ":") - it is # assumed that no part of a normal pathname contains ";" but that should # okay in the real world where ";" in dirpaths is itself problematic. sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e 's/;/ /g'` else sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e "s/$PATH_SEPARATOR/ /g"` fi else sys_lib_search_path_spec="/lib /usr/lib /usr/local/lib" fi need_lib_prefix=unknown hardcode_into_libs=no # when you set need_version to no, make sure it does not cause -set_version # flags to be left without arguments need_version=unknown case $host_os in aix3*) version_type=linux library_names_spec='${libname}${release}${shared_ext}$versuffix $libname.a' shlibpath_var=LIBPATH # AIX 3 has no versioning support, so we append a major version to the name. soname_spec='${libname}${release}${shared_ext}$major' ;; aix4* | aix5*) version_type=linux need_lib_prefix=no need_version=no hardcode_into_libs=yes if test "$host_cpu" = ia64; then # AIX 5 supports IA64 library_names_spec='${libname}${release}${shared_ext}$major ${libname}${release}${shared_ext}$versuffix $libname${shared_ext}' shlibpath_var=LD_LIBRARY_PATH else # With GCC up to 2.95.x, collect2 would create an import file # for dependence libraries. The import file would start with # the line `#! .'. This would cause the generated library to # depend on `.', always an invalid library. This was fixed in # development snapshots of GCC prior to 3.0. case $host_os in aix4 | aix4.[01] | aix4.[01].*) if { echo '#if __GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 97)' echo ' yes ' echo '#endif'; } | ${CC} -E - | grep yes > /dev/null; then : else can_build_shared=no fi ;; esac # AIX (on Power*) has no versioning support, so currently we can not hardcode correct # soname into executable. Probably we can add versioning support to # collect2, so additional links can be useful in future. if test "$aix_use_runtimelinking" = yes; then # If using run time linking (on AIX 4.2 or later) use lib.so # instead of lib.a to let people know that these are not # typical AIX shared libraries. library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' else # We preserve .a as extension for shared libraries through AIX4.2 # and later when we are not doing run time linking. library_names_spec='${libname}${release}.a $libname.a' soname_spec='${libname}${release}${shared_ext}$major' fi shlibpath_var=LIBPATH fi ;; amigaos*) library_names_spec='$libname.ixlibrary $libname.a' # Create ${libname}_ixlibrary.a entries in /sys/libs. finish_eval='for lib in `ls $libdir/*.ixlibrary 2>/dev/null`; do libname=`$echo "X$lib" | $Xsed -e '\''s%^.*/\([^/]*\)\.ixlibrary$%\1%'\''`; test $rm /sys/libs/${libname}_ixlibrary.a; $show "cd /sys/libs && $LN_S $lib ${libname}_ixlibrary.a"; cd /sys/libs && $LN_S $lib ${libname}_ixlibrary.a || exit 1; done' ;; beos*) library_names_spec='${libname}${shared_ext}' dynamic_linker="$host_os ld.so" shlibpath_var=LIBRARY_PATH ;; bsdi[45]*) version_type=linux need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' finish_cmds='PATH="\$PATH:/sbin" ldconfig $libdir' shlibpath_var=LD_LIBRARY_PATH sys_lib_search_path_spec="/shlib /usr/lib /usr/X11/lib /usr/contrib/lib /lib /usr/local/lib" sys_lib_dlsearch_path_spec="/shlib /usr/lib /usr/local/lib" # the default ld.so.conf also contains /usr/contrib/lib and # /usr/X11R6/lib (/usr/X11 is a link to /usr/X11R6), but let us allow # libtool to hard-code these into programs ;; cygwin* | mingw* | pw32*) version_type=windows shrext_cmds=".dll" need_version=no need_lib_prefix=no case $GCC,$host_os in yes,cygwin* | yes,mingw* | yes,pw32*) library_names_spec='$libname.dll.a' # DLL is installed to $(libdir)/../bin by postinstall_cmds postinstall_cmds='base_file=`basename \${file}`~ dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i;echo \$dlname'\''`~ dldir=$destdir/`dirname \$dlpath`~ test -d \$dldir || mkdir -p \$dldir~ $install_prog $dir/$dlname \$dldir/$dlname~ chmod a+x \$dldir/$dlname' postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~ dlpath=$dir/\$dldll~ $rm \$dlpath' shlibpath_overrides_runpath=yes case $host_os in cygwin*) # Cygwin DLLs use 'cyg' prefix rather than 'lib' soname_spec='`echo ${libname} | sed -e 's/^lib/cyg/'``echo ${release} | $SED -e 's/[.]/-/g'`${versuffix}${shared_ext}' sys_lib_search_path_spec="/usr/lib /lib/w32api /lib /usr/local/lib" ;; mingw*) # MinGW DLLs use traditional 'lib' prefix soname_spec='${libname}`echo ${release} | $SED -e 's/[.]/-/g'`${versuffix}${shared_ext}' sys_lib_search_path_spec=`$CC -print-search-dirs | grep "^libraries:" | $SED -e "s/^libraries://" -e "s,=/,/,g"` if echo "$sys_lib_search_path_spec" | grep ';[c-zC-Z]:/' >/dev/null; then # It is most probably a Windows format PATH printed by # mingw gcc, but we are running on Cygwin. Gcc prints its search # path with ; separators, and with drive letters. We can handle the # drive letters (cygwin fileutils understands them), so leave them, # especially as we might pass files found there to a mingw objdump, # which wouldn't understand a cygwinified path. Ahh. sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e 's/;/ /g'` else sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e "s/$PATH_SEPARATOR/ /g"` fi ;; pw32*) # pw32 DLLs use 'pw' prefix rather than 'lib' library_names_spec='`echo ${libname} | sed -e 's/^lib/pw/'``echo ${release} | $SED -e 's/[.]/-/g'`${versuffix}${shared_ext}' ;; esac ;; *) library_names_spec='${libname}`echo ${release} | $SED -e 's/[.]/-/g'`${versuffix}${shared_ext} $libname.lib' ;; esac dynamic_linker='Win32 ld.exe' # FIXME: first we should search . and the directory the executable is in shlibpath_var=PATH ;; darwin* | rhapsody*) dynamic_linker="$host_os dyld" version_type=darwin need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${versuffix}$shared_ext ${libname}${release}${major}$shared_ext ${libname}$shared_ext' soname_spec='${libname}${release}${major}$shared_ext' shlibpath_overrides_runpath=yes shlibpath_var=DYLD_LIBRARY_PATH shrext_cmds='`test .$module = .yes && echo .so || echo .dylib`' # Apple's gcc prints 'gcc -print-search-dirs' doesn't operate the same. if test "$GCC" = yes; then sys_lib_search_path_spec=`$CC -print-search-dirs | tr "\n" "$PATH_SEPARATOR" | sed -e 's/libraries:/@libraries:/' | tr "@" "\n" | grep "^libraries:" | sed -e "s/^libraries://" -e "s,=/,/,g" -e "s,$PATH_SEPARATOR, ,g" -e "s,.*,& /lib /usr/lib /usr/local/lib,g"` else sys_lib_search_path_spec='/lib /usr/lib /usr/local/lib' fi sys_lib_dlsearch_path_spec='/usr/local/lib /lib /usr/lib' ;; dgux*) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname$shared_ext' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH ;; freebsd1*) dynamic_linker=no ;; kfreebsd*-gnu) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=no hardcode_into_libs=yes dynamic_linker='GNU ld.so' ;; freebsd* | dragonfly*) # DragonFly does not have aout. When/if they implement a new # versioning mechanism, adjust this. if test -x /usr/bin/objformat; then objformat=`/usr/bin/objformat` else case $host_os in freebsd[123]*) objformat=aout ;; *) objformat=elf ;; esac fi version_type=freebsd-$objformat case $version_type in freebsd-elf*) library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext} $libname${shared_ext}' need_version=no need_lib_prefix=no ;; freebsd-*) library_names_spec='${libname}${release}${shared_ext}$versuffix $libname${shared_ext}$versuffix' need_version=yes ;; esac shlibpath_var=LD_LIBRARY_PATH case $host_os in freebsd2*) shlibpath_overrides_runpath=yes ;; freebsd3.[01]* | freebsdelf3.[01]*) shlibpath_overrides_runpath=yes hardcode_into_libs=yes ;; freebsd3.[2-9]* | freebsdelf3.[2-9]* | \ freebsd4.[0-5] | freebsdelf4.[0-5] | freebsd4.1.1 | freebsdelf4.1.1) shlibpath_overrides_runpath=no hardcode_into_libs=yes ;; freebsd*) # from 4.6 on shlibpath_overrides_runpath=yes hardcode_into_libs=yes ;; esac ;; gnu*) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}${major} ${libname}${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH hardcode_into_libs=yes ;; hpux9* | hpux10* | hpux11*) # Give a soname corresponding to the major version so that dld.sl refuses to # link against other versions. version_type=sunos need_lib_prefix=no need_version=no case $host_cpu in ia64*) shrext_cmds='.so' hardcode_into_libs=yes dynamic_linker="$host_os dld.so" shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=yes # Unless +noenvvar is specified. library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' if test "X$HPUX_IA64_MODE" = X32; then sys_lib_search_path_spec="/usr/lib/hpux32 /usr/local/lib/hpux32 /usr/local/lib" else sys_lib_search_path_spec="/usr/lib/hpux64 /usr/local/lib/hpux64" fi sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec ;; hppa*64*) shrext_cmds='.sl' hardcode_into_libs=yes dynamic_linker="$host_os dld.sl" shlibpath_var=LD_LIBRARY_PATH # How should we handle SHLIB_PATH shlibpath_overrides_runpath=yes # Unless +noenvvar is specified. library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' sys_lib_search_path_spec="/usr/lib/pa20_64 /usr/ccs/lib/pa20_64" sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec ;; *) shrext_cmds='.sl' dynamic_linker="$host_os dld.sl" shlibpath_var=SHLIB_PATH shlibpath_overrides_runpath=no # +s is required to enable SHLIB_PATH library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' ;; esac # HP-UX runs *really* slowly unless shared libraries are mode 555. postinstall_cmds='chmod 555 $lib' ;; interix3*) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' dynamic_linker='Interix 3.x ld.so.1 (PE, like ELF)' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=no hardcode_into_libs=yes ;; irix5* | irix6* | nonstopux*) case $host_os in nonstopux*) version_type=nonstopux ;; *) if test "$lt_cv_prog_gnu_ld" = yes; then version_type=linux else version_type=irix fi ;; esac need_lib_prefix=no need_version=no soname_spec='${libname}${release}${shared_ext}$major' library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${release}${shared_ext} $libname${shared_ext}' case $host_os in irix5* | nonstopux*) libsuff= shlibsuff= ;; *) case $LD in # libtool.m4 will add one of these switches to LD *-32|*"-32 "|*-melf32bsmip|*"-melf32bsmip ") libsuff= shlibsuff= libmagic=32-bit;; *-n32|*"-n32 "|*-melf32bmipn32|*"-melf32bmipn32 ") libsuff=32 shlibsuff=N32 libmagic=N32;; *-64|*"-64 "|*-melf64bmip|*"-melf64bmip ") libsuff=64 shlibsuff=64 libmagic=64-bit;; *) libsuff= shlibsuff= libmagic=never-match;; esac ;; esac shlibpath_var=LD_LIBRARY${shlibsuff}_PATH shlibpath_overrides_runpath=no sys_lib_search_path_spec="/usr/lib${libsuff} /lib${libsuff} /usr/local/lib${libsuff}" sys_lib_dlsearch_path_spec="/usr/lib${libsuff} /lib${libsuff}" hardcode_into_libs=yes ;; # No shared lib support for Linux oldld, aout, or coff. linux*oldld* | linux*aout* | linux*coff*) dynamic_linker=no ;; # This must be Linux ELF. linux*) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' finish_cmds='PATH="\$PATH:/sbin" ldconfig -n $libdir' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=no # This implies no fast_install, which is unacceptable. # Some rework will be needed to allow for fast_install # before this can be enabled. hardcode_into_libs=yes # Append ld.so.conf contents to the search path if test -f /etc/ld.so.conf; then lt_ld_extra=`awk '/^include / { system(sprintf("cd /etc; cat %s", \$2)); skip = 1; } { if (!skip) print \$0; skip = 0; }' < /etc/ld.so.conf | $SED -e 's/#.*//;s/[:, ]/ /g;s/=[^=]*$//;s/=[^= ]* / /g;/^$/d' | tr '\n' ' '` sys_lib_dlsearch_path_spec="/lib /usr/lib $lt_ld_extra" fi # We used to test for /lib/ld.so.1 and disable shared libraries on # powerpc, because MkLinux only supported shared libraries with the # GNU dynamic linker. Since this was broken with cross compilers, # most powerpc-linux boxes support dynamic linking these days and # people can always --disable-shared, the test was removed, and we # assume the GNU/Linux dynamic linker is in use. dynamic_linker='GNU/Linux ld.so' ;; knetbsd*-gnu) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=no hardcode_into_libs=yes dynamic_linker='GNU ld.so' ;; netbsd*) version_type=sunos need_lib_prefix=no need_version=no if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix' finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir' dynamic_linker='NetBSD (a.out) ld.so' else library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' dynamic_linker='NetBSD ld.elf_so' fi shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=yes hardcode_into_libs=yes ;; newsos6) version_type=linux library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=yes ;; nto-qnx*) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=yes ;; openbsd*) version_type=sunos sys_lib_dlsearch_path_spec="/usr/lib" need_lib_prefix=no # Some older versions of OpenBSD (3.3 at least) *do* need versioned libs. case $host_os in openbsd3.3 | openbsd3.3.*) need_version=yes ;; *) need_version=no ;; esac library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix' finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir' shlibpath_var=LD_LIBRARY_PATH if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`" || test "$host_os-$host_cpu" = "openbsd2.8-powerpc"; then case $host_os in openbsd2.[89] | openbsd2.[89].*) shlibpath_overrides_runpath=no ;; *) shlibpath_overrides_runpath=yes ;; esac else shlibpath_overrides_runpath=yes fi ;; os2*) libname_spec='$name' shrext_cmds=".dll" need_lib_prefix=no library_names_spec='$libname${shared_ext} $libname.a' dynamic_linker='OS/2 ld.exe' shlibpath_var=LIBPATH ;; osf3* | osf4* | osf5*) version_type=osf need_lib_prefix=no need_version=no soname_spec='${libname}${release}${shared_ext}$major' library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' shlibpath_var=LD_LIBRARY_PATH sys_lib_search_path_spec="/usr/shlib /usr/ccs/lib /usr/lib/cmplrs/cc /usr/lib /usr/local/lib /var/shlib" sys_lib_dlsearch_path_spec="$sys_lib_search_path_spec" ;; solaris*) version_type=linux need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=yes hardcode_into_libs=yes # ldd complains unless libraries are executable postinstall_cmds='chmod +x $lib' ;; sunos4*) version_type=sunos library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix' finish_cmds='PATH="\$PATH:/usr/etc" ldconfig $libdir' shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=yes if test "$with_gnu_ld" = yes; then need_lib_prefix=no fi need_version=yes ;; sysv4 | sysv4.3*) version_type=linux library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH case $host_vendor in sni) shlibpath_overrides_runpath=no need_lib_prefix=no export_dynamic_flag_spec='${wl}-Blargedynsym' runpath_var=LD_RUN_PATH ;; siemens) need_lib_prefix=no ;; motorola) need_lib_prefix=no need_version=no shlibpath_overrides_runpath=no sys_lib_search_path_spec='/lib /usr/lib /usr/ccs/lib' ;; esac ;; sysv4*MP*) if test -d /usr/nec ;then version_type=linux library_names_spec='$libname${shared_ext}.$versuffix $libname${shared_ext}.$major $libname${shared_ext}' soname_spec='$libname${shared_ext}.$major' shlibpath_var=LD_LIBRARY_PATH fi ;; sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX* | sysv4*uw2*) version_type=freebsd-elf need_lib_prefix=no need_version=no library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext} $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH hardcode_into_libs=yes if test "$with_gnu_ld" = yes; then sys_lib_search_path_spec='/usr/local/lib /usr/gnu/lib /usr/ccs/lib /usr/lib /lib' shlibpath_overrides_runpath=no else sys_lib_search_path_spec='/usr/ccs/lib /usr/lib' shlibpath_overrides_runpath=yes case $host_os in sco3.2v5*) sys_lib_search_path_spec="$sys_lib_search_path_spec /lib" ;; esac fi sys_lib_dlsearch_path_spec='/usr/lib' ;; uts4*) version_type=linux library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' shlibpath_var=LD_LIBRARY_PATH ;; *) dynamic_linker=no ;; esac echo "$as_me:$LINENO: result: $dynamic_linker" >&5 echo "${ECHO_T}$dynamic_linker" >&6 test "$dynamic_linker" = no && can_build_shared=no variables_saved_for_relink="PATH $shlibpath_var $runpath_var" if test "$GCC" = yes; then variables_saved_for_relink="$variables_saved_for_relink GCC_EXEC_PREFIX COMPILER_PATH LIBRARY_PATH" fi echo "$as_me:$LINENO: checking how to hardcode library paths into programs" >&5 echo $ECHO_N "checking how to hardcode library paths into programs... $ECHO_C" >&6 hardcode_action_GCJ= if test -n "$hardcode_libdir_flag_spec_GCJ" || \ test -n "$runpath_var_GCJ" || \ test "X$hardcode_automatic_GCJ" = "Xyes" ; then # We can hardcode non-existant directories. if test "$hardcode_direct_GCJ" != no && # If the only mechanism to avoid hardcoding is shlibpath_var, we # have to relink, otherwise we might link with an installed library # when we should be linking with a yet-to-be-installed one ## test "$_LT_AC_TAGVAR(hardcode_shlibpath_var, GCJ)" != no && test "$hardcode_minus_L_GCJ" != no; then # Linking always hardcodes the temporary library directory. hardcode_action_GCJ=relink else # We can link without hardcoding, and we can hardcode nonexisting dirs. hardcode_action_GCJ=immediate fi else # We cannot hardcode anything, or else we can only hardcode existing # directories. hardcode_action_GCJ=unsupported fi echo "$as_me:$LINENO: result: $hardcode_action_GCJ" >&5 echo "${ECHO_T}$hardcode_action_GCJ" >&6 if test "$hardcode_action_GCJ" = relink; then # Fast installation is not supported enable_fast_install=no elif test "$shlibpath_overrides_runpath" = yes || test "$enable_shared" = no; then # Fast installation is not necessary enable_fast_install=needless fi # The else clause should only fire when bootstrapping the # libtool distribution, otherwise you forgot to ship ltmain.sh # with your package, and you will get complaints that there are # no rules to generate ltmain.sh. if test -f "$ltmain"; then # See if we are running on zsh, and set the options which allow our commands through # without removal of \ escapes. if test -n "${ZSH_VERSION+set}" ; then setopt NO_GLOB_SUBST fi # Now quote all the things that may contain metacharacters while being # careful not to overquote the AC_SUBSTed values. We take copies of the # variables and quote the copies for generation of the libtool script. for var in echo old_CC old_CFLAGS AR AR_FLAGS EGREP RANLIB LN_S LTCC LTCFLAGS NM \ SED SHELL STRIP \ libname_spec library_names_spec soname_spec extract_expsyms_cmds \ old_striplib striplib file_magic_cmd finish_cmds finish_eval \ deplibs_check_method reload_flag reload_cmds need_locks \ lt_cv_sys_global_symbol_pipe lt_cv_sys_global_symbol_to_cdecl \ lt_cv_sys_global_symbol_to_c_name_address \ sys_lib_search_path_spec sys_lib_dlsearch_path_spec \ old_postinstall_cmds old_postuninstall_cmds \ compiler_GCJ \ CC_GCJ \ LD_GCJ \ lt_prog_compiler_wl_GCJ \ lt_prog_compiler_pic_GCJ \ lt_prog_compiler_static_GCJ \ lt_prog_compiler_no_builtin_flag_GCJ \ export_dynamic_flag_spec_GCJ \ thread_safe_flag_spec_GCJ \ whole_archive_flag_spec_GCJ \ enable_shared_with_static_runtimes_GCJ \ old_archive_cmds_GCJ \ old_archive_from_new_cmds_GCJ \ predep_objects_GCJ \ postdep_objects_GCJ \ predeps_GCJ \ postdeps_GCJ \ compiler_lib_search_path_GCJ \ archive_cmds_GCJ \ archive_expsym_cmds_GCJ \ postinstall_cmds_GCJ \ postuninstall_cmds_GCJ \ old_archive_from_expsyms_cmds_GCJ \ allow_undefined_flag_GCJ \ no_undefined_flag_GCJ \ export_symbols_cmds_GCJ \ hardcode_libdir_flag_spec_GCJ \ hardcode_libdir_flag_spec_ld_GCJ \ hardcode_libdir_separator_GCJ \ hardcode_automatic_GCJ \ module_cmds_GCJ \ module_expsym_cmds_GCJ \ lt_cv_prog_compiler_c_o_GCJ \ exclude_expsyms_GCJ \ include_expsyms_GCJ; do case $var in old_archive_cmds_GCJ | \ old_archive_from_new_cmds_GCJ | \ archive_cmds_GCJ | \ archive_expsym_cmds_GCJ | \ module_cmds_GCJ | \ module_expsym_cmds_GCJ | \ old_archive_from_expsyms_cmds_GCJ | \ export_symbols_cmds_GCJ | \ extract_expsyms_cmds | reload_cmds | finish_cmds | \ postinstall_cmds | postuninstall_cmds | \ old_postinstall_cmds | old_postuninstall_cmds | \ sys_lib_search_path_spec | sys_lib_dlsearch_path_spec) # Double-quote double-evaled strings. eval "lt_$var=\\\"\`\$echo \"X\$$var\" | \$Xsed -e \"\$double_quote_subst\" -e \"\$sed_quote_subst\" -e \"\$delay_variable_subst\"\`\\\"" ;; *) eval "lt_$var=\\\"\`\$echo \"X\$$var\" | \$Xsed -e \"\$sed_quote_subst\"\`\\\"" ;; esac done case $lt_echo in *'\$0 --fallback-echo"') lt_echo=`$echo "X$lt_echo" | $Xsed -e 's/\\\\\\\$0 --fallback-echo"$/$0 --fallback-echo"/'` ;; esac cfgfile="$ofile" cat <<__EOF__ >> "$cfgfile" # ### BEGIN LIBTOOL TAG CONFIG: $tagname # Libtool was configured on host `(hostname || uname -n) 2>/dev/null | sed 1q`: # Shell to use when invoking shell scripts. SHELL=$lt_SHELL # Whether or not to build shared libraries. build_libtool_libs=$enable_shared # Whether or not to build static libraries. build_old_libs=$enable_static # Whether or not to add -lc for building shared libraries. build_libtool_need_lc=$archive_cmds_need_lc_GCJ # Whether or not to disallow shared libs when runtime libs are static allow_libtool_libs_with_static_runtimes=$enable_shared_with_static_runtimes_GCJ # Whether or not to optimize for fast installation. fast_install=$enable_fast_install # The host system. host_alias=$host_alias host=$host host_os=$host_os # The build system. build_alias=$build_alias build=$build build_os=$build_os # An echo program that does not interpret backslashes. echo=$lt_echo # The archiver. AR=$lt_AR AR_FLAGS=$lt_AR_FLAGS # A C compiler. LTCC=$lt_LTCC # LTCC compiler flags. LTCFLAGS=$lt_LTCFLAGS # A language-specific compiler. CC=$lt_compiler_GCJ # Is the compiler the GNU C compiler? with_gcc=$GCC_GCJ # An ERE matcher. EGREP=$lt_EGREP # The linker used to build libraries. LD=$lt_LD_GCJ # Whether we need hard or soft links. LN_S=$lt_LN_S # A BSD-compatible nm program. NM=$lt_NM # A symbol stripping program STRIP=$lt_STRIP # Used to examine libraries when file_magic_cmd begins "file" MAGIC_CMD=$MAGIC_CMD # Used on cygwin: DLL creation program. DLLTOOL="$DLLTOOL" # Used on cygwin: object dumper. OBJDUMP="$OBJDUMP" # Used on cygwin: assembler. AS="$AS" # The name of the directory that contains temporary libtool files. objdir=$objdir # How to create reloadable object files. reload_flag=$lt_reload_flag reload_cmds=$lt_reload_cmds # How to pass a linker flag through the compiler. wl=$lt_lt_prog_compiler_wl_GCJ # Object file suffix (normally "o"). objext="$ac_objext" # Old archive suffix (normally "a"). libext="$libext" # Shared library suffix (normally ".so"). shrext_cmds='$shrext_cmds' # Executable file suffix (normally ""). exeext="$exeext" # Additional compiler flags for building library objects. pic_flag=$lt_lt_prog_compiler_pic_GCJ pic_mode=$pic_mode # What is the maximum length of a command? max_cmd_len=$lt_cv_sys_max_cmd_len # Does compiler simultaneously support -c and -o options? compiler_c_o=$lt_lt_cv_prog_compiler_c_o_GCJ # Must we lock files when doing compilation? need_locks=$lt_need_locks # Do we need the lib prefix for modules? need_lib_prefix=$need_lib_prefix # Do we need a version for libraries? need_version=$need_version # Whether dlopen is supported. dlopen_support=$enable_dlopen # Whether dlopen of programs is supported. dlopen_self=$enable_dlopen_self # Whether dlopen of statically linked programs is supported. dlopen_self_static=$enable_dlopen_self_static # Compiler flag to prevent dynamic linking. link_static_flag=$lt_lt_prog_compiler_static_GCJ # Compiler flag to turn off builtin functions. no_builtin_flag=$lt_lt_prog_compiler_no_builtin_flag_GCJ # Compiler flag to allow reflexive dlopens. export_dynamic_flag_spec=$lt_export_dynamic_flag_spec_GCJ # Compiler flag to generate shared objects directly from archives. whole_archive_flag_spec=$lt_whole_archive_flag_spec_GCJ # Compiler flag to generate thread-safe objects. thread_safe_flag_spec=$lt_thread_safe_flag_spec_GCJ # Library versioning type. version_type=$version_type # Format of library name prefix. libname_spec=$lt_libname_spec # List of archive names. First name is the real one, the rest are links. # The last name is the one that the linker finds with -lNAME. library_names_spec=$lt_library_names_spec # The coded name of the library, if different from the real name. soname_spec=$lt_soname_spec # Commands used to build and install an old-style archive. RANLIB=$lt_RANLIB old_archive_cmds=$lt_old_archive_cmds_GCJ old_postinstall_cmds=$lt_old_postinstall_cmds old_postuninstall_cmds=$lt_old_postuninstall_cmds # Create an old-style archive from a shared archive. old_archive_from_new_cmds=$lt_old_archive_from_new_cmds_GCJ # Create a temporary old-style archive to link instead of a shared archive. old_archive_from_expsyms_cmds=$lt_old_archive_from_expsyms_cmds_GCJ # Commands used to build and install a shared archive. archive_cmds=$lt_archive_cmds_GCJ archive_expsym_cmds=$lt_archive_expsym_cmds_GCJ postinstall_cmds=$lt_postinstall_cmds postuninstall_cmds=$lt_postuninstall_cmds # Commands used to build a loadable module (assumed same as above if empty) module_cmds=$lt_module_cmds_GCJ module_expsym_cmds=$lt_module_expsym_cmds_GCJ # Commands to strip libraries. old_striplib=$lt_old_striplib striplib=$lt_striplib # Dependencies to place before the objects being linked to create a # shared library. predep_objects=$lt_predep_objects_GCJ # Dependencies to place after the objects being linked to create a # shared library. postdep_objects=$lt_postdep_objects_GCJ # Dependencies to place before the objects being linked to create a # shared library. predeps=$lt_predeps_GCJ # Dependencies to place after the objects being linked to create a # shared library. postdeps=$lt_postdeps_GCJ # The library search path used internally by the compiler when linking # a shared library. compiler_lib_search_path=$lt_compiler_lib_search_path_GCJ # Method to check whether dependent libraries are shared objects. deplibs_check_method=$lt_deplibs_check_method # Command to use when deplibs_check_method == file_magic. file_magic_cmd=$lt_file_magic_cmd # Flag that allows shared libraries with undefined symbols to be built. allow_undefined_flag=$lt_allow_undefined_flag_GCJ # Flag that forces no undefined symbols. no_undefined_flag=$lt_no_undefined_flag_GCJ # Commands used to finish a libtool library installation in a directory. finish_cmds=$lt_finish_cmds # Same as above, but a single script fragment to be evaled but not shown. finish_eval=$lt_finish_eval # Take the output of nm and produce a listing of raw symbols and C names. global_symbol_pipe=$lt_lt_cv_sys_global_symbol_pipe # Transform the output of nm in a proper C declaration global_symbol_to_cdecl=$lt_lt_cv_sys_global_symbol_to_cdecl # Transform the output of nm in a C name address pair global_symbol_to_c_name_address=$lt_lt_cv_sys_global_symbol_to_c_name_address # This is the shared library runtime path variable. runpath_var=$runpath_var # This is the shared library path variable. shlibpath_var=$shlibpath_var # Is shlibpath searched before the hard-coded library search path? shlibpath_overrides_runpath=$shlibpath_overrides_runpath # How to hardcode a shared library path into an executable. hardcode_action=$hardcode_action_GCJ # Whether we should hardcode library paths into libraries. hardcode_into_libs=$hardcode_into_libs # Flag to hardcode \$libdir into a binary during linking. # This must work even if \$libdir does not exist. hardcode_libdir_flag_spec=$lt_hardcode_libdir_flag_spec_GCJ # If ld is used when linking, flag to hardcode \$libdir into # a binary during linking. This must work even if \$libdir does # not exist. hardcode_libdir_flag_spec_ld=$lt_hardcode_libdir_flag_spec_ld_GCJ # Whether we need a single -rpath flag with a separated argument. hardcode_libdir_separator=$lt_hardcode_libdir_separator_GCJ # Set to yes if using DIR/libNAME${shared_ext} during linking hardcodes DIR into the # resulting binary. hardcode_direct=$hardcode_direct_GCJ # Set to yes if using the -LDIR flag during linking hardcodes DIR into the # resulting binary. hardcode_minus_L=$hardcode_minus_L_GCJ # Set to yes if using SHLIBPATH_VAR=DIR during linking hardcodes DIR into # the resulting binary. hardcode_shlibpath_var=$hardcode_shlibpath_var_GCJ # Set to yes if building a shared library automatically hardcodes DIR into the library # and all subsequent libraries and executables linked against it. hardcode_automatic=$hardcode_automatic_GCJ # Variables whose values should be saved in libtool wrapper scripts and # restored at relink time. variables_saved_for_relink="$variables_saved_for_relink" # Whether libtool must link a program against all its dependency libraries. link_all_deplibs=$link_all_deplibs_GCJ # Compile-time system search path for libraries sys_lib_search_path_spec=$lt_sys_lib_search_path_spec # Run-time system search path for libraries sys_lib_dlsearch_path_spec=$lt_sys_lib_dlsearch_path_spec # Fix the shell variable \$srcfile for the compiler. fix_srcfile_path="$fix_srcfile_path_GCJ" # Set to yes if exported symbols are required. always_export_symbols=$always_export_symbols_GCJ # The commands to list exported symbols. export_symbols_cmds=$lt_export_symbols_cmds_GCJ # The commands to extract the exported symbol list from a shared archive. extract_expsyms_cmds=$lt_extract_expsyms_cmds # Symbols that should not be listed in the preloaded symbols. exclude_expsyms=$lt_exclude_expsyms_GCJ # Symbols that must always be exported. include_expsyms=$lt_include_expsyms_GCJ # ### END LIBTOOL TAG CONFIG: $tagname __EOF__ else # If there is no Makefile yet, we rely on a make rule to execute # `config.status --recheck' to rerun these tests and create the # libtool script then. ltmain_in=`echo $ltmain | sed -e 's/\.sh$/.in/'` if test -f "$ltmain_in"; then test -f Makefile && make "$ltmain" fi fi ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu CC="$lt_save_CC" else tagname="" fi ;; RC) # Source file extension for RC test sources. ac_ext=rc # Object file extension for compiled RC test sources. objext=o objext_RC=$objext # Code to be used in simple compile tests lt_simple_compile_test_code='sample MENU { MENUITEM "&Soup", 100, CHECKED }\n' # Code to be used in simple link tests lt_simple_link_test_code="$lt_simple_compile_test_code" # ltmain only uses $CC for tagged configurations so make sure $CC is set. # If no C compiler was specified, use CC. LTCC=${LTCC-"$CC"} # If no C compiler flags were specified, use CFLAGS. LTCFLAGS=${LTCFLAGS-"$CFLAGS"} # Allow CC to be a program name with arguments. compiler=$CC # save warnings/boilerplate of simple test code ac_outfile=conftest.$ac_objext printf "$lt_simple_compile_test_code" >conftest.$ac_ext eval "$ac_compile" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err _lt_compiler_boilerplate=`cat conftest.err` $rm conftest* ac_outfile=conftest.$ac_objext printf "$lt_simple_link_test_code" >conftest.$ac_ext eval "$ac_link" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err _lt_linker_boilerplate=`cat conftest.err` $rm conftest* # Allow CC to be a program name with arguments. lt_save_CC="$CC" CC=${RC-"windres"} compiler=$CC compiler_RC=$CC for cc_temp in $compiler""; do case $cc_temp in compile | *[\\/]compile | ccache | *[\\/]ccache ) ;; distcc | *[\\/]distcc | purify | *[\\/]purify ) ;; \-*) ;; *) break;; esac done cc_basename=`$echo "X$cc_temp" | $Xsed -e 's%.*/%%' -e "s%^$host_alias-%%"` lt_cv_prog_compiler_c_o_RC=yes # The else clause should only fire when bootstrapping the # libtool distribution, otherwise you forgot to ship ltmain.sh # with your package, and you will get complaints that there are # no rules to generate ltmain.sh. if test -f "$ltmain"; then # See if we are running on zsh, and set the options which allow our commands through # without removal of \ escapes. if test -n "${ZSH_VERSION+set}" ; then setopt NO_GLOB_SUBST fi # Now quote all the things that may contain metacharacters while being # careful not to overquote the AC_SUBSTed values. We take copies of the # variables and quote the copies for generation of the libtool script. for var in echo old_CC old_CFLAGS AR AR_FLAGS EGREP RANLIB LN_S LTCC LTCFLAGS NM \ SED SHELL STRIP \ libname_spec library_names_spec soname_spec extract_expsyms_cmds \ old_striplib striplib file_magic_cmd finish_cmds finish_eval \ deplibs_check_method reload_flag reload_cmds need_locks \ lt_cv_sys_global_symbol_pipe lt_cv_sys_global_symbol_to_cdecl \ lt_cv_sys_global_symbol_to_c_name_address \ sys_lib_search_path_spec sys_lib_dlsearch_path_spec \ old_postinstall_cmds old_postuninstall_cmds \ compiler_RC \ CC_RC \ LD_RC \ lt_prog_compiler_wl_RC \ lt_prog_compiler_pic_RC \ lt_prog_compiler_static_RC \ lt_prog_compiler_no_builtin_flag_RC \ export_dynamic_flag_spec_RC \ thread_safe_flag_spec_RC \ whole_archive_flag_spec_RC \ enable_shared_with_static_runtimes_RC \ old_archive_cmds_RC \ old_archive_from_new_cmds_RC \ predep_objects_RC \ postdep_objects_RC \ predeps_RC \ postdeps_RC \ compiler_lib_search_path_RC \ archive_cmds_RC \ archive_expsym_cmds_RC \ postinstall_cmds_RC \ postuninstall_cmds_RC \ old_archive_from_expsyms_cmds_RC \ allow_undefined_flag_RC \ no_undefined_flag_RC \ export_symbols_cmds_RC \ hardcode_libdir_flag_spec_RC \ hardcode_libdir_flag_spec_ld_RC \ hardcode_libdir_separator_RC \ hardcode_automatic_RC \ module_cmds_RC \ module_expsym_cmds_RC \ lt_cv_prog_compiler_c_o_RC \ exclude_expsyms_RC \ include_expsyms_RC; do case $var in old_archive_cmds_RC | \ old_archive_from_new_cmds_RC | \ archive_cmds_RC | \ archive_expsym_cmds_RC | \ module_cmds_RC | \ module_expsym_cmds_RC | \ old_archive_from_expsyms_cmds_RC | \ export_symbols_cmds_RC | \ extract_expsyms_cmds | reload_cmds | finish_cmds | \ postinstall_cmds | postuninstall_cmds | \ old_postinstall_cmds | old_postuninstall_cmds | \ sys_lib_search_path_spec | sys_lib_dlsearch_path_spec) # Double-quote double-evaled strings. eval "lt_$var=\\\"\`\$echo \"X\$$var\" | \$Xsed -e \"\$double_quote_subst\" -e \"\$sed_quote_subst\" -e \"\$delay_variable_subst\"\`\\\"" ;; *) eval "lt_$var=\\\"\`\$echo \"X\$$var\" | \$Xsed -e \"\$sed_quote_subst\"\`\\\"" ;; esac done case $lt_echo in *'\$0 --fallback-echo"') lt_echo=`$echo "X$lt_echo" | $Xsed -e 's/\\\\\\\$0 --fallback-echo"$/$0 --fallback-echo"/'` ;; esac cfgfile="$ofile" cat <<__EOF__ >> "$cfgfile" # ### BEGIN LIBTOOL TAG CONFIG: $tagname # Libtool was configured on host `(hostname || uname -n) 2>/dev/null | sed 1q`: # Shell to use when invoking shell scripts. SHELL=$lt_SHELL # Whether or not to build shared libraries. build_libtool_libs=$enable_shared # Whether or not to build static libraries. build_old_libs=$enable_static # Whether or not to add -lc for building shared libraries. build_libtool_need_lc=$archive_cmds_need_lc_RC # Whether or not to disallow shared libs when runtime libs are static allow_libtool_libs_with_static_runtimes=$enable_shared_with_static_runtimes_RC # Whether or not to optimize for fast installation. fast_install=$enable_fast_install # The host system. host_alias=$host_alias host=$host host_os=$host_os # The build system. build_alias=$build_alias build=$build build_os=$build_os # An echo program that does not interpret backslashes. echo=$lt_echo # The archiver. AR=$lt_AR AR_FLAGS=$lt_AR_FLAGS # A C compiler. LTCC=$lt_LTCC # LTCC compiler flags. LTCFLAGS=$lt_LTCFLAGS # A language-specific compiler. CC=$lt_compiler_RC # Is the compiler the GNU C compiler? with_gcc=$GCC_RC # An ERE matcher. EGREP=$lt_EGREP # The linker used to build libraries. LD=$lt_LD_RC # Whether we need hard or soft links. LN_S=$lt_LN_S # A BSD-compatible nm program. NM=$lt_NM # A symbol stripping program STRIP=$lt_STRIP # Used to examine libraries when file_magic_cmd begins "file" MAGIC_CMD=$MAGIC_CMD # Used on cygwin: DLL creation program. DLLTOOL="$DLLTOOL" # Used on cygwin: object dumper. OBJDUMP="$OBJDUMP" # Used on cygwin: assembler. AS="$AS" # The name of the directory that contains temporary libtool files. objdir=$objdir # How to create reloadable object files. reload_flag=$lt_reload_flag reload_cmds=$lt_reload_cmds # How to pass a linker flag through the compiler. wl=$lt_lt_prog_compiler_wl_RC # Object file suffix (normally "o"). objext="$ac_objext" # Old archive suffix (normally "a"). libext="$libext" # Shared library suffix (normally ".so"). shrext_cmds='$shrext_cmds' # Executable file suffix (normally ""). exeext="$exeext" # Additional compiler flags for building library objects. pic_flag=$lt_lt_prog_compiler_pic_RC pic_mode=$pic_mode # What is the maximum length of a command? max_cmd_len=$lt_cv_sys_max_cmd_len # Does compiler simultaneously support -c and -o options? compiler_c_o=$lt_lt_cv_prog_compiler_c_o_RC # Must we lock files when doing compilation? need_locks=$lt_need_locks # Do we need the lib prefix for modules? need_lib_prefix=$need_lib_prefix # Do we need a version for libraries? need_version=$need_version # Whether dlopen is supported. dlopen_support=$enable_dlopen # Whether dlopen of programs is supported. dlopen_self=$enable_dlopen_self # Whether dlopen of statically linked programs is supported. dlopen_self_static=$enable_dlopen_self_static # Compiler flag to prevent dynamic linking. link_static_flag=$lt_lt_prog_compiler_static_RC # Compiler flag to turn off builtin functions. no_builtin_flag=$lt_lt_prog_compiler_no_builtin_flag_RC # Compiler flag to allow reflexive dlopens. export_dynamic_flag_spec=$lt_export_dynamic_flag_spec_RC # Compiler flag to generate shared objects directly from archives. whole_archive_flag_spec=$lt_whole_archive_flag_spec_RC # Compiler flag to generate thread-safe objects. thread_safe_flag_spec=$lt_thread_safe_flag_spec_RC # Library versioning type. version_type=$version_type # Format of library name prefix. libname_spec=$lt_libname_spec # List of archive names. First name is the real one, the rest are links. # The last name is the one that the linker finds with -lNAME. library_names_spec=$lt_library_names_spec # The coded name of the library, if different from the real name. soname_spec=$lt_soname_spec # Commands used to build and install an old-style archive. RANLIB=$lt_RANLIB old_archive_cmds=$lt_old_archive_cmds_RC old_postinstall_cmds=$lt_old_postinstall_cmds old_postuninstall_cmds=$lt_old_postuninstall_cmds # Create an old-style archive from a shared archive. old_archive_from_new_cmds=$lt_old_archive_from_new_cmds_RC # Create a temporary old-style archive to link instead of a shared archive. old_archive_from_expsyms_cmds=$lt_old_archive_from_expsyms_cmds_RC # Commands used to build and install a shared archive. archive_cmds=$lt_archive_cmds_RC archive_expsym_cmds=$lt_archive_expsym_cmds_RC postinstall_cmds=$lt_postinstall_cmds postuninstall_cmds=$lt_postuninstall_cmds # Commands used to build a loadable module (assumed same as above if empty) module_cmds=$lt_module_cmds_RC module_expsym_cmds=$lt_module_expsym_cmds_RC # Commands to strip libraries. old_striplib=$lt_old_striplib striplib=$lt_striplib # Dependencies to place before the objects being linked to create a # shared library. predep_objects=$lt_predep_objects_RC # Dependencies to place after the objects being linked to create a # shared library. postdep_objects=$lt_postdep_objects_RC # Dependencies to place before the objects being linked to create a # shared library. predeps=$lt_predeps_RC # Dependencies to place after the objects being linked to create a # shared library. postdeps=$lt_postdeps_RC # The library search path used internally by the compiler when linking # a shared library. compiler_lib_search_path=$lt_compiler_lib_search_path_RC # Method to check whether dependent libraries are shared objects. deplibs_check_method=$lt_deplibs_check_method # Command to use when deplibs_check_method == file_magic. file_magic_cmd=$lt_file_magic_cmd # Flag that allows shared libraries with undefined symbols to be built. allow_undefined_flag=$lt_allow_undefined_flag_RC # Flag that forces no undefined symbols. no_undefined_flag=$lt_no_undefined_flag_RC # Commands used to finish a libtool library installation in a directory. finish_cmds=$lt_finish_cmds # Same as above, but a single script fragment to be evaled but not shown. finish_eval=$lt_finish_eval # Take the output of nm and produce a listing of raw symbols and C names. global_symbol_pipe=$lt_lt_cv_sys_global_symbol_pipe # Transform the output of nm in a proper C declaration global_symbol_to_cdecl=$lt_lt_cv_sys_global_symbol_to_cdecl # Transform the output of nm in a C name address pair global_symbol_to_c_name_address=$lt_lt_cv_sys_global_symbol_to_c_name_address # This is the shared library runtime path variable. runpath_var=$runpath_var # This is the shared library path variable. shlibpath_var=$shlibpath_var # Is shlibpath searched before the hard-coded library search path? shlibpath_overrides_runpath=$shlibpath_overrides_runpath # How to hardcode a shared library path into an executable. hardcode_action=$hardcode_action_RC # Whether we should hardcode library paths into libraries. hardcode_into_libs=$hardcode_into_libs # Flag to hardcode \$libdir into a binary during linking. # This must work even if \$libdir does not exist. hardcode_libdir_flag_spec=$lt_hardcode_libdir_flag_spec_RC # If ld is used when linking, flag to hardcode \$libdir into # a binary during linking. This must work even if \$libdir does # not exist. hardcode_libdir_flag_spec_ld=$lt_hardcode_libdir_flag_spec_ld_RC # Whether we need a single -rpath flag with a separated argument. hardcode_libdir_separator=$lt_hardcode_libdir_separator_RC # Set to yes if using DIR/libNAME${shared_ext} during linking hardcodes DIR into the # resulting binary. hardcode_direct=$hardcode_direct_RC # Set to yes if using the -LDIR flag during linking hardcodes DIR into the # resulting binary. hardcode_minus_L=$hardcode_minus_L_RC # Set to yes if using SHLIBPATH_VAR=DIR during linking hardcodes DIR into # the resulting binary. hardcode_shlibpath_var=$hardcode_shlibpath_var_RC # Set to yes if building a shared library automatically hardcodes DIR into the library # and all subsequent libraries and executables linked against it. hardcode_automatic=$hardcode_automatic_RC # Variables whose values should be saved in libtool wrapper scripts and # restored at relink time. variables_saved_for_relink="$variables_saved_for_relink" # Whether libtool must link a program against all its dependency libraries. link_all_deplibs=$link_all_deplibs_RC # Compile-time system search path for libraries sys_lib_search_path_spec=$lt_sys_lib_search_path_spec # Run-time system search path for libraries sys_lib_dlsearch_path_spec=$lt_sys_lib_dlsearch_path_spec # Fix the shell variable \$srcfile for the compiler. fix_srcfile_path="$fix_srcfile_path_RC" # Set to yes if exported symbols are required. always_export_symbols=$always_export_symbols_RC # The commands to list exported symbols. export_symbols_cmds=$lt_export_symbols_cmds_RC # The commands to extract the exported symbol list from a shared archive. extract_expsyms_cmds=$lt_extract_expsyms_cmds # Symbols that should not be listed in the preloaded symbols. exclude_expsyms=$lt_exclude_expsyms_RC # Symbols that must always be exported. include_expsyms=$lt_include_expsyms_RC # ### END LIBTOOL TAG CONFIG: $tagname __EOF__ else # If there is no Makefile yet, we rely on a make rule to execute # `config.status --recheck' to rerun these tests and create the # libtool script then. ltmain_in=`echo $ltmain | sed -e 's/\.sh$/.in/'` if test -f "$ltmain_in"; then test -f Makefile && make "$ltmain" fi fi ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu CC="$lt_save_CC" ;; *) { { echo "$as_me:$LINENO: error: Unsupported tag name: $tagname" >&5 echo "$as_me: error: Unsupported tag name: $tagname" >&2;} { (exit 1); exit 1; }; } ;; esac # Append the new tag name to the list of available tags. if test -n "$tagname" ; then available_tags="$available_tags $tagname" fi fi done IFS="$lt_save_ifs" # Now substitute the updated list of available tags. if eval "sed -e 's/^available_tags=.*\$/available_tags=\"$available_tags\"/' \"$ofile\" > \"${ofile}T\""; then mv "${ofile}T" "$ofile" chmod +x "$ofile" else rm -f "${ofile}T" { { echo "$as_me:$LINENO: error: unable to update list of available tagged configurations." >&5 echo "$as_me: error: unable to update list of available tagged configurations." >&2;} { (exit 1); exit 1; }; } fi fi # This can be used to rebuild libtool when needed LIBTOOL_DEPS="$ac_aux_dir/ltmain.sh" # Always use our own libtool. LIBTOOL='$(SHELL) $(top_builddir)/libtool' # Prevent multiple expansion echo "$as_me:$LINENO: checking whether to enable maintainer-specific portions of Makefiles" >&5 echo $ECHO_N "checking whether to enable maintainer-specific portions of Makefiles... $ECHO_C" >&6 # Check whether --enable-maintainer-mode or --disable-maintainer-mode was given. if test "${enable_maintainer_mode+set}" = set; then enableval="$enable_maintainer_mode" USE_MAINTAINER_MODE=$enableval else USE_MAINTAINER_MODE=no fi; echo "$as_me:$LINENO: result: $USE_MAINTAINER_MODE" >&5 echo "${ECHO_T}$USE_MAINTAINER_MODE" >&6 if test $USE_MAINTAINER_MODE = yes; then MAINTAINER_MODE_TRUE= MAINTAINER_MODE_FALSE='#' else MAINTAINER_MODE_TRUE='#' MAINTAINER_MODE_FALSE= fi MAINT=$MAINTAINER_MODE_TRUE echo "$as_me:$LINENO: checking for BSDgettimeofday" >&5 echo $ECHO_N "checking for BSDgettimeofday... $ECHO_C" >&6 if test "${ac_cv_func_BSDgettimeofday+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Define BSDgettimeofday to an innocuous variant, in case declares BSDgettimeofday. For example, HP-UX 11i declares gettimeofday. */ #define BSDgettimeofday innocuous_BSDgettimeofday /* System header to define __stub macros and hopefully few prototypes, which can conflict with char BSDgettimeofday (); below. Prefer to if __STDC__ is defined, since exists even on freestanding compilers. */ #ifdef __STDC__ # include #else # include #endif #undef BSDgettimeofday /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" { #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char BSDgettimeofday (); /* The GNU C library defines this for functions which it implements to always fail with ENOSYS. Some functions are actually named something starting with __ and the normal name is an alias. */ #if defined (__stub_BSDgettimeofday) || defined (__stub___BSDgettimeofday) choke me #else char (*f) () = BSDgettimeofday; #endif #ifdef __cplusplus } #endif int main () { return f != BSDgettimeofday; ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_func_BSDgettimeofday=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_func_BSDgettimeofday=no fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext fi echo "$as_me:$LINENO: result: $ac_cv_func_BSDgettimeofday" >&5 echo "${ECHO_T}$ac_cv_func_BSDgettimeofday" >&6 if test $ac_cv_func_BSDgettimeofday = yes; then cat >>confdefs.h <<\_ACEOF #define HAVE_BSDGETTIMEOFDAY _ACEOF else echo "$as_me:$LINENO: checking for gettimeofday" >&5 echo $ECHO_N "checking for gettimeofday... $ECHO_C" >&6 if test "${ac_cv_func_gettimeofday+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Define gettimeofday to an innocuous variant, in case declares gettimeofday. For example, HP-UX 11i declares gettimeofday. */ #define gettimeofday innocuous_gettimeofday /* System header to define __stub macros and hopefully few prototypes, which can conflict with char gettimeofday (); below. Prefer to if __STDC__ is defined, since exists even on freestanding compilers. */ #ifdef __STDC__ # include #else # include #endif #undef gettimeofday /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" { #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char gettimeofday (); /* The GNU C library defines this for functions which it implements to always fail with ENOSYS. Some functions are actually named something starting with __ and the normal name is an alias. */ #if defined (__stub_gettimeofday) || defined (__stub___gettimeofday) choke me #else char (*f) () = gettimeofday; #endif #ifdef __cplusplus } #endif int main () { return f != gettimeofday; ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_func_gettimeofday=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_func_gettimeofday=no fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext fi echo "$as_me:$LINENO: result: $ac_cv_func_gettimeofday" >&5 echo "${ECHO_T}$ac_cv_func_gettimeofday" >&6 if test $ac_cv_func_gettimeofday = yes; then : else cat >>confdefs.h <<\_ACEOF #define NO_GETTOD _ACEOF fi fi echo "$as_me:$LINENO: checking whether #! works in shell scripts" >&5 echo $ECHO_N "checking whether #! works in shell scripts... $ECHO_C" >&6 if test "${ac_cv_sys_interpreter+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else echo '#! /bin/cat exit 69 ' >conftest chmod u+x conftest (SHELL=/bin/sh; export SHELL; ./conftest >/dev/null) if test $? -ne 69; then ac_cv_sys_interpreter=yes else ac_cv_sys_interpreter=no fi rm -f conftest fi echo "$as_me:$LINENO: result: $ac_cv_sys_interpreter" >&5 echo "${ECHO_T}$ac_cv_sys_interpreter" >&6 interpval=$ac_cv_sys_interpreter echo "$as_me:$LINENO: checking whether ${MAKE-make} sets \$(MAKE)" >&5 echo $ECHO_N "checking whether ${MAKE-make} sets \$(MAKE)... $ECHO_C" >&6 set dummy ${MAKE-make}; ac_make=`echo "$2" | sed 'y,:./+-,___p_,'` if eval "test \"\${ac_cv_prog_make_${ac_make}_set+set}\" = set"; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.make <<\_ACEOF all: @echo 'ac_maketemp="$(MAKE)"' _ACEOF # GNU make sometimes prints "make[1]: Entering...", which would confuse us. eval `${MAKE-make} -f conftest.make 2>/dev/null | grep temp=` if test -n "$ac_maketemp"; then eval ac_cv_prog_make_${ac_make}_set=yes else eval ac_cv_prog_make_${ac_make}_set=no fi rm -f conftest.make fi if eval "test \"`echo '$ac_cv_prog_make_'${ac_make}_set`\" = yes"; then echo "$as_me:$LINENO: result: yes" >&5 echo "${ECHO_T}yes" >&6 SET_MAKE= else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 SET_MAKE="MAKE=${MAKE-make}" fi # Extract the first word of "perl", so it can be a program name with args. set dummy perl; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_path_PERL+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else case $PERL in [\\/]* | ?:[\\/]*) ac_cv_path_PERL="$PERL" # Let the user override the test with a path. ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_path_PERL="$as_dir/$ac_word$ac_exec_ext" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done test -z "$ac_cv_path_PERL" && ac_cv_path_PERL="no" ;; esac fi PERL=$ac_cv_path_PERL if test -n "$PERL"; then echo "$as_me:$LINENO: result: $PERL" >&5 echo "${ECHO_T}$PERL" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi if test "$PERL" = "false"; then { echo "$as_me:$LINENO: WARNING: perl was not found - needed for script shebang lines" >&5 echo "$as_me: WARNING: perl was not found - needed for script shebang lines" >&2;} fi # Extract the first word of "pod2man", so it can be a program name with args. set dummy pod2man; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_POD2MAN+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$POD2MAN"; then ac_cv_prog_POD2MAN="$POD2MAN" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_POD2MAN="pod2man" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done test -z "$ac_cv_prog_POD2MAN" && ac_cv_prog_POD2MAN="false" fi fi POD2MAN=$ac_cv_prog_POD2MAN if test -n "$POD2MAN"; then echo "$as_me:$LINENO: result: $POD2MAN" >&5 echo "${ECHO_T}$POD2MAN" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi if test "$POD2MAN" = "false"; then { echo "$as_me:$LINENO: WARNING: pod2man was not found - needed for building man pages" >&5 echo "$as_me: WARNING: pod2man was not found - needed for building man pages" >&2;} fi # Find a good install program. We prefer a C program (faster), # so one script is as good as another. But avoid the broken or # incompatible versions: # SysV /etc/install, /usr/sbin/install # SunOS /usr/etc/install # IRIX /sbin/install # AIX /bin/install # AmigaOS /C/install, which installs bootblocks on floppy discs # AIX 4 /usr/bin/installbsd, which doesn't work without a -g flag # AFS /usr/afsws/bin/install, which mishandles nonexistent args # SVR4 /usr/ucb/install, which tries to use the nonexistent group "staff" # OS/2's system install, which has a completely different semantic # ./install, which can be erroneously created by make from ./install.sh. echo "$as_me:$LINENO: checking for a BSD-compatible install" >&5 echo $ECHO_N "checking for a BSD-compatible install... $ECHO_C" >&6 if test -z "$INSTALL"; then if test "${ac_cv_path_install+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. # Account for people who put trailing slashes in PATH elements. case $as_dir/ in ./ | .// | /cC/* | \ /etc/* | /usr/sbin/* | /usr/etc/* | /sbin/* | /usr/afsws/bin/* | \ ?:\\/os2\\/install\\/* | ?:\\/OS2\\/INSTALL\\/* | \ /usr/ucb/* ) ;; *) # OSF1 and SCO ODT 3.0 have their own names for install. # Don't use installbsd from OSF since it installs stuff as root # by default. for ac_prog in ginstall scoinst install; do for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_prog$ac_exec_ext"; then if test $ac_prog = install && grep dspmsg "$as_dir/$ac_prog$ac_exec_ext" >/dev/null 2>&1; then # AIX install. It has an incompatible calling convention. : elif test $ac_prog = install && grep pwplus "$as_dir/$ac_prog$ac_exec_ext" >/dev/null 2>&1; then # program-specific install script used by HP pwplus--don't use. : else ac_cv_path_install="$as_dir/$ac_prog$ac_exec_ext -c" break 3 fi fi done done ;; esac done fi if test "${ac_cv_path_install+set}" = set; then INSTALL=$ac_cv_path_install else # As a last resort, use the slow shell script. We don't cache a # path for INSTALL within a source directory, because that will # break other packages using the cache if that directory is # removed, or if the path is relative. INSTALL=$ac_install_sh fi fi echo "$as_me:$LINENO: result: $INSTALL" >&5 echo "${ECHO_T}$INSTALL" >&6 # Use test -z because SunOS4 sh mishandles braces in ${var-val}. # It thinks the first close brace ends the variable substitution. test -z "$INSTALL_PROGRAM" && INSTALL_PROGRAM='${INSTALL}' test -z "$INSTALL_SCRIPT" && INSTALL_SCRIPT='${INSTALL}' test -z "$INSTALL_DATA" && INSTALL_DATA='${INSTALL} -m 644' ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu if test -n "$ac_tool_prefix"; then # Extract the first word of "${ac_tool_prefix}gcc", so it can be a program name with args. set dummy ${ac_tool_prefix}gcc; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_CC="${ac_tool_prefix}gcc" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then echo "$as_me:$LINENO: result: $CC" >&5 echo "${ECHO_T}$CC" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi fi if test -z "$ac_cv_prog_CC"; then ac_ct_CC=$CC # Extract the first word of "gcc", so it can be a program name with args. set dummy gcc; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_ac_ct_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$ac_ct_CC"; then ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_CC="gcc" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi ac_ct_CC=$ac_cv_prog_ac_ct_CC if test -n "$ac_ct_CC"; then echo "$as_me:$LINENO: result: $ac_ct_CC" >&5 echo "${ECHO_T}$ac_ct_CC" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi CC=$ac_ct_CC else CC="$ac_cv_prog_CC" fi if test -z "$CC"; then if test -n "$ac_tool_prefix"; then # Extract the first word of "${ac_tool_prefix}cc", so it can be a program name with args. set dummy ${ac_tool_prefix}cc; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_CC="${ac_tool_prefix}cc" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then echo "$as_me:$LINENO: result: $CC" >&5 echo "${ECHO_T}$CC" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi fi if test -z "$ac_cv_prog_CC"; then ac_ct_CC=$CC # Extract the first word of "cc", so it can be a program name with args. set dummy cc; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_ac_ct_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$ac_ct_CC"; then ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_CC="cc" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi ac_ct_CC=$ac_cv_prog_ac_ct_CC if test -n "$ac_ct_CC"; then echo "$as_me:$LINENO: result: $ac_ct_CC" >&5 echo "${ECHO_T}$ac_ct_CC" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi CC=$ac_ct_CC else CC="$ac_cv_prog_CC" fi fi if test -z "$CC"; then # Extract the first word of "cc", so it can be a program name with args. set dummy cc; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else ac_prog_rejected=no as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then if test "$as_dir/$ac_word$ac_exec_ext" = "/usr/ucb/cc"; then ac_prog_rejected=yes continue fi ac_cv_prog_CC="cc" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done if test $ac_prog_rejected = yes; then # We found a bogon in the path, so make sure we never use it. set dummy $ac_cv_prog_CC shift if test $# != 0; then # We chose a different compiler from the bogus one. # However, it has the same basename, so the bogon will be chosen # first if we set CC to just the basename; use the full file name. shift ac_cv_prog_CC="$as_dir/$ac_word${1+' '}$@" fi fi fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then echo "$as_me:$LINENO: result: $CC" >&5 echo "${ECHO_T}$CC" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi fi if test -z "$CC"; then if test -n "$ac_tool_prefix"; then for ac_prog in cl do # Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args. set dummy $ac_tool_prefix$ac_prog; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_CC="$ac_tool_prefix$ac_prog" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then echo "$as_me:$LINENO: result: $CC" >&5 echo "${ECHO_T}$CC" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi test -n "$CC" && break done fi if test -z "$CC"; then ac_ct_CC=$CC for ac_prog in cl do # Extract the first word of "$ac_prog", so it can be a program name with args. set dummy $ac_prog; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_prog_ac_ct_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$ac_ct_CC"; then ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_CC="$ac_prog" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done fi fi ac_ct_CC=$ac_cv_prog_ac_ct_CC if test -n "$ac_ct_CC"; then echo "$as_me:$LINENO: result: $ac_ct_CC" >&5 echo "${ECHO_T}$ac_ct_CC" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi test -n "$ac_ct_CC" && break done CC=$ac_ct_CC fi fi test -z "$CC" && { { echo "$as_me:$LINENO: error: no acceptable C compiler found in \$PATH See \`config.log' for more details." >&5 echo "$as_me: error: no acceptable C compiler found in \$PATH See \`config.log' for more details." >&2;} { (exit 1); exit 1; }; } # Provide some information about the compiler. echo "$as_me:$LINENO:" \ "checking for C compiler version" >&5 ac_compiler=`set X $ac_compile; echo $2` { (eval echo "$as_me:$LINENO: \"$ac_compiler --version &5\"") >&5 (eval $ac_compiler --version &5) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } { (eval echo "$as_me:$LINENO: \"$ac_compiler -v &5\"") >&5 (eval $ac_compiler -v &5) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } { (eval echo "$as_me:$LINENO: \"$ac_compiler -V &5\"") >&5 (eval $ac_compiler -V &5) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } echo "$as_me:$LINENO: checking whether we are using the GNU C compiler" >&5 echo $ECHO_N "checking whether we are using the GNU C compiler... $ECHO_C" >&6 if test "${ac_cv_c_compiler_gnu+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { #ifndef __GNUC__ choke me #endif ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_compiler_gnu=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_compiler_gnu=no fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext ac_cv_c_compiler_gnu=$ac_compiler_gnu fi echo "$as_me:$LINENO: result: $ac_cv_c_compiler_gnu" >&5 echo "${ECHO_T}$ac_cv_c_compiler_gnu" >&6 GCC=`test $ac_compiler_gnu = yes && echo yes` ac_test_CFLAGS=${CFLAGS+set} ac_save_CFLAGS=$CFLAGS CFLAGS="-g" echo "$as_me:$LINENO: checking whether $CC accepts -g" >&5 echo $ECHO_N "checking whether $CC accepts -g... $ECHO_C" >&6 if test "${ac_cv_prog_cc_g+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_prog_cc_g=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_prog_cc_g=no fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext fi echo "$as_me:$LINENO: result: $ac_cv_prog_cc_g" >&5 echo "${ECHO_T}$ac_cv_prog_cc_g" >&6 if test "$ac_test_CFLAGS" = set; then CFLAGS=$ac_save_CFLAGS elif test $ac_cv_prog_cc_g = yes; then if test "$GCC" = yes; then CFLAGS="-g -O2" else CFLAGS="-g" fi else if test "$GCC" = yes; then CFLAGS="-O2" else CFLAGS= fi fi echo "$as_me:$LINENO: checking for $CC option to accept ANSI C" >&5 echo $ECHO_N "checking for $CC option to accept ANSI C... $ECHO_C" >&6 if test "${ac_cv_prog_cc_stdc+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else ac_cv_prog_cc_stdc=no ac_save_CC=$CC cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include #include #include #include /* Most of the following tests are stolen from RCS 5.7's src/conf.sh. */ struct buf { int x; }; FILE * (*rcsopen) (struct buf *, struct stat *, int); static char *e (p, i) char **p; int i; { return p[i]; } static char *f (char * (*g) (char **, int), char **p, ...) { char *s; va_list v; va_start (v,p); s = g (p, va_arg (v,int)); va_end (v); return s; } /* OSF 4.0 Compaq cc is some sort of almost-ANSI by default. It has function prototypes and stuff, but not '\xHH' hex character constants. These don't provoke an error unfortunately, instead are silently treated as 'x'. The following induces an error, until -std1 is added to get proper ANSI mode. Curiously '\x00'!='x' always comes out true, for an array size at least. It's necessary to write '\x00'==0 to get something that's true only with -std1. */ int osf4_cc_array ['\x00' == 0 ? 1 : -1]; int test (int i, double x); struct s1 {int (*f) (int a);}; struct s2 {int (*f) (double a);}; int pairnames (int, char **, FILE *(*)(struct buf *, struct stat *, int), int, int); int argc; char **argv; int main () { return f (e, argv, 0) != argv[0] || f (e, argv, 1) != argv[1]; ; return 0; } _ACEOF # Don't try gcc -ansi; that turns off useful extensions and # breaks some systems' header files. # AIX -qlanglvl=ansi # Ultrix and OSF/1 -std1 # HP-UX 10.20 and later -Ae # HP-UX older versions -Aa -D_HPUX_SOURCE # SVR4 -Xc -D__EXTENSIONS__ for ac_arg in "" -qlanglvl=ansi -std1 -Ae "-Aa -D_HPUX_SOURCE" "-Xc -D__EXTENSIONS__" do CC="$ac_save_CC $ac_arg" rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_prog_cc_stdc=$ac_arg break else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext done rm -f conftest.$ac_ext conftest.$ac_objext CC=$ac_save_CC fi case "x$ac_cv_prog_cc_stdc" in x|xno) echo "$as_me:$LINENO: result: none needed" >&5 echo "${ECHO_T}none needed" >&6 ;; *) echo "$as_me:$LINENO: result: $ac_cv_prog_cc_stdc" >&5 echo "${ECHO_T}$ac_cv_prog_cc_stdc" >&6 CC="$CC $ac_cv_prog_cc_stdc" ;; esac # Some people use a C++ compiler to compile C. Since we use `exit', # in C++ we need to declare it. In case someone uses the same compiler # for both compiling C and C++ we need to have the C++ compiler decide # the declaration of exit, since it's the most demanding environment. cat >conftest.$ac_ext <<_ACEOF #ifndef __cplusplus choke me #endif _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then for ac_declaration in \ '' \ 'extern "C" void std::exit (int) throw (); using std::exit;' \ 'extern "C" void std::exit (int); using std::exit;' \ 'extern "C" void exit (int) throw ();' \ 'extern "C" void exit (int);' \ 'void exit (int);' do cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ $ac_declaration #include int main () { exit (42); ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then : else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 continue fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ $ac_declaration int main () { exit (42); ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then break else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext done rm -f conftest* if test -n "$ac_declaration"; then echo '#ifdef __cplusplus' >>confdefs.h echo $ac_declaration >>confdefs.h echo '#endif' >>confdefs.h fi else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu depcc="$CC" am_compiler_list= echo "$as_me:$LINENO: checking dependency style of $depcc" >&5 echo $ECHO_N "checking dependency style of $depcc... $ECHO_C" >&6 if test "${am_cv_CC_dependencies_compiler_type+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -z "$AMDEP_TRUE" && test -f "$am_depcomp"; then # We make a subdir and do the tests there. Otherwise we can end up # making bogus files that we don't know about and never remove. For # instance it was reported that on HP-UX the gcc test will end up # making a dummy file named `D' -- because `-MD' means `put the output # in D'. mkdir conftest.dir # Copy depcomp to subdir because otherwise we won't find it if we're # using a relative directory. cp "$am_depcomp" conftest.dir cd conftest.dir # We will build objects and dependencies in a subdirectory because # it helps to detect inapplicable dependency modes. For instance # both Tru64's cc and ICC support -MD to output dependencies as a # side effect of compilation, but ICC will put the dependencies in # the current directory while Tru64 will put them in the object # directory. mkdir sub am_cv_CC_dependencies_compiler_type=none if test "$am_compiler_list" = ""; then am_compiler_list=`sed -n 's/^#*\([a-zA-Z0-9]*\))$/\1/p' < ./depcomp` fi for depmode in $am_compiler_list; do # Setup a source with many dependencies, because some compilers # like to wrap large dependency lists on column 80 (with \), and # we should not choose a depcomp mode which is confused by this. # # We need to recreate these files for each test, as the compiler may # overwrite some of them when testing with obscure command lines. # This happens at least with the AIX C compiler. : > sub/conftest.c for i in 1 2 3 4 5 6; do echo '#include "conftst'$i'.h"' >> sub/conftest.c # Using `: > sub/conftst$i.h' creates only sub/conftst1.h with # Solaris 8's {/usr,}/bin/sh. touch sub/conftst$i.h done echo "${am__include} ${am__quote}sub/conftest.Po${am__quote}" > confmf case $depmode in nosideeffect) # after this tag, mechanisms are not by side-effect, so they'll # only be used when explicitly requested if test "x$enable_dependency_tracking" = xyes; then continue else break fi ;; none) break ;; esac # We check with `-c' and `-o' for the sake of the "dashmstdout" # mode. It turns out that the SunPro C++ compiler does not properly # handle `-M -o', and we need to detect this. if depmode=$depmode \ source=sub/conftest.c object=sub/conftest.${OBJEXT-o} \ depfile=sub/conftest.Po tmpdepfile=sub/conftest.TPo \ $SHELL ./depcomp $depcc -c -o sub/conftest.${OBJEXT-o} sub/conftest.c \ >/dev/null 2>conftest.err && grep sub/conftst6.h sub/conftest.Po > /dev/null 2>&1 && grep sub/conftest.${OBJEXT-o} sub/conftest.Po > /dev/null 2>&1 && ${MAKE-make} -s -f confmf > /dev/null 2>&1; then # icc doesn't choke on unknown options, it will just issue warnings # or remarks (even with -Werror). So we grep stderr for any message # that says an option was ignored or not supported. # When given -MP, icc 7.0 and 7.1 complain thusly: # icc: Command line warning: ignoring option '-M'; no argument required # The diagnosis changed in icc 8.0: # icc: Command line remark: option '-MP' not supported if (grep 'ignoring option' conftest.err || grep 'not supported' conftest.err) >/dev/null 2>&1; then :; else am_cv_CC_dependencies_compiler_type=$depmode break fi fi done cd .. rm -rf conftest.dir else am_cv_CC_dependencies_compiler_type=none fi fi echo "$as_me:$LINENO: result: $am_cv_CC_dependencies_compiler_type" >&5 echo "${ECHO_T}$am_cv_CC_dependencies_compiler_type" >&6 CCDEPMODE=depmode=$am_cv_CC_dependencies_compiler_type if test "x$enable_dependency_tracking" != xno \ && test "$am_cv_CC_dependencies_compiler_type" = gcc3; then am__fastdepCC_TRUE= am__fastdepCC_FALSE='#' else am__fastdepCC_TRUE='#' am__fastdepCC_FALSE= fi echo "$as_me:$LINENO: checking for vsnprintf in -lsnprintf" >&5 echo $ECHO_N "checking for vsnprintf in -lsnprintf... $ECHO_C" >&6 if test "${ac_cv_lib_snprintf_vsnprintf+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else ac_check_lib_save_LIBS=$LIBS LIBS="-lsnprintf $LIBS" cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char vsnprintf (); int main () { vsnprintf (); ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_lib_snprintf_vsnprintf=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_lib_snprintf_vsnprintf=no fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext LIBS=$ac_check_lib_save_LIBS fi echo "$as_me:$LINENO: result: $ac_cv_lib_snprintf_vsnprintf" >&5 echo "${ECHO_T}$ac_cv_lib_snprintf_vsnprintf" >&6 if test $ac_cv_lib_snprintf_vsnprintf = yes; then cat >>confdefs.h <<_ACEOF #define HAVE_LIBSNPRINTF 1 _ACEOF LIBS="-lsnprintf $LIBS" fi ac_header_dirent=no for ac_hdr in dirent.h sys/ndir.h sys/dir.h ndir.h; do as_ac_Header=`echo "ac_cv_header_dirent_$ac_hdr" | $as_tr_sh` echo "$as_me:$LINENO: checking for $ac_hdr that defines DIR" >&5 echo $ECHO_N "checking for $ac_hdr that defines DIR... $ECHO_C" >&6 if eval "test \"\${$as_ac_Header+set}\" = set"; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include #include <$ac_hdr> int main () { if ((DIR *) 0) return 0; ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then eval "$as_ac_Header=yes" else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 eval "$as_ac_Header=no" fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext fi echo "$as_me:$LINENO: result: `eval echo '${'$as_ac_Header'}'`" >&5 echo "${ECHO_T}`eval echo '${'$as_ac_Header'}'`" >&6 if test `eval echo '${'$as_ac_Header'}'` = yes; then cat >>confdefs.h <<_ACEOF #define `echo "HAVE_$ac_hdr" | $as_tr_cpp` 1 _ACEOF ac_header_dirent=$ac_hdr; break fi done # Two versions of opendir et al. are in -ldir and -lx on SCO Xenix. if test $ac_header_dirent = dirent.h; then echo "$as_me:$LINENO: checking for library containing opendir" >&5 echo $ECHO_N "checking for library containing opendir... $ECHO_C" >&6 if test "${ac_cv_search_opendir+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else ac_func_search_save_LIBS=$LIBS ac_cv_search_opendir=no cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char opendir (); int main () { opendir (); ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_search_opendir="none required" else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext if test "$ac_cv_search_opendir" = no; then for ac_lib in dir; do LIBS="-l$ac_lib $ac_func_search_save_LIBS" cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char opendir (); int main () { opendir (); ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_search_opendir="-l$ac_lib" break else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext done fi LIBS=$ac_func_search_save_LIBS fi echo "$as_me:$LINENO: result: $ac_cv_search_opendir" >&5 echo "${ECHO_T}$ac_cv_search_opendir" >&6 if test "$ac_cv_search_opendir" != no; then test "$ac_cv_search_opendir" = "none required" || LIBS="$ac_cv_search_opendir $LIBS" fi else echo "$as_me:$LINENO: checking for library containing opendir" >&5 echo $ECHO_N "checking for library containing opendir... $ECHO_C" >&6 if test "${ac_cv_search_opendir+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else ac_func_search_save_LIBS=$LIBS ac_cv_search_opendir=no cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char opendir (); int main () { opendir (); ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_search_opendir="none required" else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext if test "$ac_cv_search_opendir" = no; then for ac_lib in x; do LIBS="-l$ac_lib $ac_func_search_save_LIBS" cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char opendir (); int main () { opendir (); ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_search_opendir="-l$ac_lib" break else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext done fi LIBS=$ac_func_search_save_LIBS fi echo "$as_me:$LINENO: result: $ac_cv_search_opendir" >&5 echo "${ECHO_T}$ac_cv_search_opendir" >&6 if test "$ac_cv_search_opendir" != no; then test "$ac_cv_search_opendir" = "none required" || LIBS="$ac_cv_search_opendir $LIBS" fi fi echo "$as_me:$LINENO: checking whether stat file-mode macros are broken" >&5 echo $ECHO_N "checking whether stat file-mode macros are broken... $ECHO_C" >&6 if test "${ac_cv_header_stat_broken+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include #include #if defined(S_ISBLK) && defined(S_IFDIR) # if S_ISBLK (S_IFDIR) You lose. # endif #endif #if defined(S_ISBLK) && defined(S_IFCHR) # if S_ISBLK (S_IFCHR) You lose. # endif #endif #if defined(S_ISLNK) && defined(S_IFREG) # if S_ISLNK (S_IFREG) You lose. # endif #endif #if defined(S_ISSOCK) && defined(S_IFREG) # if S_ISSOCK (S_IFREG) You lose. # endif #endif _ACEOF if (eval "$ac_cpp conftest.$ac_ext") 2>&5 | $EGREP "You lose" >/dev/null 2>&1; then ac_cv_header_stat_broken=yes else ac_cv_header_stat_broken=no fi rm -f conftest* fi echo "$as_me:$LINENO: result: $ac_cv_header_stat_broken" >&5 echo "${ECHO_T}$ac_cv_header_stat_broken" >&6 if test $ac_cv_header_stat_broken = yes; then cat >>confdefs.h <<\_ACEOF #define STAT_MACROS_BROKEN 1 _ACEOF fi echo "$as_me:$LINENO: checking for ANSI C header files" >&5 echo $ECHO_N "checking for ANSI C header files... $ECHO_C" >&6 if test "${ac_cv_header_stdc+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include #include #include #include int main () { ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_header_stdc=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_header_stdc=no fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext if test $ac_cv_header_stdc = yes; then # SunOS 4.x string.h does not declare mem*, contrary to ANSI. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include _ACEOF if (eval "$ac_cpp conftest.$ac_ext") 2>&5 | $EGREP "memchr" >/dev/null 2>&1; then : else ac_cv_header_stdc=no fi rm -f conftest* fi if test $ac_cv_header_stdc = yes; then # ISC 2.0.2 stdlib.h does not declare free, contrary to ANSI. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include _ACEOF if (eval "$ac_cpp conftest.$ac_ext") 2>&5 | $EGREP "free" >/dev/null 2>&1; then : else ac_cv_header_stdc=no fi rm -f conftest* fi if test $ac_cv_header_stdc = yes; then # /bin/cc in Irix-4.0.5 gets non-ANSI ctype macros unless using -ansi. if test "$cross_compiling" = yes; then : else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include #if ((' ' & 0x0FF) == 0x020) # define ISLOWER(c) ('a' <= (c) && (c) <= 'z') # define TOUPPER(c) (ISLOWER(c) ? 'A' + ((c) - 'a') : (c)) #else # define ISLOWER(c) \ (('a' <= (c) && (c) <= 'i') \ || ('j' <= (c) && (c) <= 'r') \ || ('s' <= (c) && (c) <= 'z')) # define TOUPPER(c) (ISLOWER(c) ? ((c) | 0x40) : (c)) #endif #define XOR(e, f) (((e) && !(f)) || (!(e) && (f))) int main () { int i; for (i = 0; i < 256; i++) if (XOR (islower (i), ISLOWER (i)) || toupper (i) != TOUPPER (i)) exit(2); exit (0); } _ACEOF rm -f conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='./conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then : else echo "$as_me: program exited with status $ac_status" >&5 echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ( exit $ac_status ) ac_cv_header_stdc=no fi rm -f core *.core gmon.out bb.out conftest$ac_exeext conftest.$ac_objext conftest.$ac_ext fi fi fi echo "$as_me:$LINENO: result: $ac_cv_header_stdc" >&5 echo "${ECHO_T}$ac_cv_header_stdc" >&6 if test $ac_cv_header_stdc = yes; then cat >>confdefs.h <<\_ACEOF #define STDC_HEADERS 1 _ACEOF fi for ac_header in unistd.h stdlib.h string.h sys/timeb.h windows.h do as_ac_Header=`echo "ac_cv_header_$ac_header" | $as_tr_sh` if eval "test \"\${$as_ac_Header+set}\" = set"; then echo "$as_me:$LINENO: checking for $ac_header" >&5 echo $ECHO_N "checking for $ac_header... $ECHO_C" >&6 if eval "test \"\${$as_ac_Header+set}\" = set"; then echo $ECHO_N "(cached) $ECHO_C" >&6 fi echo "$as_me:$LINENO: result: `eval echo '${'$as_ac_Header'}'`" >&5 echo "${ECHO_T}`eval echo '${'$as_ac_Header'}'`" >&6 else # Is the header compilable? echo "$as_me:$LINENO: checking $ac_header usability" >&5 echo $ECHO_N "checking $ac_header usability... $ECHO_C" >&6 cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ $ac_includes_default #include <$ac_header> _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_header_compiler=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_header_compiler=no fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext echo "$as_me:$LINENO: result: $ac_header_compiler" >&5 echo "${ECHO_T}$ac_header_compiler" >&6 # Is the header present? echo "$as_me:$LINENO: checking $ac_header presence" >&5 echo $ECHO_N "checking $ac_header presence... $ECHO_C" >&6 cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include <$ac_header> _ACEOF if { (eval echo "$as_me:$LINENO: \"$ac_cpp conftest.$ac_ext\"") >&5 (eval $ac_cpp conftest.$ac_ext) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } >/dev/null; then if test -s conftest.err; then ac_cpp_err=$ac_c_preproc_warn_flag ac_cpp_err=$ac_cpp_err$ac_c_werror_flag else ac_cpp_err= fi else ac_cpp_err=yes fi if test -z "$ac_cpp_err"; then ac_header_preproc=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_header_preproc=no fi rm -f conftest.err conftest.$ac_ext echo "$as_me:$LINENO: result: $ac_header_preproc" >&5 echo "${ECHO_T}$ac_header_preproc" >&6 # So? What about this header? case $ac_header_compiler:$ac_header_preproc:$ac_c_preproc_warn_flag in yes:no: ) { echo "$as_me:$LINENO: WARNING: $ac_header: accepted by the compiler, rejected by the preprocessor!" >&5 echo "$as_me: WARNING: $ac_header: accepted by the compiler, rejected by the preprocessor!" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: proceeding with the compiler's result" >&5 echo "$as_me: WARNING: $ac_header: proceeding with the compiler's result" >&2;} ac_header_preproc=yes ;; no:yes:* ) { echo "$as_me:$LINENO: WARNING: $ac_header: present but cannot be compiled" >&5 echo "$as_me: WARNING: $ac_header: present but cannot be compiled" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: check for missing prerequisite headers?" >&5 echo "$as_me: WARNING: $ac_header: check for missing prerequisite headers?" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: see the Autoconf documentation" >&5 echo "$as_me: WARNING: $ac_header: see the Autoconf documentation" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: section \"Present But Cannot Be Compiled\"" >&5 echo "$as_me: WARNING: $ac_header: section \"Present But Cannot Be Compiled\"" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: proceeding with the preprocessor's result" >&5 echo "$as_me: WARNING: $ac_header: proceeding with the preprocessor's result" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: in the future, the compiler will take precedence" >&5 echo "$as_me: WARNING: $ac_header: in the future, the compiler will take precedence" >&2;} ( cat <<\_ASBOX ## ------------------------------------------ ## ## Report this to the AC_PACKAGE_NAME lists. ## ## ------------------------------------------ ## _ASBOX ) | sed "s/^/$as_me: WARNING: /" >&2 ;; esac echo "$as_me:$LINENO: checking for $ac_header" >&5 echo $ECHO_N "checking for $ac_header... $ECHO_C" >&6 if eval "test \"\${$as_ac_Header+set}\" = set"; then echo $ECHO_N "(cached) $ECHO_C" >&6 else eval "$as_ac_Header=\$ac_header_preproc" fi echo "$as_me:$LINENO: result: `eval echo '${'$as_ac_Header'}'`" >&5 echo "${ECHO_T}`eval echo '${'$as_ac_Header'}'`" >&6 fi if test `eval echo '${'$as_ac_Header'}'` = yes; then cat >>confdefs.h <<_ACEOF #define `echo "HAVE_$ac_header" | $as_tr_cpp` 1 _ACEOF fi done for ac_header in sys/resource.h sys/param.h do as_ac_Header=`echo "ac_cv_header_$ac_header" | $as_tr_sh` if eval "test \"\${$as_ac_Header+set}\" = set"; then echo "$as_me:$LINENO: checking for $ac_header" >&5 echo $ECHO_N "checking for $ac_header... $ECHO_C" >&6 if eval "test \"\${$as_ac_Header+set}\" = set"; then echo $ECHO_N "(cached) $ECHO_C" >&6 fi echo "$as_me:$LINENO: result: `eval echo '${'$as_ac_Header'}'`" >&5 echo "${ECHO_T}`eval echo '${'$as_ac_Header'}'`" >&6 else # Is the header compilable? echo "$as_me:$LINENO: checking $ac_header usability" >&5 echo $ECHO_N "checking $ac_header usability... $ECHO_C" >&6 cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ $ac_includes_default #include <$ac_header> _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_header_compiler=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_header_compiler=no fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext echo "$as_me:$LINENO: result: $ac_header_compiler" >&5 echo "${ECHO_T}$ac_header_compiler" >&6 # Is the header present? echo "$as_me:$LINENO: checking $ac_header presence" >&5 echo $ECHO_N "checking $ac_header presence... $ECHO_C" >&6 cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include <$ac_header> _ACEOF if { (eval echo "$as_me:$LINENO: \"$ac_cpp conftest.$ac_ext\"") >&5 (eval $ac_cpp conftest.$ac_ext) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } >/dev/null; then if test -s conftest.err; then ac_cpp_err=$ac_c_preproc_warn_flag ac_cpp_err=$ac_cpp_err$ac_c_werror_flag else ac_cpp_err= fi else ac_cpp_err=yes fi if test -z "$ac_cpp_err"; then ac_header_preproc=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_header_preproc=no fi rm -f conftest.err conftest.$ac_ext echo "$as_me:$LINENO: result: $ac_header_preproc" >&5 echo "${ECHO_T}$ac_header_preproc" >&6 # So? What about this header? case $ac_header_compiler:$ac_header_preproc:$ac_c_preproc_warn_flag in yes:no: ) { echo "$as_me:$LINENO: WARNING: $ac_header: accepted by the compiler, rejected by the preprocessor!" >&5 echo "$as_me: WARNING: $ac_header: accepted by the compiler, rejected by the preprocessor!" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: proceeding with the compiler's result" >&5 echo "$as_me: WARNING: $ac_header: proceeding with the compiler's result" >&2;} ac_header_preproc=yes ;; no:yes:* ) { echo "$as_me:$LINENO: WARNING: $ac_header: present but cannot be compiled" >&5 echo "$as_me: WARNING: $ac_header: present but cannot be compiled" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: check for missing prerequisite headers?" >&5 echo "$as_me: WARNING: $ac_header: check for missing prerequisite headers?" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: see the Autoconf documentation" >&5 echo "$as_me: WARNING: $ac_header: see the Autoconf documentation" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: section \"Present But Cannot Be Compiled\"" >&5 echo "$as_me: WARNING: $ac_header: section \"Present But Cannot Be Compiled\"" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: proceeding with the preprocessor's result" >&5 echo "$as_me: WARNING: $ac_header: proceeding with the preprocessor's result" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: in the future, the compiler will take precedence" >&5 echo "$as_me: WARNING: $ac_header: in the future, the compiler will take precedence" >&2;} ( cat <<\_ASBOX ## ------------------------------------------ ## ## Report this to the AC_PACKAGE_NAME lists. ## ## ------------------------------------------ ## _ASBOX ) | sed "s/^/$as_me: WARNING: /" >&2 ;; esac echo "$as_me:$LINENO: checking for $ac_header" >&5 echo $ECHO_N "checking for $ac_header... $ECHO_C" >&6 if eval "test \"\${$as_ac_Header+set}\" = set"; then echo $ECHO_N "(cached) $ECHO_C" >&6 else eval "$as_ac_Header=\$ac_header_preproc" fi echo "$as_me:$LINENO: result: `eval echo '${'$as_ac_Header'}'`" >&5 echo "${ECHO_T}`eval echo '${'$as_ac_Header'}'`" >&6 fi if test `eval echo '${'$as_ac_Header'}'` = yes; then cat >>confdefs.h <<_ACEOF #define `echo "HAVE_$ac_header" | $as_tr_cpp` 1 _ACEOF fi done echo "$as_me:$LINENO: checking for sys/wait.h that is POSIX.1 compatible" >&5 echo $ECHO_N "checking for sys/wait.h that is POSIX.1 compatible... $ECHO_C" >&6 if test "${ac_cv_header_sys_wait_h+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include #include #ifndef WEXITSTATUS # define WEXITSTATUS(stat_val) ((unsigned)(stat_val) >> 8) #endif #ifndef WIFEXITED # define WIFEXITED(stat_val) (((stat_val) & 255) == 0) #endif int main () { int s; wait (&s); s = WIFEXITED (s) ? WEXITSTATUS (s) : 1; ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_header_sys_wait_h=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_header_sys_wait_h=no fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext fi echo "$as_me:$LINENO: result: $ac_cv_header_sys_wait_h" >&5 echo "${ECHO_T}$ac_cv_header_sys_wait_h" >&6 if test $ac_cv_header_sys_wait_h = yes; then cat >>confdefs.h <<\_ACEOF #define HAVE_SYS_WAIT_H 1 _ACEOF fi echo "$as_me:$LINENO: checking for an ANSI C-conforming const" >&5 echo $ECHO_N "checking for an ANSI C-conforming const... $ECHO_C" >&6 if test "${ac_cv_c_const+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { /* FIXME: Include the comments suggested by Paul. */ #ifndef __cplusplus /* Ultrix mips cc rejects this. */ typedef int charset[2]; const charset x; /* SunOS 4.1.1 cc rejects this. */ char const *const *ccp; char **p; /* NEC SVR4.0.2 mips cc rejects this. */ struct point {int x, y;}; static struct point const zero = {0,0}; /* AIX XL C 1.02.0.0 rejects this. It does not let you subtract one const X* pointer from another in an arm of an if-expression whose if-part is not a constant expression */ const char *g = "string"; ccp = &g + (g ? g-g : 0); /* HPUX 7.0 cc rejects these. */ ++ccp; p = (char**) ccp; ccp = (char const *const *) p; { /* SCO 3.2v4 cc rejects this. */ char *t; char const *s = 0 ? (char *) 0 : (char const *) 0; *t++ = 0; } { /* Someone thinks the Sun supposedly-ANSI compiler will reject this. */ int x[] = {25, 17}; const int *foo = &x[0]; ++foo; } { /* Sun SC1.0 ANSI compiler rejects this -- but not the above. */ typedef const int *iptr; iptr p = 0; ++p; } { /* AIX XL C 1.02.0.0 rejects this saying "k.c", line 2.27: 1506-025 (S) Operand must be a modifiable lvalue. */ struct s { int j; const int *ap[3]; }; struct s *b; b->j = 5; } { /* ULTRIX-32 V3.1 (Rev 9) vcc rejects this */ const int foo = 10; } #endif ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_c_const=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_c_const=no fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext fi echo "$as_me:$LINENO: result: $ac_cv_c_const" >&5 echo "${ECHO_T}$ac_cv_c_const" >&6 if test $ac_cv_c_const = no; then cat >>confdefs.h <<\_ACEOF #define const _ACEOF fi echo "$as_me:$LINENO: checking for pid_t" >&5 echo $ECHO_N "checking for pid_t... $ECHO_C" >&6 if test "${ac_cv_type_pid_t+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ $ac_includes_default int main () { if ((pid_t *) 0) return 0; if (sizeof (pid_t)) return 0; ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_type_pid_t=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_type_pid_t=no fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext fi echo "$as_me:$LINENO: result: $ac_cv_type_pid_t" >&5 echo "${ECHO_T}$ac_cv_type_pid_t" >&6 if test $ac_cv_type_pid_t = yes; then : else cat >>confdefs.h <<_ACEOF #define pid_t int _ACEOF fi echo "$as_me:$LINENO: checking for size_t" >&5 echo $ECHO_N "checking for size_t... $ECHO_C" >&6 if test "${ac_cv_type_size_t+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ $ac_includes_default int main () { if ((size_t *) 0) return 0; if (sizeof (size_t)) return 0; ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_type_size_t=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_type_size_t=no fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext fi echo "$as_me:$LINENO: result: $ac_cv_type_size_t" >&5 echo "${ECHO_T}$ac_cv_type_size_t" >&6 if test $ac_cv_type_size_t = yes; then : else cat >>confdefs.h <<_ACEOF #define size_t unsigned _ACEOF fi echo "$as_me:$LINENO: checking whether struct tm is in sys/time.h or time.h" >&5 echo $ECHO_N "checking whether struct tm is in sys/time.h or time.h... $ECHO_C" >&6 if test "${ac_cv_struct_tm+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include #include int main () { struct tm *tp; tp->tm_sec; ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_struct_tm=time.h else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_struct_tm=sys/time.h fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext fi echo "$as_me:$LINENO: result: $ac_cv_struct_tm" >&5 echo "${ECHO_T}$ac_cv_struct_tm" >&6 if test $ac_cv_struct_tm = sys/time.h; then cat >>confdefs.h <<\_ACEOF #define TM_IN_SYS_TIME 1 _ACEOF fi # The Ultrix 4.2 mips builtin alloca declared by alloca.h only works # for constant arguments. Useless! echo "$as_me:$LINENO: checking for working alloca.h" >&5 echo $ECHO_N "checking for working alloca.h... $ECHO_C" >&6 if test "${ac_cv_working_alloca_h+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include int main () { char *p = (char *) alloca (2 * sizeof (int)); ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_working_alloca_h=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_working_alloca_h=no fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext fi echo "$as_me:$LINENO: result: $ac_cv_working_alloca_h" >&5 echo "${ECHO_T}$ac_cv_working_alloca_h" >&6 if test $ac_cv_working_alloca_h = yes; then cat >>confdefs.h <<\_ACEOF #define HAVE_ALLOCA_H 1 _ACEOF fi echo "$as_me:$LINENO: checking for alloca" >&5 echo $ECHO_N "checking for alloca... $ECHO_C" >&6 if test "${ac_cv_func_alloca_works+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #ifdef __GNUC__ # define alloca __builtin_alloca #else # ifdef _MSC_VER # include # define alloca _alloca # else # if HAVE_ALLOCA_H # include # else # ifdef _AIX #pragma alloca # else # ifndef alloca /* predefined by HP cc +Olibcalls */ char *alloca (); # endif # endif # endif # endif #endif int main () { char *p = (char *) alloca (1); ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_func_alloca_works=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_func_alloca_works=no fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext fi echo "$as_me:$LINENO: result: $ac_cv_func_alloca_works" >&5 echo "${ECHO_T}$ac_cv_func_alloca_works" >&6 if test $ac_cv_func_alloca_works = yes; then cat >>confdefs.h <<\_ACEOF #define HAVE_ALLOCA 1 _ACEOF else # The SVR3 libPW and SVR4 libucb both contain incompatible functions # that cause trouble. Some versions do not even contain alloca or # contain a buggy version. If you still want to use their alloca, # use ar to extract alloca.o from them instead of compiling alloca.c. ALLOCA=alloca.$ac_objext cat >>confdefs.h <<\_ACEOF #define C_ALLOCA 1 _ACEOF echo "$as_me:$LINENO: checking whether \`alloca.c' needs Cray hooks" >&5 echo $ECHO_N "checking whether \`alloca.c' needs Cray hooks... $ECHO_C" >&6 if test "${ac_cv_os_cray+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #if defined(CRAY) && ! defined(CRAY2) webecray #else wenotbecray #endif _ACEOF if (eval "$ac_cpp conftest.$ac_ext") 2>&5 | $EGREP "webecray" >/dev/null 2>&1; then ac_cv_os_cray=yes else ac_cv_os_cray=no fi rm -f conftest* fi echo "$as_me:$LINENO: result: $ac_cv_os_cray" >&5 echo "${ECHO_T}$ac_cv_os_cray" >&6 if test $ac_cv_os_cray = yes; then for ac_func in _getb67 GETB67 getb67; do as_ac_var=`echo "ac_cv_func_$ac_func" | $as_tr_sh` echo "$as_me:$LINENO: checking for $ac_func" >&5 echo $ECHO_N "checking for $ac_func... $ECHO_C" >&6 if eval "test \"\${$as_ac_var+set}\" = set"; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Define $ac_func to an innocuous variant, in case declares $ac_func. For example, HP-UX 11i declares gettimeofday. */ #define $ac_func innocuous_$ac_func /* System header to define __stub macros and hopefully few prototypes, which can conflict with char $ac_func (); below. Prefer to if __STDC__ is defined, since exists even on freestanding compilers. */ #ifdef __STDC__ # include #else # include #endif #undef $ac_func /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" { #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char $ac_func (); /* The GNU C library defines this for functions which it implements to always fail with ENOSYS. Some functions are actually named something starting with __ and the normal name is an alias. */ #if defined (__stub_$ac_func) || defined (__stub___$ac_func) choke me #else char (*f) () = $ac_func; #endif #ifdef __cplusplus } #endif int main () { return f != $ac_func; ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then eval "$as_ac_var=yes" else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 eval "$as_ac_var=no" fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext fi echo "$as_me:$LINENO: result: `eval echo '${'$as_ac_var'}'`" >&5 echo "${ECHO_T}`eval echo '${'$as_ac_var'}'`" >&6 if test `eval echo '${'$as_ac_var'}'` = yes; then cat >>confdefs.h <<_ACEOF #define CRAY_STACKSEG_END $ac_func _ACEOF break fi done fi echo "$as_me:$LINENO: checking stack direction for C alloca" >&5 echo $ECHO_N "checking stack direction for C alloca... $ECHO_C" >&6 if test "${ac_cv_c_stack_direction+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test "$cross_compiling" = yes; then ac_cv_c_stack_direction=0 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int find_stack_direction () { static char *addr = 0; auto char dummy; if (addr == 0) { addr = &dummy; return find_stack_direction (); } else return (&dummy > addr) ? 1 : -1; } int main () { exit (find_stack_direction () < 0); } _ACEOF rm -f conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='./conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_c_stack_direction=1 else echo "$as_me: program exited with status $ac_status" >&5 echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ( exit $ac_status ) ac_cv_c_stack_direction=-1 fi rm -f core *.core gmon.out bb.out conftest$ac_exeext conftest.$ac_objext conftest.$ac_ext fi fi echo "$as_me:$LINENO: result: $ac_cv_c_stack_direction" >&5 echo "${ECHO_T}$ac_cv_c_stack_direction" >&6 cat >>confdefs.h <<_ACEOF #define STACK_DIRECTION $ac_cv_c_stack_direction _ACEOF fi for ac_func in strftime do as_ac_var=`echo "ac_cv_func_$ac_func" | $as_tr_sh` echo "$as_me:$LINENO: checking for $ac_func" >&5 echo $ECHO_N "checking for $ac_func... $ECHO_C" >&6 if eval "test \"\${$as_ac_var+set}\" = set"; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Define $ac_func to an innocuous variant, in case declares $ac_func. For example, HP-UX 11i declares gettimeofday. */ #define $ac_func innocuous_$ac_func /* System header to define __stub macros and hopefully few prototypes, which can conflict with char $ac_func (); below. Prefer to if __STDC__ is defined, since exists even on freestanding compilers. */ #ifdef __STDC__ # include #else # include #endif #undef $ac_func /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" { #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char $ac_func (); /* The GNU C library defines this for functions which it implements to always fail with ENOSYS. Some functions are actually named something starting with __ and the normal name is an alias. */ #if defined (__stub_$ac_func) || defined (__stub___$ac_func) choke me #else char (*f) () = $ac_func; #endif #ifdef __cplusplus } #endif int main () { return f != $ac_func; ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then eval "$as_ac_var=yes" else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 eval "$as_ac_var=no" fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext fi echo "$as_me:$LINENO: result: `eval echo '${'$as_ac_var'}'`" >&5 echo "${ECHO_T}`eval echo '${'$as_ac_var'}'`" >&6 if test `eval echo '${'$as_ac_var'}'` = yes; then cat >>confdefs.h <<_ACEOF #define `echo "HAVE_$ac_func" | $as_tr_cpp` 1 _ACEOF else # strftime is in -lintl on SCO UNIX. echo "$as_me:$LINENO: checking for strftime in -lintl" >&5 echo $ECHO_N "checking for strftime in -lintl... $ECHO_C" >&6 if test "${ac_cv_lib_intl_strftime+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else ac_check_lib_save_LIBS=$LIBS LIBS="-lintl $LIBS" cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char strftime (); int main () { strftime (); ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_lib_intl_strftime=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_lib_intl_strftime=no fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext LIBS=$ac_check_lib_save_LIBS fi echo "$as_me:$LINENO: result: $ac_cv_lib_intl_strftime" >&5 echo "${ECHO_T}$ac_cv_lib_intl_strftime" >&6 if test $ac_cv_lib_intl_strftime = yes; then cat >>confdefs.h <<\_ACEOF #define HAVE_STRFTIME 1 _ACEOF LIBS="-lintl $LIBS" fi fi done for ac_func in vprintf do as_ac_var=`echo "ac_cv_func_$ac_func" | $as_tr_sh` echo "$as_me:$LINENO: checking for $ac_func" >&5 echo $ECHO_N "checking for $ac_func... $ECHO_C" >&6 if eval "test \"\${$as_ac_var+set}\" = set"; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Define $ac_func to an innocuous variant, in case declares $ac_func. For example, HP-UX 11i declares gettimeofday. */ #define $ac_func innocuous_$ac_func /* System header to define __stub macros and hopefully few prototypes, which can conflict with char $ac_func (); below. Prefer to if __STDC__ is defined, since exists even on freestanding compilers. */ #ifdef __STDC__ # include #else # include #endif #undef $ac_func /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" { #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char $ac_func (); /* The GNU C library defines this for functions which it implements to always fail with ENOSYS. Some functions are actually named something starting with __ and the normal name is an alias. */ #if defined (__stub_$ac_func) || defined (__stub___$ac_func) choke me #else char (*f) () = $ac_func; #endif #ifdef __cplusplus } #endif int main () { return f != $ac_func; ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then eval "$as_ac_var=yes" else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 eval "$as_ac_var=no" fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext fi echo "$as_me:$LINENO: result: `eval echo '${'$as_ac_var'}'`" >&5 echo "${ECHO_T}`eval echo '${'$as_ac_var'}'`" >&6 if test `eval echo '${'$as_ac_var'}'` = yes; then cat >>confdefs.h <<_ACEOF #define `echo "HAVE_$ac_func" | $as_tr_cpp` 1 _ACEOF echo "$as_me:$LINENO: checking for _doprnt" >&5 echo $ECHO_N "checking for _doprnt... $ECHO_C" >&6 if test "${ac_cv_func__doprnt+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Define _doprnt to an innocuous variant, in case declares _doprnt. For example, HP-UX 11i declares gettimeofday. */ #define _doprnt innocuous__doprnt /* System header to define __stub macros and hopefully few prototypes, which can conflict with char _doprnt (); below. Prefer to if __STDC__ is defined, since exists even on freestanding compilers. */ #ifdef __STDC__ # include #else # include #endif #undef _doprnt /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" { #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char _doprnt (); /* The GNU C library defines this for functions which it implements to always fail with ENOSYS. Some functions are actually named something starting with __ and the normal name is an alias. */ #if defined (__stub__doprnt) || defined (__stub____doprnt) choke me #else char (*f) () = _doprnt; #endif #ifdef __cplusplus } #endif int main () { return f != _doprnt; ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_func__doprnt=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_func__doprnt=no fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext fi echo "$as_me:$LINENO: result: $ac_cv_func__doprnt" >&5 echo "${ECHO_T}$ac_cv_func__doprnt" >&6 if test $ac_cv_func__doprnt = yes; then cat >>confdefs.h <<\_ACEOF #define HAVE_DOPRNT 1 _ACEOF fi fi done for ac_header in unistd.h vfork.h do as_ac_Header=`echo "ac_cv_header_$ac_header" | $as_tr_sh` if eval "test \"\${$as_ac_Header+set}\" = set"; then echo "$as_me:$LINENO: checking for $ac_header" >&5 echo $ECHO_N "checking for $ac_header... $ECHO_C" >&6 if eval "test \"\${$as_ac_Header+set}\" = set"; then echo $ECHO_N "(cached) $ECHO_C" >&6 fi echo "$as_me:$LINENO: result: `eval echo '${'$as_ac_Header'}'`" >&5 echo "${ECHO_T}`eval echo '${'$as_ac_Header'}'`" >&6 else # Is the header compilable? echo "$as_me:$LINENO: checking $ac_header usability" >&5 echo $ECHO_N "checking $ac_header usability... $ECHO_C" >&6 cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ $ac_includes_default #include <$ac_header> _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_header_compiler=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_header_compiler=no fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext echo "$as_me:$LINENO: result: $ac_header_compiler" >&5 echo "${ECHO_T}$ac_header_compiler" >&6 # Is the header present? echo "$as_me:$LINENO: checking $ac_header presence" >&5 echo $ECHO_N "checking $ac_header presence... $ECHO_C" >&6 cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include <$ac_header> _ACEOF if { (eval echo "$as_me:$LINENO: \"$ac_cpp conftest.$ac_ext\"") >&5 (eval $ac_cpp conftest.$ac_ext) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } >/dev/null; then if test -s conftest.err; then ac_cpp_err=$ac_c_preproc_warn_flag ac_cpp_err=$ac_cpp_err$ac_c_werror_flag else ac_cpp_err= fi else ac_cpp_err=yes fi if test -z "$ac_cpp_err"; then ac_header_preproc=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_header_preproc=no fi rm -f conftest.err conftest.$ac_ext echo "$as_me:$LINENO: result: $ac_header_preproc" >&5 echo "${ECHO_T}$ac_header_preproc" >&6 # So? What about this header? case $ac_header_compiler:$ac_header_preproc:$ac_c_preproc_warn_flag in yes:no: ) { echo "$as_me:$LINENO: WARNING: $ac_header: accepted by the compiler, rejected by the preprocessor!" >&5 echo "$as_me: WARNING: $ac_header: accepted by the compiler, rejected by the preprocessor!" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: proceeding with the compiler's result" >&5 echo "$as_me: WARNING: $ac_header: proceeding with the compiler's result" >&2;} ac_header_preproc=yes ;; no:yes:* ) { echo "$as_me:$LINENO: WARNING: $ac_header: present but cannot be compiled" >&5 echo "$as_me: WARNING: $ac_header: present but cannot be compiled" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: check for missing prerequisite headers?" >&5 echo "$as_me: WARNING: $ac_header: check for missing prerequisite headers?" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: see the Autoconf documentation" >&5 echo "$as_me: WARNING: $ac_header: see the Autoconf documentation" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: section \"Present But Cannot Be Compiled\"" >&5 echo "$as_me: WARNING: $ac_header: section \"Present But Cannot Be Compiled\"" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: proceeding with the preprocessor's result" >&5 echo "$as_me: WARNING: $ac_header: proceeding with the preprocessor's result" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: in the future, the compiler will take precedence" >&5 echo "$as_me: WARNING: $ac_header: in the future, the compiler will take precedence" >&2;} ( cat <<\_ASBOX ## ------------------------------------------ ## ## Report this to the AC_PACKAGE_NAME lists. ## ## ------------------------------------------ ## _ASBOX ) | sed "s/^/$as_me: WARNING: /" >&2 ;; esac echo "$as_me:$LINENO: checking for $ac_header" >&5 echo $ECHO_N "checking for $ac_header... $ECHO_C" >&6 if eval "test \"\${$as_ac_Header+set}\" = set"; then echo $ECHO_N "(cached) $ECHO_C" >&6 else eval "$as_ac_Header=\$ac_header_preproc" fi echo "$as_me:$LINENO: result: `eval echo '${'$as_ac_Header'}'`" >&5 echo "${ECHO_T}`eval echo '${'$as_ac_Header'}'`" >&6 fi if test `eval echo '${'$as_ac_Header'}'` = yes; then cat >>confdefs.h <<_ACEOF #define `echo "HAVE_$ac_header" | $as_tr_cpp` 1 _ACEOF fi done for ac_func in fork vfork do as_ac_var=`echo "ac_cv_func_$ac_func" | $as_tr_sh` echo "$as_me:$LINENO: checking for $ac_func" >&5 echo $ECHO_N "checking for $ac_func... $ECHO_C" >&6 if eval "test \"\${$as_ac_var+set}\" = set"; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Define $ac_func to an innocuous variant, in case declares $ac_func. For example, HP-UX 11i declares gettimeofday. */ #define $ac_func innocuous_$ac_func /* System header to define __stub macros and hopefully few prototypes, which can conflict with char $ac_func (); below. Prefer to if __STDC__ is defined, since exists even on freestanding compilers. */ #ifdef __STDC__ # include #else # include #endif #undef $ac_func /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" { #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char $ac_func (); /* The GNU C library defines this for functions which it implements to always fail with ENOSYS. Some functions are actually named something starting with __ and the normal name is an alias. */ #if defined (__stub_$ac_func) || defined (__stub___$ac_func) choke me #else char (*f) () = $ac_func; #endif #ifdef __cplusplus } #endif int main () { return f != $ac_func; ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then eval "$as_ac_var=yes" else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 eval "$as_ac_var=no" fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext fi echo "$as_me:$LINENO: result: `eval echo '${'$as_ac_var'}'`" >&5 echo "${ECHO_T}`eval echo '${'$as_ac_var'}'`" >&6 if test `eval echo '${'$as_ac_var'}'` = yes; then cat >>confdefs.h <<_ACEOF #define `echo "HAVE_$ac_func" | $as_tr_cpp` 1 _ACEOF fi done if test "x$ac_cv_func_fork" = xyes; then echo "$as_me:$LINENO: checking for working fork" >&5 echo $ECHO_N "checking for working fork... $ECHO_C" >&6 if test "${ac_cv_func_fork_works+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test "$cross_compiling" = yes; then ac_cv_func_fork_works=cross else cat >conftest.$ac_ext <<_ACEOF /* By Ruediger Kuhlmann. */ #include #if HAVE_UNISTD_H # include #endif /* Some systems only have a dummy stub for fork() */ int main () { if (fork() < 0) exit (1); exit (0); } _ACEOF rm -f conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='./conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_func_fork_works=yes else echo "$as_me: program exited with status $ac_status" >&5 echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ( exit $ac_status ) ac_cv_func_fork_works=no fi rm -f core *.core gmon.out bb.out conftest$ac_exeext conftest.$ac_objext conftest.$ac_ext fi fi echo "$as_me:$LINENO: result: $ac_cv_func_fork_works" >&5 echo "${ECHO_T}$ac_cv_func_fork_works" >&6 else ac_cv_func_fork_works=$ac_cv_func_fork fi if test "x$ac_cv_func_fork_works" = xcross; then case $host in *-*-amigaos* | *-*-msdosdjgpp*) # Override, as these systems have only a dummy fork() stub ac_cv_func_fork_works=no ;; *) ac_cv_func_fork_works=yes ;; esac { echo "$as_me:$LINENO: WARNING: result $ac_cv_func_fork_works guessed because of cross compilation" >&5 echo "$as_me: WARNING: result $ac_cv_func_fork_works guessed because of cross compilation" >&2;} fi ac_cv_func_vfork_works=$ac_cv_func_vfork if test "x$ac_cv_func_vfork" = xyes; then echo "$as_me:$LINENO: checking for working vfork" >&5 echo $ECHO_N "checking for working vfork... $ECHO_C" >&6 if test "${ac_cv_func_vfork_works+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test "$cross_compiling" = yes; then ac_cv_func_vfork_works=cross else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Thanks to Paul Eggert for this test. */ #include #include #include #include #include #if HAVE_UNISTD_H # include #endif #if HAVE_VFORK_H # include #endif /* On some sparc systems, changes by the child to local and incoming argument registers are propagated back to the parent. The compiler is told about this with #include , but some compilers (e.g. gcc -O) don't grok . Test for this by using a static variable whose address is put into a register that is clobbered by the vfork. */ static void #ifdef __cplusplus sparc_address_test (int arg) # else sparc_address_test (arg) int arg; #endif { static pid_t child; if (!child) { child = vfork (); if (child < 0) { perror ("vfork"); _exit(2); } if (!child) { arg = getpid(); write(-1, "", 0); _exit (arg); } } } int main () { pid_t parent = getpid (); pid_t child; sparc_address_test (0); child = vfork (); if (child == 0) { /* Here is another test for sparc vfork register problems. This test uses lots of local variables, at least as many local variables as main has allocated so far including compiler temporaries. 4 locals are enough for gcc 1.40.3 on a Solaris 4.1.3 sparc, but we use 8 to be safe. A buggy compiler should reuse the register of parent for one of the local variables, since it will think that parent can't possibly be used any more in this routine. Assigning to the local variable will thus munge parent in the parent process. */ pid_t p = getpid(), p1 = getpid(), p2 = getpid(), p3 = getpid(), p4 = getpid(), p5 = getpid(), p6 = getpid(), p7 = getpid(); /* Convince the compiler that p..p7 are live; otherwise, it might use the same hardware register for all 8 local variables. */ if (p != p1 || p != p2 || p != p3 || p != p4 || p != p5 || p != p6 || p != p7) _exit(1); /* On some systems (e.g. IRIX 3.3), vfork doesn't separate parent from child file descriptors. If the child closes a descriptor before it execs or exits, this munges the parent's descriptor as well. Test for this by closing stdout in the child. */ _exit(close(fileno(stdout)) != 0); } else { int status; struct stat st; while (wait(&status) != child) ; exit( /* Was there some problem with vforking? */ child < 0 /* Did the child fail? (This shouldn't happen.) */ || status /* Did the vfork/compiler bug occur? */ || parent != getpid() /* Did the file descriptor bug occur? */ || fstat(fileno(stdout), &st) != 0 ); } } _ACEOF rm -f conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='./conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_func_vfork_works=yes else echo "$as_me: program exited with status $ac_status" >&5 echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ( exit $ac_status ) ac_cv_func_vfork_works=no fi rm -f core *.core gmon.out bb.out conftest$ac_exeext conftest.$ac_objext conftest.$ac_ext fi fi echo "$as_me:$LINENO: result: $ac_cv_func_vfork_works" >&5 echo "${ECHO_T}$ac_cv_func_vfork_works" >&6 fi; if test "x$ac_cv_func_fork_works" = xcross; then ac_cv_func_vfork_works=$ac_cv_func_vfork { echo "$as_me:$LINENO: WARNING: result $ac_cv_func_vfork_works guessed because of cross compilation" >&5 echo "$as_me: WARNING: result $ac_cv_func_vfork_works guessed because of cross compilation" >&2;} fi if test "x$ac_cv_func_vfork_works" = xyes; then cat >>confdefs.h <<\_ACEOF #define HAVE_WORKING_VFORK 1 _ACEOF else cat >>confdefs.h <<\_ACEOF #define vfork fork _ACEOF fi if test "x$ac_cv_func_fork_works" = xyes; then cat >>confdefs.h <<\_ACEOF #define HAVE_WORKING_FORK 1 _ACEOF fi for ac_func in waitpid kill do as_ac_var=`echo "ac_cv_func_$ac_func" | $as_tr_sh` echo "$as_me:$LINENO: checking for $ac_func" >&5 echo $ECHO_N "checking for $ac_func... $ECHO_C" >&6 if eval "test \"\${$as_ac_var+set}\" = set"; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Define $ac_func to an innocuous variant, in case declares $ac_func. For example, HP-UX 11i declares gettimeofday. */ #define $ac_func innocuous_$ac_func /* System header to define __stub macros and hopefully few prototypes, which can conflict with char $ac_func (); below. Prefer to if __STDC__ is defined, since exists even on freestanding compilers. */ #ifdef __STDC__ # include #else # include #endif #undef $ac_func /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" { #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char $ac_func (); /* The GNU C library defines this for functions which it implements to always fail with ENOSYS. Some functions are actually named something starting with __ and the normal name is an alias. */ #if defined (__stub_$ac_func) || defined (__stub___$ac_func) choke me #else char (*f) () = $ac_func; #endif #ifdef __cplusplus } #endif int main () { return f != $ac_func; ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then eval "$as_ac_var=yes" else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 eval "$as_ac_var=no" fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext fi echo "$as_me:$LINENO: result: `eval echo '${'$as_ac_var'}'`" >&5 echo "${ECHO_T}`eval echo '${'$as_ac_var'}'`" >&6 if test `eval echo '${'$as_ac_var'}'` = yes; then cat >>confdefs.h <<_ACEOF #define `echo "HAVE_$ac_func" | $as_tr_cpp` 1 _ACEOF fi done for ac_func in re_comp regcomp strdup strstr lstat access do as_ac_var=`echo "ac_cv_func_$ac_func" | $as_tr_sh` echo "$as_me:$LINENO: checking for $ac_func" >&5 echo $ECHO_N "checking for $ac_func... $ECHO_C" >&6 if eval "test \"\${$as_ac_var+set}\" = set"; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Define $ac_func to an innocuous variant, in case declares $ac_func. For example, HP-UX 11i declares gettimeofday. */ #define $ac_func innocuous_$ac_func /* System header to define __stub macros and hopefully few prototypes, which can conflict with char $ac_func (); below. Prefer to if __STDC__ is defined, since exists even on freestanding compilers. */ #ifdef __STDC__ # include #else # include #endif #undef $ac_func /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" { #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char $ac_func (); /* The GNU C library defines this for functions which it implements to always fail with ENOSYS. Some functions are actually named something starting with __ and the normal name is an alias. */ #if defined (__stub_$ac_func) || defined (__stub___$ac_func) choke me #else char (*f) () = $ac_func; #endif #ifdef __cplusplus } #endif int main () { return f != $ac_func; ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then eval "$as_ac_var=yes" else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 eval "$as_ac_var=no" fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext fi echo "$as_me:$LINENO: result: `eval echo '${'$as_ac_var'}'`" >&5 echo "${ECHO_T}`eval echo '${'$as_ac_var'}'`" >&6 if test `eval echo '${'$as_ac_var'}'` = yes; then cat >>confdefs.h <<_ACEOF #define `echo "HAVE_$ac_func" | $as_tr_cpp` 1 _ACEOF fi done for ac_func in strchr memcpy do as_ac_var=`echo "ac_cv_func_$ac_func" | $as_tr_sh` echo "$as_me:$LINENO: checking for $ac_func" >&5 echo $ECHO_N "checking for $ac_func... $ECHO_C" >&6 if eval "test \"\${$as_ac_var+set}\" = set"; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Define $ac_func to an innocuous variant, in case declares $ac_func. For example, HP-UX 11i declares gettimeofday. */ #define $ac_func innocuous_$ac_func /* System header to define __stub macros and hopefully few prototypes, which can conflict with char $ac_func (); below. Prefer to if __STDC__ is defined, since exists even on freestanding compilers. */ #ifdef __STDC__ # include #else # include #endif #undef $ac_func /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" { #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char $ac_func (); /* The GNU C library defines this for functions which it implements to always fail with ENOSYS. Some functions are actually named something starting with __ and the normal name is an alias. */ #if defined (__stub_$ac_func) || defined (__stub___$ac_func) choke me #else char (*f) () = $ac_func; #endif #ifdef __cplusplus } #endif int main () { return f != $ac_func; ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then eval "$as_ac_var=yes" else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 eval "$as_ac_var=no" fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext fi echo "$as_me:$LINENO: result: `eval echo '${'$as_ac_var'}'`" >&5 echo "${ECHO_T}`eval echo '${'$as_ac_var'}'`" >&6 if test `eval echo '${'$as_ac_var'}'` = yes; then cat >>confdefs.h <<_ACEOF #define `echo "HAVE_$ac_func" | $as_tr_cpp` 1 _ACEOF fi done for ac_func in clock times getrusage do as_ac_var=`echo "ac_cv_func_$ac_func" | $as_tr_sh` echo "$as_me:$LINENO: checking for $ac_func" >&5 echo $ECHO_N "checking for $ac_func... $ECHO_C" >&6 if eval "test \"\${$as_ac_var+set}\" = set"; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Define $ac_func to an innocuous variant, in case declares $ac_func. For example, HP-UX 11i declares gettimeofday. */ #define $ac_func innocuous_$ac_func /* System header to define __stub macros and hopefully few prototypes, which can conflict with char $ac_func (); below. Prefer to if __STDC__ is defined, since exists even on freestanding compilers. */ #ifdef __STDC__ # include #else # include #endif #undef $ac_func /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" { #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char $ac_func (); /* The GNU C library defines this for functions which it implements to always fail with ENOSYS. Some functions are actually named something starting with __ and the normal name is an alias. */ #if defined (__stub_$ac_func) || defined (__stub___$ac_func) choke me #else char (*f) () = $ac_func; #endif #ifdef __cplusplus } #endif int main () { return f != $ac_func; ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then eval "$as_ac_var=yes" else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 eval "$as_ac_var=no" fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext fi echo "$as_me:$LINENO: result: `eval echo '${'$as_ac_var'}'`" >&5 echo "${ECHO_T}`eval echo '${'$as_ac_var'}'`" >&6 if test `eval echo '${'$as_ac_var'}'` = yes; then cat >>confdefs.h <<_ACEOF #define `echo "HAVE_$ac_func" | $as_tr_cpp` 1 _ACEOF fi done echo "$as_me:$LINENO: checking for log in -lm" >&5 echo $ECHO_N "checking for log in -lm... $ECHO_C" >&6 if test "${ac_cv_lib_m_log+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else ac_check_lib_save_LIBS=$LIBS LIBS="-lm $LIBS" cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char log (); int main () { log (); ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_lib_m_log=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_lib_m_log=no fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext LIBS=$ac_check_lib_save_LIBS fi echo "$as_me:$LINENO: result: $ac_cv_lib_m_log" >&5 echo "${ECHO_T}$ac_cv_lib_m_log" >&6 if test $ac_cv_lib_m_log = yes; then cat >>confdefs.h <<_ACEOF #define HAVE_LIBM 1 _ACEOF LIBS="-lm $LIBS" fi echo "$as_me:$LINENO: checking for working strcoll" >&5 echo $ECHO_N "checking for working strcoll... $ECHO_C" >&6 if test "${ac_cv_func_strcoll_works+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test "$cross_compiling" = yes; then ac_cv_func_strcoll_works=no else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ $ac_includes_default int main () { exit (strcoll ("abc", "def") >= 0 || strcoll ("ABC", "DEF") >= 0 || strcoll ("123", "456") >= 0) ; return 0; } _ACEOF rm -f conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='./conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_func_strcoll_works=yes else echo "$as_me: program exited with status $ac_status" >&5 echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ( exit $ac_status ) ac_cv_func_strcoll_works=no fi rm -f core *.core gmon.out bb.out conftest$ac_exeext conftest.$ac_objext conftest.$ac_ext fi fi echo "$as_me:$LINENO: result: $ac_cv_func_strcoll_works" >&5 echo "${ECHO_T}$ac_cv_func_strcoll_works" >&6 if test $ac_cv_func_strcoll_works = yes; then cat >>confdefs.h <<\_ACEOF #define HAVE_STRCOLL 1 _ACEOF fi echo "$as_me:$LINENO: checking for uid_t in sys/types.h" >&5 echo $ECHO_N "checking for uid_t in sys/types.h... $ECHO_C" >&6 if test "${ac_cv_type_uid_t+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include _ACEOF if (eval "$ac_cpp conftest.$ac_ext") 2>&5 | $EGREP "uid_t" >/dev/null 2>&1; then ac_cv_type_uid_t=yes else ac_cv_type_uid_t=no fi rm -f conftest* fi echo "$as_me:$LINENO: result: $ac_cv_type_uid_t" >&5 echo "${ECHO_T}$ac_cv_type_uid_t" >&6 if test $ac_cv_type_uid_t = no; then cat >>confdefs.h <<\_ACEOF #define uid_t int _ACEOF cat >>confdefs.h <<\_ACEOF #define gid_t int _ACEOF fi echo "$as_me:$LINENO: checking type of array argument to getgroups" >&5 echo $ECHO_N "checking type of array argument to getgroups... $ECHO_C" >&6 if test "${ac_cv_type_getgroups+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test "$cross_compiling" = yes; then ac_cv_type_getgroups=cross else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Thanks to Mike Rendell for this test. */ #include #define NGID 256 #undef MAX #define MAX(x, y) ((x) > (y) ? (x) : (y)) int main () { gid_t gidset[NGID]; int i, n; union { gid_t gval; long lval; } val; val.lval = -1; for (i = 0; i < NGID; i++) gidset[i] = val.gval; n = getgroups (sizeof (gidset) / MAX (sizeof (int), sizeof (gid_t)) - 1, gidset); /* Exit non-zero if getgroups seems to require an array of ints. This happens when gid_t is short but getgroups modifies an array of ints. */ exit ((n > 0 && gidset[n] != val.gval) ? 1 : 0); } _ACEOF rm -f conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='./conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_type_getgroups=gid_t else echo "$as_me: program exited with status $ac_status" >&5 echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ( exit $ac_status ) ac_cv_type_getgroups=int fi rm -f core *.core gmon.out bb.out conftest$ac_exeext conftest.$ac_objext conftest.$ac_ext fi if test $ac_cv_type_getgroups = cross; then cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include _ACEOF if (eval "$ac_cpp conftest.$ac_ext") 2>&5 | $EGREP "getgroups.*int.*gid_t" >/dev/null 2>&1; then ac_cv_type_getgroups=gid_t else ac_cv_type_getgroups=int fi rm -f conftest* fi fi echo "$as_me:$LINENO: result: $ac_cv_type_getgroups" >&5 echo "${ECHO_T}$ac_cv_type_getgroups" >&6 cat >>confdefs.h <<_ACEOF #define GETGROUPS_T $ac_cv_type_getgroups _ACEOF echo "$as_me:$LINENO: checking for getgroups" >&5 echo $ECHO_N "checking for getgroups... $ECHO_C" >&6 if test "${ac_cv_func_getgroups+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Define getgroups to an innocuous variant, in case declares getgroups. For example, HP-UX 11i declares gettimeofday. */ #define getgroups innocuous_getgroups /* System header to define __stub macros and hopefully few prototypes, which can conflict with char getgroups (); below. Prefer to if __STDC__ is defined, since exists even on freestanding compilers. */ #ifdef __STDC__ # include #else # include #endif #undef getgroups /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" { #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char getgroups (); /* The GNU C library defines this for functions which it implements to always fail with ENOSYS. Some functions are actually named something starting with __ and the normal name is an alias. */ #if defined (__stub_getgroups) || defined (__stub___getgroups) choke me #else char (*f) () = getgroups; #endif #ifdef __cplusplus } #endif int main () { return f != getgroups; ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_func_getgroups=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_func_getgroups=no fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext fi echo "$as_me:$LINENO: result: $ac_cv_func_getgroups" >&5 echo "${ECHO_T}$ac_cv_func_getgroups" >&6 # If we don't yet have getgroups, see if it's in -lbsd. # This is reported to be necessary on an ITOS 3000WS running SEIUX 3.1. ac_save_LIBS=$LIBS if test $ac_cv_func_getgroups = no; then echo "$as_me:$LINENO: checking for getgroups in -lbsd" >&5 echo $ECHO_N "checking for getgroups in -lbsd... $ECHO_C" >&6 if test "${ac_cv_lib_bsd_getgroups+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else ac_check_lib_save_LIBS=$LIBS LIBS="-lbsd $LIBS" cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char getgroups (); int main () { getgroups (); ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_lib_bsd_getgroups=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_lib_bsd_getgroups=no fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext LIBS=$ac_check_lib_save_LIBS fi echo "$as_me:$LINENO: result: $ac_cv_lib_bsd_getgroups" >&5 echo "${ECHO_T}$ac_cv_lib_bsd_getgroups" >&6 if test $ac_cv_lib_bsd_getgroups = yes; then GETGROUPS_LIB=-lbsd fi fi # Run the program to test the functionality of the system-supplied # getgroups function only if there is such a function. if test $ac_cv_func_getgroups = yes; then echo "$as_me:$LINENO: checking for working getgroups" >&5 echo $ECHO_N "checking for working getgroups... $ECHO_C" >&6 if test "${ac_cv_func_getgroups_works+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test "$cross_compiling" = yes; then ac_cv_func_getgroups_works=no else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ $ac_includes_default int main () { /* On Ultrix 4.3, getgroups (0, 0) always fails. */ exit (getgroups (0, 0) == -1 ? 1 : 0); ; return 0; } _ACEOF rm -f conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='./conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_func_getgroups_works=yes else echo "$as_me: program exited with status $ac_status" >&5 echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ( exit $ac_status ) ac_cv_func_getgroups_works=no fi rm -f core *.core gmon.out bb.out conftest$ac_exeext conftest.$ac_objext conftest.$ac_ext fi fi echo "$as_me:$LINENO: result: $ac_cv_func_getgroups_works" >&5 echo "${ECHO_T}$ac_cv_func_getgroups_works" >&6 if test $ac_cv_func_getgroups_works = yes; then cat >>confdefs.h <<\_ACEOF #define HAVE_GETGROUPS 1 _ACEOF fi fi LIBS=$ac_save_LIBS echo "$as_me:$LINENO: checking type of array argument to getgroups" >&5 echo $ECHO_N "checking type of array argument to getgroups... $ECHO_C" >&6 if test "${ac_cv_type_getgroups+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test "$cross_compiling" = yes; then ac_cv_type_getgroups=cross else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Thanks to Mike Rendell for this test. */ #include #define NGID 256 #undef MAX #define MAX(x, y) ((x) > (y) ? (x) : (y)) int main () { gid_t gidset[NGID]; int i, n; union { gid_t gval; long lval; } val; val.lval = -1; for (i = 0; i < NGID; i++) gidset[i] = val.gval; n = getgroups (sizeof (gidset) / MAX (sizeof (int), sizeof (gid_t)) - 1, gidset); /* Exit non-zero if getgroups seems to require an array of ints. This happens when gid_t is short but getgroups modifies an array of ints. */ exit ((n > 0 && gidset[n] != val.gval) ? 1 : 0); } _ACEOF rm -f conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='./conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_type_getgroups=gid_t else echo "$as_me: program exited with status $ac_status" >&5 echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ( exit $ac_status ) ac_cv_type_getgroups=int fi rm -f core *.core gmon.out bb.out conftest$ac_exeext conftest.$ac_objext conftest.$ac_ext fi if test $ac_cv_type_getgroups = cross; then cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include _ACEOF if (eval "$ac_cpp conftest.$ac_ext") 2>&5 | $EGREP "getgroups.*int.*gid_t" >/dev/null 2>&1; then ac_cv_type_getgroups=gid_t else ac_cv_type_getgroups=int fi rm -f conftest* fi fi echo "$as_me:$LINENO: result: $ac_cv_type_getgroups" >&5 echo "${ECHO_T}$ac_cv_type_getgroups" >&6 cat >>confdefs.h <<_ACEOF #define GETGROUPS_T $ac_cv_type_getgroups _ACEOF for ac_func in vsnprintf mkstemp do as_ac_var=`echo "ac_cv_func_$ac_func" | $as_tr_sh` echo "$as_me:$LINENO: checking for $ac_func" >&5 echo $ECHO_N "checking for $ac_func... $ECHO_C" >&6 if eval "test \"\${$as_ac_var+set}\" = set"; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Define $ac_func to an innocuous variant, in case declares $ac_func. For example, HP-UX 11i declares gettimeofday. */ #define $ac_func innocuous_$ac_func /* System header to define __stub macros and hopefully few prototypes, which can conflict with char $ac_func (); below. Prefer to if __STDC__ is defined, since exists even on freestanding compilers. */ #ifdef __STDC__ # include #else # include #endif #undef $ac_func /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" { #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char $ac_func (); /* The GNU C library defines this for functions which it implements to always fail with ENOSYS. Some functions are actually named something starting with __ and the normal name is an alias. */ #if defined (__stub_$ac_func) || defined (__stub___$ac_func) choke me #else char (*f) () = $ac_func; #endif #ifdef __cplusplus } #endif int main () { return f != $ac_func; ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then eval "$as_ac_var=yes" else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 eval "$as_ac_var=no" fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext fi echo "$as_me:$LINENO: result: `eval echo '${'$as_ac_var'}'`" >&5 echo "${ECHO_T}`eval echo '${'$as_ac_var'}'`" >&6 if test `eval echo '${'$as_ac_var'}'` = yes; then cat >>confdefs.h <<_ACEOF #define `echo "HAVE_$ac_func" | $as_tr_cpp` 1 _ACEOF else case $LIBOBJS in "$ac_func.$ac_objext" | \ *" $ac_func.$ac_objext" | \ "$ac_func.$ac_objext "* | \ *" $ac_func.$ac_objext "* ) ;; *) LIBOBJS="$LIBOBJS $ac_func.$ac_objext" ;; esac fi done LIBXML_REQUIRED_VERSION=2.4.3 # Check whether --with-libxml2 or --without-libxml2 was given. if test "${with_libxml2+set}" = set; then withval="$with_libxml2" else withval=maybe fi; if test "$withval" != "no"; then XML2_CONFIG="no" if test "$withval" != "yes" && test "$withval" != "maybe" ; then XML2_CONFIG_PATH="$withval/bin" # Extract the first word of "xml2-config", so it can be a program name with args. set dummy xml2-config; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_path_XML2_CONFIG+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else case $XML2_CONFIG in [\\/]* | ?:[\\/]*) ac_cv_path_XML2_CONFIG="$XML2_CONFIG" # Let the user override the test with a path. ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $XML2_CONFIG_PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_path_XML2_CONFIG="$as_dir/$ac_word$ac_exec_ext" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done test -z "$ac_cv_path_XML2_CONFIG" && ac_cv_path_XML2_CONFIG=""no"" ;; esac fi XML2_CONFIG=$ac_cv_path_XML2_CONFIG if test -n "$XML2_CONFIG"; then echo "$as_me:$LINENO: result: $XML2_CONFIG" >&5 echo "${ECHO_T}$XML2_CONFIG" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi else XML2_CONFIG_PATH=$PATH # Extract the first word of "xml2-config", so it can be a program name with args. set dummy xml2-config; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_path_XML2_CONFIG+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else case $XML2_CONFIG in [\\/]* | ?:[\\/]*) ac_cv_path_XML2_CONFIG="$XML2_CONFIG" # Let the user override the test with a path. ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $XML2_CONFIG_PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_path_XML2_CONFIG="$as_dir/$ac_word$ac_exec_ext" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done test -z "$ac_cv_path_XML2_CONFIG" && ac_cv_path_XML2_CONFIG=""no"" ;; esac fi XML2_CONFIG=$ac_cv_path_XML2_CONFIG if test -n "$XML2_CONFIG"; then echo "$as_me:$LINENO: result: $XML2_CONFIG" >&5 echo "${ECHO_T}$XML2_CONFIG" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi fi if test "$XML2_CONFIG" = "no"; then withval="no" else withval=`$XML2_CONFIG --prefix` fi if test "$withval" = "maybe"; then withval = "no" fi fi if test "$withval" = "no"; then echo "$as_me:$LINENO: result: Not building with libxml2 - use --with-libxml2 to enable" >&5 echo "${ECHO_T}Not building with libxml2 - use --with-libxml2 to enable" >&6 else echo "$as_me:$LINENO: checking for libxml libraries >= $LIBXML_REQUIRED_VERSION" >&5 echo $ECHO_N "checking for libxml libraries >= $LIBXML_REQUIRED_VERSION... $ECHO_C" >&6 vers=`$XML2_CONFIG --version | sed -e 's/libxml //' | awk 'BEGIN { FS = "."; } { printf "%d", ($1 * 1000 + $2) * 1000 + $3;}'` XML2_VERSION=`$XML2_CONFIG --version` if test "$vers" -ge `echo $LIBXML_REQUIRED_VERSION | sed -e 's/libxml //' | awk 'BEGIN { FS = "."; } { printf "%d", ($1 * 1000 + $2) * 1000 + $3;}'`;then LIBXML2_LIB="`$XML2_CONFIG --libs`" LIBXML2_CFLAGS="`$XML2_CONFIG --cflags`" echo "$as_me:$LINENO: result: found version $XML2_VERSION" >&5 echo "${ECHO_T}found version $XML2_VERSION" >&6 else { { echo "$as_me:$LINENO: error: You need at least libxml2 $LIBXML_REQUIRED_VERSION for this version of swish" >&5 echo "$as_me: error: You need at least libxml2 $LIBXML_REQUIRED_VERSION for this version of swish" >&2;} { (exit 1); exit 1; }; } fi cat >>confdefs.h <<\_ACEOF #define HAVE_LIBXML2 _ACEOF LIBXML2_OBJS="parser.lo" fi # Check whether --enable-incremental or --disable-incremental was given. if test "${enable_incremental+set}" = set; then enableval="$enable_incremental" btree=yes fi; # Check whether --enable-psortarray or --disable-psortarray was given. if test "${enable_psortarray+set}" = set; then enableval="$enable_psortarray" psortarray=yes fi; if test x$btree = xyes; then { echo "$as_me:$LINENO: WARNING: ** Buidling with developer only incremental indexing code **" >&5 echo "$as_me: WARNING: ** Buidling with developer only incremental indexing code **" >&2;} BTREE_OBJS="btree.lo array.lo worddata.lo fhash.lo" cat >>confdefs.h <<\_ACEOF #define USE_BTREE _ACEOF if test "x$psortarray" = xyes; then { echo "$as_me:$LINENO: WARNING: ** And using ARRAY presorted tables **" >&5 echo "$as_me: WARNING: ** And using ARRAY presorted tables **" >&2;} cat >>confdefs.h <<\_ACEOF #define USE_PRESORT_ARRAY _ACEOF fi fi _cppflags="${CPPFLAGS}" _ldflags="${LDFLAGS}" # Check whether --with-zlib or --without-zlib was given. if test "${with_zlib+set}" = set; then withval="$with_zlib" if test "$withval" != "no" -a "$withval" != "yes"; then Z_DIR=$withval CPPFLAGS="${CPPFLAGS} -I$withval/include" LDFLAGS="${LDFLAGS} -L$withval/lib" fi fi; if test "$with_zlib" = "no"; then echo "Disabling compression support" else for ac_header in zlib.h do as_ac_Header=`echo "ac_cv_header_$ac_header" | $as_tr_sh` if eval "test \"\${$as_ac_Header+set}\" = set"; then echo "$as_me:$LINENO: checking for $ac_header" >&5 echo $ECHO_N "checking for $ac_header... $ECHO_C" >&6 if eval "test \"\${$as_ac_Header+set}\" = set"; then echo $ECHO_N "(cached) $ECHO_C" >&6 fi echo "$as_me:$LINENO: result: `eval echo '${'$as_ac_Header'}'`" >&5 echo "${ECHO_T}`eval echo '${'$as_ac_Header'}'`" >&6 else # Is the header compilable? echo "$as_me:$LINENO: checking $ac_header usability" >&5 echo $ECHO_N "checking $ac_header usability... $ECHO_C" >&6 cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ $ac_includes_default #include <$ac_header> _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_header_compiler=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_header_compiler=no fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext echo "$as_me:$LINENO: result: $ac_header_compiler" >&5 echo "${ECHO_T}$ac_header_compiler" >&6 # Is the header present? echo "$as_me:$LINENO: checking $ac_header presence" >&5 echo $ECHO_N "checking $ac_header presence... $ECHO_C" >&6 cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include <$ac_header> _ACEOF if { (eval echo "$as_me:$LINENO: \"$ac_cpp conftest.$ac_ext\"") >&5 (eval $ac_cpp conftest.$ac_ext) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } >/dev/null; then if test -s conftest.err; then ac_cpp_err=$ac_c_preproc_warn_flag ac_cpp_err=$ac_cpp_err$ac_c_werror_flag else ac_cpp_err= fi else ac_cpp_err=yes fi if test -z "$ac_cpp_err"; then ac_header_preproc=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_header_preproc=no fi rm -f conftest.err conftest.$ac_ext echo "$as_me:$LINENO: result: $ac_header_preproc" >&5 echo "${ECHO_T}$ac_header_preproc" >&6 # So? What about this header? case $ac_header_compiler:$ac_header_preproc:$ac_c_preproc_warn_flag in yes:no: ) { echo "$as_me:$LINENO: WARNING: $ac_header: accepted by the compiler, rejected by the preprocessor!" >&5 echo "$as_me: WARNING: $ac_header: accepted by the compiler, rejected by the preprocessor!" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: proceeding with the compiler's result" >&5 echo "$as_me: WARNING: $ac_header: proceeding with the compiler's result" >&2;} ac_header_preproc=yes ;; no:yes:* ) { echo "$as_me:$LINENO: WARNING: $ac_header: present but cannot be compiled" >&5 echo "$as_me: WARNING: $ac_header: present but cannot be compiled" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: check for missing prerequisite headers?" >&5 echo "$as_me: WARNING: $ac_header: check for missing prerequisite headers?" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: see the Autoconf documentation" >&5 echo "$as_me: WARNING: $ac_header: see the Autoconf documentation" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: section \"Present But Cannot Be Compiled\"" >&5 echo "$as_me: WARNING: $ac_header: section \"Present But Cannot Be Compiled\"" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: proceeding with the preprocessor's result" >&5 echo "$as_me: WARNING: $ac_header: proceeding with the preprocessor's result" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: in the future, the compiler will take precedence" >&5 echo "$as_me: WARNING: $ac_header: in the future, the compiler will take precedence" >&2;} ( cat <<\_ASBOX ## ------------------------------------------ ## ## Report this to the AC_PACKAGE_NAME lists. ## ## ------------------------------------------ ## _ASBOX ) | sed "s/^/$as_me: WARNING: /" >&2 ;; esac echo "$as_me:$LINENO: checking for $ac_header" >&5 echo $ECHO_N "checking for $ac_header... $ECHO_C" >&6 if eval "test \"\${$as_ac_Header+set}\" = set"; then echo $ECHO_N "(cached) $ECHO_C" >&6 else eval "$as_ac_Header=\$ac_header_preproc" fi echo "$as_me:$LINENO: result: `eval echo '${'$as_ac_Header'}'`" >&5 echo "${ECHO_T}`eval echo '${'$as_ac_Header'}'`" >&6 fi if test `eval echo '${'$as_ac_Header'}'` = yes; then cat >>confdefs.h <<_ACEOF #define `echo "HAVE_$ac_header" | $as_tr_cpp` 1 _ACEOF echo "$as_me:$LINENO: checking for gzread in -lz" >&5 echo $ECHO_N "checking for gzread in -lz... $ECHO_C" >&6 if test "${ac_cv_lib_z_gzread+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else ac_check_lib_save_LIBS=$LIBS LIBS="-lz $LIBS" cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ /* Override any gcc2 internal prototype to avoid an error. */ #ifdef __cplusplus extern "C" #endif /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char gzread (); int main () { gzread (); ; return 0; } _ACEOF rm -f conftest.$ac_objext conftest$ac_exeext if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_lib_z_gzread=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_lib_z_gzread=no fi rm -f conftest.err conftest.$ac_objext \ conftest$ac_exeext conftest.$ac_ext LIBS=$ac_check_lib_save_LIBS fi echo "$as_me:$LINENO: result: $ac_cv_lib_z_gzread" >&5 echo "${ECHO_T}$ac_cv_lib_z_gzread" >&6 if test $ac_cv_lib_z_gzread = yes; then cat >>confdefs.h <<\_ACEOF #define HAVE_ZLIB _ACEOF if test "x${Z_DIR}" != "x"; then Z_CFLAGS="-I${Z_DIR}/include" Z_LIBS="-L${Z_DIR}/lib -lz" case ${host} in *-*-solaris*) Z_LIBS="-L${Z_DIR}/lib -R${Z_DIR}/lib -lz" ;; esac else Z_LIBS="-lz" fi fi fi done fi if test "x${target}" == "xi586-mingw32msvc"; then Z_LIBS="-L${Z_DIR}/lib -R${Z_DIR}/lib -lzdll" fi echo "Z_LIBS = $Z_LIBS" CPPFLAGS=${_cppflags} LDFLAGS=${_ldflags} PCRE_REQUIRED_VERSION=3.4 # Check whether --with-pcre or --without-pcre was given. if test "${with_pcre+set}" = set; then withval="$with_pcre" else withval=no fi; if test "$withval" != "no"; then PCRE_CONFIG="no" if test "$withval" != "yes" && test "$withval" != "maybe" ; then PCRE_CONFIG_PATH="$withval/bin" # Extract the first word of "pcre-config", so it can be a program name with args. set dummy pcre-config; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_path_PCRE_CONFIG+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else case $PCRE_CONFIG in [\\/]* | ?:[\\/]*) ac_cv_path_PCRE_CONFIG="$PCRE_CONFIG" # Let the user override the test with a path. ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PCRE_CONFIG_PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_path_PCRE_CONFIG="$as_dir/$ac_word$ac_exec_ext" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done test -z "$ac_cv_path_PCRE_CONFIG" && ac_cv_path_PCRE_CONFIG=""no"" ;; esac fi PCRE_CONFIG=$ac_cv_path_PCRE_CONFIG if test -n "$PCRE_CONFIG"; then echo "$as_me:$LINENO: result: $PCRE_CONFIG" >&5 echo "${ECHO_T}$PCRE_CONFIG" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi else PCRE_CONFIG_PATH=$PATH # Extract the first word of "pcre-config", so it can be a program name with args. set dummy pcre-config; ac_word=$2 echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 if test "${ac_cv_path_PCRE_CONFIG+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else case $PCRE_CONFIG in [\\/]* | ?:[\\/]*) ac_cv_path_PCRE_CONFIG="$PCRE_CONFIG" # Let the user override the test with a path. ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PCRE_CONFIG_PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_path_PCRE_CONFIG="$as_dir/$ac_word$ac_exec_ext" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done test -z "$ac_cv_path_PCRE_CONFIG" && ac_cv_path_PCRE_CONFIG=""no"" ;; esac fi PCRE_CONFIG=$ac_cv_path_PCRE_CONFIG if test -n "$PCRE_CONFIG"; then echo "$as_me:$LINENO: result: $PCRE_CONFIG" >&5 echo "${ECHO_T}$PCRE_CONFIG" >&6 else echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6 fi fi if test "$PCRE_CONFIG" = "no"; then withval="no" else withval=`$PCRE_CONFIG --prefix` fi if test "$withval" = "maybe"; then withval = "no" fi fi if test "$withval" != "no"; then echo "$as_me:$LINENO: checking for libpcre libraries >= $PCRE_REQUIRED_VERSION" >&5 echo $ECHO_N "checking for libpcre libraries >= $PCRE_REQUIRED_VERSION... $ECHO_C" >&6 vers=`$PCRE_CONFIG --version | awk 'BEGIN { FS = "."; } { printf "%d", ($1 * 1000 + $2) * 1000 + $3;}'` PCRE_VERSION=`$PCRE_CONFIG --version` if test "$vers" -ge `echo $PCRE_REQUIRED_VERSION | awk 'BEGIN { FS = "."; } { printf "%d", ($1 * 1000 + $2) * 1000 + $3;}'`;then PCRE_LIBS="`$PCRE_CONFIG --libs-posix`" PCRE_CFLAGS="`$PCRE_CONFIG --cflags-posix`" echo "$as_me:$LINENO: result: found version $PCRE_VERSION" >&5 echo "${ECHO_T}found version $PCRE_VERSION" >&6 else { { echo "$as_me:$LINENO: error: You need at least libpcre $PCRE_REQUIRED_VERSION for this version of swish" >&5 echo "$as_me: error: You need at least libpcre $PCRE_REQUIRED_VERSION for this version of swish" >&2;} { (exit 1); exit 1; }; } fi cat >>confdefs.h <<\_ACEOF #define HAVE_PCRE _ACEOF else echo "$as_me:$LINENO: result: Not building with perl compatible regex - use --with-pcre to enable" >&5 echo "${ECHO_T}Not building with perl compatible regex - use --with-pcre to enable" >&6 fi # Check whether --enable-largefile or --disable-largefile was given. if test "${enable_largefile+set}" = set; then enableval="$enable_largefile" fi; if test "$enable_largefile" != no; then echo "$as_me:$LINENO: checking for special C compiler options needed for large files" >&5 echo $ECHO_N "checking for special C compiler options needed for large files... $ECHO_C" >&6 if test "${ac_cv_sys_largefile_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else ac_cv_sys_largefile_CC=no if test "$GCC" != yes; then ac_save_CC=$CC while :; do # IRIX 6.2 and later do not support large files by default, # so use the C compiler's -n32 option if that helps. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include /* Check that off_t can represent 2**63 - 1 correctly. We can't simply define LARGE_OFF_T to be 9223372036854775807, since some C++ compilers masquerading as C compilers incorrectly reject 9223372036854775807. */ #define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62)) int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721 && LARGE_OFF_T % 2147483647 == 1) ? 1 : -1]; int main () { ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then break else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext CC="$CC -n32" rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_sys_largefile_CC=' -n32'; break else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext break done CC=$ac_save_CC rm -f conftest.$ac_ext fi fi echo "$as_me:$LINENO: result: $ac_cv_sys_largefile_CC" >&5 echo "${ECHO_T}$ac_cv_sys_largefile_CC" >&6 if test "$ac_cv_sys_largefile_CC" != no; then CC=$CC$ac_cv_sys_largefile_CC fi echo "$as_me:$LINENO: checking for _FILE_OFFSET_BITS value needed for large files" >&5 echo $ECHO_N "checking for _FILE_OFFSET_BITS value needed for large files... $ECHO_C" >&6 if test "${ac_cv_sys_file_offset_bits+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else while :; do ac_cv_sys_file_offset_bits=no cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include /* Check that off_t can represent 2**63 - 1 correctly. We can't simply define LARGE_OFF_T to be 9223372036854775807, since some C++ compilers masquerading as C compilers incorrectly reject 9223372036854775807. */ #define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62)) int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721 && LARGE_OFF_T % 2147483647 == 1) ? 1 : -1]; int main () { ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then break else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #define _FILE_OFFSET_BITS 64 #include /* Check that off_t can represent 2**63 - 1 correctly. We can't simply define LARGE_OFF_T to be 9223372036854775807, since some C++ compilers masquerading as C compilers incorrectly reject 9223372036854775807. */ #define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62)) int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721 && LARGE_OFF_T % 2147483647 == 1) ? 1 : -1]; int main () { ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_sys_file_offset_bits=64; break else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext break done fi echo "$as_me:$LINENO: result: $ac_cv_sys_file_offset_bits" >&5 echo "${ECHO_T}$ac_cv_sys_file_offset_bits" >&6 if test "$ac_cv_sys_file_offset_bits" != no; then cat >>confdefs.h <<_ACEOF #define _FILE_OFFSET_BITS $ac_cv_sys_file_offset_bits _ACEOF fi rm -f conftest* echo "$as_me:$LINENO: checking for _LARGE_FILES value needed for large files" >&5 echo $ECHO_N "checking for _LARGE_FILES value needed for large files... $ECHO_C" >&6 if test "${ac_cv_sys_large_files+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else while :; do ac_cv_sys_large_files=no cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include /* Check that off_t can represent 2**63 - 1 correctly. We can't simply define LARGE_OFF_T to be 9223372036854775807, since some C++ compilers masquerading as C compilers incorrectly reject 9223372036854775807. */ #define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62)) int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721 && LARGE_OFF_T % 2147483647 == 1) ? 1 : -1]; int main () { ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then break else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #define _LARGE_FILES 1 #include /* Check that off_t can represent 2**63 - 1 correctly. We can't simply define LARGE_OFF_T to be 9223372036854775807, since some C++ compilers masquerading as C compilers incorrectly reject 9223372036854775807. */ #define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62)) int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721 && LARGE_OFF_T % 2147483647 == 1) ? 1 : -1]; int main () { ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='test -z "$ac_c_werror_flag" || test ! -s conftest.err' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then ac_cv_sys_large_files=1; break else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext break done fi echo "$as_me:$LINENO: result: $ac_cv_sys_large_files" >&5 echo "${ECHO_T}$ac_cv_sys_large_files" >&6 if test "$ac_cv_sys_large_files" != no; then cat >>confdefs.h <<_ACEOF #define _LARGE_FILES $ac_cv_sys_large_files _ACEOF fi rm -f conftest* fi { echo "$as_me:$LINENO: fileoffset bits = ${ac_cv_sys_file_offset_bits}" >&5 echo "$as_me: fileoffset bits = ${ac_cv_sys_file_offset_bits}" >&6;} if test "x${ac_cv_sys_file_offset_bits}" == "x64" ; then LARGEFILES_MACROS="-D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=$ac_cv_sys_file_offset_bits" fi CPPFLAGS=${_cppflags} LDFLAGS=${_ldflags} libexecdiropt=$(echo $ac_option | grep 'libexecdir=') if test "x$libexecdiropt" = "x"; then libexecdir='${exec_prefix}/lib/${PACKAGE}' { echo "$as_me:$LINENO: Setting libexecdir to \${exec_prefix}/lib/${PACKAGE}" >&5 echo "$as_me: Setting libexecdir to \${exec_prefix}/lib/${PACKAGE}" >&6;} fi echo "$as_me:$LINENO: checking config option memdebug for setting MEM_DEBUG" >&5 echo $ECHO_N "checking config option memdebug for setting MEM_DEBUG... $ECHO_C" >&6 # Check whether --enable-memdebug or --disable-memdebug was given. if test "${enable_memdebug+set}" = set; then enableval="$enable_memdebug" else enableval=no fi; echo "$as_me:$LINENO: result: $enableval" >&5 echo "${ECHO_T}$enableval" >&6 if test x"$enableval" != xno ; then cat >>confdefs.h <<\_ACEOF #define MEM_DEBUG 1 _ACEOF fi echo "$as_me:$LINENO: checking config option memtrace for setting MEM_TRACE" >&5 echo $ECHO_N "checking config option memtrace for setting MEM_TRACE... $ECHO_C" >&6 # Check whether --enable-memtrace or --disable-memtrace was given. if test "${enable_memtrace+set}" = set; then enableval="$enable_memtrace" else enableval=no fi; echo "$as_me:$LINENO: result: $enableval" >&5 echo "${ECHO_T}$enableval" >&6 if test x"$enableval" != xno ; then cat >>confdefs.h <<\_ACEOF #define MEM_TRACE 1 _ACEOF fi echo "$as_me:$LINENO: checking config option memstats for setting MEM_STATISTICS" >&5 echo $ECHO_N "checking config option memstats for setting MEM_STATISTICS... $ECHO_C" >&6 # Check whether --enable-memstats or --disable-memstats was given. if test "${enable_memstats+set}" = set; then enableval="$enable_memstats" else enableval=no fi; echo "$as_me:$LINENO: result: $enableval" >&5 echo "${ECHO_T}$enableval" >&6 if test x"$enableval" != xno ; then cat >>confdefs.h <<\_ACEOF #define MEM_STATISTICS 1 _ACEOF fi ac_config_files="$ac_config_files Makefile html/Makefile pod/Makefile man/Makefile src/Makefile src/expat/Makefile src/replace/Makefile src/snowball/Makefile rpm/swish-e.spec tests/Makefile example/Makefile prog-bin/Makefile filters/Makefile filters/SWISH/Makefile conf/Makefile filter-bin/Makefile swish-e.pc swish-config" cat >confcache <<\_ACEOF # This file is a shell script that caches the results of configure # tests run on this system so they can be shared between configure # scripts and configure runs, see configure's option --config-cache. # It is not useful on other systems. If it contains results you don't # want to keep, you may remove or edit it. # # config.status only pays attention to the cache file if you give it # the --recheck option to rerun configure. # # `ac_cv_env_foo' variables (set or unset) will be overridden when # loading this file, other *unset* `ac_cv_foo' will be assigned the # following values. _ACEOF # The following way of writing the cache mishandles newlines in values, # but we know of no workaround that is simple, portable, and efficient. # So, don't put newlines in cache variables' values. # Ultrix sh set writes to stderr and can't be redirected directly, # and sets the high bit in the cache file unless we assign to the vars. { (set) 2>&1 | case `(ac_space=' '; set | grep ac_space) 2>&1` in *ac_space=\ *) # `set' does not quote correctly, so add quotes (double-quote # substitution turns \\\\ into \\, and sed turns \\ into \). sed -n \ "s/'/'\\\\''/g; s/^\\([_$as_cr_alnum]*_cv_[_$as_cr_alnum]*\\)=\\(.*\\)/\\1='\\2'/p" ;; *) # `set' quotes correctly as required by POSIX, so do not add quotes. sed -n \ "s/^\\([_$as_cr_alnum]*_cv_[_$as_cr_alnum]*\\)=\\(.*\\)/\\1=\\2/p" ;; esac; } | sed ' t clear : clear s/^\([^=]*\)=\(.*[{}].*\)$/test "${\1+set}" = set || &/ t end /^ac_cv_env/!s/^\([^=]*\)=\(.*\)$/\1=${\1=\2}/ : end' >>confcache if diff $cache_file confcache >/dev/null 2>&1; then :; else if test -w $cache_file; then test "x$cache_file" != "x/dev/null" && echo "updating cache $cache_file" cat confcache >$cache_file else echo "not updating unwritable cache $cache_file" fi fi rm -f confcache test "x$prefix" = xNONE && prefix=$ac_default_prefix # Let make expand exec_prefix. test "x$exec_prefix" = xNONE && exec_prefix='${prefix}' # VPATH may cause trouble with some makes, so we remove $(srcdir), # ${srcdir} and @srcdir@ from VPATH if srcdir is ".", strip leading and # trailing colons and then remove the whole line if VPATH becomes empty # (actually we leave an empty line to preserve line numbers). if test "x$srcdir" = x.; then ac_vpsub='/^[ ]*VPATH[ ]*=/{ s/:*\$(srcdir):*/:/; s/:*\${srcdir}:*/:/; s/:*@srcdir@:*/:/; s/^\([^=]*=[ ]*\):*/\1/; s/:*$//; s/^[^=]*=[ ]*$//; }' fi DEFS=-DHAVE_CONFIG_H ac_libobjs= ac_ltlibobjs= for ac_i in : $LIBOBJS; do test "x$ac_i" = x: && continue # 1. Remove the extension, and $U if already installed. ac_i=`echo "$ac_i" | sed 's/\$U\././;s/\.o$//;s/\.obj$//'` # 2. Add them. ac_libobjs="$ac_libobjs $ac_i\$U.$ac_objext" ac_ltlibobjs="$ac_ltlibobjs $ac_i"'$U.lo' done LIBOBJS=$ac_libobjs LTLIBOBJS=$ac_ltlibobjs if test -z "${BUILDDOCS_TRUE}" && test -z "${BUILDDOCS_FALSE}"; then { { echo "$as_me:$LINENO: error: conditional \"BUILDDOCS\" was never defined. Usually this means the macro was only invoked conditionally." >&5 echo "$as_me: error: conditional \"BUILDDOCS\" was never defined. Usually this means the macro was only invoked conditionally." >&2;} { (exit 1); exit 1; }; } fi if test -z "${INSTALLDOCS_TRUE}" && test -z "${INSTALLDOCS_FALSE}"; then { { echo "$as_me:$LINENO: error: conditional \"INSTALLDOCS\" was never defined. Usually this means the macro was only invoked conditionally." >&5 echo "$as_me: error: conditional \"INSTALLDOCS\" was never defined. Usually this means the macro was only invoked conditionally." >&2;} { (exit 1); exit 1; }; } fi if test -z "${BUILDDOCS_TRUE}" && test -z "${BUILDDOCS_FALSE}"; then { { echo "$as_me:$LINENO: error: conditional \"BUILDDOCS\" was never defined. Usually this means the macro was only invoked conditionally." >&5 echo "$as_me: error: conditional \"BUILDDOCS\" was never defined. Usually this means the macro was only invoked conditionally." >&2;} { (exit 1); exit 1; }; } fi if test -z "${INSTALLDOCS_TRUE}" && test -z "${INSTALLDOCS_FALSE}"; then { { echo "$as_me:$LINENO: error: conditional \"INSTALLDOCS\" was never defined. Usually this means the macro was only invoked conditionally." >&5 echo "$as_me: error: conditional \"INSTALLDOCS\" was never defined. Usually this means the macro was only invoked conditionally." >&2;} { (exit 1); exit 1; }; } fi if test -z "${INSTALLDOCS_TRUE}" && test -z "${INSTALLDOCS_FALSE}"; then { { echo "$as_me:$LINENO: error: conditional \"INSTALLDOCS\" was never defined. Usually this means the macro was only invoked conditionally." >&5 echo "$as_me: error: conditional \"INSTALLDOCS\" was never defined. Usually this means the macro was only invoked conditionally." >&2;} { (exit 1); exit 1; }; } fi if test -z "${AMDEP_TRUE}" && test -z "${AMDEP_FALSE}"; then { { echo "$as_me:$LINENO: error: conditional \"AMDEP\" was never defined. Usually this means the macro was only invoked conditionally." >&5 echo "$as_me: error: conditional \"AMDEP\" was never defined. Usually this means the macro was only invoked conditionally." >&2;} { (exit 1); exit 1; }; } fi if test -z "${am__fastdepCC_TRUE}" && test -z "${am__fastdepCC_FALSE}"; then { { echo "$as_me:$LINENO: error: conditional \"am__fastdepCC\" was never defined. Usually this means the macro was only invoked conditionally." >&5 echo "$as_me: error: conditional \"am__fastdepCC\" was never defined. Usually this means the macro was only invoked conditionally." >&2;} { (exit 1); exit 1; }; } fi if test -z "${am__fastdepCC_TRUE}" && test -z "${am__fastdepCC_FALSE}"; then { { echo "$as_me:$LINENO: error: conditional \"am__fastdepCC\" was never defined. Usually this means the macro was only invoked conditionally." >&5 echo "$as_me: error: conditional \"am__fastdepCC\" was never defined. Usually this means the macro was only invoked conditionally." >&2;} { (exit 1); exit 1; }; } fi if test -z "${am__fastdepCXX_TRUE}" && test -z "${am__fastdepCXX_FALSE}"; then { { echo "$as_me:$LINENO: error: conditional \"am__fastdepCXX\" was never defined. Usually this means the macro was only invoked conditionally." >&5 echo "$as_me: error: conditional \"am__fastdepCXX\" was never defined. Usually this means the macro was only invoked conditionally." >&2;} { (exit 1); exit 1; }; } fi if test -z "${MAINTAINER_MODE_TRUE}" && test -z "${MAINTAINER_MODE_FALSE}"; then { { echo "$as_me:$LINENO: error: conditional \"MAINTAINER_MODE\" was never defined. Usually this means the macro was only invoked conditionally." >&5 echo "$as_me: error: conditional \"MAINTAINER_MODE\" was never defined. Usually this means the macro was only invoked conditionally." >&2;} { (exit 1); exit 1; }; } fi if test -z "${am__fastdepCC_TRUE}" && test -z "${am__fastdepCC_FALSE}"; then { { echo "$as_me:$LINENO: error: conditional \"am__fastdepCC\" was never defined. Usually this means the macro was only invoked conditionally." >&5 echo "$as_me: error: conditional \"am__fastdepCC\" was never defined. Usually this means the macro was only invoked conditionally." >&2;} { (exit 1); exit 1; }; } fi : ${CONFIG_STATUS=./config.status} ac_clean_files_save=$ac_clean_files ac_clean_files="$ac_clean_files $CONFIG_STATUS" { echo "$as_me:$LINENO: creating $CONFIG_STATUS" >&5 echo "$as_me: creating $CONFIG_STATUS" >&6;} cat >$CONFIG_STATUS <<_ACEOF #! $SHELL # Generated by $as_me. # Run this file to recreate the current configuration. # Compiler output produced by configure, useful for debugging # configure, is in config.log if it exists. debug=false ac_cs_recheck=false ac_cs_silent=false SHELL=\${CONFIG_SHELL-$SHELL} _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF ## --------------------- ## ## M4sh Initialization. ## ## --------------------- ## # Be Bourne compatible if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then emulate sh NULLCMD=: # Zsh 3.x and 4.x performs word splitting on ${1+"$@"}, which # is contrary to our usage. Disable this feature. alias -g '${1+"$@"}'='"$@"' elif test -n "${BASH_VERSION+set}" && (set -o posix) >/dev/null 2>&1; then set -o posix fi DUALCASE=1; export DUALCASE # for MKS sh # Support unset when possible. if ( (MAIL=60; unset MAIL) || exit) >/dev/null 2>&1; then as_unset=unset else as_unset=false fi # Work around bugs in pre-3.0 UWIN ksh. $as_unset ENV MAIL MAILPATH PS1='$ ' PS2='> ' PS4='+ ' # NLS nuisances. for as_var in \ LANG LANGUAGE LC_ADDRESS LC_ALL LC_COLLATE LC_CTYPE LC_IDENTIFICATION \ LC_MEASUREMENT LC_MESSAGES LC_MONETARY LC_NAME LC_NUMERIC LC_PAPER \ LC_TELEPHONE LC_TIME do if (set +x; test -z "`(eval $as_var=C; export $as_var) 2>&1`"); then eval $as_var=C; export $as_var else $as_unset $as_var fi done # Required to use basename. if expr a : '\(a\)' >/dev/null 2>&1; then as_expr=expr else as_expr=false fi if (basename /) >/dev/null 2>&1 && test "X`basename / 2>&1`" = "X/"; then as_basename=basename else as_basename=false fi # Name of the executable. as_me=`$as_basename "$0" || $as_expr X/"$0" : '.*/\([^/][^/]*\)/*$' \| \ X"$0" : 'X\(//\)$' \| \ X"$0" : 'X\(/\)$' \| \ . : '\(.\)' 2>/dev/null || echo X/"$0" | sed '/^.*\/\([^/][^/]*\)\/*$/{ s//\1/; q; } /^X\/\(\/\/\)$/{ s//\1/; q; } /^X\/\(\/\).*/{ s//\1/; q; } s/.*/./; q'` # PATH needs CR, and LINENO needs CR and PATH. # Avoid depending upon Character Ranges. as_cr_letters='abcdefghijklmnopqrstuvwxyz' as_cr_LETTERS='ABCDEFGHIJKLMNOPQRSTUVWXYZ' as_cr_Letters=$as_cr_letters$as_cr_LETTERS as_cr_digits='0123456789' as_cr_alnum=$as_cr_Letters$as_cr_digits # The user is always right. if test "${PATH_SEPARATOR+set}" != set; then echo "#! /bin/sh" >conf$$.sh echo "exit 0" >>conf$$.sh chmod +x conf$$.sh if (PATH="/nonexistent;."; conf$$.sh) >/dev/null 2>&1; then PATH_SEPARATOR=';' else PATH_SEPARATOR=: fi rm -f conf$$.sh fi as_lineno_1=$LINENO as_lineno_2=$LINENO as_lineno_3=`(expr $as_lineno_1 + 1) 2>/dev/null` test "x$as_lineno_1" != "x$as_lineno_2" && test "x$as_lineno_3" = "x$as_lineno_2" || { # Find who we are. Look in the path if we contain no path at all # relative or not. case $0 in *[\\/]* ) as_myself=$0 ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. test -r "$as_dir/$0" && as_myself=$as_dir/$0 && break done ;; esac # We did not find ourselves, most probably we were run as `sh COMMAND' # in which case we are not to be found in the path. if test "x$as_myself" = x; then as_myself=$0 fi if test ! -f "$as_myself"; then { { echo "$as_me:$LINENO: error: cannot find myself; rerun with an absolute path" >&5 echo "$as_me: error: cannot find myself; rerun with an absolute path" >&2;} { (exit 1); exit 1; }; } fi case $CONFIG_SHELL in '') as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in /bin$PATH_SEPARATOR/usr/bin$PATH_SEPARATOR$PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for as_base in sh bash ksh sh5; do case $as_dir in /*) if ("$as_dir/$as_base" -c ' as_lineno_1=$LINENO as_lineno_2=$LINENO as_lineno_3=`(expr $as_lineno_1 + 1) 2>/dev/null` test "x$as_lineno_1" != "x$as_lineno_2" && test "x$as_lineno_3" = "x$as_lineno_2" ') 2>/dev/null; then $as_unset BASH_ENV || test "${BASH_ENV+set}" != set || { BASH_ENV=; export BASH_ENV; } $as_unset ENV || test "${ENV+set}" != set || { ENV=; export ENV; } CONFIG_SHELL=$as_dir/$as_base export CONFIG_SHELL exec "$CONFIG_SHELL" "$0" ${1+"$@"} fi;; esac done done ;; esac # Create $as_me.lineno as a copy of $as_myself, but with $LINENO # uniformly replaced by the line number. The first 'sed' inserts a # line-number line before each line; the second 'sed' does the real # work. The second script uses 'N' to pair each line-number line # with the numbered line, and appends trailing '-' during # substitution so that $LINENO is not a special case at line end. # (Raja R Harinath suggested sed '=', and Paul Eggert wrote the # second 'sed' script. Blame Lee E. McMahon for sed's syntax. :-) sed '=' <$as_myself | sed ' N s,$,-, : loop s,^\(['$as_cr_digits']*\)\(.*\)[$]LINENO\([^'$as_cr_alnum'_]\),\1\2\1\3, t loop s,-$,, s,^['$as_cr_digits']*\n,, ' >$as_me.lineno && chmod +x $as_me.lineno || { { echo "$as_me:$LINENO: error: cannot create $as_me.lineno; rerun with a POSIX shell" >&5 echo "$as_me: error: cannot create $as_me.lineno; rerun with a POSIX shell" >&2;} { (exit 1); exit 1; }; } # Don't try to exec as it changes $[0], causing all sort of problems # (the dirname of $[0] is not the place where we might find the # original and so on. Autoconf is especially sensible to this). . ./$as_me.lineno # Exit status is that of the last command. exit } case `echo "testing\c"; echo 1,2,3`,`echo -n testing; echo 1,2,3` in *c*,-n*) ECHO_N= ECHO_C=' ' ECHO_T=' ' ;; *c*,* ) ECHO_N=-n ECHO_C= ECHO_T= ;; *) ECHO_N= ECHO_C='\c' ECHO_T= ;; esac if expr a : '\(a\)' >/dev/null 2>&1; then as_expr=expr else as_expr=false fi rm -f conf$$ conf$$.exe conf$$.file echo >conf$$.file if ln -s conf$$.file conf$$ 2>/dev/null; then # We could just check for DJGPP; but this test a) works b) is more generic # and c) will remain valid once DJGPP supports symlinks (DJGPP 2.04). if test -f conf$$.exe; then # Don't use ln at all; we don't have any links as_ln_s='cp -p' else as_ln_s='ln -s' fi elif ln conf$$.file conf$$ 2>/dev/null; then as_ln_s=ln else as_ln_s='cp -p' fi rm -f conf$$ conf$$.exe conf$$.file if mkdir -p . 2>/dev/null; then as_mkdir_p=: else test -d ./-p && rmdir ./-p as_mkdir_p=false fi as_executable_p="test -f" # Sed expression to map a string onto a valid CPP name. as_tr_cpp="eval sed 'y%*$as_cr_letters%P$as_cr_LETTERS%;s%[^_$as_cr_alnum]%_%g'" # Sed expression to map a string onto a valid variable name. as_tr_sh="eval sed 'y%*+%pp%;s%[^_$as_cr_alnum]%_%g'" # IFS # We need space, tab and new line, in precisely that order. as_nl=' ' IFS=" $as_nl" # CDPATH. $as_unset CDPATH exec 6>&1 # Open the log real soon, to keep \$[0] and so on meaningful, and to # report actual input values of CONFIG_FILES etc. instead of their # values after options handling. Logging --version etc. is OK. exec 5>>config.log { echo sed 'h;s/./-/g;s/^.../## /;s/...$/ ##/;p;x;p;x' <<_ASBOX ## Running $as_me. ## _ASBOX } >&5 cat >&5 <<_CSEOF This file was extended by $as_me, which was generated by GNU Autoconf 2.59. Invocation command line was CONFIG_FILES = $CONFIG_FILES CONFIG_HEADERS = $CONFIG_HEADERS CONFIG_LINKS = $CONFIG_LINKS CONFIG_COMMANDS = $CONFIG_COMMANDS $ $0 $@ _CSEOF echo "on `(hostname || uname -n) 2>/dev/null | sed 1q`" >&5 echo >&5 _ACEOF # Files that config.status was made for. if test -n "$ac_config_files"; then echo "config_files=\"$ac_config_files\"" >>$CONFIG_STATUS fi if test -n "$ac_config_headers"; then echo "config_headers=\"$ac_config_headers\"" >>$CONFIG_STATUS fi if test -n "$ac_config_links"; then echo "config_links=\"$ac_config_links\"" >>$CONFIG_STATUS fi if test -n "$ac_config_commands"; then echo "config_commands=\"$ac_config_commands\"" >>$CONFIG_STATUS fi cat >>$CONFIG_STATUS <<\_ACEOF ac_cs_usage="\ \`$as_me' instantiates files from templates according to the current configuration. Usage: $0 [OPTIONS] [FILE]... -h, --help print this help, then exit -V, --version print version number, then exit -q, --quiet do not print progress messages -d, --debug don't remove temporary files --recheck update $as_me by reconfiguring in the same conditions --file=FILE[:TEMPLATE] instantiate the configuration file FILE --header=FILE[:TEMPLATE] instantiate the configuration header FILE Configuration files: $config_files Configuration headers: $config_headers Configuration commands: $config_commands Report bugs to ." _ACEOF cat >>$CONFIG_STATUS <<_ACEOF ac_cs_version="\\ config.status configured by $0, generated by GNU Autoconf 2.59, with options \\"`echo "$ac_configure_args" | sed 's/[\\""\`\$]/\\\\&/g'`\\" Copyright (C) 2003 Free Software Foundation, Inc. This config.status script is free software; the Free Software Foundation gives unlimited permission to copy, distribute and modify it." srcdir=$srcdir INSTALL="$INSTALL" _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF # If no file are specified by the user, then we need to provide default # value. By we need to know if files were specified by the user. ac_need_defaults=: while test $# != 0 do case $1 in --*=*) ac_option=`expr "x$1" : 'x\([^=]*\)='` ac_optarg=`expr "x$1" : 'x[^=]*=\(.*\)'` ac_shift=: ;; -*) ac_option=$1 ac_optarg=$2 ac_shift=shift ;; *) # This is not an option, so the user has probably given explicit # arguments. ac_option=$1 ac_need_defaults=false;; esac case $ac_option in # Handling of the options. _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF -recheck | --recheck | --rechec | --reche | --rech | --rec | --re | --r) ac_cs_recheck=: ;; --version | --vers* | -V ) echo "$ac_cs_version"; exit 0 ;; --he | --h) # Conflict between --help and --header { { echo "$as_me:$LINENO: error: ambiguous option: $1 Try \`$0 --help' for more information." >&5 echo "$as_me: error: ambiguous option: $1 Try \`$0 --help' for more information." >&2;} { (exit 1); exit 1; }; };; --help | --hel | -h ) echo "$ac_cs_usage"; exit 0 ;; --debug | --d* | -d ) debug=: ;; --file | --fil | --fi | --f ) $ac_shift CONFIG_FILES="$CONFIG_FILES $ac_optarg" ac_need_defaults=false;; --header | --heade | --head | --hea ) $ac_shift CONFIG_HEADERS="$CONFIG_HEADERS $ac_optarg" ac_need_defaults=false;; -q | -quiet | --quiet | --quie | --qui | --qu | --q \ | -silent | --silent | --silen | --sile | --sil | --si | --s) ac_cs_silent=: ;; # This is an error. -*) { { echo "$as_me:$LINENO: error: unrecognized option: $1 Try \`$0 --help' for more information." >&5 echo "$as_me: error: unrecognized option: $1 Try \`$0 --help' for more information." >&2;} { (exit 1); exit 1; }; } ;; *) ac_config_targets="$ac_config_targets $1" ;; esac shift done ac_configure_extra_args= if $ac_cs_silent; then exec 6>/dev/null ac_configure_extra_args="$ac_configure_extra_args --silent" fi _ACEOF cat >>$CONFIG_STATUS <<_ACEOF if \$ac_cs_recheck; then echo "running $SHELL $0 " $ac_configure_args \$ac_configure_extra_args " --no-create --no-recursion" >&6 exec $SHELL $0 $ac_configure_args \$ac_configure_extra_args --no-create --no-recursion fi _ACEOF cat >>$CONFIG_STATUS <<_ACEOF # # INIT-COMMANDS section. # AMDEP_TRUE="$AMDEP_TRUE" ac_aux_dir="$ac_aux_dir" _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF for ac_config_target in $ac_config_targets do case "$ac_config_target" in # Handling of arguments. "Makefile" ) CONFIG_FILES="$CONFIG_FILES Makefile" ;; "html/Makefile" ) CONFIG_FILES="$CONFIG_FILES html/Makefile" ;; "pod/Makefile" ) CONFIG_FILES="$CONFIG_FILES pod/Makefile" ;; "man/Makefile" ) CONFIG_FILES="$CONFIG_FILES man/Makefile" ;; "src/Makefile" ) CONFIG_FILES="$CONFIG_FILES src/Makefile" ;; "src/expat/Makefile" ) CONFIG_FILES="$CONFIG_FILES src/expat/Makefile" ;; "src/replace/Makefile" ) CONFIG_FILES="$CONFIG_FILES src/replace/Makefile" ;; "src/snowball/Makefile" ) CONFIG_FILES="$CONFIG_FILES src/snowball/Makefile" ;; "rpm/swish-e.spec" ) CONFIG_FILES="$CONFIG_FILES rpm/swish-e.spec" ;; "tests/Makefile" ) CONFIG_FILES="$CONFIG_FILES tests/Makefile" ;; "example/Makefile" ) CONFIG_FILES="$CONFIG_FILES example/Makefile" ;; "prog-bin/Makefile" ) CONFIG_FILES="$CONFIG_FILES prog-bin/Makefile" ;; "filters/Makefile" ) CONFIG_FILES="$CONFIG_FILES filters/Makefile" ;; "filters/SWISH/Makefile" ) CONFIG_FILES="$CONFIG_FILES filters/SWISH/Makefile" ;; "conf/Makefile" ) CONFIG_FILES="$CONFIG_FILES conf/Makefile" ;; "filter-bin/Makefile" ) CONFIG_FILES="$CONFIG_FILES filter-bin/Makefile" ;; "swish-e.pc" ) CONFIG_FILES="$CONFIG_FILES swish-e.pc" ;; "swish-config" ) CONFIG_FILES="$CONFIG_FILES swish-config" ;; "depfiles" ) CONFIG_COMMANDS="$CONFIG_COMMANDS depfiles" ;; "src/acconfig.h" ) CONFIG_HEADERS="$CONFIG_HEADERS src/acconfig.h" ;; *) { { echo "$as_me:$LINENO: error: invalid argument: $ac_config_target" >&5 echo "$as_me: error: invalid argument: $ac_config_target" >&2;} { (exit 1); exit 1; }; };; esac done # If the user did not use the arguments to specify the items to instantiate, # then the envvar interface is used. Set only those that are not. # We use the long form for the default assignment because of an extremely # bizarre bug on SunOS 4.1.3. if $ac_need_defaults; then test "${CONFIG_FILES+set}" = set || CONFIG_FILES=$config_files test "${CONFIG_HEADERS+set}" = set || CONFIG_HEADERS=$config_headers test "${CONFIG_COMMANDS+set}" = set || CONFIG_COMMANDS=$config_commands fi # Have a temporary directory for convenience. Make it in the build tree # simply because there is no reason to put it here, and in addition, # creating and moving files from /tmp can sometimes cause problems. # Create a temporary directory, and hook for its removal unless debugging. $debug || { trap 'exit_status=$?; rm -rf $tmp && exit $exit_status' 0 trap '{ (exit 1); exit 1; }' 1 2 13 15 } # Create a (secure) tmp directory for tmp files. { tmp=`(umask 077 && mktemp -d -q "./confstatXXXXXX") 2>/dev/null` && test -n "$tmp" && test -d "$tmp" } || { tmp=./confstat$$-$RANDOM (umask 077 && mkdir $tmp) } || { echo "$me: cannot create a temporary directory in ." >&2 { (exit 1); exit 1; } } _ACEOF cat >>$CONFIG_STATUS <<_ACEOF # # CONFIG_FILES section. # # No need to generate the scripts if there are no CONFIG_FILES. # This happens for instance when ./config.status config.h if test -n "\$CONFIG_FILES"; then # Protect against being on the right side of a sed subst in config.status. sed 's/,@/@@/; s/@,/@@/; s/,;t t\$/@;t t/; /@;t t\$/s/[\\\\&,]/\\\\&/g; s/@@/,@/; s/@@/@,/; s/@;t t\$/,;t t/' >\$tmp/subs.sed <<\\CEOF s,@SHELL@,$SHELL,;t t s,@PATH_SEPARATOR@,$PATH_SEPARATOR,;t t s,@PACKAGE_NAME@,$PACKAGE_NAME,;t t s,@PACKAGE_TARNAME@,$PACKAGE_TARNAME,;t t s,@PACKAGE_VERSION@,$PACKAGE_VERSION,;t t s,@PACKAGE_STRING@,$PACKAGE_STRING,;t t s,@PACKAGE_BUGREPORT@,$PACKAGE_BUGREPORT,;t t s,@exec_prefix@,$exec_prefix,;t t s,@prefix@,$prefix,;t t s,@program_transform_name@,$program_transform_name,;t t s,@bindir@,$bindir,;t t s,@sbindir@,$sbindir,;t t s,@libexecdir@,$libexecdir,;t t s,@datadir@,$datadir,;t t s,@sysconfdir@,$sysconfdir,;t t s,@sharedstatedir@,$sharedstatedir,;t t s,@localstatedir@,$localstatedir,;t t s,@libdir@,$libdir,;t t s,@includedir@,$includedir,;t t s,@oldincludedir@,$oldincludedir,;t t s,@infodir@,$infodir,;t t s,@mandir@,$mandir,;t t s,@build_alias@,$build_alias,;t t s,@host_alias@,$host_alias,;t t s,@target_alias@,$target_alias,;t t s,@DEFS@,$DEFS,;t t s,@ECHO_C@,$ECHO_C,;t t s,@ECHO_N@,$ECHO_N,;t t s,@ECHO_T@,$ECHO_T,;t t s,@LIBS@,$LIBS,;t t s,@BUILDDOCS_TRUE@,$BUILDDOCS_TRUE,;t t s,@BUILDDOCS_FALSE@,$BUILDDOCS_FALSE,;t t s,@INSTALLDOCS_TRUE@,$INSTALLDOCS_TRUE,;t t s,@INSTALLDOCS_FALSE@,$INSTALLDOCS_FALSE,;t t s,@SWISH_WEB@,$SWISH_WEB,;t t s,@INSTALL_PROGRAM@,$INSTALL_PROGRAM,;t t s,@INSTALL_SCRIPT@,$INSTALL_SCRIPT,;t t s,@INSTALL_DATA@,$INSTALL_DATA,;t t s,@CYGPATH_W@,$CYGPATH_W,;t t s,@PACKAGE@,$PACKAGE,;t t s,@VERSION@,$VERSION,;t t s,@ACLOCAL@,$ACLOCAL,;t t s,@AUTOCONF@,$AUTOCONF,;t t s,@AUTOMAKE@,$AUTOMAKE,;t t s,@AUTOHEADER@,$AUTOHEADER,;t t s,@MAKEINFO@,$MAKEINFO,;t t s,@install_sh@,$install_sh,;t t s,@STRIP@,$STRIP,;t t s,@ac_ct_STRIP@,$ac_ct_STRIP,;t t s,@INSTALL_STRIP_PROGRAM@,$INSTALL_STRIP_PROGRAM,;t t s,@mkdir_p@,$mkdir_p,;t t s,@AWK@,$AWK,;t t s,@SET_MAKE@,$SET_MAKE,;t t s,@am__leading_dot@,$am__leading_dot,;t t s,@AMTAR@,$AMTAR,;t t s,@am__tar@,$am__tar,;t t s,@am__untar@,$am__untar,;t t s,@CC@,$CC,;t t s,@CFLAGS@,$CFLAGS,;t t s,@LDFLAGS@,$LDFLAGS,;t t s,@CPPFLAGS@,$CPPFLAGS,;t t s,@ac_ct_CC@,$ac_ct_CC,;t t s,@EXEEXT@,$EXEEXT,;t t s,@OBJEXT@,$OBJEXT,;t t s,@DEPDIR@,$DEPDIR,;t t s,@am__include@,$am__include,;t t s,@am__quote@,$am__quote,;t t s,@AMDEP_TRUE@,$AMDEP_TRUE,;t t s,@AMDEP_FALSE@,$AMDEP_FALSE,;t t s,@AMDEPBACKSLASH@,$AMDEPBACKSLASH,;t t s,@CCDEPMODE@,$CCDEPMODE,;t t s,@am__fastdepCC_TRUE@,$am__fastdepCC_TRUE,;t t s,@am__fastdepCC_FALSE@,$am__fastdepCC_FALSE,;t t s,@build@,$build,;t t s,@build_cpu@,$build_cpu,;t t s,@build_vendor@,$build_vendor,;t t s,@build_os@,$build_os,;t t s,@host@,$host,;t t s,@host_cpu@,$host_cpu,;t t s,@host_vendor@,$host_vendor,;t t s,@host_os@,$host_os,;t t s,@EGREP@,$EGREP,;t t s,@LN_S@,$LN_S,;t t s,@ECHO@,$ECHO,;t t s,@AR@,$AR,;t t s,@ac_ct_AR@,$ac_ct_AR,;t t s,@RANLIB@,$RANLIB,;t t s,@ac_ct_RANLIB@,$ac_ct_RANLIB,;t t s,@DLLTOOL@,$DLLTOOL,;t t s,@ac_ct_DLLTOOL@,$ac_ct_DLLTOOL,;t t s,@AS@,$AS,;t t s,@ac_ct_AS@,$ac_ct_AS,;t t s,@OBJDUMP@,$OBJDUMP,;t t s,@ac_ct_OBJDUMP@,$ac_ct_OBJDUMP,;t t s,@CPP@,$CPP,;t t s,@CXX@,$CXX,;t t s,@CXXFLAGS@,$CXXFLAGS,;t t s,@ac_ct_CXX@,$ac_ct_CXX,;t t s,@CXXDEPMODE@,$CXXDEPMODE,;t t s,@am__fastdepCXX_TRUE@,$am__fastdepCXX_TRUE,;t t s,@am__fastdepCXX_FALSE@,$am__fastdepCXX_FALSE,;t t s,@CXXCPP@,$CXXCPP,;t t s,@F77@,$F77,;t t s,@FFLAGS@,$FFLAGS,;t t s,@ac_ct_F77@,$ac_ct_F77,;t t s,@LIBTOOL@,$LIBTOOL,;t t s,@MAINTAINER_MODE_TRUE@,$MAINTAINER_MODE_TRUE,;t t s,@MAINTAINER_MODE_FALSE@,$MAINTAINER_MODE_FALSE,;t t s,@MAINT@,$MAINT,;t t s,@PERL@,$PERL,;t t s,@POD2MAN@,$POD2MAN,;t t s,@ALLOCA@,$ALLOCA,;t t s,@LIBOBJS@,$LIBOBJS,;t t s,@XML2_CONFIG@,$XML2_CONFIG,;t t s,@LIBXML_REQUIRED_VERSION@,$LIBXML_REQUIRED_VERSION,;t t s,@LIBXML2_OBJS@,$LIBXML2_OBJS,;t t s,@LIBXML2_LIB@,$LIBXML2_LIB,;t t s,@LIBXML2_CFLAGS@,$LIBXML2_CFLAGS,;t t s,@BTREE_OBJS@,$BTREE_OBJS,;t t s,@Z_CFLAGS@,$Z_CFLAGS,;t t s,@Z_LIBS@,$Z_LIBS,;t t s,@PCRE_CONFIG@,$PCRE_CONFIG,;t t s,@PCRE_REQUIRED_VERSION@,$PCRE_REQUIRED_VERSION,;t t s,@PCRE_CFLAGS@,$PCRE_CFLAGS,;t t s,@PCRE_LIBS@,$PCRE_LIBS,;t t s,@LARGEFILES_MACROS@,$LARGEFILES_MACROS,;t t s,@LTLIBOBJS@,$LTLIBOBJS,;t t CEOF _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF # Split the substitutions into bite-sized pieces for seds with # small command number limits, like on Digital OSF/1 and HP-UX. ac_max_sed_lines=48 ac_sed_frag=1 # Number of current file. ac_beg=1 # First line for current file. ac_end=$ac_max_sed_lines # Line after last line for current file. ac_more_lines=: ac_sed_cmds= while $ac_more_lines; do if test $ac_beg -gt 1; then sed "1,${ac_beg}d; ${ac_end}q" $tmp/subs.sed >$tmp/subs.frag else sed "${ac_end}q" $tmp/subs.sed >$tmp/subs.frag fi if test ! -s $tmp/subs.frag; then ac_more_lines=false else # The purpose of the label and of the branching condition is to # speed up the sed processing (if there are no `@' at all, there # is no need to browse any of the substitutions). # These are the two extra sed commands mentioned above. (echo ':t /@[a-zA-Z_][a-zA-Z_0-9]*@/!b' && cat $tmp/subs.frag) >$tmp/subs-$ac_sed_frag.sed if test -z "$ac_sed_cmds"; then ac_sed_cmds="sed -f $tmp/subs-$ac_sed_frag.sed" else ac_sed_cmds="$ac_sed_cmds | sed -f $tmp/subs-$ac_sed_frag.sed" fi ac_sed_frag=`expr $ac_sed_frag + 1` ac_beg=$ac_end ac_end=`expr $ac_end + $ac_max_sed_lines` fi done if test -z "$ac_sed_cmds"; then ac_sed_cmds=cat fi fi # test -n "$CONFIG_FILES" _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF for ac_file in : $CONFIG_FILES; do test "x$ac_file" = x: && continue # Support "outfile[:infile[:infile...]]", defaulting infile="outfile.in". case $ac_file in - | *:- | *:-:* ) # input from stdin cat >$tmp/stdin ac_file_in=`echo "$ac_file" | sed 's,[^:]*:,,'` ac_file=`echo "$ac_file" | sed 's,:.*,,'` ;; *:* ) ac_file_in=`echo "$ac_file" | sed 's,[^:]*:,,'` ac_file=`echo "$ac_file" | sed 's,:.*,,'` ;; * ) ac_file_in=$ac_file.in ;; esac # Compute @srcdir@, @top_srcdir@, and @INSTALL@ for subdirectories. ac_dir=`(dirname "$ac_file") 2>/dev/null || $as_expr X"$ac_file" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$ac_file" : 'X\(//\)[^/]' \| \ X"$ac_file" : 'X\(//\)$' \| \ X"$ac_file" : 'X\(/\)' \| \ . : '\(.\)' 2>/dev/null || echo X"$ac_file" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/; q; } /^X\(\/\/\)[^/].*/{ s//\1/; q; } /^X\(\/\/\)$/{ s//\1/; q; } /^X\(\/\).*/{ s//\1/; q; } s/.*/./; q'` { if $as_mkdir_p; then mkdir -p "$ac_dir" else as_dir="$ac_dir" as_dirs= while test ! -d "$as_dir"; do as_dirs="$as_dir $as_dirs" as_dir=`(dirname "$as_dir") 2>/dev/null || $as_expr X"$as_dir" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$as_dir" : 'X\(//\)[^/]' \| \ X"$as_dir" : 'X\(//\)$' \| \ X"$as_dir" : 'X\(/\)' \| \ . : '\(.\)' 2>/dev/null || echo X"$as_dir" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/; q; } /^X\(\/\/\)[^/].*/{ s//\1/; q; } /^X\(\/\/\)$/{ s//\1/; q; } /^X\(\/\).*/{ s//\1/; q; } s/.*/./; q'` done test ! -n "$as_dirs" || mkdir $as_dirs fi || { { echo "$as_me:$LINENO: error: cannot create directory \"$ac_dir\"" >&5 echo "$as_me: error: cannot create directory \"$ac_dir\"" >&2;} { (exit 1); exit 1; }; }; } ac_builddir=. if test "$ac_dir" != .; then ac_dir_suffix=/`echo "$ac_dir" | sed 's,^\.[\\/],,'` # A "../" for each directory in $ac_dir_suffix. ac_top_builddir=`echo "$ac_dir_suffix" | sed 's,/[^\\/]*,../,g'` else ac_dir_suffix= ac_top_builddir= fi case $srcdir in .) # No --srcdir option. We are building in place. ac_srcdir=. if test -z "$ac_top_builddir"; then ac_top_srcdir=. else ac_top_srcdir=`echo $ac_top_builddir | sed 's,/$,,'` fi ;; [\\/]* | ?:[\\/]* ) # Absolute path. ac_srcdir=$srcdir$ac_dir_suffix; ac_top_srcdir=$srcdir ;; *) # Relative path. ac_srcdir=$ac_top_builddir$srcdir$ac_dir_suffix ac_top_srcdir=$ac_top_builddir$srcdir ;; esac # Do not use `cd foo && pwd` to compute absolute paths, because # the directories may not exist. case `pwd` in .) ac_abs_builddir="$ac_dir";; *) case "$ac_dir" in .) ac_abs_builddir=`pwd`;; [\\/]* | ?:[\\/]* ) ac_abs_builddir="$ac_dir";; *) ac_abs_builddir=`pwd`/"$ac_dir";; esac;; esac case $ac_abs_builddir in .) ac_abs_top_builddir=${ac_top_builddir}.;; *) case ${ac_top_builddir}. in .) ac_abs_top_builddir=$ac_abs_builddir;; [\\/]* | ?:[\\/]* ) ac_abs_top_builddir=${ac_top_builddir}.;; *) ac_abs_top_builddir=$ac_abs_builddir/${ac_top_builddir}.;; esac;; esac case $ac_abs_builddir in .) ac_abs_srcdir=$ac_srcdir;; *) case $ac_srcdir in .) ac_abs_srcdir=$ac_abs_builddir;; [\\/]* | ?:[\\/]* ) ac_abs_srcdir=$ac_srcdir;; *) ac_abs_srcdir=$ac_abs_builddir/$ac_srcdir;; esac;; esac case $ac_abs_builddir in .) ac_abs_top_srcdir=$ac_top_srcdir;; *) case $ac_top_srcdir in .) ac_abs_top_srcdir=$ac_abs_builddir;; [\\/]* | ?:[\\/]* ) ac_abs_top_srcdir=$ac_top_srcdir;; *) ac_abs_top_srcdir=$ac_abs_builddir/$ac_top_srcdir;; esac;; esac case $INSTALL in [\\/$]* | ?:[\\/]* ) ac_INSTALL=$INSTALL ;; *) ac_INSTALL=$ac_top_builddir$INSTALL ;; esac if test x"$ac_file" != x-; then { echo "$as_me:$LINENO: creating $ac_file" >&5 echo "$as_me: creating $ac_file" >&6;} rm -f "$ac_file" fi # Let's still pretend it is `configure' which instantiates (i.e., don't # use $as_me), people would be surprised to read: # /* config.h. Generated by config.status. */ if test x"$ac_file" = x-; then configure_input= else configure_input="$ac_file. " fi configure_input=$configure_input"Generated from `echo $ac_file_in | sed 's,.*/,,'` by configure." # First look for the input files in the build tree, otherwise in the # src tree. ac_file_inputs=`IFS=: for f in $ac_file_in; do case $f in -) echo $tmp/stdin ;; [\\/$]*) # Absolute (can't be DOS-style, as IFS=:) test -f "$f" || { { echo "$as_me:$LINENO: error: cannot find input file: $f" >&5 echo "$as_me: error: cannot find input file: $f" >&2;} { (exit 1); exit 1; }; } echo "$f";; *) # Relative if test -f "$f"; then # Build tree echo "$f" elif test -f "$srcdir/$f"; then # Source tree echo "$srcdir/$f" else # /dev/null tree { { echo "$as_me:$LINENO: error: cannot find input file: $f" >&5 echo "$as_me: error: cannot find input file: $f" >&2;} { (exit 1); exit 1; }; } fi;; esac done` || { (exit 1); exit 1; } _ACEOF cat >>$CONFIG_STATUS <<_ACEOF sed "$ac_vpsub $extrasub _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF :t /@[a-zA-Z_][a-zA-Z_0-9]*@/!b s,@configure_input@,$configure_input,;t t s,@srcdir@,$ac_srcdir,;t t s,@abs_srcdir@,$ac_abs_srcdir,;t t s,@top_srcdir@,$ac_top_srcdir,;t t s,@abs_top_srcdir@,$ac_abs_top_srcdir,;t t s,@builddir@,$ac_builddir,;t t s,@abs_builddir@,$ac_abs_builddir,;t t s,@top_builddir@,$ac_top_builddir,;t t s,@abs_top_builddir@,$ac_abs_top_builddir,;t t s,@INSTALL@,$ac_INSTALL,;t t " $ac_file_inputs | (eval "$ac_sed_cmds") >$tmp/out rm -f $tmp/stdin if test x"$ac_file" != x-; then mv $tmp/out $ac_file else cat $tmp/out rm -f $tmp/out fi done _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF # # CONFIG_HEADER section. # # These sed commands are passed to sed as "A NAME B NAME C VALUE D", where # NAME is the cpp macro being defined and VALUE is the value it is being given. # # ac_d sets the value in "#define NAME VALUE" lines. ac_dA='s,^\([ ]*\)#\([ ]*define[ ][ ]*\)' ac_dB='[ ].*$,\1#\2' ac_dC=' ' ac_dD=',;t' # ac_u turns "#undef NAME" without trailing blanks into "#define NAME VALUE". ac_uA='s,^\([ ]*\)#\([ ]*\)undef\([ ][ ]*\)' ac_uB='$,\1#\2define\3' ac_uC=' ' ac_uD=',;t' for ac_file in : $CONFIG_HEADERS; do test "x$ac_file" = x: && continue # Support "outfile[:infile[:infile...]]", defaulting infile="outfile.in". case $ac_file in - | *:- | *:-:* ) # input from stdin cat >$tmp/stdin ac_file_in=`echo "$ac_file" | sed 's,[^:]*:,,'` ac_file=`echo "$ac_file" | sed 's,:.*,,'` ;; *:* ) ac_file_in=`echo "$ac_file" | sed 's,[^:]*:,,'` ac_file=`echo "$ac_file" | sed 's,:.*,,'` ;; * ) ac_file_in=$ac_file.in ;; esac test x"$ac_file" != x- && { echo "$as_me:$LINENO: creating $ac_file" >&5 echo "$as_me: creating $ac_file" >&6;} # First look for the input files in the build tree, otherwise in the # src tree. ac_file_inputs=`IFS=: for f in $ac_file_in; do case $f in -) echo $tmp/stdin ;; [\\/$]*) # Absolute (can't be DOS-style, as IFS=:) test -f "$f" || { { echo "$as_me:$LINENO: error: cannot find input file: $f" >&5 echo "$as_me: error: cannot find input file: $f" >&2;} { (exit 1); exit 1; }; } # Do quote $f, to prevent DOS paths from being IFS'd. echo "$f";; *) # Relative if test -f "$f"; then # Build tree echo "$f" elif test -f "$srcdir/$f"; then # Source tree echo "$srcdir/$f" else # /dev/null tree { { echo "$as_me:$LINENO: error: cannot find input file: $f" >&5 echo "$as_me: error: cannot find input file: $f" >&2;} { (exit 1); exit 1; }; } fi;; esac done` || { (exit 1); exit 1; } # Remove the trailing spaces. sed 's/[ ]*$//' $ac_file_inputs >$tmp/in _ACEOF # Transform confdefs.h into two sed scripts, `conftest.defines' and # `conftest.undefs', that substitutes the proper values into # config.h.in to produce config.h. The first handles `#define' # templates, and the second `#undef' templates. # And first: Protect against being on the right side of a sed subst in # config.status. Protect against being in an unquoted here document # in config.status. rm -f conftest.defines conftest.undefs # Using a here document instead of a string reduces the quoting nightmare. # Putting comments in sed scripts is not portable. # # `end' is used to avoid that the second main sed command (meant for # 0-ary CPP macros) applies to n-ary macro definitions. # See the Autoconf documentation for `clear'. cat >confdef2sed.sed <<\_ACEOF s/[\\&,]/\\&/g s,[\\$`],\\&,g t clear : clear s,^[ ]*#[ ]*define[ ][ ]*\([^ (][^ (]*\)\(([^)]*)\)[ ]*\(.*\)$,${ac_dA}\1${ac_dB}\1\2${ac_dC}\3${ac_dD},gp t end s,^[ ]*#[ ]*define[ ][ ]*\([^ ][^ ]*\)[ ]*\(.*\)$,${ac_dA}\1${ac_dB}\1${ac_dC}\2${ac_dD},gp : end _ACEOF # If some macros were called several times there might be several times # the same #defines, which is useless. Nevertheless, we may not want to # sort them, since we want the *last* AC-DEFINE to be honored. uniq confdefs.h | sed -n -f confdef2sed.sed >conftest.defines sed 's/ac_d/ac_u/g' conftest.defines >conftest.undefs rm -f confdef2sed.sed # This sed command replaces #undef with comments. This is necessary, for # example, in the case of _POSIX_SOURCE, which is predefined and required # on some systems where configure will not decide to define it. cat >>conftest.undefs <<\_ACEOF s,^[ ]*#[ ]*undef[ ][ ]*[a-zA-Z_][a-zA-Z_0-9]*,/* & */, _ACEOF # Break up conftest.defines because some shells have a limit on the size # of here documents, and old seds have small limits too (100 cmds). echo ' # Handle all the #define templates only if necessary.' >>$CONFIG_STATUS echo ' if grep "^[ ]*#[ ]*define" $tmp/in >/dev/null; then' >>$CONFIG_STATUS echo ' # If there are no defines, we may have an empty if/fi' >>$CONFIG_STATUS echo ' :' >>$CONFIG_STATUS rm -f conftest.tail while grep . conftest.defines >/dev/null do # Write a limited-size here document to $tmp/defines.sed. echo ' cat >$tmp/defines.sed <>$CONFIG_STATUS # Speed up: don't consider the non `#define' lines. echo '/^[ ]*#[ ]*define/!b' >>$CONFIG_STATUS # Work around the forget-to-reset-the-flag bug. echo 't clr' >>$CONFIG_STATUS echo ': clr' >>$CONFIG_STATUS sed ${ac_max_here_lines}q conftest.defines >>$CONFIG_STATUS echo 'CEOF sed -f $tmp/defines.sed $tmp/in >$tmp/out rm -f $tmp/in mv $tmp/out $tmp/in ' >>$CONFIG_STATUS sed 1,${ac_max_here_lines}d conftest.defines >conftest.tail rm -f conftest.defines mv conftest.tail conftest.defines done rm -f conftest.defines echo ' fi # grep' >>$CONFIG_STATUS echo >>$CONFIG_STATUS # Break up conftest.undefs because some shells have a limit on the size # of here documents, and old seds have small limits too (100 cmds). echo ' # Handle all the #undef templates' >>$CONFIG_STATUS rm -f conftest.tail while grep . conftest.undefs >/dev/null do # Write a limited-size here document to $tmp/undefs.sed. echo ' cat >$tmp/undefs.sed <>$CONFIG_STATUS # Speed up: don't consider the non `#undef' echo '/^[ ]*#[ ]*undef/!b' >>$CONFIG_STATUS # Work around the forget-to-reset-the-flag bug. echo 't clr' >>$CONFIG_STATUS echo ': clr' >>$CONFIG_STATUS sed ${ac_max_here_lines}q conftest.undefs >>$CONFIG_STATUS echo 'CEOF sed -f $tmp/undefs.sed $tmp/in >$tmp/out rm -f $tmp/in mv $tmp/out $tmp/in ' >>$CONFIG_STATUS sed 1,${ac_max_here_lines}d conftest.undefs >conftest.tail rm -f conftest.undefs mv conftest.tail conftest.undefs done rm -f conftest.undefs cat >>$CONFIG_STATUS <<\_ACEOF # Let's still pretend it is `configure' which instantiates (i.e., don't # use $as_me), people would be surprised to read: # /* config.h. Generated by config.status. */ if test x"$ac_file" = x-; then echo "/* Generated by configure. */" >$tmp/config.h else echo "/* $ac_file. Generated by configure. */" >$tmp/config.h fi cat $tmp/in >>$tmp/config.h rm -f $tmp/in if test x"$ac_file" != x-; then if diff $ac_file $tmp/config.h >/dev/null 2>&1; then { echo "$as_me:$LINENO: $ac_file is unchanged" >&5 echo "$as_me: $ac_file is unchanged" >&6;} else ac_dir=`(dirname "$ac_file") 2>/dev/null || $as_expr X"$ac_file" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$ac_file" : 'X\(//\)[^/]' \| \ X"$ac_file" : 'X\(//\)$' \| \ X"$ac_file" : 'X\(/\)' \| \ . : '\(.\)' 2>/dev/null || echo X"$ac_file" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/; q; } /^X\(\/\/\)[^/].*/{ s//\1/; q; } /^X\(\/\/\)$/{ s//\1/; q; } /^X\(\/\).*/{ s//\1/; q; } s/.*/./; q'` { if $as_mkdir_p; then mkdir -p "$ac_dir" else as_dir="$ac_dir" as_dirs= while test ! -d "$as_dir"; do as_dirs="$as_dir $as_dirs" as_dir=`(dirname "$as_dir") 2>/dev/null || $as_expr X"$as_dir" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$as_dir" : 'X\(//\)[^/]' \| \ X"$as_dir" : 'X\(//\)$' \| \ X"$as_dir" : 'X\(/\)' \| \ . : '\(.\)' 2>/dev/null || echo X"$as_dir" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/; q; } /^X\(\/\/\)[^/].*/{ s//\1/; q; } /^X\(\/\/\)$/{ s//\1/; q; } /^X\(\/\).*/{ s//\1/; q; } s/.*/./; q'` done test ! -n "$as_dirs" || mkdir $as_dirs fi || { { echo "$as_me:$LINENO: error: cannot create directory \"$ac_dir\"" >&5 echo "$as_me: error: cannot create directory \"$ac_dir\"" >&2;} { (exit 1); exit 1; }; }; } rm -f $ac_file mv $tmp/config.h $ac_file fi else cat $tmp/config.h rm -f $tmp/config.h fi # Compute $ac_file's index in $config_headers. _am_stamp_count=1 for _am_header in $config_headers :; do case $_am_header in $ac_file | $ac_file:* ) break ;; * ) _am_stamp_count=`expr $_am_stamp_count + 1` ;; esac done echo "timestamp for $ac_file" >`(dirname $ac_file) 2>/dev/null || $as_expr X$ac_file : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X$ac_file : 'X\(//\)[^/]' \| \ X$ac_file : 'X\(//\)$' \| \ X$ac_file : 'X\(/\)' \| \ . : '\(.\)' 2>/dev/null || echo X$ac_file | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/; q; } /^X\(\/\/\)[^/].*/{ s//\1/; q; } /^X\(\/\/\)$/{ s//\1/; q; } /^X\(\/\).*/{ s//\1/; q; } s/.*/./; q'`/stamp-h$_am_stamp_count done _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF # # CONFIG_COMMANDS section. # for ac_file in : $CONFIG_COMMANDS; do test "x$ac_file" = x: && continue ac_dest=`echo "$ac_file" | sed 's,:.*,,'` ac_source=`echo "$ac_file" | sed 's,[^:]*:,,'` ac_dir=`(dirname "$ac_dest") 2>/dev/null || $as_expr X"$ac_dest" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$ac_dest" : 'X\(//\)[^/]' \| \ X"$ac_dest" : 'X\(//\)$' \| \ X"$ac_dest" : 'X\(/\)' \| \ . : '\(.\)' 2>/dev/null || echo X"$ac_dest" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/; q; } /^X\(\/\/\)[^/].*/{ s//\1/; q; } /^X\(\/\/\)$/{ s//\1/; q; } /^X\(\/\).*/{ s//\1/; q; } s/.*/./; q'` { if $as_mkdir_p; then mkdir -p "$ac_dir" else as_dir="$ac_dir" as_dirs= while test ! -d "$as_dir"; do as_dirs="$as_dir $as_dirs" as_dir=`(dirname "$as_dir") 2>/dev/null || $as_expr X"$as_dir" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$as_dir" : 'X\(//\)[^/]' \| \ X"$as_dir" : 'X\(//\)$' \| \ X"$as_dir" : 'X\(/\)' \| \ . : '\(.\)' 2>/dev/null || echo X"$as_dir" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/; q; } /^X\(\/\/\)[^/].*/{ s//\1/; q; } /^X\(\/\/\)$/{ s//\1/; q; } /^X\(\/\).*/{ s//\1/; q; } s/.*/./; q'` done test ! -n "$as_dirs" || mkdir $as_dirs fi || { { echo "$as_me:$LINENO: error: cannot create directory \"$ac_dir\"" >&5 echo "$as_me: error: cannot create directory \"$ac_dir\"" >&2;} { (exit 1); exit 1; }; }; } ac_builddir=. if test "$ac_dir" != .; then ac_dir_suffix=/`echo "$ac_dir" | sed 's,^\.[\\/],,'` # A "../" for each directory in $ac_dir_suffix. ac_top_builddir=`echo "$ac_dir_suffix" | sed 's,/[^\\/]*,../,g'` else ac_dir_suffix= ac_top_builddir= fi case $srcdir in .) # No --srcdir option. We are building in place. ac_srcdir=. if test -z "$ac_top_builddir"; then ac_top_srcdir=. else ac_top_srcdir=`echo $ac_top_builddir | sed 's,/$,,'` fi ;; [\\/]* | ?:[\\/]* ) # Absolute path. ac_srcdir=$srcdir$ac_dir_suffix; ac_top_srcdir=$srcdir ;; *) # Relative path. ac_srcdir=$ac_top_builddir$srcdir$ac_dir_suffix ac_top_srcdir=$ac_top_builddir$srcdir ;; esac # Do not use `cd foo && pwd` to compute absolute paths, because # the directories may not exist. case `pwd` in .) ac_abs_builddir="$ac_dir";; *) case "$ac_dir" in .) ac_abs_builddir=`pwd`;; [\\/]* | ?:[\\/]* ) ac_abs_builddir="$ac_dir";; *) ac_abs_builddir=`pwd`/"$ac_dir";; esac;; esac case $ac_abs_builddir in .) ac_abs_top_builddir=${ac_top_builddir}.;; *) case ${ac_top_builddir}. in .) ac_abs_top_builddir=$ac_abs_builddir;; [\\/]* | ?:[\\/]* ) ac_abs_top_builddir=${ac_top_builddir}.;; *) ac_abs_top_builddir=$ac_abs_builddir/${ac_top_builddir}.;; esac;; esac case $ac_abs_builddir in .) ac_abs_srcdir=$ac_srcdir;; *) case $ac_srcdir in .) ac_abs_srcdir=$ac_abs_builddir;; [\\/]* | ?:[\\/]* ) ac_abs_srcdir=$ac_srcdir;; *) ac_abs_srcdir=$ac_abs_builddir/$ac_srcdir;; esac;; esac case $ac_abs_builddir in .) ac_abs_top_srcdir=$ac_top_srcdir;; *) case $ac_top_srcdir in .) ac_abs_top_srcdir=$ac_abs_builddir;; [\\/]* | ?:[\\/]* ) ac_abs_top_srcdir=$ac_top_srcdir;; *) ac_abs_top_srcdir=$ac_abs_builddir/$ac_top_srcdir;; esac;; esac { echo "$as_me:$LINENO: executing $ac_dest commands" >&5 echo "$as_me: executing $ac_dest commands" >&6;} case $ac_dest in depfiles ) test x"$AMDEP_TRUE" != x"" || for mf in $CONFIG_FILES; do # Strip MF so we end up with the name of the file. mf=`echo "$mf" | sed -e 's/:.*$//'` # Check whether this is an Automake generated Makefile or not. # We used to match only the files named `Makefile.in', but # some people rename them; so instead we look at the file content. # Grep'ing the first line is not enough: some people post-process # each Makefile.in and add a new line on top of each file to say so. # So let's grep whole file. if grep '^#.*generated by automake' $mf > /dev/null 2>&1; then dirpart=`(dirname "$mf") 2>/dev/null || $as_expr X"$mf" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$mf" : 'X\(//\)[^/]' \| \ X"$mf" : 'X\(//\)$' \| \ X"$mf" : 'X\(/\)' \| \ . : '\(.\)' 2>/dev/null || echo X"$mf" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/; q; } /^X\(\/\/\)[^/].*/{ s//\1/; q; } /^X\(\/\/\)$/{ s//\1/; q; } /^X\(\/\).*/{ s//\1/; q; } s/.*/./; q'` else continue fi # Extract the definition of DEPDIR, am__include, and am__quote # from the Makefile without running `make'. DEPDIR=`sed -n 's/^DEPDIR = //p' < "$mf"` test -z "$DEPDIR" && continue am__include=`sed -n 's/^am__include = //p' < "$mf"` test -z "am__include" && continue am__quote=`sed -n 's/^am__quote = //p' < "$mf"` # When using ansi2knr, U may be empty or an underscore; expand it U=`sed -n 's/^U = //p' < "$mf"` # Find all dependency output files, they are included files with # $(DEPDIR) in their names. We invoke sed twice because it is the # simplest approach to changing $(DEPDIR) to its actual value in the # expansion. for file in `sed -n " s/^$am__include $am__quote\(.*(DEPDIR).*\)$am__quote"'$/\1/p' <"$mf" | \ sed -e 's/\$(DEPDIR)/'"$DEPDIR"'/g' -e 's/\$U/'"$U"'/g'`; do # Make sure the directory exists. test -f "$dirpart/$file" && continue fdir=`(dirname "$file") 2>/dev/null || $as_expr X"$file" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$file" : 'X\(//\)[^/]' \| \ X"$file" : 'X\(//\)$' \| \ X"$file" : 'X\(/\)' \| \ . : '\(.\)' 2>/dev/null || echo X"$file" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/; q; } /^X\(\/\/\)[^/].*/{ s//\1/; q; } /^X\(\/\/\)$/{ s//\1/; q; } /^X\(\/\).*/{ s//\1/; q; } s/.*/./; q'` { if $as_mkdir_p; then mkdir -p $dirpart/$fdir else as_dir=$dirpart/$fdir as_dirs= while test ! -d "$as_dir"; do as_dirs="$as_dir $as_dirs" as_dir=`(dirname "$as_dir") 2>/dev/null || $as_expr X"$as_dir" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$as_dir" : 'X\(//\)[^/]' \| \ X"$as_dir" : 'X\(//\)$' \| \ X"$as_dir" : 'X\(/\)' \| \ . : '\(.\)' 2>/dev/null || echo X"$as_dir" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/; q; } /^X\(\/\/\)[^/].*/{ s//\1/; q; } /^X\(\/\/\)$/{ s//\1/; q; } /^X\(\/\).*/{ s//\1/; q; } s/.*/./; q'` done test ! -n "$as_dirs" || mkdir $as_dirs fi || { { echo "$as_me:$LINENO: error: cannot create directory $dirpart/$fdir" >&5 echo "$as_me: error: cannot create directory $dirpart/$fdir" >&2;} { (exit 1); exit 1; }; }; } # echo "creating $dirpart/$file" echo '# dummy' > "$dirpart/$file" done done ;; esac done _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF { (exit 0); exit 0; } _ACEOF chmod +x $CONFIG_STATUS ac_clean_files=$ac_clean_files_save # configure is writing to config.log, and then calls config.status. # config.status does its own redirection, appending to config.log. # Unfortunately, on DOS this fails, as config.log is still kept open # by configure, so config.status won't be able to write to it; its # output is simply discarded. So we exec the FD to /dev/null, # effectively closing config.log, so it can be properly (re)opened and # appended to by config.status. When coming back to configure, we # need to make the FD available again. if test "$no_create" != yes; then ac_cs_success=: ac_config_status_args= test "$silent" = yes && ac_config_status_args="$ac_config_status_args --quiet" exec 5>/dev/null $SHELL $CONFIG_STATUS $ac_config_status_args || ac_cs_success=false exec 5>>config.log # Use ||, not &&, to avoid exiting from the if with $? = 1, which # would make configure fail if this is the last instruction. $ac_cs_success || { (exit 1); exit 1; } fi swish-e-2.4.7/swish-config.in0000664000077100017500000000277411166010113012760 00000000000000#! /bin/sh prefix=@prefix@ exec_prefix=@exec_prefix@ includedir=@includedir@ libdir=@libdir@ usage() { cat <, Thu, 15 May 2003 12:06:51 -0700 swish-e-2.4.7/debian/swish-e.substvars0000664000077100017500000000011511166010102014570 00000000000000shlibs:Depends=libc6 (>= 2.3.2-1), libxml2 (>= 2.5.7-1), zlib1g (>= 1:1.1.4) swish-e-2.4.7/debian/rules0000775000077100017500000000444011166010102012320 00000000000000#!/usr/bin/make -f # Sample debian/rules that uses debhelper. # GNU copyright 1997 to 1999 by Joey Hess. # Uncomment this to turn on verbose mode. #export DH_VERBOSE=1 # These are used for cross-compiling and for saving the configure script # from having to guess our platform (since we know it already) DEB_HOST_GNU_TYPE ?= $(shell dpkg-architecture -qDEB_HOST_GNU_TYPE) DEB_BUILD_GNU_TYPE ?= $(shell dpkg-architecture -qDEB_BUILD_GNU_TYPE) CFLAGS = -Wall -g ifneq (,$(findstring noopt,$(DEB_BUILD_OPTIONS))) CFLAGS += -O0 else CFLAGS += -O2 endif ifeq (,$(findstring nostrip,$(DEB_BUILD_OPTIONS))) INSTALL_PROGRAM += -s endif config.status: configure dh_testdir # Add here commands to configure the package. ./configure --host=$(DEB_HOST_GNU_TYPE) --build=$(DEB_BUILD_GNU_TYPE) --prefix=/usr --mandir=\$${prefix}/share/man --infodir=\$${prefix}/share/info build: build-stamp build-stamp: config.status dh_testdir # Add here commands to compile the package. $(MAKE) #/usr/bin/docbook-to-man debian/swish-e.sgml > swish-e.1 touch build-stamp clean: dh_testdir dh_testroot rm -f build-stamp # Add here commands to clean up after the build process. -$(MAKE) distclean ifneq "$(wildcard /usr/share/misc/config.sub)" "" cp -f /usr/share/misc/config.sub config.sub endif ifneq "$(wildcard /usr/share/misc/config.guess)" "" cp -f /usr/share/misc/config.guess config.guess endif dh_clean install: build dh_testdir dh_testroot dh_clean -k dh_installdirs # Add here commands to install the package into debian/swish-e. $(MAKE) install DESTDIR=$(CURDIR)/debian/swish-e # Build architecture-independent files here. binary-indep: build install # We have nothing to do by default. # Build architecture-dependent files here. binary-arch: build install dh_testdir dh_testroot dh_installchangelogs dh_installdocs dh_installexamples # dh_install # dh_installmenu # dh_installdebconf # dh_installlogrotate # dh_installemacsen # dh_installpam # dh_installmime # dh_installinit # dh_installcron # dh_installinfo dh_installman dh_link dh_strip dh_compress dh_fixperms # dh_perl # dh_python dh_makeshlibs dh_installdeb dh_shlibdeps -ldebian/swish-e/usr/lib dh_gencontrol dh_md5sums dh_builddeb binary: binary-indep binary-arch .PHONY: build clean binary-indep binary-arch binary install swish-e-2.4.7/debian/changelog0000664000077100017500000000024111166010102013105 00000000000000swish-e (2.4.0-0) unstable; urgency=low * Addition of Debian build system into swish-e -- Bill Moseley Thu, 15 May 2003 12:06:51 -0700 swish-e-2.4.7/debian/copyright0000664000077100017500000000052511166010102013173 00000000000000This package was debianized by Bill Moseley on Thu, 15 May 2003 12:06:51 -0700. It was downloaded from http://swish-e.org/ Copyright: SWISH is Copyright (c) 2003, 2002 Free Software Foundation, Inc. SWISH-E is distributed with no warranty under the terms of the GNU Public License. See /usr/share/common-licenses/GPL. swish-e-2.4.7/debian/control0000664000077100017500000000125011166010102012637 00000000000000Source: swish-e Section: web Priority: optional Maintainer: Bill Moseley Standards-Version: 3.5.8 Build-Depends: debhelper (>= 4.0.1), zlib1g-dev, perl, libxml2-dev (>= 2.4.19) Package: swish-e Architecture: any Depends: ${shlibs:Depends} Recommends: libmime-types-perl, libhtml-parser-perl, libwww-perl Description: Simple Web Indexing System for Humans SWISH-Enhanced is a fast, powerful, flexible, and easy to use system for indexing collections of HTML Web pages, XML or other text files. Key features include the ability to limit searches to certain HTML tags (META, TITLE, comments, etc.). . See the Swish-e Website for details: http://swish-e.org swish-e-2.4.7/debian/swish-e.doc-base0000664000077100017500000000066111166010102014217 00000000000000Document: swish-e Title: Swish-e documentation Author: Abstract: SWISH-Enhanced is a fast, powerful, flexible, free, and easy to use system for indexing collections of Web pages, XML or other text files. Key features include the ability to limit searches to certain HTML tags (META, TITLE, comments, etc.). Section: Apps/Text Format: HTML Index: /usr/share/doc/swish-e/html/index.html Files: /usr/share/doc/swish-e/html/*.html swish-e-2.4.7/debian/files0000664000077100017500000000004611166010102012263 00000000000000swish-e_2.4.0-0_i386.deb web optional swish-e-2.4.7/debian/compat0000664000077100017500000000000211166010102012434 000000000000004 swish-e-2.4.7/Makefile.in0000664000077100017500000006151711166010113012075 00000000000000# Makefile.in generated by automake 1.9.6 from Makefile.am. # @configure_input@ # Copyright (C) 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, # 2003, 2004, 2005 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. @SET_MAKE@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = . am__cd = CDPATH="$${ZSH_VERSION+.}$(PATH_SEPARATOR)" && cd INSTALL = @INSTALL@ install_sh_DATA = $(install_sh) -c -m 644 install_sh_PROGRAM = $(install_sh) -c install_sh_SCRIPT = $(install_sh) -c INSTALL_HEADER = $(INSTALL_DATA) transform = $(program_transform_name) NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : build_triplet = @build@ host_triplet = @host@ DIST_COMMON = README $(am__configure_deps) $(srcdir)/Makefile.am \ $(srcdir)/Makefile.in $(srcdir)/swish-config.in \ $(srcdir)/swish-e.pc.in $(top_srcdir)/configure \ $(top_srcdir)/rpm/swish-e.spec.in COPYING INSTALL TODO \ config/compile config/config.guess config/config.sub \ config/depcomp config/install-sh config/ltmain.sh \ config/missing config/mkinstalldirs subdir = . ACLOCAL_M4 = $(top_srcdir)/aclocal.m4 am__aclocal_m4_deps = $(top_srcdir)/config/acinclude.m4 \ $(top_srcdir)/configure.in am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \ $(ACLOCAL_M4) am__CONFIG_DISTCLEAN_FILES = config.status config.cache config.log \ configure.lineno configure.status.lineno mkinstalldirs = $(SHELL) $(top_srcdir)/config/mkinstalldirs CONFIG_HEADER = $(top_builddir)/src/acconfig.h CONFIG_CLEAN_FILES = rpm/swish-e.spec swish-e.pc swish-config am__installdirs = "$(DESTDIR)$(bindir)" "$(DESTDIR)$(docdir)" \ "$(DESTDIR)$(pkgconfigdir)" binSCRIPT_INSTALL = $(INSTALL_SCRIPT) SCRIPTS = $(bin_SCRIPTS) SOURCES = DIST_SOURCES = RECURSIVE_TARGETS = all-recursive check-recursive dvi-recursive \ html-recursive info-recursive install-data-recursive \ install-exec-recursive install-info-recursive \ install-recursive installcheck-recursive installdirs-recursive \ pdf-recursive ps-recursive uninstall-info-recursive \ uninstall-recursive am__vpath_adj_setup = srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; am__vpath_adj = case $$p in \ $(srcdir)/*) f=`echo "$$p" | sed "s|^$$srcdirstrip/||"`;; \ *) f=$$p;; \ esac; am__strip_dir = `echo $$p | sed -e 's|^.*/||'`; docDATA_INSTALL = $(INSTALL_DATA) pkgconfigDATA_INSTALL = $(INSTALL_DATA) DATA = $(doc_DATA) $(pkgconfig_DATA) ETAGS = etags CTAGS = ctags DIST_SUBDIRS = $(SUBDIRS) DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST) distdir = $(PACKAGE)-$(VERSION) top_distdir = $(distdir) am__remove_distdir = \ { test ! -d $(distdir) \ || { find $(distdir) -type d ! -perm -200 -exec chmod u+w {} ';' \ && rm -fr $(distdir); }; } DIST_ARCHIVES = $(distdir).tar.gz GZIP_ENV = --best distuninstallcheck_listfiles = find . -type f -print distcleancheck_listfiles = find . -type f -print ACLOCAL = @ACLOCAL@ ALLOCA = @ALLOCA@ AMDEP_FALSE = @AMDEP_FALSE@ AMDEP_TRUE = @AMDEP_TRUE@ AMTAR = @AMTAR@ AR = @AR@ AS = @AS@ AUTOCONF = @AUTOCONF@ AUTOHEADER = @AUTOHEADER@ AUTOMAKE = @AUTOMAKE@ AWK = @AWK@ BTREE_OBJS = @BTREE_OBJS@ BUILDDOCS_FALSE = @BUILDDOCS_FALSE@ BUILDDOCS_TRUE = @BUILDDOCS_TRUE@ CC = @CC@ CCDEPMODE = @CCDEPMODE@ CFLAGS = @CFLAGS@ CPP = @CPP@ CPPFLAGS = @CPPFLAGS@ CXX = @CXX@ CXXCPP = @CXXCPP@ CXXDEPMODE = @CXXDEPMODE@ CXXFLAGS = @CXXFLAGS@ CYGPATH_W = @CYGPATH_W@ DEFS = @DEFS@ DEPDIR = @DEPDIR@ DLLTOOL = @DLLTOOL@ ECHO = @ECHO@ ECHO_C = @ECHO_C@ ECHO_N = @ECHO_N@ ECHO_T = @ECHO_T@ EGREP = @EGREP@ EXEEXT = @EXEEXT@ F77 = @F77@ FFLAGS = @FFLAGS@ INSTALLDOCS_FALSE = @INSTALLDOCS_FALSE@ INSTALLDOCS_TRUE = @INSTALLDOCS_TRUE@ INSTALL_DATA = @INSTALL_DATA@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@ LARGEFILES_MACROS = @LARGEFILES_MACROS@ LDFLAGS = @LDFLAGS@ LIBOBJS = @LIBOBJS@ LIBS = @LIBS@ LIBTOOL = @LIBTOOL@ LIBXML2_CFLAGS = @LIBXML2_CFLAGS@ LIBXML2_LIB = @LIBXML2_LIB@ LIBXML2_OBJS = @LIBXML2_OBJS@ LIBXML_REQUIRED_VERSION = @LIBXML_REQUIRED_VERSION@ LN_S = @LN_S@ LTLIBOBJS = @LTLIBOBJS@ MAINT = @MAINT@ MAINTAINER_MODE_FALSE = @MAINTAINER_MODE_FALSE@ MAINTAINER_MODE_TRUE = @MAINTAINER_MODE_TRUE@ MAKEINFO = @MAKEINFO@ OBJDUMP = @OBJDUMP@ OBJEXT = @OBJEXT@ PACKAGE = @PACKAGE@ PACKAGE_BUGREPORT = @PACKAGE_BUGREPORT@ PACKAGE_NAME = @PACKAGE_NAME@ PACKAGE_STRING = @PACKAGE_STRING@ PACKAGE_TARNAME = @PACKAGE_TARNAME@ PACKAGE_VERSION = @PACKAGE_VERSION@ PATH_SEPARATOR = @PATH_SEPARATOR@ PCRE_CFLAGS = @PCRE_CFLAGS@ PCRE_CONFIG = @PCRE_CONFIG@ PCRE_LIBS = @PCRE_LIBS@ PCRE_REQUIRED_VERSION = @PCRE_REQUIRED_VERSION@ PERL = @PERL@ POD2MAN = @POD2MAN@ RANLIB = @RANLIB@ SET_MAKE = @SET_MAKE@ SHELL = @SHELL@ STRIP = @STRIP@ SWISH_WEB = @SWISH_WEB@ VERSION = @VERSION@ XML2_CONFIG = @XML2_CONFIG@ Z_CFLAGS = @Z_CFLAGS@ Z_LIBS = @Z_LIBS@ ac_ct_AR = @ac_ct_AR@ ac_ct_AS = @ac_ct_AS@ ac_ct_CC = @ac_ct_CC@ ac_ct_CXX = @ac_ct_CXX@ ac_ct_DLLTOOL = @ac_ct_DLLTOOL@ ac_ct_F77 = @ac_ct_F77@ ac_ct_OBJDUMP = @ac_ct_OBJDUMP@ ac_ct_RANLIB = @ac_ct_RANLIB@ ac_ct_STRIP = @ac_ct_STRIP@ am__fastdepCC_FALSE = @am__fastdepCC_FALSE@ am__fastdepCC_TRUE = @am__fastdepCC_TRUE@ am__fastdepCXX_FALSE = @am__fastdepCXX_FALSE@ am__fastdepCXX_TRUE = @am__fastdepCXX_TRUE@ am__include = @am__include@ am__leading_dot = @am__leading_dot@ am__quote = @am__quote@ am__tar = @am__tar@ am__untar = @am__untar@ bindir = @bindir@ build = @build@ build_alias = @build_alias@ build_cpu = @build_cpu@ build_os = @build_os@ build_vendor = @build_vendor@ datadir = @datadir@ exec_prefix = @exec_prefix@ host = @host@ host_alias = @host_alias@ host_cpu = @host_cpu@ host_os = @host_os@ host_vendor = @host_vendor@ includedir = @includedir@ infodir = @infodir@ install_sh = @install_sh@ libdir = @libdir@ libexecdir = @libexecdir@ localstatedir = @localstatedir@ mandir = @mandir@ mkdir_p = @mkdir_p@ oldincludedir = @oldincludedir@ prefix = @prefix@ program_transform_name = @program_transform_name@ sbindir = @sbindir@ sharedstatedir = @sharedstatedir@ sysconfdir = @sysconfdir@ target_alias = @target_alias@ AUTOMAKE_OPTIONS = foreign SUBDIRS = filters prog-bin conf filter-bin example html man src tests pod docdir = $(datadir)/doc/$(PACKAGE) # Install these three in the doc directory # INSTALL and README are built at make time from .pod source doc_DATA = \ $(srcdir)/INSTALL \ $(srcdir)/README \ README.cvs config_dir = \ config/config.guess \ config/config.sub \ config/install-sh \ config/ltmain.sh \ config/missing \ config/mkinstalldirs perl_dir = \ perl/Changes \ perl/MANIFEST \ perl/README \ perl/Makefile.PL \ perl/Makefile.mingw \ perl/API.pm \ perl/API.xs \ perl/typemap \ perl/t/test.t \ perl/t/dummy.t \ perl/t/test.conf \ perl/t/first.html \ perl/t/second.html \ perl/t/third.html vms_dir = \ src/vms/acconfig.h_vms \ src/vms/build_swish-e.com \ src/vms/config.h \ src/vms/descrip_axp.mms \ src/vms/descrip_libxml2.mms \ src/vms/descrip_vax.mms \ src/vms/libtest.opt \ src/vms/readme_vms.txt \ src/vms/regex.c \ src/vms/regex.h \ src/vms/regexpr.h \ src/vms/swish.opt win32_dir = \ src/win32/acconfig.h \ src/win32/dirent.c \ src/win32/dirent.h \ src/win32/libswishe.dsp \ src/win32/libswishindex.dsp \ src/win32/swishe.dsp \ src/win32/swishe.dsw \ src/win32/release.nsi \ src/win32/filebase.nsh \ src/win32/fixperl.pl \ src/win32/build-perl.bat \ src/win32/build.sh \ src/win32/dist.sh \ src/worddata.c \ src/worddata.h rpm_dir = \ rpm/swish-e.spec.in \ rpm/swish-e.xpm debian_dir = \ debian/README.Debian \ debian/changelog \ debian/compat \ debian/control \ debian/copyright \ debian/files \ debian/rules \ debian/swish-e.doc-base \ debian/swish-e.substvars bin_SCRIPTS = swish-config pkgconfigdir = $(libdir)/pkgconfig pkgconfig_DATA = swish-e.pc EXTRA_DIST = \ $(config_dir) $(perl_dir) \ $(vms_dir) $(win32_dir) $(rpm_dir) \ $(doc_DATA) swish-config.in swish-e.pc.in\ $(debian_dir) all: all-recursive .SUFFIXES: am--refresh: @: $(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am $(am__configure_deps) @for dep in $?; do \ case '$(am__configure_deps)' in \ *$$dep*) \ echo ' cd $(srcdir) && $(AUTOMAKE) --foreign '; \ cd $(srcdir) && $(AUTOMAKE) --foreign \ && exit 0; \ exit 1;; \ esac; \ done; \ echo ' cd $(top_srcdir) && $(AUTOMAKE) --foreign Makefile'; \ cd $(top_srcdir) && \ $(AUTOMAKE) --foreign Makefile .PRECIOUS: Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status @case '$?' in \ *config.status*) \ echo ' $(SHELL) ./config.status'; \ $(SHELL) ./config.status;; \ *) \ echo ' cd $(top_builddir) && $(SHELL) ./config.status $@ $(am__depfiles_maybe)'; \ cd $(top_builddir) && $(SHELL) ./config.status $@ $(am__depfiles_maybe);; \ esac; $(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES) $(SHELL) ./config.status --recheck $(top_srcdir)/configure: @MAINTAINER_MODE_TRUE@ $(am__configure_deps) cd $(srcdir) && $(AUTOCONF) $(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps) cd $(srcdir) && $(ACLOCAL) $(ACLOCAL_AMFLAGS) rpm/swish-e.spec: $(top_builddir)/config.status $(top_srcdir)/rpm/swish-e.spec.in cd $(top_builddir) && $(SHELL) ./config.status $@ swish-e.pc: $(top_builddir)/config.status $(srcdir)/swish-e.pc.in cd $(top_builddir) && $(SHELL) ./config.status $@ swish-config: $(top_builddir)/config.status $(srcdir)/swish-config.in cd $(top_builddir) && $(SHELL) ./config.status $@ install-binSCRIPTS: $(bin_SCRIPTS) @$(NORMAL_INSTALL) test -z "$(bindir)" || $(mkdir_p) "$(DESTDIR)$(bindir)" @list='$(bin_SCRIPTS)'; for p in $$list; do \ if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \ if test -f $$d$$p; then \ f=`echo "$$p" | sed 's|^.*/||;$(transform)'`; \ echo " $(binSCRIPT_INSTALL) '$$d$$p' '$(DESTDIR)$(bindir)/$$f'"; \ $(binSCRIPT_INSTALL) "$$d$$p" "$(DESTDIR)$(bindir)/$$f"; \ else :; fi; \ done uninstall-binSCRIPTS: @$(NORMAL_UNINSTALL) @list='$(bin_SCRIPTS)'; for p in $$list; do \ f=`echo "$$p" | sed 's|^.*/||;$(transform)'`; \ echo " rm -f '$(DESTDIR)$(bindir)/$$f'"; \ rm -f "$(DESTDIR)$(bindir)/$$f"; \ done mostlyclean-libtool: -rm -f *.lo clean-libtool: -rm -rf .libs _libs distclean-libtool: -rm -f libtool uninstall-info-am: install-docDATA: $(doc_DATA) @$(NORMAL_INSTALL) test -z "$(docdir)" || $(mkdir_p) "$(DESTDIR)$(docdir)" @list='$(doc_DATA)'; for p in $$list; do \ if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \ f=$(am__strip_dir) \ echo " $(docDATA_INSTALL) '$$d$$p' '$(DESTDIR)$(docdir)/$$f'"; \ $(docDATA_INSTALL) "$$d$$p" "$(DESTDIR)$(docdir)/$$f"; \ done uninstall-docDATA: @$(NORMAL_UNINSTALL) @list='$(doc_DATA)'; for p in $$list; do \ f=$(am__strip_dir) \ echo " rm -f '$(DESTDIR)$(docdir)/$$f'"; \ rm -f "$(DESTDIR)$(docdir)/$$f"; \ done install-pkgconfigDATA: $(pkgconfig_DATA) @$(NORMAL_INSTALL) test -z "$(pkgconfigdir)" || $(mkdir_p) "$(DESTDIR)$(pkgconfigdir)" @list='$(pkgconfig_DATA)'; for p in $$list; do \ if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \ f=$(am__strip_dir) \ echo " $(pkgconfigDATA_INSTALL) '$$d$$p' '$(DESTDIR)$(pkgconfigdir)/$$f'"; \ $(pkgconfigDATA_INSTALL) "$$d$$p" "$(DESTDIR)$(pkgconfigdir)/$$f"; \ done uninstall-pkgconfigDATA: @$(NORMAL_UNINSTALL) @list='$(pkgconfig_DATA)'; for p in $$list; do \ f=$(am__strip_dir) \ echo " rm -f '$(DESTDIR)$(pkgconfigdir)/$$f'"; \ rm -f "$(DESTDIR)$(pkgconfigdir)/$$f"; \ done # This directory's subdirectories are mostly independent; you can cd # into them and run `make' without going through this Makefile. # To change the values of `make' variables: instead of editing Makefiles, # (1) if the variable is set in `config.status', edit `config.status' # (which will cause the Makefiles to be regenerated when you run `make'); # (2) otherwise, pass the desired values on the `make' command line. $(RECURSIVE_TARGETS): @failcom='exit 1'; \ for f in x $$MAKEFLAGS; do \ case $$f in \ *=* | --[!k]*);; \ *k*) failcom='fail=yes';; \ esac; \ done; \ dot_seen=no; \ target=`echo $@ | sed s/-recursive//`; \ list='$(SUBDIRS)'; for subdir in $$list; do \ echo "Making $$target in $$subdir"; \ if test "$$subdir" = "."; then \ dot_seen=yes; \ local_target="$$target-am"; \ else \ local_target="$$target"; \ fi; \ (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \ || eval $$failcom; \ done; \ if test "$$dot_seen" = "no"; then \ $(MAKE) $(AM_MAKEFLAGS) "$$target-am" || exit 1; \ fi; test -z "$$fail" mostlyclean-recursive clean-recursive distclean-recursive \ maintainer-clean-recursive: @failcom='exit 1'; \ for f in x $$MAKEFLAGS; do \ case $$f in \ *=* | --[!k]*);; \ *k*) failcom='fail=yes';; \ esac; \ done; \ dot_seen=no; \ case "$@" in \ distclean-* | maintainer-clean-*) list='$(DIST_SUBDIRS)' ;; \ *) list='$(SUBDIRS)' ;; \ esac; \ rev=''; for subdir in $$list; do \ if test "$$subdir" = "."; then :; else \ rev="$$subdir $$rev"; \ fi; \ done; \ rev="$$rev ."; \ target=`echo $@ | sed s/-recursive//`; \ for subdir in $$rev; do \ echo "Making $$target in $$subdir"; \ if test "$$subdir" = "."; then \ local_target="$$target-am"; \ else \ local_target="$$target"; \ fi; \ (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \ || eval $$failcom; \ done && test -z "$$fail" tags-recursive: list='$(SUBDIRS)'; for subdir in $$list; do \ test "$$subdir" = . || (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) tags); \ done ctags-recursive: list='$(SUBDIRS)'; for subdir in $$list; do \ test "$$subdir" = . || (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) ctags); \ done ID: $(HEADERS) $(SOURCES) $(LISP) $(TAGS_FILES) list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \ unique=`for i in $$list; do \ if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \ done | \ $(AWK) ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ mkid -fID $$unique tags: TAGS TAGS: tags-recursive $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) \ $(TAGS_FILES) $(LISP) tags=; \ here=`pwd`; \ if ($(ETAGS) --etags-include --version) >/dev/null 2>&1; then \ include_option=--etags-include; \ empty_fix=.; \ else \ include_option=--include; \ empty_fix=; \ fi; \ list='$(SUBDIRS)'; for subdir in $$list; do \ if test "$$subdir" = .; then :; else \ test ! -f $$subdir/TAGS || \ tags="$$tags $$include_option=$$here/$$subdir/TAGS"; \ fi; \ done; \ list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \ unique=`for i in $$list; do \ if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \ done | \ $(AWK) ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ if test -z "$(ETAGS_ARGS)$$tags$$unique"; then :; else \ test -n "$$unique" || unique=$$empty_fix; \ $(ETAGS) $(ETAGSFLAGS) $(AM_ETAGSFLAGS) $(ETAGS_ARGS) \ $$tags $$unique; \ fi ctags: CTAGS CTAGS: ctags-recursive $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) \ $(TAGS_FILES) $(LISP) tags=; \ here=`pwd`; \ list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \ unique=`for i in $$list; do \ if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \ done | \ $(AWK) ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ test -z "$(CTAGS_ARGS)$$tags$$unique" \ || $(CTAGS) $(CTAGSFLAGS) $(AM_CTAGSFLAGS) $(CTAGS_ARGS) \ $$tags $$unique GTAGS: here=`$(am__cd) $(top_builddir) && pwd` \ && cd $(top_srcdir) \ && gtags -i $(GTAGS_ARGS) $$here distclean-tags: -rm -f TAGS ID GTAGS GRTAGS GSYMS GPATH tags distdir: $(DISTFILES) $(am__remove_distdir) mkdir $(distdir) $(mkdir_p) $(distdir)/$(srcdir) $(distdir)/. $(distdir)/config $(distdir)/debian $(distdir)/perl $(distdir)/perl/t $(distdir)/rpm $(distdir)/src $(distdir)/src/vms $(distdir)/src/win32 @srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; \ topsrcdirstrip=`echo "$(top_srcdir)" | sed 's|.|.|g'`; \ list='$(DISTFILES)'; for file in $$list; do \ case $$file in \ $(srcdir)/*) file=`echo "$$file" | sed "s|^$$srcdirstrip/||"`;; \ $(top_srcdir)/*) file=`echo "$$file" | sed "s|^$$topsrcdirstrip/|$(top_builddir)/|"`;; \ esac; \ if test -f $$file || test -d $$file; then d=.; else d=$(srcdir); fi; \ dir=`echo "$$file" | sed -e 's,/[^/]*$$,,'`; \ if test "$$dir" != "$$file" && test "$$dir" != "."; then \ dir="/$$dir"; \ $(mkdir_p) "$(distdir)$$dir"; \ else \ dir=''; \ fi; \ if test -d $$d/$$file; then \ if test -d $(srcdir)/$$file && test $$d != $(srcdir); then \ cp -pR $(srcdir)/$$file $(distdir)$$dir || exit 1; \ fi; \ cp -pR $$d/$$file $(distdir)$$dir || exit 1; \ else \ test -f $(distdir)/$$file \ || cp -p $$d/$$file $(distdir)/$$file \ || exit 1; \ fi; \ done list='$(DIST_SUBDIRS)'; for subdir in $$list; do \ if test "$$subdir" = .; then :; else \ test -d "$(distdir)/$$subdir" \ || $(mkdir_p) "$(distdir)/$$subdir" \ || exit 1; \ distdir=`$(am__cd) $(distdir) && pwd`; \ top_distdir=`$(am__cd) $(top_distdir) && pwd`; \ (cd $$subdir && \ $(MAKE) $(AM_MAKEFLAGS) \ top_distdir="$$top_distdir" \ distdir="$$distdir/$$subdir" \ distdir) \ || exit 1; \ fi; \ done -find $(distdir) -type d ! -perm -777 -exec chmod a+rwx {} \; -o \ ! -type d ! -perm -444 -links 1 -exec chmod a+r {} \; -o \ ! -type d ! -perm -400 -exec chmod a+r {} \; -o \ ! -type d ! -perm -444 -exec $(SHELL) $(install_sh) -c -m a+r {} {} \; \ || chmod -R a+r $(distdir) dist-gzip: distdir tardir=$(distdir) && $(am__tar) | GZIP=$(GZIP_ENV) gzip -c >$(distdir).tar.gz $(am__remove_distdir) dist-bzip2: distdir tardir=$(distdir) && $(am__tar) | bzip2 -9 -c >$(distdir).tar.bz2 $(am__remove_distdir) dist-tarZ: distdir tardir=$(distdir) && $(am__tar) | compress -c >$(distdir).tar.Z $(am__remove_distdir) dist-shar: distdir shar $(distdir) | GZIP=$(GZIP_ENV) gzip -c >$(distdir).shar.gz $(am__remove_distdir) dist-zip: distdir -rm -f $(distdir).zip zip -rq $(distdir).zip $(distdir) $(am__remove_distdir) dist dist-all: distdir tardir=$(distdir) && $(am__tar) | GZIP=$(GZIP_ENV) gzip -c >$(distdir).tar.gz $(am__remove_distdir) # This target untars the dist file and tries a VPATH configuration. Then # it guarantees that the distribution is self-contained by making another # tarfile. distcheck: dist case '$(DIST_ARCHIVES)' in \ *.tar.gz*) \ GZIP=$(GZIP_ENV) gunzip -c $(distdir).tar.gz | $(am__untar) ;;\ *.tar.bz2*) \ bunzip2 -c $(distdir).tar.bz2 | $(am__untar) ;;\ *.tar.Z*) \ uncompress -c $(distdir).tar.Z | $(am__untar) ;;\ *.shar.gz*) \ GZIP=$(GZIP_ENV) gunzip -c $(distdir).shar.gz | unshar ;;\ *.zip*) \ unzip $(distdir).zip ;;\ esac chmod -R a-w $(distdir); chmod a+w $(distdir) mkdir $(distdir)/_build mkdir $(distdir)/_inst chmod a-w $(distdir) dc_install_base=`$(am__cd) $(distdir)/_inst && pwd | sed -e 's,^[^:\\/]:[\\/],/,'` \ && dc_destdir="$${TMPDIR-/tmp}/am-dc-$$$$/" \ && cd $(distdir)/_build \ && ../configure --srcdir=.. --prefix="$$dc_install_base" \ $(DISTCHECK_CONFIGURE_FLAGS) \ && $(MAKE) $(AM_MAKEFLAGS) \ && $(MAKE) $(AM_MAKEFLAGS) dvi \ && $(MAKE) $(AM_MAKEFLAGS) check \ && $(MAKE) $(AM_MAKEFLAGS) install \ && $(MAKE) $(AM_MAKEFLAGS) installcheck \ && $(MAKE) $(AM_MAKEFLAGS) uninstall \ && $(MAKE) $(AM_MAKEFLAGS) distuninstallcheck_dir="$$dc_install_base" \ distuninstallcheck \ && chmod -R a-w "$$dc_install_base" \ && ({ \ (cd ../.. && umask 077 && mkdir "$$dc_destdir") \ && $(MAKE) $(AM_MAKEFLAGS) DESTDIR="$$dc_destdir" install \ && $(MAKE) $(AM_MAKEFLAGS) DESTDIR="$$dc_destdir" uninstall \ && $(MAKE) $(AM_MAKEFLAGS) DESTDIR="$$dc_destdir" \ distuninstallcheck_dir="$$dc_destdir" distuninstallcheck; \ } || { rm -rf "$$dc_destdir"; exit 1; }) \ && rm -rf "$$dc_destdir" \ && $(MAKE) $(AM_MAKEFLAGS) dist \ && rm -rf $(DIST_ARCHIVES) \ && $(MAKE) $(AM_MAKEFLAGS) distcleancheck $(am__remove_distdir) @(echo "$(distdir) archives ready for distribution: "; \ list='$(DIST_ARCHIVES)'; for i in $$list; do echo $$i; done) | \ sed -e '1{h;s/./=/g;p;x;}' -e '$${p;x;}' distuninstallcheck: @cd $(distuninstallcheck_dir) \ && test `$(distuninstallcheck_listfiles) | wc -l` -le 1 \ || { echo "ERROR: files left after uninstall:" ; \ if test -n "$(DESTDIR)"; then \ echo " (check DESTDIR support)"; \ fi ; \ $(distuninstallcheck_listfiles) ; \ exit 1; } >&2 distcleancheck: distclean @if test '$(srcdir)' = . ; then \ echo "ERROR: distcleancheck can only run from a VPATH build" ; \ exit 1 ; \ fi @test `$(distcleancheck_listfiles) | wc -l` -eq 0 \ || { echo "ERROR: files left in build directory after distclean:" ; \ $(distcleancheck_listfiles) ; \ exit 1; } >&2 check-am: all-am check: check-recursive all-am: Makefile $(SCRIPTS) $(DATA) installdirs: installdirs-recursive installdirs-am: for dir in "$(DESTDIR)$(bindir)" "$(DESTDIR)$(docdir)" "$(DESTDIR)$(pkgconfigdir)"; do \ test -z "$$dir" || $(mkdir_p) "$$dir"; \ done install: install-recursive install-exec: install-exec-recursive install-data: install-data-recursive uninstall: uninstall-recursive install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am installcheck: installcheck-recursive install-strip: $(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \ install_sh_PROGRAM="$(INSTALL_STRIP_PROGRAM)" INSTALL_STRIP_FLAG=-s \ `test -z '$(STRIP)' || \ echo "INSTALL_PROGRAM_ENV=STRIPPROG='$(STRIP)'"` install mostlyclean-generic: clean-generic: distclean-generic: -test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES) maintainer-clean-generic: @echo "This command is intended for maintainers to use" @echo "it deletes files that may require special tools to rebuild." clean: clean-recursive clean-am: clean-generic clean-libtool mostlyclean-am distclean: distclean-recursive -rm -f $(am__CONFIG_DISTCLEAN_FILES) -rm -f Makefile distclean-am: clean-am distclean-generic distclean-libtool \ distclean-tags dvi: dvi-recursive dvi-am: html: html-recursive info: info-recursive info-am: install-data-am: install-docDATA install-pkgconfigDATA install-exec-am: install-binSCRIPTS install-info: install-info-recursive install-man: installcheck-am: maintainer-clean: maintainer-clean-recursive -rm -f $(am__CONFIG_DISTCLEAN_FILES) -rm -rf $(top_srcdir)/autom4te.cache -rm -f Makefile maintainer-clean-am: distclean-am maintainer-clean-generic mostlyclean: mostlyclean-recursive mostlyclean-am: mostlyclean-generic mostlyclean-libtool pdf: pdf-recursive pdf-am: ps: ps-recursive ps-am: uninstall-am: uninstall-binSCRIPTS uninstall-docDATA uninstall-info-am \ uninstall-pkgconfigDATA uninstall-info: uninstall-info-recursive .PHONY: $(RECURSIVE_TARGETS) CTAGS GTAGS all all-am am--refresh check \ check-am clean clean-generic clean-libtool clean-recursive \ ctags ctags-recursive dist dist-all dist-bzip2 dist-gzip \ dist-shar dist-tarZ dist-zip distcheck distclean \ distclean-generic distclean-libtool distclean-recursive \ distclean-tags distcleancheck distdir distuninstallcheck dvi \ dvi-am html html-am info info-am install install-am \ install-binSCRIPTS install-data install-data-am \ install-docDATA install-exec install-exec-am install-info \ install-info-am install-man install-pkgconfigDATA \ install-strip installcheck installcheck-am installdirs \ installdirs-am maintainer-clean maintainer-clean-generic \ maintainer-clean-recursive mostlyclean mostlyclean-generic \ mostlyclean-libtool mostlyclean-recursive pdf pdf-am ps ps-am \ tags tags-recursive uninstall uninstall-am \ uninstall-binSCRIPTS uninstall-docDATA uninstall-info-am \ uninstall-pkgconfigDATA # These create REAME and INSTALL in the top level *source* # directory for the distribution. Created at "make" time. $(srcdir)/INSTALL: $(top_srcdir)/pod/INSTALL.pod -rm -f $(top_srcdir)/INSTALL -pod2text $(top_srcdir)/pod/INSTALL.pod > $(top_srcdir)/INSTALL $(srcdir)/README: $(top_srcdir)/pod/README.pod -rm -f $(top_srcdir)/README -pod2text $(top_srcdir)/pod/README.pod > $(top_srcdir)/README .PHONEY: test test: check # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: swish-e-2.4.7/INSTALL0000664000077100017500000014761211166010455011073 00000000000000NAME INSTALL - Swish-e Installation Instructions OVERVIEW This document describes how to download, build, and install Swish-e from source. Also found below is a basic overview of using Swish-e to index documents, with pointers to other, more advanced examples. This document also provides instructions on how to get help installing and using Swish-e (and the important information you should provide when asking for help). Please read these instructions before requesting help on the Swish-e discussion list. See "QUESTIONS AND TROUBLESHOOTING". Although building from source is recommended, some OS distributions (e.g., Debian) provide pre-compiled binaries. Check with your distribution for available packages. Build from source, if your distribution does not offer the current version of Swish-e. Also, please read the Swish-e FAQ (SWISH-FAQ), as it answers many frequently-asked questions. Swish-e knows how to index HTML, XML, and plain text documents. Helper applications and other tools are used to convert documents such as PDF or MS Word into a format that Swish-e can index. These additional applications and tools (listed below) must be installed separately. The process of converting documents is called "filtering". NOTE: Swish-e version 4.2.0 installs a lot more files when running "make install". Be aware that the Swish-e documentation may thus include errors about where files are located. Please notify the Swish-e discussion list of any documentation errors. Upgrading from previous versions of Swish-e If you are upgrading from a previous version of Swish-e, read the CHANGES page first. The Swish-e index format may have changed and existing indexes may not work with the newer version of Swish-e. If you have existing indexes, you may need to re-index your data before running the "make install" step described below. Swish-e may be run from the build directory after compiling, but before installation. Windows Users A Windows binary version is available as a separate download from the Swish-e site (http://swish-e.org). Many of the installation instructions below will not apply to Windows users; the Windows version is pre-compiled and includes libxml2, zlib, xpdf, and catdoc. A number of Perl modules may also be needed. These can be installed with ActiveState's PPM utility. libwww-perl - the LWP modules (for spidering) HTML-Tagset - used by web spider HTML-Parser - used by web spider MIME-Types - used for filtering documents when not spidering HTML-Template - formatting output from swish.cgi (optional) HTML-FillInForm (if HTML-Template is used) Building from CVS Please refer to the README.cvs file found in the documentation directory $prefix/share/doc/swish-e. SYSTEM REQUIREMENTS Swish-e makes use of a number of libraries and tools that are not distributed with Swish-e. Some libraries need to be installed before building Swish-e from source; other tools can be installed at any time. See below for details. Software Requirements Swish-e is written in C. It has been tested on a number of platforms, including Sun/Solaris, Dec Alpha, BSD, Linux, Mac OS X, and Open VMS. The GNU C compiler (gcc) and GNU make are strongly recommended. Repeat: you will find life easier if you use the GNU tools. Optional but Recommended Packages Most of the packages listed below are available as easily installable packages. Check with your operating system vendor or install them from source. Most are very common packages that may already be installed on your computer. As noted below, some packages need to be installed before building Swish-e from source, while others may be added after Swish-e is installed. * Libxml2 libxml2 is very strongly recommended. It is used for parsing both HTML and XML files. Swish-e can be built and installed without libxml2, but the HTML parser that is built into Swish-e is not as accurate as libxml2. http://xmlsoft.org/ libxml2 must be installed before Swish-e is built, or it will not be used. If libxml2 is installed in a non-standard location (e.g., libxml2 is built with "--prefix $HOME/local"), make sure that you add the "bin" directory to your $PATH before building Swish-e. Swish-e's configure script uses a program created by libxml2 ("xml2-config") to find the location of libxml2. Use "which xml2-config" to verify that the program can be found where expected. * Zlib Compression The Zlib compression library is commonly installed on most systems and is recommended for use with Swish-e. Zlib is used for compressing text stored in the Swish-e index. http://www.gzip.org/zlib/ Zlib must be installed before building Swish-e. * Perl Modules Although Swish-e is a compiled C program, many support features use Perl. For example, both the web spiders and modules to help with filtering documents are written in Perl. The following Perl modules may be required. Check your current Perl installation, as many may already be installed. LWP URI HTML::Parser HTML::Tagset MIME::Types (optional) Note that installing "Bundle::LWP" with the CPAN module perl -MCPAN -e 'install Bundle::LWP' will install many of the above modules. If you wish to use "HTML-Template" with swish.cgi to generate output, install: HTML::Template HTML::FillInForm If you wish to use "Template-Toolkit" with "swish.cgi" to generate output, install: Template Questions about installing these modules may be sent to the Swish-e discussion list. The "search.cgi" example script requires both "Template-Toolkit" and "HTML::FillInForm". * Indexing PDF Documents Indexing PDF files requires the "xpdf" package. This is a common package, available with most operating systems and often provided as an add-on package. http://www.foolabs.com/xpdf/ Xpdf may be added after Swish-e is installed. * Indexing MS Word Documents Indexing MS Word documents requires the Catdoc program. http://www.wagner.pp.ru/~vitus/software/catdoc/ Catdoc may be added after Swish-e is installed. * Indexing MP3 ID3 Tags Indexing MP3 ID3 Tags requires the "MP3::Tag" Perl module. See http://search.cpan.org. "MP3::Tag" may be installed after Swish-e is installed. * Indexing MS Excel Files Indexing MS Excel files is supported by the following Perl modules, also available at http://search.cpan.org. Spreadsheet::ParseExcel HTML::Entities These Perl modules may be installed after Swish-e is installed. INSTALLATION Here are brief installation instructions that should work in most cases. Following this section are more detailed instructions and examples. Building Swish-e Download Swish-e using your favorite web browser or a utility such as "wget", "lynx", or "lwp-download". Unpack and build the distribution, using the following steps: Note: "swish-e-2.4.0" is used as an example. Download the most current available version and adjust the commands below! Also, if you are running Debian, see the notes below on building a ".deb" package from the Swish-e source package. Pay careful attention to the "prompt" character used on the following command lines. A "$" prompt indicates steps run as an unprivileged user. A "#" indicates steps run as the superuser (root). $ wget http://swish-e.org/Download/swish-e-2.4.0.tar.gz $ gzip -dc swihs-e-2.4.0.tar.gz | tar xof - $ cd swish-e-2.4.0 (this directory will depend on the version of Swish-e) $ ./configure $ make $ make check ... ================== All 3 tests passed ================== $ su root (or use sudo) (enter password) # make install # exit $ swish-e -V SWISH-E 2.4.0 IMPORTANT: Once Swish-e is installed, do not run it as the superuser (root) -- root is only required during the installation step, when installing into system directories. Please do not break this rule. NOTE: If you are upgrading from an older version of Swish-e, be sure and review the CHANGES file. Old index files may not be compatible with newer versions of Swish-e. After building Swish-e (but before running "make install"), Swish-e can be run from the build directory: $ src/swish-e -V To minimize downtime, create new index files before running "make install", by using Swish-e from the build directory. Then, copy the index files to the live location and run "make install": $ src/swish-e -c /path/to/config -f index.new Keep in mind that the location you index from may affect the paths stored in the index file. Installing without root access Here's another installation example. This might be used if you do not have root access or you wish to install Swish-e someplace other than "/usr/local". This example also shows building Swish-e in a "build" directory that is separate from where the source files are located. This is the recommended way to build Swish-e, but it requires GNU Make. Without GNU Make, you will likely need to build from within the source directory, as shown in the previous example. $ tar zxof swish-e-2.4.0.tar.gz (GNU tar with "z" option) $ mkdir build $ cd build Note that the current directory is not where Swish-e was unpacked. Swish-e uses a configure script. configure has many options, but it uses reasonable and standard defaults. Running $ ../swish-e-2.4.0/configure --help will display the options. Two options are of common interest: "--prefix" sets the top-level installation directory; "--disable-shared" will link Swish-e statically, which may be needed on some platforms (Solaris 2.6, perhaps). Platforms may require varying link instructions when libraries are installed in non-standard locations. Swish-e uses the GNU autoconf tools for building the package. autoconf is good at building and testing, but still requires you to provide information appropriate for your platform. This may mean reading the manual page for your compiler and linker to see how to specify non-standard file locations. For most Unix-type platforms, you can use "LDFLAGS" and "CPPFLAGS" environment variables to specify paths to "include" (header) files and to libraries that are not in standard locations. In this example, we do not have root access. We have installed libxml2 and libz in "$HOME/local". Swish-e will also be installed in "$HOME/local" (by using the "--prefix" setting). In this case, you would need to add "$HOME/local/bin" to the start of your shell's $PATH setting. This is required because libxml2 installs a program that is used when running the configure script. Before running configure, type: $ which xml2-config It should list "$HOME/local/bin/xml2-config". Now run configure (remember, we are in a separate "build" directory): $ ../swish-e-2.4.0/configure \ --prefix=$HOME/local \ CPPFLAGS=-I$HOME/local/include \ LDFLAGS="-R$HOME/local/lib -L$HOME/local/lib" $ make >/dev/null (redirect output to only see warnings and errors) $ make check ... ================== All 3 tests passed ================== $ make install $ $HOME/local/bin/swish-e -V SWISH-E 2.4.0 Note the use of double quotes in the "LDFLAGS" line above. This allows $HOME to be expanded within the text string. Run-time paths The "-R" option says to add a specified path (or paths) to those that are used to find shared libraries at run time. These paths are stored in the Swish-e binary. When Swish-e is run, it will look in these directories for shared libraries. Some platforms may not support the "-R" option. In this event, set the "LD_RUN_PATH" environment variable before running make. Some systems, such as Redhat, do not look in "/usr/local/lib" for libraries. In these cases, you can either use "-R", as above, when building Swish-e or add "/usr/local/lib" to "/etc/ld.so.conf" and run ldconfig as root. If all else fails, you may need to actually read the man pages for your platform. Building a Debian Package The Swish-e distribution includes the files required to build a Debian package. $ tar zxof swish-e-2.4.0.tar.gz (GNU tar with "z" option) $ cd swish-e-2.4.0 $ fakeroot debian/rules binary [lots of output] dpkg-deb: building package `swish-e' in `../swish-e_2.4.0-0_i386.deb'. $ su # dpkg -i ../swish-e_2.4.0-0_i386.deb What's installed Swish installs a number of files. By default, all files are installed below "/usr/local", but this can be changed by setting "--prefix" when running configure (as shown above). Individual paths may also be set. Run "configure --help" for details. $prefix/bin/swish-e The Swish-e binary program $prefix/share/doc/swish-e/ Full documentation and examples $prefix/lib/libswish-e The Swish-e C library $prefix/include/swish-e.h The library header file $prefix/man/man1/ Documentation as manual pages $prefix/lib/swish-e/ Helper programs (spider.pl, swishspider, swish.cgi) $prefix/lib/swish-e/perl/ Perl helper modules Note that the Perl modules are *not* installed in the system Perl library. Swish-e and the Perl scripts that require the modules know where to find the modules, but the perldoc program (used for reading documentation) does not. This can be corrected by adding "$prefix/lib/swish-e" and "$prefix/lib/swish-e/perl" to the "PERL5LIB" environment variable. Documentation Documentation can be found in the "$prefix/share/doc/swish-e" directory. Documentation is in html format at "$prefix/share/doc/swish-e/html" and can also be read on-line at the Swish-e web site: http://swish-e.org/ The Swish-e documentation as man(1) pages Running "make install" installs some of the Swish-e documentation as man pages. The following man pages are installed: SWISH-FAQ(1) SWISH-CONFIG(1) SWISH-RUN(1) SWISH-LIBRARY(1) The man pages are installed, by default, in the system man directory. This directory is determined when configure is run; it can be set by passing a directory name to configure. For example, ./configure --mandir=/usr/local/doc/man The man directory is specified relative to the "--prefix" setting. If you use "--prefix", you do not normally need to also specify "--mandir". Information on running configure can be found by typing: ./configure --help Join the Swish-e discussion list The final step, when installing Swish-e, is to join the Swish-e discussion list. The Swish-e discussion list is the place to ask questions about installing and using Swish-e, see or post bug fixes or security announcements, and offer help to others. Please do not contact the developers directly. The list is typically *very low traffic*, so it won't overload your inbox. Please take the time to subscribe. See http://Swish-e.org. If you are using Swish-e on a public site, please let the list know, so that your URL can be added to the list of sites that use Swish-e! Please review the next section before posting questions to the Swish-e list. QUESTIONS AND TROUBLESHOOTING Support for installation, configuration, and usage is available via the Swish-e discussion list. Visit http://swish-e.org for information. Do not contact developers directly for help -- always post your question to the list. It's very important to provide the right information when asking for help. Please search the Swish-e list archive before posting a question. Also, check the SWISH-FAQ to see if your question has already been asked and answered. Before posting, use the available tools to narrow down the problem. Swish-e has several switches (e.g., "-T", "-v", and "-k") that may help you resolve issues. These switches are described on the SWISH-RUN page. For example, if you cannot find a document by a keyword that you believe should be indexed, try indexing just that single file and use the "-T INDEXED_WORDS" option to see if the word is actually being indexed. First, try it without any changes to default settings: swish-e -i testdoc.html -T indexed_words | less if that works, add in your configuration file: swish-e -i testdoc.html -c swish.conf -T indexed_words | less If it still isn't working as you expect, try to reduce the test document to a very small example. This will be very helpful to your readers, when you are asking for help. Another useful trick is to use "-H9" when searching, to display full headers in search results. Look at the "Parsed Words" header to see what words Swish-e is searching for. When posting, please provide the following information: Use these guidelines when asking for help. The most important tip is to provide the least amount of information that can be used to reproduce your problem. Do not paraphrase output -- copy-and-paste -- but trim text that is not necessary. * The exact version of Swish-e that you are using. Running Swish-e with the "-V" switch will print the version number. Also, supply the output from "uname -a" or similar command that identifies the operating system you are running on. If you are running an old version of swish, be prepared for a response of "upgrade" to your question. * A summary of the problem. This should include the commands issued (e.g. for indexing or searching) and their output, along with an explanation of why you don't think it's working correctly. Please copy-and-paste the exact commands and their output, instead of retyping, to avoid errors. * Include a copy of the configuration file you are using, if any. Swish-e has reasonable defaults, so in many cases you can run it without using a configuration file. But, if you need to use a configuration file, reduce it down to the absolute minimum number of commands that is required to demonstrate your problem. Again, copy-and-paste. * A small copy of a source document that demonstrates the problem. If you are having problems spidering a web server, use lwp-download or wget to copy the file locally, then make sure you can index the document using the file system method. This will help you determine if the problem is with spidering or indexing. If you expect help with spidering, don't post fake URLs, as it makes it impossible to test. If you don't want to expose your web page to the people on the Swish-e list, find some other site to test spidering on. If that works, but you still cannot spider your own site, you may need to request help from others. If so, you must post your real URL or make a test document available via some other source. * If you are having trouble building Swish-e, please copy-and-paste the output from make (or from "./configure", if that's where the problem is). The key is to provide enough information so that others may reproduce the problem. ADDITIONAL INSTALLATION OPTIONS These steps are not required for normal use of Swish-e. The SWISH::API Perl Module The Swish-e distribution includes a module that provides a Perl interface to the Swish-e C library. This module provides a way to search a Swish-e index without running the Swish-e program. Searching an index will be many times faster when running under a persistent environment such as Apache/mod_perl with the "SWISH::API" module. See the perl/README file for information on installing and using the "SWISH::API" Perl module. GENERAL CONFIGURATION AND USAGE This section should give you a basic overview of indexing and searching with Swish-e. Other examples can be found in the "conf" directory; these will step you through a number of different configurations. Also, please review the SWISH-FAQ. Swish-e is a command-line program. The program is controlled by passing switches on the command line. A configuration file may be used, but often is not required. Swish-e does not include a graphical user interface. Example CGI scripts are provided in the distribution, but they require additional setup to use. Introduction to Indexing and Searching Swish-e can index files that are located on the local file system. For example, running: swish-e -i /var/www/htdocs will index *all* files in the "/var/www/htdocs" directory. You may specify one or more files or directories with the "-i" option. By default, this will create an index called "index.swish-e" in the current directory. To search the resulting index for a given word, try: swish-e -w apache This will find the word "apache" in the body or title of the indexed documents. As mentioned above, Swish-e will index all files in a directory, unless instructed otherwise. So, if "/var/www/htdocs" contains non-HTML files, you will need a configuration file to limit the files that Swish-e indexes. Create a file called "swish.conf": # Example configuration file # Tell Swish-e what to index (same as -i switch above) IndexDir /var/www/htdocs # Only index HTML and text files IndexOnly .htm .html .txt # Tell Swish-e that .txt files are to use the text parser. IndexContents TXT* .txt # Otherwise, use the HTML parser DefaultContents HTML* # Ask libxml2 to report any parsing errors and warnings or # any UTF-8 to 8859-1 conversion errors ParserWarnLevel 9 After saving the configuration file, reindex: swish-e -c swish.conf The Swish-e configuration settings are described in the SWISH-CONFIG manual page. The order of statements in the configuration file is typically not important, although some statements depend on previously set statements. There are many possible settings. Good advice is to use as few settings as possible when first starting out with Swish-e. The runtime options (switches) are described in the SWISH-RUN manual page. You may also see a summary of options by running: swish-e -h Swish-e has two other methods for reading input files. One method uses a Perl helper script and the LWP Perl library to spider remote web sites: swish-e -S http -i http://localhost/index.html -v2 This will spider the web server running on the local host. The "-S" option defines the input source method to be "http", "-i" specifies the URL to spider, and "-v" sets the verbose level to two. There are a number of configuration options that are specific to the "-S" http input source. See SWISH-CONFIG. Note that only files of "Content-Type text/*" will be indexed. The "-S http" method is deprecated, however, in favor of a variation on the following input method. There is a general-purpose input method wherein Swish-e reads input from a program that produces documents in a special format. The program might read and format data stored in a database, or parse and format messages in a mailing list archive, or run a program that spiders web sites (like the previous method). The Swish-e distribution includes a spider program that uses this method of input. This spider program is much more configurable and feature-rich than the previous ("-S http") method. To duplicate the previous example, create a configuration file called "swish2.conf": # Example for spidering # Use the "spider.pl" program included with Swish-e IndexDir spider.pl # Define what site to index SwishProgParameters default http://localhost/index.html Then, create the index using the command: swish-e -S prog -c swish2.conf This says to use the "-S prog" input source method. Note that, in this case, the "IndexDir" setting does not specify a file or directory to index, but a program name to be run. This program, "spider.pl", does the work of fetching the documents from the web server and passing them to Swish-e for indexing. The "SwishProgParameters" option is a special feature that allows passing command-line parameters to the program specified with "IndexDir". In this case, we are passing the word "default" (which tells "spider.pl" to use default settings) and the URL to spider. Running a script under Windows requires specifying the interpreter (e.g., "perl.exe") and then using "SwishPropParameters" to specify the script and the script's parameters. See *Notes when using "-S prog" on MS Windows* on the SWISH-RUN page. The advantage of the "-S prog" method of spidering (over the previous "-S http" method) is that the Perl code is only compiled once instead of once for every document fetched from the web server. In addition, it is a much more advanced spider with many, many features. Still, as used here, "spider.pl" will automatically index PDF or MS Word documents if (when) Xpdf and Catdoc are installed. A special form of the "-S prog" input source method is: ./myprog --option | swish-e -S prog -i stdin -c config This allows running Swish-e from a program (instead of running the external program from Swish-e). So, this also can be done as: ./myprog --option > outfile swish-e -S prog -i stdin -c config < outfile or ./myprog --option > outfile cat outfile | swish-e -S prog -i stdin -c config One final note about the "-S prog" input source method. The program specified with "-i" or "IndexDir" needs to be an absolute path. The exception is when the program is installed in the "libexecdir" directory. Then, a plain program name may be specified (as in the example showing "spider.pl", above). All three input source methods are described in more detail on the SWISH-RUN page. Metanames and Properties There are two key Swish-e concepts that you need to be familiar with: Metanames and Properties. * Metanames Swish-e creates a reverse (i.e., inverted) index. Just like an index in a book, you look up a word and it lists the pages (or documents) where that word can be found. Swish-e can create multiple index tables within the same index file. For example, you might want to create an index that only contains words in HTML titles, so that searches can be limited to title text. Or, you might have descriptive words that you would like to search, stored in a meta tag called "keywords". Some database systems might call these different "fields" or "columns", but Swish-e calls them *MetaNames* (as a result of its first indexing HTML "meta" tags). To find documents containing "foo" in their titles, you might run: swish-e -w swishtitle=foo or, a more advanced example: swish-e -w swishtitle=(foo or bar) or swishdefault=(baz) The Metaname "swishdefault" is the name that is used by Swish-e if no other name is specified. The following two searches are thus equivalent: swish-e -w foo swish-e -w swishdefault=foo When indexing HTML documents, Swish-e indexes words in the body and title under the Metaname "swishdefault". * Properties Swish-e's search result is a list of files -- actually, Swish-e uses file numbers internally. Data can be associated with each file number when indexing. For example, by default Swish-e associates the file's name, title, last modified date, and size with the file number. These items can be printed in search results. In Swish-e, this associated data is called a file's *Properties*. Properties can be any data you wish to associated with a document -- in fact, the entire text of the document can be stored in the index. What data is stored as a Property is controlled by the *PropertyNames* (and other) configuration directives. What properties are printed with search results depends on the "-x" or "-p" switches. By default, Swish-e returns the rank, path/URL, title, and file size in bytes for each result. Getting Started With Swish-e Swish-e reads a configuration file (see SWISH-CONFIG) for directives that control whether and how Swish-e indexes files. Swish-e is also controlled by command-line arguments (see SWISH-RUN). Many of the command-line arguments have equivalent configuration directives (e.g., "-i" and "IndexDir"). Swish-e does not require a configuration file, but most people change its default behavior by placing settings in a configuration file. To try the examples below, go to the "tests" subdirectory of the distribution. The tests will use the "*.html" files in this directory when creating the test index. You may wish to review these "*.html" files to get an idea of the various native file formats that Swish-e supports. You may also use your own test documents. It's recommended to use small test documents when first using Swish-e. Step 1: Create a Configuration File The configuration file controls what and how Swish-e indexes. The configuration file consists of directives, comments, and blank lines. The configuration file can be any name you like. This example will work with the documents in the tests directory. You may wish to review the tests/test.config configuration file used for the "make test" tests. For example, a simple configuration file (swish-e.conf): # Example Swish-e Configuration file # Define *what* to index # IndexDir can point to a directories and/or a files # Here it's pointing to the current directory # Swish-e will also recurse into sub-directories. IndexDir . # But only index the .html files IndexOnly .html # Show basic info while indexing IndexReport 1 And that's a simple configuration file. It says to index all the ".html" files in the current directory and sub-directories, if any, and provide some basic output while indexing. As mentioned above, the complete list of all configuration file directives is detailed in SWISH-CONFIG. Step 2: Index your Files Run Swish-e, using the "-c" switch to specify the name of the configuration file. swish-e -c swish-e.conf Indexing Data Source: "File-System" Indexing "." Removing very common words... no words removed. Writing main index... Sorting words ... Sorting 55 words alphabetically Writing header ... Writing index entries ... Writing word text: Complete Writing word hash: Complete Writing word data: Complete 55 unique words indexed. 4 properties sorted. 5 files indexed. 1252 total bytes. 140 total words. Elapsed time: 00:00:00 CPU time: 00:00:00 Indexing done! This created the index file "index.swish-e". This is the default index file name, unless the IndexFile directive is specified in the configuration file: IndexFile ./website.index You may use the "-f" switch to specify a index file at indexing time. The "-f" option overrides any "IndexFile" setting that may be in the configuration file. Step 3: Search You specify your search terms with the "-w" switch. For example, to find the files that contain the word "sample", you would issue the command: swish-e -w sample This example assumes that you are in the "tests" directory. Swish-e returns the following, in response to this command: swish-e -w sample # SWISH format: 2.4.0 # Search words: sample # Number of hits: 2 # Search time: 0.000 seconds # Run time: 0.005 seconds 1000 ./test_xml.html "If you are seeing this, the METATAG XML search was successful!" 159 1000 ./test.html "If you are seeing this, the test was successful!" 437 . So, the word "sample" was found in two documents. The first number shown is the relevance (or rank) of the search term, followed by the file containing the search term, the title of the document, and finally, the length of the document (in bytes). The period ("."), sitting alone at the end, marks the end of the search results. Much more information may be retrieved while searching, by using the "-x" and "-H" switches (see SWISH-RUN) and by using Document Properties (see SWISH-CONFIG). Phrase Searching To search for a phrase in a document, use double-quotes to delimit your search terms. (The default phrase delimiter is set in "src/swish.h".) You must protect the quotes from the shell. For example, under Unix: swish-e -w '"this is a phrase" or (this and that)' swish-e -w 'meta1=("this is a phrase") or (this and that)' Or under the Windows "command.com" shell. swish-e -w \"this is a phrase\" or (this and that) The phrase delimiter can be set with the "-P" switch. Boolean Searching You can use the Boolean operators and, or, or not in searching. Without these Boolean operatots, Swish-e will assume you're anding the words together. Here are some examples: swish-e -w 'apples oranges' swish-e -w 'apples and oranges' ( Same thing ) swish-e -w 'apples or oranges' swish-e -w 'apples or oranges not juice' -f myIndex retrieves first the files that contain both the words "apples" and "oranges"; then among those, selects the ones that do not contain the word "juice". A few other examples to ponder: swish-e -w 'apples and oranges or pears' swish-e -w '(apples and oranges) or pears' ( Same thing ) swish-e -w 'apples and (oranges or pears)' ( Not the same thing ) Swish processes the query left to right. See SWISH-SEARCH for more information. Context Searching The "-t" option in the search command line allows you to search for words that exist only in specific HTML tags. This option takes a string of characters as its argument. Each character represents a different tag in which the word is searched; that is, you can use any combinations of the following characters: H search in all tags B search in the tags t search in tags h is <H1> to <H6> (header) tags e is emphasized tags (this may be <B>, <I>, <EM>, or <STRONG>) c is HTML comment tags (<!-- ... -->) For example: # Find only documents with the word "linux" in the <TITLE> tags. swish-e -w linux -t t # Find the word "apple" in titles or comments swish-e -w apple -t tc META Tags As mentioned above, Metanames are a way to define "fields" in your documents. You can use the Metanames in your queries to limit the search to just the words contained in that META name of your document. For example, you might have a META-tagged field called "subjects" in your documents. This would let you search your documents for the word "foo", but only return documents where "foo" is within the "subjects" META tag. Document *Properties* are somewhat related: Properties allow the content of a META tag in a source document to be stored within the index, and that text to be returned along with search results. META tags can have two formats in your documents. <META NAME="keyName" CONTENT="some Content"> And in XML format <keyName> Some Content </keyName> If using libxml, you can optionally use a non-HTML tag as a metaname: <html> <body> Hello swish users! <keyName> this is meta data </keyName>. </body> This, of course, is invalid HTML. To continue with our sample "Swish-e.conf" file, add the following lines: # Define META tags MetaNames meta1 meta2 meta3 Reindex to include the changes: swish-e -c swish-e.conf Now search, but this time limit your search to META tag "meta1": swish-e -w 'meta1=metatest1' Again, please see SWISH-RUN and SWISH-CONFIG for complete documentation of the various indexing and searching options. Spidering and Searching with a Web form. This example demonstrates how to spider a web site and set up the included CGI script to provide a web-based search page. This example uses Perl programs that are included in the Swish-e distribution: spider.pl will be used for reading files from the web server; swish.cgi will provide the web search form and display results. As an example, we will index the Apache Web Server documentation, installed on the local computer at http://localhost/apache_docs/index.html. 1 Make a Working Directory Create a directory to store the Swish-e configuration and the Swish-e index. ~$ mkdir web_index ~$ cd web_index/ ~/web_index$ 2 Create a Swish-e Configuration file ~/web_index$ cat swish.conf # Swish-e config to index the Apache documentation # # Use spider.pl for indexing (location of spider.pl set at installation time) IndexDir spider.pl # Use spider.pl's default configuration and specify the URL to spider SwishProgParameters default http://localhost/apache_docs/index.html # Allow extra searching by title, path Metanames swishtitle swishdocpath # Set StoreDescription for each parser # to display context with search results StoreDescription TXT* 10000 StoreDescription HTML* <body> 10000 3 Generate the Index Now, run Swish-e to create the index: ~/web_index$ swish-e -S prog -c swish.conf Indexing Data Source: "External-Program" Indexing "spider.pl" /usr/local/lib/swish-e/spider.pl: Reading parameters from 'default' Summary for: http://localhost/apache_docs/index.html Duplicates: 4,188 (349.0/sec) Off-site links: 276 (23.0/sec) Skipped: 1 (0.1/sec) Total Bytes: 2,090,125 (174177.1/sec) Total Docs: 147 (12.2/sec) Unique URLs: 149 (12.4/sec) Removing very common words... no words removed. Writing main index... Sorting words ... Sorting 7736 words alphabetically Writing header ... Writing index entries ... Writing word text: Complete Writing word hash: Complete Writing word data: Complete 7736 unique words indexed. 5 properties sorted. 147 files indexed. 2090125 total bytes. 200783 total words. Elapsed time: 00:00:13 CPU time: 00:00:02 Indexing done! The above output is actually a mix of output from both Swish-e and "spider.pl". "spider.pl" reports the "Summary for: http://localhost/apache_docs/index.html". Also note that Swish-e knows to find "spider.pl" at "/usr/local/lib/swish-e/spider.pl". The script installation directory (called "libexecdir") is set at configure time. You can see your setting by running "swish-e -h": ~/web_index$ swish-e -h | grep libexecdir Scripts and Modules at: (libexecdir) = /usr/local/lib/swish-e This directory will be needed in the next step, when setting up the CGI script. Finally, verify that the index can be searched from the command line: ~/web_index$ swish-e -w installing -m3 # SWISH format: 2.4.0 # Search words: installing # Removed stopwords: # Number of hits: 17 # Search time: 0.018 seconds # Run time: 0.050 seconds 1000 http://localhost/apache_docs/install.html "Compiling and Installing Apache" 17960 718 http://localhost/apache_docs/install-tpf.html "Installing Apache on TPF" 25734 680 http://localhost/apache_docs/windows.html "Using Apache with Microsoft Windows" 27165 . Now, try limiting the search to the title: ~/web_index$ swish-e -w swishtitle=installing -m3 # SWISH format: 2.3.5 # Search words: swishtitle=installing # Removed stopwords: # Number of hits: 2 # Search time: 0.018 seconds # Run time: 0.048 seconds 1000 http://localhost/apache_docs/install-tpf.html "Installing Apache on TPF" 25734 1000 http://localhost/apache_docs/install.html "Compiling and Installing Apache" 17960 . Note that the above can also be done using the "-t" option: ~/web_index$ swish-e -w installing -m3 -tH 4 Set up the CGI script Swish-e does not include a web server. So, you must use your locally installed web server. Apache is highly recommended, of course. Locate your web server's CGI directory. This may be a "cgi-bin" directory in your home directory or a central "cgi-bin" directory set up by the web server administrator. Once this is located, copy the "swish.cgi" script into the "cgi-bin" directory. Where CGI scripts can be located depends completely on the web server that is being used and how it has been configured. See your web server's documentation or your site's administrator for additional information. This example will use a site "cgi-bin" directory, located at "/usr/lib/cgi-bin". Copy the "swish.cgi" script into the "cgi-bin" directory. Again, we will need the location of the "libexecdir" directory: ~/web_index$ swish-e -h | grep libexecdir Scripts and Modules at: (libexecdir) = /usr/local/lib/swish-e ~/web_index$ cd /usr/lib/cgi-bin /usr/lib/cgi-bin$ su Password: /usr/lib/cgi-bin# cp /usr/local/lib/swish-e/swish.cgi. If your operating system supports symbolic links and your web server allows programs to be symbolic links, then you may wish to create a link to the "swish.cgi" program, instead. /usr/lib/cgi-bin# ln -s /usr/local/lib/swish-e/swish.cgi We need to tell the "swish.cgi" script where to look for the index created in the previous step. It's also recommended to enter the path to the swish-e binary. Otherwise, the "swish.cgi" script will look for the binary in the "PATH", and that may change when running under the CGI environment. Here's the configuration file: /usr/lib/cgi-bin# cat .swishcgi.conf return { title => 'Search Apache Documentation', swish_binary => '/usr/local/bin/swish-e', swish_index => '/home/moseley/web_index/index.swish-e', } Now, test the script from the command line (as a normal user!): /usr/lib/cgi-bin# exit exit /usr/lib/cgi-bin$ ./swish.cgi | head Content-Type: text/html; charset=ISO-8859-1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title> Search Apache Documentation Notice that the CGI script returns the HTTP header (Content-Type) and the body of the web page, just like a well behaved CGI scrip should do. Now, test using the web server (this step depends on the location of your "cgi-bin" directory). This example uses the "GET" command that is part of the LWP Perl library, but any web browser can run this test. /usr/lib/cgi-bin$ GET http://localhost/cgi-bin/swish.cgi | head Search Apache Documentation

The script reports errors to stderr, so consult the web server's error log if problems occur. The message "Service currently unavailable", reported by running "swish.cgi", typically indicates a configuration error; the exact problem will be listed in the web server's error log. Detailed instructions on using the "swish.cgi" script and debugging tips can be found by running: $ perldoc swish.cgi while in the "cgi-bin" directory where "swish.cgi" was copied. The spider program "spider.pl" also has a large number of configuration options. Documentation is also available in the directory "$prefix/share/doc/swish-e" or at http://swish-e.org. Note: Also check out the "search.cgi" script, found at the same location as the "swish.cgi" script. This is more of a skeleton script, for those that want to create a custom search script. Now you are ready to search. Indexing Other Types of Documents - Filtering Swish-e can only index HTML, XML, and text documents. In order to index other documents, such as PDF or MS Word documents, you must use a utility to convert or "filter" those documents. How documents are filtered with Swish-e has changed over time. This has resulting in a bit of confusion. It's also a somewhat complex process, as different programs need to communicate with each other. You may wish to read the Swish-e FAQ question on filtering, before continuing here. How do I filter documents? Filtering Overview There are two ways to filter documents with Swish-e. Both are described in the SWISH-CONFIG man page. They use the "FileFilter" directive and the "SWISH::Filter" Perl module. The "FileFilter" directive is a general-purpose method of filtering. It allows running of an external program for each document processed (based on file extension), and requires one or more external programs. These programs open an input file, convert as needed, and write their output to standard output. Previous versions of Swish-e (before 2.4.0) used a collection of filter programs for converting files such as PDF or MS Word documents. The external programs call other program to do the work of filtering (e.g. pdftotext to extract the contents from PDF files). Although these filter programs are still included with the Swish-e distribution as examples, it is recommended to use the "SWISH::Filter" method, instead. One disadvantage of using "FileFilter" is that the filter program is run once for every document that needs to be filtered. This can slow down the indexing process substantially. The "SWISH::Filter" Perl module works very much like the old system and uses the same helper programs. Convieniently, however, it provides a single interface for filtering all types of documents. The primary advantage of "SWISH::Filter" is that it is built into the program used for spidering web sites (spider.pl), so all that's required is installing the filter programs that do the actual work of filtering (e.g. catdoc, xpdf). (The Windows binary includes some of the filter programs.) But, Swish-e will not use "SWISH::Filter" by default when using the file system method of indexing. To use "SWISH::Filter" when indexing by file system method (-S fs), you can use a "FileFilter" directive with the "swish_filter.pl" filter (which is just a program that uses "SWISH::Filter") or use the "-S prog" method of indexing and use the "DirTree.pl" program for fetching documents. "DirTree.pl" is included with the Swish-e distribution and is designed to work with "SWISH::Filter". Using DirTree.pl will likely be a faster way to index, since the "SWISH::Filter" set of modules does not need to be compiled for every document that needs to be filtered. See the contents of "swish_filter.pl" and "DirTree.pl" for specifics on their use. Filtering Examples The "FileFilter" directive can be used in your config file to convert documents, based on their extensions. This is the old way of filtering, but provides an easy way to add filters to Swish-e. For example: FileFilter .pdf pdftotext "'%p' -" IndexContents TXT* .pdf will cause all ".pdf" files to be filtered through the pdftotext program (part of the Xpdf package) and to parse the resulting output (from pdftotext) with the text ("TXT") parser. The other way to filter documents is to use a "-S prog" prograam and convert the documents before passing them onto Swish-e. For example, "spider.pl" makes use of the "SWISH::Filter"" Perl module, included with the Swish-e distribution. "SWISH::Filter" is passed a document and the document's content type; it looks for modules and utilities to convert the document into one of the types that Swish-e can index. Swish-e comes ready to index PDF, MS Word, MP3 ID3 tags, and MS Excel file types. But these filters need extra modules or tools to do the actual conversion. For example, the Swish-e distribution includes a module called "SWISH::Filter::Pdf2HTML" that uses the pdftotext and pdfinfo utilities provided by the Xpdf package. This means that if you are using "spider.pl" to spider your web site and you wish to index PDF documents, all that is needed is to install the Xpdf package and Swish-e (with the help of spider.pl) will begin indexing your PDF files. Ok, so what does all that mean? For a very simple site, you should be able to run this: $ /usr/local/lib/swish-e/spider.pl default http://localhost/ | swish-e -S prog -i stdin which is running the spider with default spider settings, indexing the Web server on localhost, and piping its output into Swish-e (using the default indexing settings). Documents will be filtered automatically, if you have the required helper applications installed. Most people will not want to just use the default settings (for one thing, the spider will take a while because its default is to delay a few seconds between every request). So, read the documentation for "spider.pl", to learn how to use a spider config file. Also read SWISH-CONFIG to learn about what configuration options can be used with Swish-e. The "SWISH::Filter" documentation provides more details on filtering and hints for debugging problems when filtering. Document Info $Id: INSTALL.pod 1978 2007-12-08 01:59:17Z karpet $ . swish-e-2.4.7/tests/0000777000077100017500000000000011166013172011252 500000000000000swish-e-2.4.7/tests/test_phrase.html0000775000077100017500000000023711166010112014373 00000000000000 If you are seeing this, the PHRASE search was successful! Once upon time there was three little pigs and the wolf swish-e-2.4.7/tests/test_xml.html0000775000077100017500000000023711166010112013711 00000000000000 If you are seeing this, the METATAG XML search was successful! This is metatest3 Just a sample swish-e-2.4.7/tests/Makefile.am0000664000077100017500000000063411166010112013216 00000000000000TESTS = check_index check_search check_metasearch check_fuzzy TESTS_ENVIRONMENT = top_builddir=$(top_builddir) EXTRA_DIST = \ test.config \ test.fuzzy.config \ test.html \ test.txt \ test.xml \ test_meta.html \ test_meta2.html \ test_phrase.html \ test_xml.html \ $(TESTS) common.sh DISTCLEANFILES = \ index.swish-e.prop \ index.swish-e .PHONEY: test test: check swish-e-2.4.7/tests/test.fuzzy.config0000664000077100017500000000064311166010112014516 00000000000000# Config file for indexing the test files IndexOnly .html .txt .xml MetaNames meta1 meta2 meta3 PropertyNames meta1 meta2 meta3 IndexComments yes DefaultContents TXT* IndexContents XML* .xml IndexContents HTML* .htm .html StoreDescription TXT* 20 StoreDescription HTML* 50 StoreDescription XML* # to test the RankScheme IgnoreTotalWordCountWhenRanking 0 # Fuzzy feature FuzzyIndexingMode Stemming_en1 swish-e-2.4.7/tests/check_metasearch0000775000077100017500000000057711166010112014367 00000000000000#!/bin/sh ## -*- sh -*- ## incomplete.test -- Test incomplete command handling # Common definitions if test -z "$srcdir"; then srcdir=`echo "$0" | sed 's,[^/]*$,,'` test "$srcdir" = "$0" && srcdir=. test -z "$srcdir" && srcdir=. test "${VERBOSE+set}" != set && VERBOSE=1 fi . $srcdir/common.sh # this is the test script $SWISH -w meta1=metatest1 | egrep '^[0-9]' swish-e-2.4.7/tests/test.txt0000664000077100017500000000006711166010112012702 00000000000000This is just a text file Line two Line three Line four swish-e-2.4.7/tests/test.config0000664000077100017500000000056211166010112013330 00000000000000# Config file for indexing the test files IndexOnly .html .txt .xml MetaNames meta1 meta2 meta3 PropertyNames meta1 meta2 meta3 IndexComments yes DefaultContents TXT* IndexContents XML* .xml IndexContents HTML* .htm .html StoreDescription TXT* 20 StoreDescription HTML* 50 StoreDescription XML* # to test the RankScheme IgnoreTotalWordCountWhenRanking 0swish-e-2.4.7/tests/test_meta.html0000775000077100017500000000035111166010112014034 00000000000000 If you are seeing this, the METATAG search 1 was successful! Bla, Bla swish-e-2.4.7/tests/Makefile.in0000664000077100017500000002741311166010112013233 00000000000000# Makefile.in generated by automake 1.9.6 from Makefile.am. # @configure_input@ # Copyright (C) 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, # 2003, 2004, 2005 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. @SET_MAKE@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = .. am__cd = CDPATH="$${ZSH_VERSION+.}$(PATH_SEPARATOR)" && cd INSTALL = @INSTALL@ install_sh_DATA = $(install_sh) -c -m 644 install_sh_PROGRAM = $(install_sh) -c install_sh_SCRIPT = $(install_sh) -c INSTALL_HEADER = $(INSTALL_DATA) transform = $(program_transform_name) NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : build_triplet = @build@ host_triplet = @host@ subdir = tests DIST_COMMON = $(srcdir)/Makefile.am $(srcdir)/Makefile.in ACLOCAL_M4 = $(top_srcdir)/aclocal.m4 am__aclocal_m4_deps = $(top_srcdir)/config/acinclude.m4 \ $(top_srcdir)/configure.in am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \ $(ACLOCAL_M4) mkinstalldirs = $(SHELL) $(top_srcdir)/config/mkinstalldirs CONFIG_HEADER = $(top_builddir)/src/acconfig.h CONFIG_CLEAN_FILES = SOURCES = DIST_SOURCES = DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST) ACLOCAL = @ACLOCAL@ ALLOCA = @ALLOCA@ AMDEP_FALSE = @AMDEP_FALSE@ AMDEP_TRUE = @AMDEP_TRUE@ AMTAR = @AMTAR@ AR = @AR@ AS = @AS@ AUTOCONF = @AUTOCONF@ AUTOHEADER = @AUTOHEADER@ AUTOMAKE = @AUTOMAKE@ AWK = @AWK@ BTREE_OBJS = @BTREE_OBJS@ BUILDDOCS_FALSE = @BUILDDOCS_FALSE@ BUILDDOCS_TRUE = @BUILDDOCS_TRUE@ CC = @CC@ CCDEPMODE = @CCDEPMODE@ CFLAGS = @CFLAGS@ CPP = @CPP@ CPPFLAGS = @CPPFLAGS@ CXX = @CXX@ CXXCPP = @CXXCPP@ CXXDEPMODE = @CXXDEPMODE@ CXXFLAGS = @CXXFLAGS@ CYGPATH_W = @CYGPATH_W@ DEFS = @DEFS@ DEPDIR = @DEPDIR@ DLLTOOL = @DLLTOOL@ ECHO = @ECHO@ ECHO_C = @ECHO_C@ ECHO_N = @ECHO_N@ ECHO_T = @ECHO_T@ EGREP = @EGREP@ EXEEXT = @EXEEXT@ F77 = @F77@ FFLAGS = @FFLAGS@ INSTALLDOCS_FALSE = @INSTALLDOCS_FALSE@ INSTALLDOCS_TRUE = @INSTALLDOCS_TRUE@ INSTALL_DATA = @INSTALL_DATA@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@ LARGEFILES_MACROS = @LARGEFILES_MACROS@ LDFLAGS = @LDFLAGS@ LIBOBJS = @LIBOBJS@ LIBS = @LIBS@ LIBTOOL = @LIBTOOL@ LIBXML2_CFLAGS = @LIBXML2_CFLAGS@ LIBXML2_LIB = @LIBXML2_LIB@ LIBXML2_OBJS = @LIBXML2_OBJS@ LIBXML_REQUIRED_VERSION = @LIBXML_REQUIRED_VERSION@ LN_S = @LN_S@ LTLIBOBJS = @LTLIBOBJS@ MAINT = @MAINT@ MAINTAINER_MODE_FALSE = @MAINTAINER_MODE_FALSE@ MAINTAINER_MODE_TRUE = @MAINTAINER_MODE_TRUE@ MAKEINFO = @MAKEINFO@ OBJDUMP = @OBJDUMP@ OBJEXT = @OBJEXT@ PACKAGE = @PACKAGE@ PACKAGE_BUGREPORT = @PACKAGE_BUGREPORT@ PACKAGE_NAME = @PACKAGE_NAME@ PACKAGE_STRING = @PACKAGE_STRING@ PACKAGE_TARNAME = @PACKAGE_TARNAME@ PACKAGE_VERSION = @PACKAGE_VERSION@ PATH_SEPARATOR = @PATH_SEPARATOR@ PCRE_CFLAGS = @PCRE_CFLAGS@ PCRE_CONFIG = @PCRE_CONFIG@ PCRE_LIBS = @PCRE_LIBS@ PCRE_REQUIRED_VERSION = @PCRE_REQUIRED_VERSION@ PERL = @PERL@ POD2MAN = @POD2MAN@ RANLIB = @RANLIB@ SET_MAKE = @SET_MAKE@ SHELL = @SHELL@ STRIP = @STRIP@ SWISH_WEB = @SWISH_WEB@ VERSION = @VERSION@ XML2_CONFIG = @XML2_CONFIG@ Z_CFLAGS = @Z_CFLAGS@ Z_LIBS = @Z_LIBS@ ac_ct_AR = @ac_ct_AR@ ac_ct_AS = @ac_ct_AS@ ac_ct_CC = @ac_ct_CC@ ac_ct_CXX = @ac_ct_CXX@ ac_ct_DLLTOOL = @ac_ct_DLLTOOL@ ac_ct_F77 = @ac_ct_F77@ ac_ct_OBJDUMP = @ac_ct_OBJDUMP@ ac_ct_RANLIB = @ac_ct_RANLIB@ ac_ct_STRIP = @ac_ct_STRIP@ am__fastdepCC_FALSE = @am__fastdepCC_FALSE@ am__fastdepCC_TRUE = @am__fastdepCC_TRUE@ am__fastdepCXX_FALSE = @am__fastdepCXX_FALSE@ am__fastdepCXX_TRUE = @am__fastdepCXX_TRUE@ am__include = @am__include@ am__leading_dot = @am__leading_dot@ am__quote = @am__quote@ am__tar = @am__tar@ am__untar = @am__untar@ bindir = @bindir@ build = @build@ build_alias = @build_alias@ build_cpu = @build_cpu@ build_os = @build_os@ build_vendor = @build_vendor@ datadir = @datadir@ exec_prefix = @exec_prefix@ host = @host@ host_alias = @host_alias@ host_cpu = @host_cpu@ host_os = @host_os@ host_vendor = @host_vendor@ includedir = @includedir@ infodir = @infodir@ install_sh = @install_sh@ libdir = @libdir@ libexecdir = @libexecdir@ localstatedir = @localstatedir@ mandir = @mandir@ mkdir_p = @mkdir_p@ oldincludedir = @oldincludedir@ prefix = @prefix@ program_transform_name = @program_transform_name@ sbindir = @sbindir@ sharedstatedir = @sharedstatedir@ sysconfdir = @sysconfdir@ target_alias = @target_alias@ TESTS = check_index check_search check_metasearch check_fuzzy TESTS_ENVIRONMENT = top_builddir=$(top_builddir) EXTRA_DIST = \ test.config \ test.fuzzy.config \ test.html \ test.txt \ test.xml \ test_meta.html \ test_meta2.html \ test_phrase.html \ test_xml.html \ $(TESTS) common.sh DISTCLEANFILES = \ index.swish-e.prop \ index.swish-e all: all-am .SUFFIXES: $(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am $(am__configure_deps) @for dep in $?; do \ case '$(am__configure_deps)' in \ *$$dep*) \ cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh \ && exit 0; \ exit 1;; \ esac; \ done; \ echo ' cd $(top_srcdir) && $(AUTOMAKE) --foreign tests/Makefile'; \ cd $(top_srcdir) && \ $(AUTOMAKE) --foreign tests/Makefile .PRECIOUS: Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status @case '$?' in \ *config.status*) \ cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh;; \ *) \ echo ' cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe)'; \ cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe);; \ esac; $(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh $(top_srcdir)/configure: @MAINTAINER_MODE_TRUE@ $(am__configure_deps) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh $(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh mostlyclean-libtool: -rm -f *.lo clean-libtool: -rm -rf .libs _libs distclean-libtool: -rm -f libtool uninstall-info-am: tags: TAGS TAGS: ctags: CTAGS CTAGS: check-TESTS: $(TESTS) @failed=0; all=0; xfail=0; xpass=0; skip=0; \ srcdir=$(srcdir); export srcdir; \ list='$(TESTS)'; \ if test -n "$$list"; then \ for tst in $$list; do \ if test -f ./$$tst; then dir=./; \ elif test -f $$tst; then dir=; \ else dir="$(srcdir)/"; fi; \ if $(TESTS_ENVIRONMENT) $${dir}$$tst; then \ all=`expr $$all + 1`; \ case " $(XFAIL_TESTS) " in \ *" $$tst "*) \ xpass=`expr $$xpass + 1`; \ failed=`expr $$failed + 1`; \ echo "XPASS: $$tst"; \ ;; \ *) \ echo "PASS: $$tst"; \ ;; \ esac; \ elif test $$? -ne 77; then \ all=`expr $$all + 1`; \ case " $(XFAIL_TESTS) " in \ *" $$tst "*) \ xfail=`expr $$xfail + 1`; \ echo "XFAIL: $$tst"; \ ;; \ *) \ failed=`expr $$failed + 1`; \ echo "FAIL: $$tst"; \ ;; \ esac; \ else \ skip=`expr $$skip + 1`; \ echo "SKIP: $$tst"; \ fi; \ done; \ if test "$$failed" -eq 0; then \ if test "$$xfail" -eq 0; then \ banner="All $$all tests passed"; \ else \ banner="All $$all tests behaved as expected ($$xfail expected failures)"; \ fi; \ else \ if test "$$xpass" -eq 0; then \ banner="$$failed of $$all tests failed"; \ else \ banner="$$failed of $$all tests did not behave as expected ($$xpass unexpected passes)"; \ fi; \ fi; \ dashes="$$banner"; \ skipped=""; \ if test "$$skip" -ne 0; then \ skipped="($$skip tests were not run)"; \ test `echo "$$skipped" | wc -c` -le `echo "$$banner" | wc -c` || \ dashes="$$skipped"; \ fi; \ report=""; \ if test "$$failed" -ne 0 && test -n "$(PACKAGE_BUGREPORT)"; then \ report="Please report to $(PACKAGE_BUGREPORT)"; \ test `echo "$$report" | wc -c` -le `echo "$$banner" | wc -c` || \ dashes="$$report"; \ fi; \ dashes=`echo "$$dashes" | sed s/./=/g`; \ echo "$$dashes"; \ echo "$$banner"; \ test -z "$$skipped" || echo "$$skipped"; \ test -z "$$report" || echo "$$report"; \ echo "$$dashes"; \ test "$$failed" -eq 0; \ else :; fi distdir: $(DISTFILES) @srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; \ topsrcdirstrip=`echo "$(top_srcdir)" | sed 's|.|.|g'`; \ list='$(DISTFILES)'; for file in $$list; do \ case $$file in \ $(srcdir)/*) file=`echo "$$file" | sed "s|^$$srcdirstrip/||"`;; \ $(top_srcdir)/*) file=`echo "$$file" | sed "s|^$$topsrcdirstrip/|$(top_builddir)/|"`;; \ esac; \ if test -f $$file || test -d $$file; then d=.; else d=$(srcdir); fi; \ dir=`echo "$$file" | sed -e 's,/[^/]*$$,,'`; \ if test "$$dir" != "$$file" && test "$$dir" != "."; then \ dir="/$$dir"; \ $(mkdir_p) "$(distdir)$$dir"; \ else \ dir=''; \ fi; \ if test -d $$d/$$file; then \ if test -d $(srcdir)/$$file && test $$d != $(srcdir); then \ cp -pR $(srcdir)/$$file $(distdir)$$dir || exit 1; \ fi; \ cp -pR $$d/$$file $(distdir)$$dir || exit 1; \ else \ test -f $(distdir)/$$file \ || cp -p $$d/$$file $(distdir)/$$file \ || exit 1; \ fi; \ done check-am: all-am $(MAKE) $(AM_MAKEFLAGS) check-TESTS check: check-am all-am: Makefile installdirs: install: install-am install-exec: install-exec-am install-data: install-data-am uninstall: uninstall-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am installcheck: installcheck-am install-strip: $(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \ install_sh_PROGRAM="$(INSTALL_STRIP_PROGRAM)" INSTALL_STRIP_FLAG=-s \ `test -z '$(STRIP)' || \ echo "INSTALL_PROGRAM_ENV=STRIPPROG='$(STRIP)'"` install mostlyclean-generic: clean-generic: distclean-generic: -test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES) -test -z "$(DISTCLEANFILES)" || rm -f $(DISTCLEANFILES) maintainer-clean-generic: @echo "This command is intended for maintainers to use" @echo "it deletes files that may require special tools to rebuild." clean: clean-am clean-am: clean-generic clean-libtool mostlyclean-am distclean: distclean-am -rm -f Makefile distclean-am: clean-am distclean-generic distclean-libtool dvi: dvi-am dvi-am: html: html-am info: info-am info-am: install-data-am: install-exec-am: install-info: install-info-am install-man: installcheck-am: maintainer-clean: maintainer-clean-am -rm -f Makefile maintainer-clean-am: distclean-am maintainer-clean-generic mostlyclean: mostlyclean-am mostlyclean-am: mostlyclean-generic mostlyclean-libtool pdf: pdf-am pdf-am: ps: ps-am ps-am: uninstall-am: uninstall-info-am .PHONY: all all-am check check-TESTS check-am clean clean-generic \ clean-libtool distclean distclean-generic distclean-libtool \ distdir dvi dvi-am html html-am info info-am install \ install-am install-data install-data-am install-exec \ install-exec-am install-info install-info-am install-man \ install-strip installcheck installcheck-am installdirs \ maintainer-clean maintainer-clean-generic mostlyclean \ mostlyclean-generic mostlyclean-libtool pdf pdf-am ps ps-am \ uninstall uninstall-am uninstall-info-am .PHONEY: test test: check # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: swish-e-2.4.7/tests/common.sh0000664000077100017500000000151711166010112013007 00000000000000#!/bin/sh # From GNU Autoconf book # Make sure srcdir is an absolute path. Supply the variable # if it does not exist. We want to be able to run the tests # stand-alone!! # srcdir=${srcdir-.} if test ! -d $srcdir ; then echo "defs: installation error" 1>&2 exit 1 fi # IF the source directory is a Unix or a DOS root directory, ... # case "$srcdir" in /* | [A-Za-z]:\\*) ;; *) srcdir=`\cd $srcdir && pwd` ;; esac case "$top_builddir" in /* | [A-Za-z]:\\*) ;; *) top_builddir=`\cd ${top_builddir-..} && pwd` ;; esac progname=`echo "$0" | sed 's,^.*/,,'` testname=`echo "$progname" | sed 's,-.*$,,'` # User can set VERBOSE to prevent output redirection case x$VERBOSE in xNO | xno | x0 | x) exec > /dev/null 2>&1 ;; esac echo "=== Running test $progname" SWISH="${top_builddir}/src/swish-e" swish-e-2.4.7/tests/test.html0000775000077100017500000000066511166010112013036 00000000000000 If you are seeing this, the test was successful! This is an initial paragraph...

This is a number one header.

This is a number two header.

This is a paragraph. I have typed some bold text, some italic text too. This is sample of entities: España This is an example and not a real doc swish-e-2.4.7/tests/check_search0000775000077100017500000000056311166010112013513 00000000000000#!/bin/sh ## -*- sh -*- ## incomplete.test -- Test incomplete command handling # Common definitions if test -z "$srcdir"; then srcdir=`echo "$0" | sed 's,[^/]*$,,'` test "$srcdir" = "$0" && srcdir=. test -z "$srcdir" && srcdir=. test "${VERBOSE+set}" != set && VERBOSE=1 fi . $srcdir/common.sh # this is the test script $SWISH -w test | egrep '^[0-9]' swish-e-2.4.7/tests/test_meta2.html0000775000077100017500000000041011166010112014112 00000000000000 If you are seeing this, the METATAG search 2 was successful! This is metatest2 Bla, bla This is is the DESCRIPTION of metatest2 Bla, Bla swish-e-2.4.7/tests/test.xml0000775000077100017500000000017611166010112012667 00000000000000 This is metatest3 Just a sample This is the DESCRIPTION of test.xml swish-e-2.4.7/tests/check_fuzzy0000775000077100017500000000062311166010112013432 00000000000000#!/bin/sh ## -*- sh -*- ## incomplete.test -- Test incomplete command handling # Common definitions if test -z "$srcdir"; then srcdir=`echo "$0" | sed 's,[^/]*$,,'` test "$srcdir" = "$0" && srcdir=. test -z "$srcdir" && srcdir=. test "${VERBOSE+set}" != set && VERBOSE=1 fi . $srcdir/common.sh # this is the test script $SWISH -c $srcdir/test.fuzzy.config -i $srcdir -T indexed_words swish-e-2.4.7/tests/check_index0000775000077100017500000000061511166010112013353 00000000000000#!/bin/sh ## -*- sh -*- ## incomplete.test -- Test incomplete command handling # Common definitions if test -z "$srcdir"; then srcdir=`echo "$0" | sed 's,[^/]*$,,'` test "$srcdir" = "$0" && srcdir=. test -z "$srcdir" && srcdir=. test "${VERBOSE+set}" != set && VERBOSE=1 fi . $srcdir/common.sh # this is the test script $SWISH -c $srcdir/test.config -i $srcdir -T indexed_words swish-e-2.4.7/swish-e.pc.in0000664000077100017500000000044011166010113012324 00000000000000prefix=@prefix@ exec_prefix=@exec_prefix@ libdir=@libdir@ includedir=@includedir@ Name: swish-e Version: @VERSION@ Description: SWISH-E - Simple Web Indexing System for Humans - Enhanced Requires: Libs: -L${libdir} -lswish-e @Z_LIBS@ @LIBS@ Cflags: @CFLAGS@ -I${includedir} @Z_CFLAGS@ swish-e-2.4.7/example/0000777000077100017500000000000011166013170011541 500000000000000swish-e-2.4.7/example/templates/0000777000077100017500000000000011166013170013537 500000000000000swish-e-2.4.7/example/templates/page_layout0000664000077100017500000000242011166010111015677 00000000000000 [% # This template defines the layout of the site %] [% config.title %]
[% PROCESS common_header %]

Interesting Stories

Some content here

Other News

Some news contnet here
[% content %]

$hidden
$advanced_form
EOF } #===================================================================== # This routine creates the results header display # and navigation bar # # # sub results_header { my $results = shift; my $config = $results->{config}; my $q = $results->{q}; my $swr = $results->header('removed stopwords'); my $stopwords = ''; if ( $swr && ref $swr eq 'ARRAY' ) { $stopwords = @$swr > 1 ? join( ', ', map { "$_" } @$swr ) . ' are very common words and were not included in your search' : join( ', ', map { "$_" } @$swr ) . ' is a very common word and was not included in your search'; } my $limits = ''; # Ok, this is ugly. if ( $results->{DateRanges_time_low} && $results->{DateRanges_time_high} ) { my $low = scalar localtime $results->{DateRanges_time_low}; my $high = scalar localtime $results->{DateRanges_time_high}; $limits = <  Results limited to dates $low to $high EOF } my $query_href = $results->{query_href}; my $query_simple = CGI::escapeHTML( $results->{query_simple} ); my $pages = $results->navigation('pages'); my $prev = $results->navigation('prev'); my $prev_count = $results->navigation('prev_count'); my $next = $results->navigation('next'); my $next_count = $results->navigation('next_count'); my $hits = $results->navigation('hits'); my $from = $results->navigation('from'); my $to = $results->navigation('to'); my $run_time = $results->navigation('run_time'); my $search_time = $results->navigation('search_time'); my $links = ''; $links .= ' Page:' . $pages if $pages; $links .= qq[ Previous $prev_count] if $prev_count; $links .= qq[ Next $next_count] if $next_count; # Save for the bottom of the screen. $results->{LINKS} = $links; $links = qq[$links] if $links; $query_simple = $query_simple ? " Results for $query_simple" : ''; $results->{links} = $links if $links; return < $query_simple   $from to $to of $hits results. Run time: $run_time | Search time: $search_time     $links $limits $stopwords EOF } #===================================================================== # This routine formats a single result # # sub show_result { my ($results, $this_result ) = @_; my $conf = $results->{conf}; my $DocTitle = $results->config('title_property') || 'swishtitle'; my $title = $this_result->{$DocTitle} || $this_result->{swishdocpath} || '?'; my $name_labels = $results->config('name_labels'); # The the properties to display my $props = ''; my $display_props = $results->config('display_props'); if ( $display_props ) { $props = join "\n", '
', map ( { '' } @$display_props ), '
' . ( $name_labels->{$_} || $_ ) . ': ' . '' . ( defined $this_result->{$_} ? $this_result->{$_} : '' ) . '' . '
'; } my $description_prop = $results->config('description_prop'); my $description = ''; if ( $description_prop ) { $description = $this_result->{ $description_prop } || ''; } return <
$this_result->{swishreccount} $title -- rank: $this_result->{swishrank}
$description $props
EOF } #================================================================== # Form setup for sorts and metas # # This could be methods of $results object # (and then available for Template-Toolkit) # But that's too much HTML in the object, perhaps. # # #================================================================== sub get_meta_name_limits { my ( $results ) = @_; my $metanames = $results->config('metanames'); return '' unless $metanames; my $name_labels = $results->config('name_labels'); my $q = $results->CGI; return join "\n", 'Limit search to:', $q->radio_group( -name =>'metaname', -values => $metanames, -default=>$metanames->[0], -labels =>$name_labels ), '
'; } sub get_sort_select_list { my ( $results ) = @_; my $sort_metas = $results->config('sorts'); return '' unless $sort_metas; my $name_labels = $results->config('name_labels'); my $q = $results->CGI; return join "\n", 'Sort by:', $q->popup_menu( -name =>'sort', -values => $sort_metas, -default=>$sort_metas->[0], -labels =>$name_labels ), $q->checkbox( -name => 'reverse', -label => 'Reverse Sort' ); } sub get_index_select_list { my ( $results ) = @_; my $q = $results->CGI; my $indexes = $results->config('swish_index'); return '' unless ref $indexes eq 'ARRAY'; my $select_config = $results->config('select_indexes'); return '' unless $select_config && ref $select_config eq 'HASH'; # Should return a warning, as this might be a likely mistake # This jumps through hoops so that real index file name is not exposed return '' unless exists $select_config->{labels} && ref $select_config->{labels} eq 'ARRAY' && @$indexes == @{$select_config->{labels}}; my @labels = @{$select_config->{labels}}; my %map; for ( 0..$#labels ) { $map{$_} = $labels[$_]; } my $method = $select_config->{method} || 'checkbox_group'; my @cols = $select_config->{columns} ? ('-columns', $select_config->{columns}) : (); return join "\n", '
', ( $select_config->{description} || 'Select: '), $q->$method( -name => 'si', -values => [0..$#labels], -default=> 0, -labels => \%map, @cols ); } sub get_limit_select { my ( $results ) = @_; my $q = $results->CGI; my $limit = $results->config('select_by_meta'); return '' unless ref $limit eq 'HASH'; my $method = $limit->{method} || 'checkbox_group'; my @options = ( -name => 'sbm', -values => $limit->{values}, -labels => $limit->{labels} || {}, ); push @options, ( -columns=> $limit->{columns} ) if $limit->{columns}; return join "\n", '
', ( $limit->{description} || 'Select: '), $q->$method( @options ); } 1; swish-e-2.4.7/example/modules/SWISH/DefaultHighlight.pm0000664000077100017500000001456411166010111017600 00000000000000#======================================================================= # Default Highlighting Code # # Context highlighting & deals with stemming, but not phrases or stopwords # # $Id: DefaultHighlight.pm 1303 2003-07-23 00:45:16Z whmoseley $ #======================================================================= package SWISH::DefaultHighlight; use strict; sub new { my ( $class, $settings, $headers ) = @_; my $self = bless { settings=> $settings, headers => $headers, }, $class; if ( $self->header('stemming applied') =~ /^(?:1|yes)$/i ) { eval { require SWISH::Stemmer }; if ( $@ ) { warn('Stemmed index needs Stemmer.pm to highlight: ' . $@); } else { $self->{stemmer_function} = \&SWISH::Stemmer::SwishStem; } } $self->set_match_regexp; return $self; } sub header { my $self = shift; return '' unless ref $self->{headers} eq 'HASH'; return $self->{headers}{$_[0]} || ''; } #========================================================================== # Returns true IF prop was HTML escaped. sub highlight { my ( $self, $text_ref, $phrase_array, $prop_name ) = @_; my $wc_regexp = $self->{wc_regexp}; my $extract_regexp = $self->{extract_regexp}; my $match_regexp = $self->match_string( $phrase_array, $prop_name ); my $last = 0; my $settings = $self->{settings}; my $Show_Words = $settings->{show_words} || 10; my $Occurrences = $settings->{occurrences} || 5; my $Max_Words = $settings->{max_words} || 100; my $On = $settings->{highlight_on} || ''; my $Off = $settings->{highlight_off} || ''; my $on_flag = 'sw' . time . 'on'; my $off_flag = 'sw' . time . 'off'; my $stemmer_function = $self->{stemmer_function}; # Should really call unescapeHTML(), but then would need to escape from escaping. my @words = split /$wc_regexp/, $$text_ref; return unless @words; my @flags; $flags[$#words] = 0; # Extend array. my $occurrences = $Occurrences ; my $pos = 0; while ( $Show_Words && $pos <= $#words ) { # Check if the word is a swish word (ignoring begin and end chars) if ( $words[$pos] =~ /$extract_regexp/ ) { my ( $begin, $word, $end ) = ( $1, $2, $3 ); my $test = $stemmer_function ? $stemmer_function->($word) : lc $word; $test ||= lc $word; # Not check if word matches if ( $test =~ /$match_regexp/ ) { $words[$pos] = "$begin$on_flag$word$off_flag$end"; my $start = $pos - ($Show_Words-1)* 2; my $end = $pos + ($Show_Words-1)* 2; if ( $start < 0 ) { $end = $end - $start; $start = 0; } $end = $#words if $end > $#words; $flags[$_]++ for $start .. $end; # All done, and mark where to stop looking if ( $occurrences-- <= 0 ) { $last = $end; last; } } } $pos += 2; # Skip to next wordchar word } my $dotdotdot = ' ... '; my @output; my $printing; my $first = 1; my $some_printed; if ( $Show_Words && @words > 50 ) { # don't limit context if a small number of words for my $i ( 0 ..$#words ) { if ( $last && $i >= $last && $i < $#words ) { push @output, $dotdotdot; last; } if ( $flags[$i] ) { push @output, $dotdotdot if !$printing++ && !$first; push @output, $words[$i]; $some_printed++; } else { $printing = 0; } $first = 0; } } if ( !$some_printed ) { for my $i ( 0 .. $Max_Words ) { if ( $i > $#words ) { $printing++; last; } push @output, $words[$i]; } } push @output, $dotdotdot if !$printing; $$text_ref = join '', @output; my %entities = ( '&' => '&', '>' => '>', '<' => '<', '"' => '"', ); my %highlight = ( $on_flag => $On, $off_flag => $Off, ); $$text_ref =~ s/([&"<>])/$entities{$1}/ge; # " fix emacs $$text_ref =~ s/($on_flag|$off_flag)/$highlight{$1}/ge; return 1; # Means that prop was processed AND was html escaped. } #============================================ # Returns compiled regular expressions for matching # sub match_string { my ($self, $phrases, $prop_name ) = @_; # Already cached? return $self->{cache}{$prop_name} if $self->{cache}{$prop_name}; my $wc = quotemeta $self->header('wordcharacters'); # Yuck! $wc .= 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'; # Warning: dependent on tolower used while indexing # Get all the unique words; my %words; for my $phrase ( @$phrases ) { $words{$_}++ for @$phrase; } my $match_string = join '|', map { substr( $_, -1, 1 ) eq '*' ? quotemeta( substr( $_, 0, -1) ) . "[$wc]*?" : quotemeta } keys %words; my $re = qr/^(?:$match_string)$/; $self->{cache}{$prop_name} = $re; return $re; } #============================================ # Returns compiled regular expressions for splitting the source text into "swish words" # # sub set_match_regexp { my $self = shift; my $ignoref = $self->header('ignorefirstchar'); my $ignorel = $self->header('ignorelastchar'); for ( $ignoref, $ignorel ) { if ( $_ ) { $_ = quotemeta; $_ = "([$_]*)"; } else { $_ = '()'; } } my $wc = quotemeta $self->header('wordcharacters'); $wc .= 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'; # Warning: dependent on tolower used while indexing # Now, wait a minute. Look at this more, as I'd hope that making a # qr// go out of scope would release the compiled pattern. # doesn't really matter, as $wc probably never changes $self->{wc_regexp} = qr/([^$wc]+)/; # regexp for splitting into swish-words $self->{extract_regexp} = qr/^$ignoref([$wc]+?)$ignorel$/i; # regexp for extracting out the words to compare } 1; swish-e-2.4.7/example/Makefile.in0000664000077100017500000003555111166010111013525 00000000000000# Makefile.in generated by automake 1.9.6 from Makefile.am. # @configure_input@ # Copyright (C) 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, # 2003, 2004, 2005 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. @SET_MAKE@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = .. am__cd = CDPATH="$${ZSH_VERSION+.}$(PATH_SEPARATOR)" && cd INSTALL = @INSTALL@ install_sh_DATA = $(install_sh) -c -m 644 install_sh_PROGRAM = $(install_sh) -c install_sh_SCRIPT = $(install_sh) -c INSTALL_HEADER = $(INSTALL_DATA) transform = $(program_transform_name) NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : build_triplet = @build@ host_triplet = @host@ subdir = example DIST_COMMON = README $(srcdir)/Makefile.am $(srcdir)/Makefile.in ACLOCAL_M4 = $(top_srcdir)/aclocal.m4 am__aclocal_m4_deps = $(top_srcdir)/config/acinclude.m4 \ $(top_srcdir)/configure.in am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \ $(ACLOCAL_M4) mkinstalldirs = $(SHELL) $(top_srcdir)/config/mkinstalldirs CONFIG_HEADER = $(top_builddir)/src/acconfig.h CONFIG_CLEAN_FILES = am__installdirs = "$(DESTDIR)$(libexecdir)" \ "$(DESTDIR)$(perlmoduledir)" "$(DESTDIR)$(pkgdatadir)" \ "$(DESTDIR)$(templatedir)" libexecSCRIPT_INSTALL = $(INSTALL_SCRIPT) perlmoduleSCRIPT_INSTALL = $(INSTALL_SCRIPT) SCRIPTS = $(libexec_SCRIPTS) $(perlmodule_SCRIPTS) SOURCES = DIST_SOURCES = am__vpath_adj_setup = srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; am__vpath_adj = case $$p in \ $(srcdir)/*) f=`echo "$$p" | sed "s|^$$srcdirstrip/||"`;; \ *) f=$$p;; \ esac; am__strip_dir = `echo $$p | sed -e 's|^.*/||'`; pkgdataDATA_INSTALL = $(INSTALL_DATA) templateDATA_INSTALL = $(INSTALL_DATA) DATA = $(pkgdata_DATA) $(template_DATA) DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST) ACLOCAL = @ACLOCAL@ ALLOCA = @ALLOCA@ AMDEP_FALSE = @AMDEP_FALSE@ AMDEP_TRUE = @AMDEP_TRUE@ AMTAR = @AMTAR@ AR = @AR@ AS = @AS@ AUTOCONF = @AUTOCONF@ AUTOHEADER = @AUTOHEADER@ AUTOMAKE = @AUTOMAKE@ AWK = @AWK@ BTREE_OBJS = @BTREE_OBJS@ BUILDDOCS_FALSE = @BUILDDOCS_FALSE@ BUILDDOCS_TRUE = @BUILDDOCS_TRUE@ CC = @CC@ CCDEPMODE = @CCDEPMODE@ CFLAGS = @CFLAGS@ CPP = @CPP@ CPPFLAGS = @CPPFLAGS@ CXX = @CXX@ CXXCPP = @CXXCPP@ CXXDEPMODE = @CXXDEPMODE@ CXXFLAGS = @CXXFLAGS@ CYGPATH_W = @CYGPATH_W@ DEFS = @DEFS@ DEPDIR = @DEPDIR@ DLLTOOL = @DLLTOOL@ ECHO = @ECHO@ ECHO_C = @ECHO_C@ ECHO_N = @ECHO_N@ ECHO_T = @ECHO_T@ EGREP = @EGREP@ EXEEXT = @EXEEXT@ F77 = @F77@ FFLAGS = @FFLAGS@ INSTALLDOCS_FALSE = @INSTALLDOCS_FALSE@ INSTALLDOCS_TRUE = @INSTALLDOCS_TRUE@ INSTALL_DATA = @INSTALL_DATA@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@ LARGEFILES_MACROS = @LARGEFILES_MACROS@ LDFLAGS = @LDFLAGS@ LIBOBJS = @LIBOBJS@ LIBS = @LIBS@ LIBTOOL = @LIBTOOL@ LIBXML2_CFLAGS = @LIBXML2_CFLAGS@ LIBXML2_LIB = @LIBXML2_LIB@ LIBXML2_OBJS = @LIBXML2_OBJS@ LIBXML_REQUIRED_VERSION = @LIBXML_REQUIRED_VERSION@ LN_S = @LN_S@ LTLIBOBJS = @LTLIBOBJS@ MAINT = @MAINT@ MAINTAINER_MODE_FALSE = @MAINTAINER_MODE_FALSE@ MAINTAINER_MODE_TRUE = @MAINTAINER_MODE_TRUE@ MAKEINFO = @MAKEINFO@ OBJDUMP = @OBJDUMP@ OBJEXT = @OBJEXT@ PACKAGE = @PACKAGE@ PACKAGE_BUGREPORT = @PACKAGE_BUGREPORT@ PACKAGE_NAME = @PACKAGE_NAME@ PACKAGE_STRING = @PACKAGE_STRING@ PACKAGE_TARNAME = @PACKAGE_TARNAME@ PACKAGE_VERSION = @PACKAGE_VERSION@ PATH_SEPARATOR = @PATH_SEPARATOR@ PCRE_CFLAGS = @PCRE_CFLAGS@ PCRE_CONFIG = @PCRE_CONFIG@ PCRE_LIBS = @PCRE_LIBS@ PCRE_REQUIRED_VERSION = @PCRE_REQUIRED_VERSION@ PERL = @PERL@ POD2MAN = @POD2MAN@ RANLIB = @RANLIB@ SET_MAKE = @SET_MAKE@ SHELL = @SHELL@ STRIP = @STRIP@ SWISH_WEB = @SWISH_WEB@ VERSION = @VERSION@ XML2_CONFIG = @XML2_CONFIG@ Z_CFLAGS = @Z_CFLAGS@ Z_LIBS = @Z_LIBS@ ac_ct_AR = @ac_ct_AR@ ac_ct_AS = @ac_ct_AS@ ac_ct_CC = @ac_ct_CC@ ac_ct_CXX = @ac_ct_CXX@ ac_ct_DLLTOOL = @ac_ct_DLLTOOL@ ac_ct_F77 = @ac_ct_F77@ ac_ct_OBJDUMP = @ac_ct_OBJDUMP@ ac_ct_RANLIB = @ac_ct_RANLIB@ ac_ct_STRIP = @ac_ct_STRIP@ am__fastdepCC_FALSE = @am__fastdepCC_FALSE@ am__fastdepCC_TRUE = @am__fastdepCC_TRUE@ am__fastdepCXX_FALSE = @am__fastdepCXX_FALSE@ am__fastdepCXX_TRUE = @am__fastdepCXX_TRUE@ am__include = @am__include@ am__leading_dot = @am__leading_dot@ am__quote = @am__quote@ am__tar = @am__tar@ am__untar = @am__untar@ bindir = @bindir@ build = @build@ build_alias = @build_alias@ build_cpu = @build_cpu@ build_os = @build_os@ build_vendor = @build_vendor@ datadir = @datadir@ exec_prefix = @exec_prefix@ host = @host@ host_alias = @host_alias@ host_cpu = @host_cpu@ host_os = @host_os@ host_vendor = @host_vendor@ includedir = @includedir@ infodir = @infodir@ install_sh = @install_sh@ libdir = @libdir@ libexecdir = @libexecdir@ localstatedir = @localstatedir@ mandir = @mandir@ mkdir_p = @mkdir_p@ oldincludedir = @oldincludedir@ prefix = @prefix@ program_transform_name = @program_transform_name@ sbindir = @sbindir@ sharedstatedir = @sharedstatedir@ sysconfdir = @sysconfdir@ target_alias = @target_alias@ perlmoduledir = $(libexecdir)/perl/SWISH libexec_SCRIPTS = swish.cgi search.cgi pkgdata_DATA = \ swish.tt \ swish.tmpl templatedir = $(pkgdatadir)/templates template_DATA = \ templates/search.tt \ templates/page_layout \ templates/common_header \ templates/common_footer \ templates/style.css \ templates/markup.css perlmodule_SCRIPTS = \ modules/SWISH/DateRanges.pm \ modules/SWISH/DefaultHighlight.pm \ modules/SWISH/PhraseHighlight.pm \ modules/SWISH/SimpleHighlight.pm \ modules/SWISH/TemplateDefault.pm \ modules/SWISH/TemplateDumper.pm \ modules/SWISH/TemplateFrame.pm \ modules/SWISH/TemplateHTMLTemplate.pm \ modules/SWISH/TemplateToolkit.pm \ modules/SWISH/ParseQuery.pm CLEANFILES = swish.cgi search.cgi EXTRA_DIST = \ README \ SWISH-Stemmer-0.05.tar.gz \ swish.cgi.in \ search.cgi.in \ swish.gif \ $(pkgdata_DATA) \ $(template_DATA) \ $(perlmodule_SCRIPTS) all: all-am .SUFFIXES: $(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am $(am__configure_deps) @for dep in $?; do \ case '$(am__configure_deps)' in \ *$$dep*) \ cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh \ && exit 0; \ exit 1;; \ esac; \ done; \ echo ' cd $(top_srcdir) && $(AUTOMAKE) --foreign example/Makefile'; \ cd $(top_srcdir) && \ $(AUTOMAKE) --foreign example/Makefile .PRECIOUS: Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status @case '$?' in \ *config.status*) \ cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh;; \ *) \ echo ' cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe)'; \ cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe);; \ esac; $(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh $(top_srcdir)/configure: @MAINTAINER_MODE_TRUE@ $(am__configure_deps) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh $(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh install-libexecSCRIPTS: $(libexec_SCRIPTS) @$(NORMAL_INSTALL) test -z "$(libexecdir)" || $(mkdir_p) "$(DESTDIR)$(libexecdir)" @list='$(libexec_SCRIPTS)'; for p in $$list; do \ if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \ if test -f $$d$$p; then \ f=`echo "$$p" | sed 's|^.*/||;$(transform)'`; \ echo " $(libexecSCRIPT_INSTALL) '$$d$$p' '$(DESTDIR)$(libexecdir)/$$f'"; \ $(libexecSCRIPT_INSTALL) "$$d$$p" "$(DESTDIR)$(libexecdir)/$$f"; \ else :; fi; \ done uninstall-libexecSCRIPTS: @$(NORMAL_UNINSTALL) @list='$(libexec_SCRIPTS)'; for p in $$list; do \ f=`echo "$$p" | sed 's|^.*/||;$(transform)'`; \ echo " rm -f '$(DESTDIR)$(libexecdir)/$$f'"; \ rm -f "$(DESTDIR)$(libexecdir)/$$f"; \ done install-perlmoduleSCRIPTS: $(perlmodule_SCRIPTS) @$(NORMAL_INSTALL) test -z "$(perlmoduledir)" || $(mkdir_p) "$(DESTDIR)$(perlmoduledir)" @list='$(perlmodule_SCRIPTS)'; for p in $$list; do \ if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \ if test -f $$d$$p; then \ f=`echo "$$p" | sed 's|^.*/||;$(transform)'`; \ echo " $(perlmoduleSCRIPT_INSTALL) '$$d$$p' '$(DESTDIR)$(perlmoduledir)/$$f'"; \ $(perlmoduleSCRIPT_INSTALL) "$$d$$p" "$(DESTDIR)$(perlmoduledir)/$$f"; \ else :; fi; \ done uninstall-perlmoduleSCRIPTS: @$(NORMAL_UNINSTALL) @list='$(perlmodule_SCRIPTS)'; for p in $$list; do \ f=`echo "$$p" | sed 's|^.*/||;$(transform)'`; \ echo " rm -f '$(DESTDIR)$(perlmoduledir)/$$f'"; \ rm -f "$(DESTDIR)$(perlmoduledir)/$$f"; \ done mostlyclean-libtool: -rm -f *.lo clean-libtool: -rm -rf .libs _libs distclean-libtool: -rm -f libtool uninstall-info-am: install-pkgdataDATA: $(pkgdata_DATA) @$(NORMAL_INSTALL) test -z "$(pkgdatadir)" || $(mkdir_p) "$(DESTDIR)$(pkgdatadir)" @list='$(pkgdata_DATA)'; for p in $$list; do \ if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \ f=$(am__strip_dir) \ echo " $(pkgdataDATA_INSTALL) '$$d$$p' '$(DESTDIR)$(pkgdatadir)/$$f'"; \ $(pkgdataDATA_INSTALL) "$$d$$p" "$(DESTDIR)$(pkgdatadir)/$$f"; \ done uninstall-pkgdataDATA: @$(NORMAL_UNINSTALL) @list='$(pkgdata_DATA)'; for p in $$list; do \ f=$(am__strip_dir) \ echo " rm -f '$(DESTDIR)$(pkgdatadir)/$$f'"; \ rm -f "$(DESTDIR)$(pkgdatadir)/$$f"; \ done install-templateDATA: $(template_DATA) @$(NORMAL_INSTALL) test -z "$(templatedir)" || $(mkdir_p) "$(DESTDIR)$(templatedir)" @list='$(template_DATA)'; for p in $$list; do \ if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \ f=$(am__strip_dir) \ echo " $(templateDATA_INSTALL) '$$d$$p' '$(DESTDIR)$(templatedir)/$$f'"; \ $(templateDATA_INSTALL) "$$d$$p" "$(DESTDIR)$(templatedir)/$$f"; \ done uninstall-templateDATA: @$(NORMAL_UNINSTALL) @list='$(template_DATA)'; for p in $$list; do \ f=$(am__strip_dir) \ echo " rm -f '$(DESTDIR)$(templatedir)/$$f'"; \ rm -f "$(DESTDIR)$(templatedir)/$$f"; \ done tags: TAGS TAGS: ctags: CTAGS CTAGS: distdir: $(DISTFILES) $(mkdir_p) $(distdir)/modules/SWISH $(distdir)/templates @srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; \ topsrcdirstrip=`echo "$(top_srcdir)" | sed 's|.|.|g'`; \ list='$(DISTFILES)'; for file in $$list; do \ case $$file in \ $(srcdir)/*) file=`echo "$$file" | sed "s|^$$srcdirstrip/||"`;; \ $(top_srcdir)/*) file=`echo "$$file" | sed "s|^$$topsrcdirstrip/|$(top_builddir)/|"`;; \ esac; \ if test -f $$file || test -d $$file; then d=.; else d=$(srcdir); fi; \ dir=`echo "$$file" | sed -e 's,/[^/]*$$,,'`; \ if test "$$dir" != "$$file" && test "$$dir" != "."; then \ dir="/$$dir"; \ $(mkdir_p) "$(distdir)$$dir"; \ else \ dir=''; \ fi; \ if test -d $$d/$$file; then \ if test -d $(srcdir)/$$file && test $$d != $(srcdir); then \ cp -pR $(srcdir)/$$file $(distdir)$$dir || exit 1; \ fi; \ cp -pR $$d/$$file $(distdir)$$dir || exit 1; \ else \ test -f $(distdir)/$$file \ || cp -p $$d/$$file $(distdir)/$$file \ || exit 1; \ fi; \ done check-am: all-am check: check-am all-am: Makefile $(SCRIPTS) $(DATA) installdirs: for dir in "$(DESTDIR)$(libexecdir)" "$(DESTDIR)$(perlmoduledir)" "$(DESTDIR)$(pkgdatadir)" "$(DESTDIR)$(templatedir)"; do \ test -z "$$dir" || $(mkdir_p) "$$dir"; \ done install: install-am install-exec: install-exec-am install-data: install-data-am uninstall: uninstall-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am installcheck: installcheck-am install-strip: $(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \ install_sh_PROGRAM="$(INSTALL_STRIP_PROGRAM)" INSTALL_STRIP_FLAG=-s \ `test -z '$(STRIP)' || \ echo "INSTALL_PROGRAM_ENV=STRIPPROG='$(STRIP)'"` install mostlyclean-generic: clean-generic: -test -z "$(CLEANFILES)" || rm -f $(CLEANFILES) distclean-generic: -test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES) maintainer-clean-generic: @echo "This command is intended for maintainers to use" @echo "it deletes files that may require special tools to rebuild." clean: clean-am clean-am: clean-generic clean-libtool mostlyclean-am distclean: distclean-am -rm -f Makefile distclean-am: clean-am distclean-generic distclean-libtool dvi: dvi-am dvi-am: html: html-am info: info-am info-am: install-data-am: install-perlmoduleSCRIPTS install-pkgdataDATA \ install-templateDATA install-exec-am: install-libexecSCRIPTS install-info: install-info-am install-man: installcheck-am: maintainer-clean: maintainer-clean-am -rm -f Makefile maintainer-clean-am: distclean-am maintainer-clean-generic mostlyclean: mostlyclean-am mostlyclean-am: mostlyclean-generic mostlyclean-libtool pdf: pdf-am pdf-am: ps: ps-am ps-am: uninstall-am: uninstall-info-am uninstall-libexecSCRIPTS \ uninstall-perlmoduleSCRIPTS uninstall-pkgdataDATA \ uninstall-templateDATA .PHONY: all all-am check check-am clean clean-generic clean-libtool \ distclean distclean-generic distclean-libtool distdir dvi \ dvi-am html html-am info info-am install install-am \ install-data install-data-am install-exec install-exec-am \ install-info install-info-am install-libexecSCRIPTS \ install-man install-perlmoduleSCRIPTS install-pkgdataDATA \ install-strip install-templateDATA installcheck \ installcheck-am installdirs maintainer-clean \ maintainer-clean-generic mostlyclean mostlyclean-generic \ mostlyclean-libtool pdf pdf-am ps ps-am uninstall uninstall-am \ uninstall-info-am uninstall-libexecSCRIPTS \ uninstall-perlmoduleSCRIPTS uninstall-pkgdataDATA \ uninstall-templateDATA # This is done here to stay in the GNU coding standards # libexecdir can be modified at make time, so can't use # variable substitution at configure time swish.cgi: swish.cgi.in @rm -f swish.cgi @sed \ -e 's,@@perlmoduledir@@,$(libexecdir)/perl,' \ -e 's,@@swishbinary@@,$(bindir)/swish-e$(EXEEXT),' \ -e 's,@@perlbinary@@,$(PERL),' \ -e 's,@@pkgdatadir@@,$(pkgdatadir),' \ $(srcdir)/swish.cgi.in > swish.cgi search.cgi: search.cgi.in @rm -f swish.cgi @sed \ -e 's,@@perlmoduledir@@,$(libexecdir)/perl,' \ -e 's,@@swishbinary@@,$(bindir)/swish-e$(EXEEXT),' \ -e 's,@@perlbinary@@,$(PERL),' \ -e 's,@@pkgdatadir@@,$(pkgdatadir),' \ -e 's,@@templatedir@@,$(templatedir),' \ $(srcdir)/search.cgi.in > search.cgi # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: swish-e-2.4.7/example/swish.cgi.in0000775000077100017500000032360611166010111013712 00000000000000#!@@perlbinary@@ -w package SwishSearch; use strict; # This is set to where Swish-e's "make install" installed the helper modules. use lib ( '@@perlmoduledir@@' ); my $DEFAULT_CONFIG_FILE = '.swishcgi.conf'; ################################################################################### # # If this text is displayed on your browser then your web server # is not configured to run .cgi programs. Contact your web server administrator. # # To display documentation for this program type "perldoc swish.cgi" # # swish.cgi $Revision: 1830 $ Copyright (C) 2001 Bill Moseley swishscript@hank.org # Example CGI program for searching with SWISH-E # # This example program will only run under an OS that supports fork(). # Under windows it uses a piped open which MAY NOT BE SECURE. # # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License # as published by the Free Software Foundation; either version # 2 of the License, or (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # The above lines must remain at the top of this program # # $Id: swish.cgi.in 1830 2006-10-05 13:34:46Z karman $ # #################################################################################### # This is written this way so the script can be used as a CGI script or a mod_perl # module without any code changes. # use CGI (); # might not be needed if using Apache::Request #================================================================================= # CGI entry point # #================================================================================= use vars '$speedy_config'; # Global for caching in persistent environment such as SpeedyCGI # Run the script -- entry point if running as a CGI script unless ( $ENV{MOD_PERL} ) { if ( !$speedy_config ) { $speedy_config = default_config(); # Merge with disk config file. $speedy_config = merge_read_config( $speedy_config ); } process_request( $speedy_config ); } #================================================================================== # This sets the default configuration parameters # # Any configuration read from disk is merged with these settings. # # Only a few settings are actually required. Some reasonable defaults are used # for most. If fact, you can probably create a complete config as: # # return = { # swish_binary => '/usr/local/bin/swish-e', # swish_index => '/usr/local/share/swish/index.swish-e', # title_property => 'swishtitle', # Not required, but recommended # }; # # But, that doesn't really show all the options. # # You can modify the options below, or you can use a config file. The config file # is .swishcgi.conf by default (read from the current directory) that must return # a hash reference. For example, to create a config file that changes the default # title and index file name, plus uses Template::Toolkit to generate output # create a config file as: # # # Example config file -- returns a hash reference # return { # title => 'Search Our Site', # swish_index => 'index.web', # # template => { # package => 'SWISH::TemplateToolkit', # file => 'swish.tt', # options => { # INCLUDE_PATH => '/home/user/swish-e/example', # }, # }, # }; # # #----------------------------------------------------------------------------------- sub default_config { ##### Configuration Parameters ######### #---- This lists all the options, with many commented out --- # By default, this config is used -- see the process_request() call below. # You should adjust for your site, and how your swish index was created. ##>> ##>> Please don't post this entire section on the swish-e list if looking for help! ##>> ##>> Send a small example, without all the comments. #====================================================================== # *** NOTES **** # Items beginning with an "x" or "#" are commented out # the "x" form simply renames (hides) that setting. It's used # to make it easy to disable a mult-line configuation setting. # # If you do not understand a setting then best to leave the default. # # Please follow the documentation (perldoc swish.cgi) and set up # a test using the defaults before making changes. It's much easier # to modify a working example than to try to get a modified example to work... # # Again, this is a Perl hash structure. Commas are important. #====================================================================== return { title => 'Search our site', # Title of your choice. Displays on the search page swish_binary => '@@swishbinary@@', # Location of swish-e binary # By default, this script tries to read a config file. You should probably # comment this out if not used save a disk stat config_file => $DEFAULT_CONFIG_FILE, # Default config file # The location of your index file. Typically, this would not be in # your web tree. # If you have more than one index to search then specify an array # reference. e.g. swish_index =>[ qw( index1 index2 index3 )], swish_index => 'index.swish-e', # Location of your index file # See "select_indexes" below for how to # select more than one index. page_size => 15, # Number of results per page - default 15 # prepend this path to the filename (swishdocpath) returned by swish. This is used to # make the href link back to the original document. Comment out to disable. #prepend_path => 'http://localhost/mydocs', # This is the property that is used for the href link back to the original # document. It's "swishdocpath" by default #link_property => 'swishdocpath', ## Display properties ## # Everything swish records about a file is called a "property". These # next three settings tell the swish.cgi script which properties should be passed # to the templating coded for output generation. # First is the property name to use as the main link text to the indexed document. # Typically, this will be 'swishtitle' if have indexed html documents, # but you can specify any PropertyName defined in your document. # By default, swish will display the pathname for documents that do not # have a title. # In other words, this is used for the text of the links of the search results. # title_property title_property => 'swishtitle', # Swish has a configuration directive "StoreDescription" that will save part or # all of a document's contents in the index file. This can then be displayed # along with results. If you are indexing a lot of files this can use a lot of disk # space, so test carefully before indexing your entire site. # Building swish with zlib can greatly reduce the space used by StoreDescription. # # This settings tells this script to display this property as the description. # Normally, this should be 'swishdescription', but you can specify another property name. # There is no default. description_prop => 'swishdescription', # Property names listed here will be displayed in a table below each result # You may wish to modify this list if you are using document properties (PropertyNames) # in your swish-e index configuration # There is no default. display_props => [qw/swishlastmodified swishdocsize swishdocpath/], # Results can be be sorted by any of the properties listed here # They will be displayed in a drop-down list on the form. # You may modify this list if you are using document properties of your own creation # Swish uses the rank as the default sort sorts => [qw/swishrank swishlastmodified swishtitle swishdocpath/], # Secondary_sort is used to sort within a sort # You may enter a property name followed by a direction (asc|desc) secondary_sort => [qw/swishlastmodified desc/], # You can limit by MetaNames here. Names listed here will be displayed in # a line of radio buttons. # The default is to not allow any metaname selection. # To use this feature you must define MetaNames while indexing. # The special "swishdefault" says to search any text that was not indexed # as a specific metaname (e.g. typically the body of a HTML document and its title). # To see how this might work, add to your *swish-e* config file: # MetaNames swishtitle swishdocpath # reindex and try: metanames => [qw/ swishdefault swishtitle swishdocpath /], # Add "all" to this list to test the meta_groups feature described below # Another example: if you indexed an email archive # that defined the metanames subject name email (as in the swish-e discussion archive) # you might use: #metanames => [qw/body subject name email/], # Searching multiple meta names: # You can also group metanames into "meta-metanames". # Example: Say you defined metanames "author", "comment" and "keywords" # You want to allow searching "author", "comment" and the document body ("swishdefault") # But you would also like an "all" search that searches all metanames, including "keywords": # # metanames => [qw/swishdefault author comment all/], # # Now, the "all" metaname is not a real metaname. It must be expanded into its # individual metanames using meta_groups: # # "meta_groups" maps a fake metaname to a list of real metanames # # meta_groups => { # all => [qw/swishdefault author comment keywords / ], # }, # # swish.cgi will then take a query like # # all=(query words) # # and create the query # # swishdefault=(query words) OR author=(query words) OR comment=(query words) OR keywords=(query words) # # This is not ideal, but should work for most cases # (might fail under windows since the query is passed through the shell). # To enable this group add "all" to the list of metanames above meta_groups => { all => [qw/swishdefault swishtitle swishdocpath/], }, # Note that you can use other words than "all". The script just checks if a given metaname is # listed in "meta_groups" and expands as needed. # "name_labels" is used to map MetaNames and PropertyNames to user-friendly names # on the CGI form. name_labels => { swishdefault => 'Title & Body', swishtitle => 'Title', swishrank => 'Rank', swishlastmodified => 'Last Modified Date', swishdocpath => 'Document Path', swishdocsize => 'Document Size', all => 'All', # group of metanames subject => 'Message Subject', # other examples name => "Poster's Name", email => "Poster's Email", sent => 'Message Date', }, timeout => 10, # limit time used by swish when fetching results - DoS protection. # does not work under Windows max_query_length => 100, # limit length of query string. Swish also has a limit (default is 40) # You might want to set swish-e's limit higher, and use this to get a # somewhat more friendly message. max_chars => 500, # Limits the size of the description_prop if it is not highlighted # This structure defines term highlighting, and what type of highlighting to use # If you are using metanames in your searches and they map to properties that you # will display, you may need to adjust the "meta_to_prop_map". highlight => { # Pick highlighting module -- you must make sure the module can be found # The highlighting modules are in the example/modules directory by default # Ok speed, but doesn't handle phrases or stopwords # Deals with stemming, and shows words in context # Takes into consideration WordCharacters, IgnoreFirstChars and IgnoreLastChars. #package => 'SWISH::DefaultHighlight', # Somewhat slow, but deals with phases, stopwords, and stemming. # Takes into consideration WordCharacters, IgnoreFirstChars and IgnoreLastChars. package => 'SWISH::PhraseHighlight', # Faster: phrases without regard to wordcharacter settings # doesn't do context display, so must match in first X words, so may not even highlight # doesn't handle stemming or stopwords. #package => 'SWISH::SimpleHighlight', show_words => 10, # Number of "swish words" words to show around highlighted word max_words => 100, # If no words are found to highlighted then show this many words occurrences => 6, # Limit number of occurrences of highlighted words #highlight_on => '', # HTML highlighting codes #highlight_off => '', highlight_on => '', highlight_off => '', # This maps (real) search metatags to display properties. # e.g. if searching in "swishdefault" then highlight in the # swishtitle and swishdescription properties # Do not include "fake" metanames defined with meta_groups, just # list the real metanames used in your index, and the properties they # relate to. meta_to_prop_map => { swishdefault => [ qw/swishtitle swishdescription/ ], swishtitle => [ qw/swishtitle/ ], swishdocpath => [ qw/swishdocpath/ ], }, }, # If you specify more than one index file (as an array reference) you # can set this allow selection of which indexes to search. # The default is to search all indexes specified if this is not used. # When used, the first index is the default index. # You need to specify your indexes as an array reference: #swish_index => [ qw/ index.swish-e index.other index2.other index3.other index4.other / ], Xselect_indexes => { # pick radio_group, popup_menu, or checkbox_group method => 'checkbox_group', #method => 'radio_group', #method => 'popup_menu', columns => 3, # labels must match up one-to-one with elements in "swish_index" labels => [ 'Main Index', 'Other Index', qw/ two three four/ ], description => 'Select Site: ', # Optional - Set the default index if none is selected # This needs to be an index file name listed in swish_index # above, not a label default_index => '', }, # Similar to select_indexes, this adds a metaname search # based on a metaname. You can use any metaname, and this will # add an "AND" search to limit results to a subset of your records. # i.e. it adds something like 'site=(foo or bar or baz)' if foo, bar, and baz were selected. # This really just allows you to limit existing searches by a metaname, instead of # selecting a metaname (with metanames option above). # Swish-e's ExtractPath would work well with this. For example, # to allow limiting searches to specific sections of the apache docs use this # in your swish-e config file: # ExtractPath site regex !^/usr/local/apache/htdocs/manual/([^/]+)/.+$!$1! # ExtractPathDefault site other # which extracts the segment of the path after /manual/ and indexes that name # under the metaname "site". Then searches can be limited to files with that # path (e.g. query would be swishdefault=foo AND site=vhosts to limit searches # to the virtual host section. Xselect_by_meta => { #method => 'radio_group', # pick: radio_group, popup_menu, or checkbox_group method => 'checkbox_group', #method => 'popup_menu', columns => 3, metaname => 'site', # Can't be a metaname used elsewhere! values => [qw/misc mod vhosts other/], labels => { misc => 'General Apache docs', mod => 'Apache Modules', vhosts => 'Virtual hosts', }, description => 'Limit search to these areas: ', }, # The 'template' setting defines what generates the output # The default is "TemplateDefault" which is reasonably ugly, # but does not require installation of a separate templating system. # Note that some of the above options may not be available # for templating, as it's up to you to layout the form # and swish-e results in your template. # TemplateDefault is the default xtemplate => { package => 'SWISH::TemplateDefault', }, xtemplate => { package => 'SWISH::TemplateDumper', }, xtemplate => { package => 'SWISH::TemplateToolkit', file => 'swish.tt', options => { INCLUDE_PATH => '@@pkgdatadir@@', #PRE_PROCESS => 'config', }, }, xtemplate => { package => 'SWISH::TemplateHTMLTemplate', options => { filename => 'swish.tmpl', path => '@@pkgdatadir@@', die_on_bad_params => 0, loop_context_vars => 1, cache => 1, }, }, # The "on_intranet" setting is just a flag that can be used to say you do # not have an external internet connection. It's here because the default # page generation includes links to images on swish-e.or and on www.w3.org. # If this is set to one then those images will not be shown. # (This only effects the default ouput module SWISH::TemplateDefault) on_intranet => 0, # Here you can hard-code debugging options. The will help you find # where you made your mistake ;) # Using all at once will generate a lot of messages to STDERR # Please see the documentation before using these. # Typically, you will set these from the command line instead of in the configuration. # debug_options => 'basic, command, headers, output, summary, dump', # This defines the package object for reading CGI parameters # Defaults to CGI. Might be useful with mod_perl. # request_package => 'CGI', # request_package => 'Apache::Request', # use_library => 1, # set true and will use the SWISH::API module # will cache based on index files when running under mod_perl # Minor adjustment to page display. The page navigation normally looks like: # Page: 1 5 6 7 8 9 24 # where the first page and last page are always displayed. These can be disabled by # by setting to true values ( 1 ) no_first_page_navigation => 0, no_last_page_navigation => 0, num_pages_to_show => 12, # number of pages to offer # Limit to date ranges # This adds in the date_range limiting options # You will need the DateRanges.pm module from the author to use that feature # Noramlly, you will want to limit by the last modified date, so specify # "swishlastmodified" as the property_name. If indexing a mail archive, and, for # example, you store the date (a unix timestamp) as "date" then specify # "date" as the property_name. date_ranges => { property_name => 'swishlastmodified', # property name to limit by # what you specify here depends on the DateRanges.pm module. time_periods => [ 'All', 'Today', 'Yesterday', #'Yesterday onward', 'This Week', 'Last Week', 'Last 90 Days', 'This Month', 'Last Month', #'Past', #'Future', #'Next 30 Days', ], line_break => 0, default => 'All', date_range => 1, }, # This is suppose to reduce the load on systems if hit with a large number # of requests. Although this will limit the number of swish-e processes run # it will not limit the number of CGI requests. I feel like a better solution # is to use mod_perl (with the SWISH::API module). # I also think that running /bin/ps for every is not ideal. # This only works on unix-based systems when running the swish-e binary. # It greps /swish-e/ from the output of ps and aborts if the count is < limit_procs # Set max number of swish-e binaries and ps command to run limit_procs => 0, # max number of swish process to run (zero to not limit) ps_prog => '/bin/ps -Unobody -ocommand', # command to list number of swish binaries }; } #^^^^^^^^^^^^^^^^^^^^^^^^^ end of user config ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ #======================================================================================== #================================================================================= # mod_perl entry point # # As an example, you might use a PerlSetVar to point to paths to different # config files, and then cache the different configurations by path. # #================================================================================= my %cached_configs; sub handler { my $r = shift; if ( my $config_path = $r->dir_config( 'Swish_Conf_File' ) ) { # Already cached? # Note that this is cached for the life of the server -- must restart if want to change config if ( $cached_configs{ $config_path } ) { process_request( $cached_configs{ $config_path } ); return Apache::Constants::OK(); } # Else, load config my $config = default_config(); $config->{config_file} = $config_path; # Merge with disk config file. $cached_configs{ $config_path } = merge_read_config( $config ); process_request( $cached_configs{ $config_path } ); return Apache::Constants::OK(); } # Otherwise, use hard-coded config my $config = default_config(); # Merge with disk config file. $config = merge_read_config( $config ); process_request( default_config() ); return Apache::Constants::OK(); } #============================================================================ # Read config settings from disk, and merge # Note, all errors are ignored since by default this script looks for a # config file. # #============================================================================ sub merge_read_config { my $config = shift; set_default_debug_flags(); set_debug($config); # get from config or from %ENV return $config unless $config->{config_file}; my $return = do $config->{config_file}; # load the config file unless ( ref $return eq 'HASH' ) { # First, let's check for file not found for the default config, which we can ignore my $error = $@ || $!; if ( $config->{config_file} eq $DEFAULT_CONFIG_FILE && !-e $config->{config_file} ) { warn "Config file '$config->{config_file}': $!" if $config->{debug}; return $config; } die "Config file '$config->{config_file}': $error"; } if ( $config->{debug} || $return->{debug} ) { require Data::Dumper; print STDERR "\n---------- Read config parameters from '$config->{config_file}' ------\n", Data::Dumper::Dumper($return), "-------------------------\n"; } set_debug( $return ); # Merge settings return { %$config, %$return }; } #-------------------------------------------------------------------------------------------------- sub set_default_debug_flags { # Debug flags defined $SwishSearch::DEBUG_BASIC = 1; # Show command used to run swish $SwishSearch::DEBUG_COMMAND = 2; # Show command used to run swish $SwishSearch::DEBUG_HEADERS = 4; # Swish output headers $SwishSearch::DEBUG_OUTPUT = 8; # Swish output besides headers $SwishSearch::DEBUG_SUMMARY = 16; # Summary of results parsed $SwishSearch::DEBUG_RESULTS = 32; # Detail of results parsed $SwishSearch::DEBUG_DUMP_DATA = 64; # dump data that is sent to templating modules } #--------------------------------------------------------------------------------------------------- sub set_debug { my $conf = shift; $conf->{debug} = 0; my $debug_string = $ENV{SWISH_DEBUG} ||$conf->{debug_options}; return unless $debug_string; my %debug = ( basic => [$SwishSearch::DEBUG_BASIC, 'Basic debugging'], command => [$SwishSearch::DEBUG_COMMAND, 'Show command used to run swish'], headers => [$SwishSearch::DEBUG_HEADERS, 'Show headers returned from swish'], output => [$SwishSearch::DEBUG_OUTPUT, 'Show output from swish'], summary => [$SwishSearch::DEBUG_SUMMARY, 'Show summary of results'], results => [$SwishSearch::DEBUG_RESULTS, 'Show detail of results'], dump => [$SwishSearch::DEBUG_DUMP_DATA, 'Show all data available to templates'], ); $conf->{debug} = 1; my @debug_str; for ( split /\s*,\s*/, $debug_string ) { if ( exists $debug{ lc $_ } ) { push @debug_str, lc $_; $conf->{debug} |= $debug{ lc $_ }->[0]; next; } print STDERR "Unknown debug option '$_'. Must be one of:\n", join( "\n", map { sprintf(' %10s: %10s', $_, $debug{$_}->[1]) } sort { $debug{$a}->[0] <=> $debug{$b}->[0] }keys %debug), "\n\n"; exit; } print STDERR "Debug level set to: $conf->{debug} [", join( ', ', @debug_str), "]\n"; } #============================================================================ # # This is the main controller (entry point), where a config hash is passed in. # # Loads the request module (e.g. CGI.pm), and the output module # Also sets up debugging # #============================================================================ sub process_request { my $conf = shift; # configuration parameters # Limit number of requests - questionable value limit_swish( $conf->{limit_procs}, $conf->{ps_prog} ) if !$conf->{use_library} && $conf->{limit_procs} && $conf->{limit_procs} =~ /^\d+$/ && $conf->{ps_prog}; # Set default property used or the href link to the document $conf->{link_property} ||= 'swishdocpath'; # Use CGI.pm by default my $request_package = $conf->{request_package} || 'CGI'; load_module( $request_package ); my $request_object = $request_package->new; # load the templating module my $template = $conf->{template} || { package => 'SWISH::TemplateDefault' }; load_module( $template->{package} ); # Allow fixup within the config file if ( $conf->{request_fixup} && ref $conf->{request_fixup} eq 'CODE' ) { &{$conf->{request_fixup}}( $request_object, $conf ); } set_debug_input( $conf, $request_object ) if $conf->{debug} && !$ENV{GATEWAY_INTERFACE}; # Create search object and build a query based on CGI parameters my $search = SwishQuery->new( config => $conf, request => $request_object, ); # run the query (run if there's a query) $search->run_query; # currently, results is the just the $search object if ( $search->hits ) { $search->set_navigation; # sets links } show_debug_output( $conf, $search ) if $conf->{debug}; $template->{package}->show_template( $template, $search ); } # For limiting number of swish-e binaries sub limit_swish { my ( $limit_procs, $ps_prog ) = @_; my $num_procs = scalar grep { /swish-e/ } `$ps_prog`; return if $num_procs <= $limit_procs; warn "swish.cgi - limited due to too many currently running swish-e binaries: $num_procs running is more than $limit_procs\n"; ## Abort print < Too Many Requests Too Many Requests -- Try back later EOF exit; } #============================================================================ # # Loads a perl module -- and shows a pretty web page to say the obvious # # #============================================================================ sub load_module { my $package = shift; $package =~ s[::][/]g; eval { require "$package.pm" }; if ( $@ ) { print < Software Error

Software Error

Please check error log

EOF die "$0 $@\n"; } } #================================================================== # set debugging input # #================================================================== sub set_debug_input { my ( $conf, $request_object ) = @_; print STDERR 'Enter a query [all]: '; my $query = ; $query =~ tr/\r//d; chomp $query; unless ( $query ) { print STDERR "Using 'not asdfghjklzxcv' to match all records\n"; $query = 'not asdfghjklzxcv'; } $request_object->param('query', $query ); print STDERR 'Enter max results to display [1]: '; my $max = ; chomp $max; $max = 1 unless $max && $max =~/^\d+$/; $conf->{page_size} = $max; } #================================================================== # show debugging output # #================================================================== sub show_debug_output { my ( $conf, $results ) = @_; require Data::Dumper; if ( $results->hits ) { print STDERR "swish.cgi: returned a page of $results->{navigation}{showing} results of $results->{navigation}{hits} total hits\n"; } else { print STDERR "swish.cgi: no results\n"; } if ($conf->{debug} & $SwishSearch::DEBUG_HEADERS ) { print STDERR "\n------------- Index Headers ------------\n"; if ( $results->{_headers} ) { print STDERR Data::Dumper::Dumper( $results->{_headers} ); } else { print STDERR "No headers\n"; } print STDERR "--------------------------\n"; } if ( $conf->{debug} & $SwishSearch::DEBUG_DUMP_DATA ) { print STDERR "\n------------- Results structure passed to template ------------\n", Data::Dumper::Dumper( $results ), "--------------------------\n"; } elsif ( $conf->{debug} & $SwishSearch::DEBUG_SUMMARY ) { print STDERR "\n------------- Results summary ------------\n"; if ( $results->{hits} ) { print STDERR "$_->{swishrank} $_->{swishdocpath}\n" for @{ $results->{_results}}; } else { print STDERR "** NO RESULTS **\n"; } } elsif ( $conf->{debug} & $SwishSearch::DEBUG_RESULTS ) { print STDERR "\n------------- Results detail ------------\n"; if ( $results->{hits} ) { print STDERR Data::Dumper::Dumper( $results->{_results} ); } else { print STDERR "** NO RESULTS **\n"; } print STDERR "--------------------------\n"; } } #================================================================================================== package SwishQuery; #================================================================================================== use Carp; # Or use this instead -- PLEASE see perldoc CGI::Carp for details # CGI::Carp doesn't help that much #use CGI::Carp; # qw(fatalsToBrowser); use SWISH::ParseQuery; #-------------------------------------------------------------------------------- # new() doesn't do much, just create the object #-------------------------------------------------------------------------------- sub new { my $class = shift; my %options = @_; my $conf = $options{config}; croak "Failed to set the swish index files in config setting 'swish_index'" unless $conf->{swish_index}; croak "Failed to specify 'swish_binary' in configuration" unless $conf->{swish_binary}; # initialize the request search hash my $sh = { prog => $conf->{swish_binary}, config => $conf, q => $options{request}, hits => 0, MOD_PERL => $ENV{MOD_PERL}, }; my $self = bless $sh, $class; # load highlight module, if requsted if ( my $highlight = $self->config('highlight') ) { $highlight->{package} ||= 'SWISH::DefaultHighlight'; SwishSearch::load_module( $highlight->{package} ); } # Fetch the swish-e query from the CGI parameters $self->set_query; return $self; } sub hits { shift->{hits} } sub config { my ($self, $setting, $value ) = @_; confess "Failed to pass 'config' a setting" unless $setting; my $cur = $self->{config}{$setting} if exists $self->{config}{$setting}; $self->{config}{$setting} = $value if $value; return $cur; } # Returns false if all of @values are not valid options - for checking # $config is what $self->config returns sub is_valid_config_option { my ( $self, $config, $err_msg, @values ) = @_; unless ( $config ) { $self->errstr( "No config option set: $err_msg" ); return; } # Allow multiple values. my @options = ref $config eq 'ARRAY' ? @$config : ( $config ); my %lookup = map { $_ => 1 } @options; for ( @values ) { unless ( exists $lookup{ $_ } ) { $self->errstr( $err_msg ); return; } } return 1; } sub header { my $self = shift; return unless ref $self->{_headers} eq 'HASH'; return $self->{_headers}{$_[0]} || ''; } # return a ref to an array sub results { my $self = shift; return $self->{_results} || undef; } sub navigation { my $self = shift; return unless ref $self->{navigation} eq 'HASH'; return exists $self->{navigation}{$_[0]} ? $self->{navigation}{$_[0]} : ''; } sub CGI { $_[0]->{q} }; sub swish_command { my ($self, $param_name, $value ) = @_; return $self->{swish_command} || {} unless $param_name; return $self->{swish_command}{$param_name} || '' unless $value; $self->{swish_command}{$param_name} = $value; } # For use when forking sub swish_command_array { my ($self ) = @_; my @params; my $swish_command = $self->swish_command; for ( keys %$swish_command ) { my $value = $swish_command->{$_}; if ( /^-/ ) { push @params, $_; push @params, ref $value eq 'ARRAY' ? @$value : $value; next; } # special cases if ( $_ eq 'limits' ) { push @params, '-L', $value->{prop}, $value->{low}, $value->{high}; next; } die "Unknown swish_command '$_' = '$value'"; } return @params; } sub errstr { my ($self, $value ) = @_; $self->{_errstr} = $value if $value; return $self->{_errstr} || ''; } #============================================================================== # Set query from the CGI parameters #------------------------------------------------------------------------------ sub set_query { my $self = shift; my $q = $self->{q}; # Sets the query string, and any -L limits. return unless $self->build_query; # Set the starting position (which is offset by one) my $start = $q->param('start') || 0; $start = 0 unless $start =~ /^\d+$/ && $start >= 0; $self->swish_command( '-b', $start+1 ); # Set the max hits my $page_size = $self->config('page_size') || 15; $self->swish_command( '-m', $page_size ); return unless $self->set_index_file; # Set the sort option, if any return unless $self->set_sort_order; return 1; } #============================================ # This returns "$self" just in case we want to seperate out into two objects later sub run_query { my $self = shift; my $q = $self->{q}; my $conf = $self->{config}; return $self unless $self->swish_command('-w'); my $time_out_str = 'Timed out'; my $timeout = $self->config('timeout') || 0; eval { local $SIG{ALRM} = sub { kill 'KILL', $self->{pid} if $self->{pid}; die $time_out_str . "\n"; }; alarm $timeout if $timeout && $^O !~ /Win32/i; $self->run_swish; alarm 0 unless $^O =~ /Win32/i; # catch zombies waitpid $self->{pid}, 0 if $self->{pid}; # for IPC::Open2 }; if ( $@ ) { warn "$0 aborted: $@"; # if $conf->{debug}; $self->errstr( $@ =~ /$time_out_str/ ? "Search timed out after $timeout seconds." : "Service currently unavailable" ); return $self; } } # Build href for repeated search via GET (forward, backward links) sub set_navigation { my $self = shift; my $q = $self->{q}; # Single string # default fields my @std_fields = qw/query metaname sort reverse/; # Extra fields could be added in the config file if ( my $extra = $self->config('extra_fields') ) { push @std_fields, @$extra; } my @query_string = map { "$_=" . $q->escape( $q->param($_) ) } grep { $q->param($_) } @std_fields; # Perhaps arrays for my $p ( qw/si sbm/ ) { my @settings = $q->param($p); next unless @settings; push @query_string, "$p=" . $q->escape( $_ ) for @settings; } if ( $self->config('date_ranges' ) ) { my $dr = SWISH::DateRanges::GetDateRangeArgs( $q ); push @query_string, $dr, if $dr; } $self->{query_href} = $q->script_name . '?' . join '&', @query_string; $self->{my_url} = $q->script_name; my $hits = $self->hits; my $start = $self->swish_command('-b') || 1; $start--; $self->{navigation} = { showing => $hits, from => $start + 1, to => $start + $hits, hits => $self->header('number of hits') || 0, run_time => $self->header('run time') || 'unknown', search_time => $self->header('search time') || 'unknown', }; $self->set_page ( $self->swish_command( '-m' ) ); return $self; } #============================================================ # Build a query string from swish # Just builds the -w string #------------------------------------------------------------ sub build_query { my $self = shift; my $q = $self->{q}; # set up the query string to pass to swish. my $query = $q->param('query') || ''; for ( $query ) { # trim the query string s/\s+$//; s/^\s+//; } $self->{query_simple} = $query; # without metaname $q->param('query', $query ); # clean up the query, if needed. # Read in the date limits, if any. This can create a new query, which is why it is here return unless $self->get_date_limits( \$query ); unless ( $query ) { $self->errstr('Please enter a query string') if $q->param('submit'); return; } if ( length( $query ) > $self->{config}{max_query_length} ) { $self->errstr('Please enter a shorter query'); return; } # Adjust the query string for metaname search # *Everything* is a metaname search # Might also like to allow searching more than one metaname at the same time my $metaname = $q->param('metaname') || 'swishdefault'; return unless $self->is_valid_config_option( $self->config('metanames') || 'swishdefault', 'Bad MetaName provided', $metaname ); # save the metaname so we know what field to highlight # Note that this might be a fake metaname $self->{metaname} = $metaname; # prepend metaname to query # expand query when using meta_groups my $meta_groups = $self->config('meta_groups'); if ( $meta_groups && $meta_groups->{$metaname} ) { $query = join ' OR ', map { "$_=($query)" } @{$meta_groups->{$metaname}}; # This is used to create a fake entry in the parsed query so highlighting # can find the query words $self->{real_metaname} = $meta_groups->{$metaname}[0]; } else { $query = $metaname . "=($query)"; } ## Look for a "limit" metaname -- perhaps used with ExtractPath # Here we don't worry about user supplied data my $limits = $self->config('select_by_meta'); my @limits = $q->param('sbm'); # Select By Metaname # Note that this could be messed up by ending the query in a NOT or OR # Should look into doing: # $query = "( $query ) AND " . $limits->{metaname} . '=(' . join( ' OR ', @limits ) . ')'; if ( @limits && ref $limits eq 'HASH' && $limits->{metaname} ) { $query .= ' and ' . $limits->{metaname} . '=(' . join( ' or ', @limits ) . ')'; } $self->swish_command('-w', $query ); return 1; } #======================================================================== # Get the index files from the form, or from the config settings # Uses index numbers to hide path names #------------------------------------------------------------------------ sub set_index_file { my $self = shift; my $q = $self->CGI; # Set the index file - first check for options my $si = $self->config('select_indexes'); if ( $si && ref $self->config('swish_index') eq 'ARRAY' ) { my @choices = $q->param('si'); if ( !@choices ) { if ( $si->{default_index} ) { $self->swish_command('-f', $si->{'default_index'}); return 1; } else { $self->errstr('Please select a source to search'); return; } } my @indexes = @{$self->config('swish_index')}; my @selected_indexes = grep {/^\d+$/ && $_ >= 0 && $_ < @indexes } @choices; if ( !@selected_indexes ) { $self->errstr('Invalid source selected'); return $self; } my %dups; my @idx = grep { !$dups{$_}++ } map { ref($_) ? @$_ : $_ } @indexes[ @selected_indexes ]; $self->swish_command( '-f', \@idx ); } else { $self->swish_command( '-f', $self->config('swish_index') ); } return 1; } #================================================================================ # Parse out the date limits from the form or from GET request # #--------------------------------------------------------------------------------- sub get_date_limits { my ( $self, $query_ref ) = @_; # reference to query since may be modified my $conf = $self->{config}; # Are date ranges enabled? return 1 unless $conf->{date_ranges}; eval { require SWISH::DateRanges }; if ( $@ ) { print STDERR "\n------ Can't use DateRanges feature ------------\n", "\nScript will run, but you can't use the date range feature\n", $@, "\n--------------\n" if $conf->{debug}; delete $conf->{date_ranges}; return 1; } my $q = $self->{q}; my %limits; unless ( SWISH::DateRanges::DateRangeParse( $q, \%limits ) ) { $self->errstr( $limits{dr_error} || 'Bad date range selection' ); return; } # Store the values for later (for display on templates) $self->{DateRanges_time_low} = $limits{dr_time_low}; $self->{DateRanges_time_high} = $limits{dr_time_high}; # Allow searchs just be date if not "All dates" search # $$$ should place some limits here, and provide a switch to disable # as it can bring up a lot of results. $$query_ref ||= 'not skaiqwdsikdeekk' if $limits{dr_time_high}; # Now specify limits, if a range was specified my $limit_prop = $conf->{date_ranges}{property_name} || 'swishlastmodified'; if ( $limits{dr_time_low} && $limits{dr_time_high} ) { my %limits = ( prop => $limit_prop, low => $limits{dr_time_low}, high => $limits{dr_time_high}, ); $self->swish_command( 'limits', \%limits ); } return 1; } #================================================================ # Set the sort order # Just builds the -s string #---------------------------------------------------------------- sub set_sort_order { my $self = shift; my $q = $self->{q}; my $sorts_array = $self->config('sorts'); my $sortby = $q->param('sort') || ''; return 1 unless $sorts_array && $sortby; return unless $self->is_valid_config_option( $sorts_array, 'Invalid Sort Option Selected', $sortby ); my $conf = $self->{config}; # Now set sort option - if a valid option submitted (or you could let swish-e return the error). my $direction = $sortby eq 'swishrank' ? $q->param('reverse') ? 'asc' : 'desc' : $q->param('reverse') ? 'desc' : 'asc'; my @sort_params = ( $sortby, $direction ); if ( $conf->{secondary_sort} ) { my @secondary = ref $conf->{secondary_sort} ? @{ $conf->{secondary_sort} } : $conf->{secondary_sort}; push @sort_params, @secondary if $sortby ne $secondary[0]; } $self->swish_command( '-s', \@sort_params ); return 1; } #======================================================== # Sets prev and next page links. # Feel free to clean this code up! # # Pass: # $results - reference to a hash (for access to the headers returned by swish) # $q - CGI object # # Returns: # Sets entries in the $results hash # sub set_page { my ( $self, $Page_Size ) = @_; my $q = $self->{q}; my $config = $self->{config}; my $navigation = $self->{navigation}; my $start = $navigation->{from} - 1; # Current starting record index # Set start number for "prev page" and the number of hits on the prev page my $prev = $start - $Page_Size; $prev = 0 if $prev < 0; if ( $prev < $start ) { $navigation->{prev} = $prev; $navigation->{prev_count} = $start - $prev; } my $last = $navigation->{hits} - 1; # Set start number for "next page" and number of hits on the next page my $next = $start + $Page_Size; $next = $last if $next > $last; my $cur_end = $start + $self->{hits} - 1; if ( $next > $cur_end ) { $navigation->{next} = $next; $navigation->{next_count} = $next + $Page_Size > $last ? $last - $next + 1 : $Page_Size; } # Calculate pages ( is this -1 correct here? ) # Build an array of a range of page numbers. my $total_pages = int (($navigation->{hits} -1) / $Page_Size); # total pages for all results. if ( $total_pages ) { my @pages = 0..$total_pages; my $show_pages = $config->{num_pages_to_show} || 12; # To make the number always work $show_pages-- unless $config->{no_first_page_navigation}; $show_pages-- unless $config->{no_last_page_navigation}; # If too many pages then limit if ( @pages > $show_pages ) { my $start_page = int ( $start / $Page_Size - $show_pages/2) ; $start_page = 0 if $start_page < 0; # if close to the end then move of center $start_page = $total_pages - $show_pages if $start_page + $show_pages - 1 > $total_pages; @pages = $start_page..$start_page + $show_pages - 1; # Add first and last pages, unless config says otherwise unshift @pages, 0 unless $start_page == 0 || $config->{no_first_page_navigation}; push @pages, $total_pages unless $start_page + $show_pages - 1 == $total_pages || $config->{no_last_page_navigation} } # Build "canned" pages HTML $navigation->{pages} = join ' ', map { my $page_start = $_ * $Page_Size; my $page = $_ + 1; $page_start == $start ? $page : qq[$page]; } @pages; # Build just the raw data - an array of hashes # for custom page display with templates $navigation->{page_array} = [ map { { page_number => $_ + 1, # page number to display page_start => $_ * $Page_Size, cur_page => $_ * $Page_Size == $start, # flag } } @pages ]; } } #================================================== # Format and return the date range options in HTML # #-------------------------------------------------- sub get_date_ranges { my $self = shift; my $q = $self->{q}; my $conf = $self->{config}; return '' unless $conf->{date_ranges}; # pass parametes, and a hash to store the returned values. my %fields; SWISH::DateRanges::DateRangeForm( $q, $conf->{date_ranges}, \%fields ); # Set the layout: my $string = '
Limit to: ' . ( $fields{buttons} ? "$fields{buttons}
" : '' ) . ( $fields{date_range_button} || '' ) . ( $fields{date_range_low} ? " $fields{date_range_low} through $fields{date_range_high}" : '' ); return $string; } #============================================ # Run swish-e and gathers headers and results # Currently requires fork() to run. # # Pass: # $sh - an array with search parameters # # Returns: # a reference to a hash that contains the headers and results # or possibly a scalar with an error message. # sub run_swish { my $self = shift; my $results = $self->{results}; my $conf = $self->{config}; my $q = $self->{q}; my @properties; my %seen; # Gather up the properties we need in results for ( qw/ title_property description_prop display_props link_property/ ) { push @properties, ref $conf->{$_} ? @{$conf->{$_}} : $conf->{$_} if $conf->{$_} && !$seen{$_}++; } # Add in the default props that should be seen. for ( qw/swishrank/ ) { push @properties, $_ unless $seen{$_}; } # add in the default prop - a number must be first (this might be a duplicate in -x, oh well) unshift @properties, 'swishreccount'; $self->swish_command( -x => join( '\t', map { "<$_>" } @properties ) . '\n' ); $self->swish_command( -H => 9 ); if ( $conf->{debug} & $SwishSearch::DEBUG_COMMAND ) { require Data::Dumper; print STDERR "---- Swish parameters ----\n"; print STDERR Data::Dumper::Dumper($self->swish_command); print STDERR "\n-----------------------------------------------\n"; } # Use the swish-e library? return $self->run_library( @properties ) if $self->config('use_library'); my $fh = $^O =~ /Win32/i ? windows_fork( $conf, $self ) : real_fork( $conf, $self ); # read in from child my %stops_removed; my $unknown_output = ''; while (<$fh>) { chomp; print STDERR "$_\n" if $conf->{debug} & $SwishSearch::DEBUG_OUTPUT; tr/\r//d; # This will not work correctly with multiple indexes when different values are used. if ( /^# ([^:]+):\s+(.+)$/ ) { my $h = lc $1; my $value = $2; $self->{_headers}{$h} = $value; push @{$self->{_headers}{'removed stopwords'}}, $value if $h eq 'removed stopword' && !$stops_removed{$value}++; next; } # return swish errors as a mesage to the script $self->errstr($1), return if /^err:\s*(.+)/; # Or, if you want to log the errors and just say "Service Unavailable" use this: #die "$1\n" if /^err:\s*(.+)/; # Found a result if ( /^\d/ ) { my %h; @h{@properties} = split /\t/; $self->add_result_to_list( \%h ); next; } elsif ( /^\.$/ ) { last; } else { next if /^#/; } $unknown_output .= "'$_'\n"; } die "Swish returned unknown output: $unknown_output\n" if $unknown_output; $self->{hits} = $self->{_results} ? @{$self->{_results}} : 0; } # Filters in place sub html_escape { $_[0] = '' unless defined $_[0]; for ($_[0]) { s/&/&/g; s//>/g; s/"/"/g; } } #============================================================================ # Adds a result to the result list and highlight the search words # This is a common source of bugs! The problem is that highlighting is done in this code. # This is good, especially for the description because it is trimmed down as processing each # result. Otherwise, would use a lot of memory. It's bad because the highlighting is # creating html which really should be done in the template output code. # What that means is the properties that are "searched" are run through the highlighting # code (and thus HTML escaped) but other properties are not. # If highlighting (and trimming) is to be kept here then either we need to # html escape all display properties, or flag which ones are escaped. # Since we know the ultimate output is HTML, the current method will be to escape here. sub add_result_to_list { my ( $self, $props ) = @_; # Push the result onto the list push @{$self->{_results}}, $props; # We need to save the text of the link prop (almost always swishdocpath) # because all properties are escaped. my $link_property = $self->config('link_property') || 'swishdocpath'; my $link_href = ( $self->config('prepend_path') || '' ) . $props->{$link_property}; # Replace spaces ***argh this is the wrong place to do this! *** # This doesn't really work -- file names could still have chars that need to be escaped. $link_href =~ s/\s/%20/g; # Returns hash of the properties that were highlighted my $highlighted = $self->highlight_props( $props ) || {}; my $trim_prop = $self->config('description_prop') || ''; $props->{$trim_prop} ||= '' if $trim_prop; # HTML escape all properties that were not highlighted for my $prop (keys %$props) { next if $highlighted->{$prop}; # not highlighted, so escape html_escape( $props->{$prop} ); if ( $prop eq $trim_prop ) { my $max = $self->config('max_chars') || 500; $props->{$trim_prop} = substr( $props->{$trim_prop}, 0, $max) . ' ...' if length $props->{$trim_prop} > $max; } } $props->{swishdocpath_href} = $link_href; # backwards compatible $props->{link_property} = $link_href; # backwards compatible } #======================================================================================= # This will call the highlighting module as needed. # The highlighting module MUST html escape the property. # returns a hash of properties highlighted sub highlight_props { my ( $self, $props ) = @_; # make sure we have the config we need. my $highlight_settings = $self->config('highlight') || return; my $meta_to_prop = $highlight_settings->{meta_to_prop_map} || return; # Initialize highlight module ( could probably do this once per instance ) # pass in the config highlight settings, and the swish-e headers as a hash. $self->{_highlight_object} ||= $highlight_settings->{package}->new( $highlight_settings, $self->{_headers} ); my $highlight_object = $self->{_highlight_object} || return; # parse the query on first result my $parsed_words = $self->header( 'parsed words' ) || die "Failed to find 'Parsed Words' in swish headers"; $self->{parsed_query} ||= ( parse_query( $parsed_words ) || return ); my %highlighted; # track which were highlighted to detect if need to trim the description # this is probably backwards -- might be better to loop through the %$props while ( my( $meta, $phrases ) = each %{$self->{parsed_query}} ) { next unless $meta_to_prop->{$meta}; # is it a prop defined to highlight? # loop through the properties for the metaname for ( @{ $meta_to_prop->{$meta} } ) { if ( $props->{$_} ) { $highlighted{$_}++ if $highlight_object->highlight( \$props->{$_}, $phrases, $_ ); } } } return \%highlighted; } #================================================================== # Run swish-e by using the SWISH::API module # my %cached_handles; sub run_library { my ( $self, @props ) = @_; SwishSearch::load_module( 'SWISH::API' ); my $indexes = $self->swish_command('-f'); print STDERR "swish.cgi: running library thus no 'output' available -- try 'summary'\n" if ($self->{config}{debug} || 0) & $SwishSearch::DEBUG_OUTPUT; eval { require Time::HiRes }; my $start_time = [Time::HiRes::gettimeofday()] unless $@; unless ( $cached_handles{$indexes} ) { my $swish = SWISH::API->new( ref $indexes ? join(' ', @$indexes) : $indexes ); if ( $swish->Error ) { $self->errstr( join ': ', $swish->ErrorString, $swish->LastErrorMsg ); delete $cached_handles{$indexes} if $swish->CriticalError; return; } # read headers (currently only reads one set) my %headers; my $index = ($swish->IndexNames)[0]; for ( $swish->HeaderNames ) { my @value = $swish->HeaderValue( $index, $_ ); my $x = @value; next unless @value; $headers{ lc($_) } = join ' ', @value; } $cached_handles{$indexes} = { swish => $swish, headers => \%headers, }; } my $swish = $cached_handles{$indexes}{swish}; my $headers = $cached_handles{$indexes}{headers}; $self->{_headers} = $headers; my $search = $swish->New_Search_Object; # probably could cache this, too if ( my $limits = $self->swish_command( 'limits' ) ) { $search->SetSearchLimit( @{$limits}{ qw/prop low high/ } ); } if ( $swish->Error ) { $self->errstr( join ': ', $swish->ErrorString, $swish->LastErrorMsg ); delete $cached_handles{$indexes} if $swish->CriticalError; return; } if ( my $sort = $self->swish_command('-s') ) { $search->SetSort( ref $sort ? join( ' ', @$sort) : $sort ); } my $search_time = [Time::HiRes::gettimeofday()] if $start_time; my $results = $search->Execute( $self->swish_command('-w') ); $headers->{'search time'} = sprintf('%0.3f seconds', Time::HiRes::tv_interval( $search_time, [Time::HiRes::gettimeofday()] )) if $start_time; if ( $swish->Error ) { $self->errstr( join ': ', $swish->ErrorString, $swish->LastErrorMsg ); delete $cached_handles{$indexes} if $swish->CriticalError; return; } # Add in results-related headers $headers->{'parsed words'} = join ' ', $results->ParsedWords( ($swish->IndexNames)[0] ); if ( ! $results->Hits ) { $self->errstr('no results'); return; } $headers->{'number of hits'} = $results->Hits; # Get stopwords removed from each index (really need to track headers per index to be correct) for my $index ( $swish->IndexNames ) { my @stopwords = $results->RemovedStopwords( $index ); push @{$headers->{'removed stopwords'}}, @stopwords if @stopwords; } # Now fetch properties $results->SeekResult( $self->swish_command( '-b' ) - 1 ); my $page_size = $self->swish_command( '-m' ); if ( $swish->Error ) { $self->errstr( join ': ', $swish->ErrorString, $swish->LastErrorMsg ); delete $cached_handles{$indexes} if $swish->CriticalError; return; } my $hit_count; while ( my $result = $results->NextResult ) { my %props; for my $prop ( @props ) { # Note, we use ResultPropertyStr instead since this is a general purpose # script (it converts dates to a string, for example). # $result->Property is a faster method and does not convert dates and numbers to strings. #my $value = $result->Property( $prop ); my $value = $result->ResultPropertyStr( $prop ); next unless $value; # ?? $props{$prop} = $value; } $hit_count++; $self->add_result_to_list( \%props ); last unless --$page_size; } $headers->{'run time'} = sprintf('%0.3f seconds', Time::HiRes::tv_interval( $start_time, [Time::HiRes::gettimeofday()] )) if $start_time; $self->{hits} = $hit_count; } #================================================================== # Run swish-e by forking # use Symbol; sub real_fork { my ( $conf, $self ) = @_; # Run swish my $fh = gensym; my $pid = open( $fh, '-|' ); die "Failed to fork: $!\n" unless defined $pid; if ( !$pid ) { # in child unless ( exec $self->{prog}, $self->swish_command_array ) { warn "Child process Failed to exec '$self->{prog}' Error: $!"; print "Failed to exec Swish"; # send this message to parent. exit; } } else { $self->{pid} = $pid; } return $fh; } #===================================================================================== # Windows work around # from perldoc perlfok -- na, that doesn't work. Try IPC::Open2 # sub windows_fork { my ( $conf, $self ) = @_; require IPC::Open2; my ( $rdrfh, $wtrfh ); # Ok, I'll say it. Windows sucks. my @command = map { s/"/\\"/g; qq["$_"] } $self->{prog}, $self->swish_command_array; my $pid = IPC::Open2::open2($rdrfh, $wtrfh, @command ); $self->{pid} = $pid; return $rdrfh; } 1; __END__ =head1 NAME swish.cgi -- Example Perl script for searching with the SWISH-E search engine. =head1 DESCRIPTION C is a CGI script for searching with the SWISH-E search engine version 2.1-dev and above. It returns results a page at a time, with matching words from the source document highlighted, showing a few words of content on either side of the highlighted word. The script is highly configurable. Features include searching multiple (or selectable) indexes, limiting searches to a subset of documents, sorting by a number of different properties, and limiting results to a date range. On unix type systems the swish.cgi script is installed in the directory $prefix/lib/swish-e, which is typically /usr/local/lib/swish-e. This can be overridden by the configure options --prefix or --libexecdir. The standard configuration (i.e. not using a config file) should work with most swish index files. Customization of the parameters will be needed if you are indexing special meta data and want to search and/or display the meta data. The configuration can be modified by editing this script directly, or by using a configuration file (.swishcgi.conf by default). The script's configuration file is described below. You are strongly encouraged to get the default configuration working before making changes. Most problems using this script are the result of configuration modifications. The script is modular in design. Both the highlighting code and output generation is handled by modules, which are included in the F distribution directory and installed in the $libexecdir/perl directory. This allows for easy customization of the output without changing the main CGI script. Included with the Swish-e distribution is a module to generate standard HTML output. There's also modules and template examples to use with the popular Perl templating systems HTML::Template and Template-Toolkit. This is very useful if your site already uses one of these templating systems The HTML::Template and Template-Toolkit packages are not distributed with Swish-e. They are available from the CPAN (http://search.cpan.org). This scipt can also run basically unmodified as a mod_perl handler, providing much better performance than running as a CGI script. Usage under mod_perl is described below. Please read the rest of the documentation. There's a C section, and a C section. This script should work on Windows, but security may be an issue. =head1 REQUIREMENTS A reasonably current version of Perl. 5.00503 or above is recommended (anything older will not be supported). The Date::Calc module is required to use the date range feature of the script. The Date::Calc module is also available from CPAN. =head1 INSTALLATION Here's an example installation session under Linux. It should be similar for other operating systems. For the sake of simplicity in this installation example all files are placed in web server space, including files such as swish-e index and configuration files that would normally not be made available via the web server. Access to these files should be limited once the script is running. Either move the files to other locations (and adjust the script's configuration) or use features of the web server to limit access (such as with F<.htaccess>). Please get a simple installation working before modifying the configuration file. Most problems reported for using this script have been due to improper configuration. The script's default settings are setup for initial testing. By default the settings expect to find most files and the swish-e binary in the same directory as the script. For I reasons, once you have tested the script you will want to change settings to limit access to some of these files by the web server (either by moving them out of web space, or using access control such as F<.htaccess>). An example of using F<.htaccess> on Apache is given below. It's expected that swish-e has already been unpacked and the swish-e binary has be compiled from source and "make install" has been run. If swish-e was installed from a vendor package (such as from a RPM or Debian package) see that pakage's documentation for where files are installed. Example Installation: =over 4 =item 1 Symlink or copy the swish.cgi. Symlink (or copy if your platform or webserver does not allow symlinks) the swish.cgi script from the installation directory to a local directory. Typically, this would be the cgi-bin directory or a location where CGI script are located. In this example a new directory is created and the script is symlinked. ~$ mkdir swishdir ~$ cd swishdir ~/swishdir$ ln -s /usr/local/lib/swish-e/swish.cgi The installation directory is set at configure time with the --prefix or --libexecdir options, but by default is in /usr/local/lib/swish-e. =item 2 Create an index Use an editor and create a simple configuration file for indexing your files. In this example the Apache documentation is indexed. Last we run a simple query to test that the index works correctly. ~/swishdir$ cat swish.conf IndexDir /usr/local/apache/htdocs IndexOnly .html .htm DefaultContents HTML* StoreDescription HTML* 200000 MetaNames swishdocpath swishtitle ReplaceRules remove /usr/local/apache/ If you do not have the Apache docs installed then pick another directory to index such as /usr/share/doc. Create the index. ~/swishdir$ swish-e -c swish.conf Indexing Data Source: "File-System" Indexing "/usr/local/apache/htdocs" Removing very common words... no words removed. Writing main index... Sorting words ... Sorting 7005 words alphabetically Writing header ... Writing index entries ... Writing word text: Complete Writing word hash: Complete Writing word data: Complete 7005 unique words indexed. 5 properties sorted. 124 files indexed. 1485844 total bytes. 171704 total words. Elapsed time: 00:00:02 CPU time: 00:00:02 Indexing done! Now, verify that the index can be searched: ~/swishdir$ swish-e -w install -m 1 # SWISH format: 2.1-dev-25 # Search words: install # Number of hits: 14 # Search time: 0.001 seconds # Run time: 0.040 seconds 1000 htdocs/manual/dso.html "Apache 1.3 Dynamic Shared Object (DSO) support" 17341 . Let's see what files we have in our directory now: ~/swishdir$ ls -1 index.swish-e index.swish-e.prop swish.cgi swish.conf =item 3 Test the CGI script This is a simple step, but often overlooked. You should test from the command line instead of jumping ahead and testing with the web server. See the C section below for more information. ~/swishdir$ ./swish.cgi | head Content-Type: text/html; charset=ISO-8859-1 Search our site The above shows that the script can be run directly, and generates a correct HTTP header and HTML. If you run the above and see something like this: ~/swishdir >./swish.cgi bash: ./swish.cgi: No such file or directory then you probably need to edit the script to point to the correct location of your perl program. Here's one way to find out where perl is located (again, on unix): ~/swishdir$ which perl /usr/local/bin/perl ~/swishdir$ /usr/local/bin/perl -v This is perl, v5.6.0 built for i586-linux ... Good! We are using a reasonably current version of perl. Now that we know perl is at F we can adjust the "shebang" line in the perl script (e.g. the first line of the script): ~/swishdir$ pico swish.cgi (edit the #! line) ~/swishdir$ head -1 swish.cgi #!/usr/local/bin/perl -w =item 4 Test with the web server How you do this is completely dependent on your web server, and you may need to talk to your web server admin to get this working. Often files with the .cgi extension are automatically set up to run as CGI scripts, but not always. In other words, this step is really up to you to figure out! This example shows creating a I from the web server space to the directory used above. This will only work if the web server is configured to follow symbolic links (the default for Apache). This operation requires root access: ~/swishdir$ su -c "ln -s $HOME/swishdir /usr/local/apache/htdocs/swishdir" Password: ********* If your account is on an ISP and your web directory is F<~/public_html> the you might just move the entire directory: mv ~/swishdir ~/public_html Now, let's make a real HTTP request: ~/swishdir$ GET http://localhost/swishdir/swish.cgi | head -3 #!/usr/local/bin/perl -w package SwishSearch; use strict; Oh, darn. It looks like Apache is not running the script and instead returning it as a static page. Apache needs to be told that swish.cgi is a CGI script. F<.htaccess> comes to the rescue: ~/swishdir$ cat .htaccess # Deny everything by default Deny From All # But allow just CGI script Options ExecCGI Allow From All SetHandler cgi-script That "Deny From All" prevents access to all files (such as config and index files), and only access is allowed to the F script. Let's try the request one more time: ~/swishdir >GET http://localhost/swishdir/swish.cgi | head Search our site

That looks better! Now use your web browser to test. Now, you may note that the links are not valid on the search results page. The swish config file contained the line: ReplaceRules remove /usr/local/apache/ To make those links works (and assuming your web server will follow symbolic links): ~/swishtest$ ln -s /usr/local/apache/htdocs BTW - "GET" used above is a program included with Perl's LWP library. If you do no have this you might try something like: wget -O - http://localhost/swishdir/swish.cgi | head and if nothing else, you can always telnet to the web server and make a basic request. ~/swishtest$ telnet localhost 80 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. GET /swishtest/swish.cgi http/1.0 HTTP/1.1 200 OK Date: Wed, 13 Feb 2002 20:14:31 GMT Server: Apache/1.3.20 (Unix) mod_perl/1.25_01 Connection: close Content-Type: text/html; charset=ISO-8859-1 Search our site This may seem like a lot of work compared to using a browser, but browsers are a poor tool for basic CGI debugging. =back If you have problems check the C section below. =head1 CONFIGURATION If you want to change the location of the swish-e binary or the index file, use multiple indexes, add additional metanames and properties, change the default highlighting behavior, etc., you will need to adjust the script's configuration settings. Again, please get a test setup working with the default parameters before making changes to any configuration settings. Better to debug one thing at a time... In general, you will need to adjust the script's settings to match the index file you are searching. For example, if you are indexing a hypermail list archive you may want to make the script use metanames/properties of Subject, Author, and, Email address. Or you may wish to provide a way to limit searches to subsets of documents (e.g. parts of your directory tree). To make things somewhat "simple", the configuration parameters are included near the top of the swish.cgi program. That is the only place that the individual parameters are defined and explained, so you will need to open up the swish.cgi script in an editor to view the options. Further questions about individual settings should be referred to the swish-e discussion list. The parameters are all part of a perl C structure, and the comments at the top of the program should get you going. The perl hash structure may seem a bit confusing, but it makes it easy to create nested and complex parameters. Syntax is important, so cut-n-paste should be your best defense if you are not a perl programmer. By the way, Perl has a number of quote operators. For example, to quote a string you might write: title => 'Search My Site', Some options take more than one parameter, where each parameter must be quoted. For example: metanames => [ 'swishdefault', 'swishtitle', 'swishdocpath' ], which assigns an array ( [...] ) of three strings to the "metanames" variable. Lists of quoted strings are so common in perl that there's a special operator called "qw" (quote word) to save typing all those quotes: metanames => [ qw/ swishdefault swishtitle swishdocpath / ], or to use the parenthesis as the quote character (you can pick any): metanames => [ qw( swishdefault swishtitle swishdocpath ) ], There are two options for changing the configuration settings from their default values: One way is to edit the script directly, or the other was is to use a separate configuration file. In either case, the configuration settings are a basic perl hash reference. Using a configuration file is described below, but contains the same hash structure. There are many configuration settings, and some of them are commented out either by using a "#" symbol, or by simply renaming the configuration directive (e.g. by adding an "x" to the parameter name). A very basic configuration setup might look like: return { title => 'Search the Swish-e list', # Title of your choice. swish_binary => 'swish-e', # Location of swish-e binary swish_index => 'index.swish-e', # Location of your index file }; Or if searching more than one index: return { title => 'Search the Swish-e list', swish_binary => 'swish-e', swish_index => ['index.swish-e', 'index2'], }; Both of these examples return a reference to a perl hash ( C ). In the second example, the multiple index files are set as an array reference. Note that in the example above the swish-e binary file is relative to the current directory. If running under mod_perl you will need to use absolute paths. The script can also use the SWISH::API perl module (included with the swish-e distribution in the F directory) to access the swish-e index. The C option is used to enable the use of the SWISH::API module: return { title => 'Search the Swish-e list', swish_index => ['index.swish-e', 'index2'], use_library => 1, # enable use of the SWISH::API module }; The module must be available via the @INC array, like all Perl modules. Using the SWISH::API module avoids the need to fork and execute a the swish-e program. Under mod_perl you will may see a significant performance improvement when using the SWISH::API module. Under normal CGI usage you will probably not see any speed improvements. B As mentioned above, configuration settings can be either set in the F script, or set in a separate configuration file. Settings in a configuration file will override the settings in the script. By default, the F script will attempt to read settings from the file F<.swishcgi.conf>. For example, you might only wish to change the title used in the script. Simply create a file called F<.swishcgi.conf> in the same directory as the CGI script: > cat .swishcgi.conf # Example swish.cgi configuration script. return { title => 'Search Our Mailing List Archive', }; The settings you use will depend on the index you create with swish: return { title => 'Search the Apache documentation', swish_binary => 'swish-e', swish_index => 'index.swish-e', metanames => [qw/swishdefault swishdocpath swishtitle/], display_props => [qw/swishtitle swishlastmodified swishdocsize swishdocpath/], title_property => 'swishdocpath', prepend_path => 'http://myhost/apachedocs', name_labels => { swishdefault => 'Search All', swishtitle => 'Title', swishrank => 'Rank', swishlastmodified => 'Last Modified Date', swishdocpath => 'Document Path', swishdocsize => 'Document Size', }, }; The above configuration defines metanames to use on the form. Searches can be limited to these metanames. "display_props" tells the script to display the property "swishlastmodified" (the last modified date of the file), the document size, and path with the search results. The parameter "name_labels" is a hash (reference) that is used to give friendly names to the metanames. Here's another example. Say you want to search either (or both) the Apache 1.3 documentation and the Apache 2.0 documentation indexed seperately. return { title => 'Search the Apache Documentation', date_ranges => 0, swish_index => [ qw/ index.apache index.apache2 / ], select_indexes => { method => 'checkbox_group', labels => [ '1.3.23 docs', '2.0 docs' ], # Must match up one-to-one to swish_index description => 'Select: ', }, }; Now you can select either or both sets of documentation while searching. All the possible settings are included in the default configuration located near the top of the F script. Open the F script with an editor to look at the various settings. Contact the Swish-e Discussion list for help in configuring the script. =head1 DEBUGGING Most problems with using this script have been a result of improper configuration. Please get the script working with default settings before adjusting the configuration settings. The key to debugging CGI scripts is to run them from the command line, not with a browser. First, make sure the program compiles correctly: $ perl -c swish.cgi swish.cgi syntax OK Next, simply try running the program: $ ./swish.cgi | head Content-Type: text/html; charset=ISO-8859-1 Search our site Under Windows you will need to run the script as: C:\wwwroot\swishtest> perl swish.cgi Now, you know that the program compiles and will run from the command line. Next, try accessing the script from a web browser. If you see the contents of the CGI script instead of its output then your web server is not configured to run the script. With Apache look at settings like ScriptAlias, SetHandler, and Options. If an error is reported (such as Internal Server Error or Forbidden) you need to locate your web server's error_log file and carefully read what the problem is. Contact your web administrator for help locating the web server's error log. If you don't have access to the web server's error_log file, you can modify the script to report errors to the browser screen. Open the script and search for "CGI::Carp". (Author's suggestion is to debug from the command line -- adding the browser and web server into the equation only complicates debugging.) The script does offer some basic debugging options that allow debugging from the command line. The debugging options are enabled by setting an environment variable "SWISH_DEBUG". How that is set depends on your operating system and the shell you are using. These examples are using the "bash" shell syntax. Note: You can also use the "debug_options" configuration setting, but the recommended method is to set the environment variable. You can list the available debugging options like this: $ SWISH_DEBUG=help ./swish.cgi >outfile Unknown debug option 'help'. Must be one of: basic: Basic debugging command: Show command used to run swish headers: Show headers returned from swish output: Show output from swish summary: Show summary of results dump: Show all data available to templates Debugging options may be combined: $ SWISH_DEBUG=command,headers,summary ./swish.cgi >outfile You will be asked for an input query and the max number of results to return. You can use the defaults in most cases. It's a good idea to redirect output to a file. Any error messages are sent to stderr, so those will still be displayed (unless you redirect stderr, too). Here are some examples: ~/swishtest$ SWISH_DEBUG=basic ./swish.cgi >outfile Debug level set to: 1 Enter a query [all]: Using 'not asdfghjklzxcv' to match all records Enter max results to display [1]: ------ Can't use DateRanges feature ------------ Script will run, but you can't use the date range feature Can't locate Date/Calc.pm in @INC (@INC contains: modules /usr/local/lib/perl5/5.6.0/i586-linux /usr/local/lib/perl5/5.6.0 /usr/local/lib/perl5/site_perl/5.6.0/i586-linux /usr/local/lib/perl5/site_perl/5.6.0 /usr/local/lib/perl5/site_perl/5.005/i586-linux /usr/local/lib/perl5/site_perl/5.005 /usr/local/lib/perl5/site_perl .) at modules/DateRanges.pm line 107, line 2. BEGIN failed--compilation aborted at modules/DateRanges.pm line 107, line 2. Compilation failed in require at ./swish.cgi line 971, line 2. -------------- Can't exec "./swish-e": No such file or directory at ./swish.cgi line 1245, line 2. Child process Failed to exec './swish-e' Error: No such file or directory at ./swish.cgi line 1246, line 2. Failed to find any results The above indicates two problems. First problem is that the Date::Calc module is not installed. The Date::Calc module is needed to use the date limiting feature of the script. The second problem is a bit more serious. It's saying that the script can't find the swish-e binary file. In this example it's specified as being in the current directory. Either correct the path to the swish-e binary, or make a local copy or symlink to the swish-e binary. ~/swishtest$ cat .swishcgi.conf return { title => 'Search the Apache Documentation', swish_binary => '/usr/local/bin/swish-e', date_ranges => 0, }; Now, let's try again: ~/swishtest$ SWISH_DEBUG=basic ./swish.cgi >outfile Debug level set to: 1 ---------- Read config parameters from '.swishcgi.conf' ------ $VAR1 = { 'date_ranges' => 0, 'title' => 'Search the Apache Documentation' }; ------------------------- Enter a query [all]: Using 'not asdfghjklzxcv' to match all records Enter max results to display [1]: Found 1 results Can't locate SWISH::TemplateDefault.pm in @INC (@INC contains: modules /usr/local/lib/perl5/5.6.0/i586-linux /usr/local/lib/perl5/5.6.0 /usr/local/lib/perl5/site_perl/5.6.0/i586-linux /usr/local/lib/perl5/site_perl/5.6.0 /usr/local/lib/perl5/site_perl/5.005/i586-linux /usr/local/lib/perl5/site_perl/5.005 /usr/local/lib/perl5/site_perl .) at ./swish.cgi line 608. This means that the swish.cgi script could not locate a required module. To correct this locate where the SWISH::Template module is installed and add a "use lib" line to your configuration file (or to the swish.cgi script): ~/swishtest$ cat .swishcgi.conf use lib '/home/bill/local/lib/perl'; return { title => 'Search the Apache Documentation', date_ranges => 0, }; ~/swishtest$ SWISH_DEBUG=basic ./swish.cgi >outfile Debug level set to: 1 ---------- Read config parameters from '.swishcgi.conf' ------ $VAR1 = { 'date_ranges' => 0, 'title' => 'Search the Apache Documentation' }; ------------------------- Enter a query [all]: Using 'not asdfghjklzxcv' to match all records Enter max results to display [1]: Found 1 results That is much better! The "use lib" statement tells Perl where to look for modules by adding the path supplied to an array called @INC. Note that most modules are in the SWISH namespace. For example, the default output module is called SWISH::TemplateDefault. When Perl is looking for that module it is looking for the file F. If the "use lib" statement is set as: use lib '/home/bill/local/lib/perl'; then Perl will look (among other places) for the file /home/bill/local/lib/perl/SWISH/TemplateDefault.pm when attempting to load the SWISH::TemplateDefault module. Relative paths may also be used. use lib 'modules'; will cause Perl to look for the file: ./modules/SWISH/TemplateDefault.pm relative to where the swish.cgi script is running. (This is not true when running under mod_perl). Here's another common problem. Everything checks out, but when you run the script you see the message: Swish returned unknown output Ok, let's find out what output it is returning: ~/swishtest$ SWISH_DEBUG=headers,output ./swish.cgi >outfile Debug level set to: 13 ---------- Read config parameters from '.swishcgi.conf' ------ $VAR1 = { 'swish_binary' => '/usr/local/bin/swish-e', 'date_ranges' => 0, 'title' => 'Search the Apache Documentation' }; ------------------------- Enter a query [all]: Using 'not asdfghjklzxcv' to match all records Enter max results to display [1]: usage: swish [-i dir file ... ] [-S system] [-c file] [-f file] [-l] [-v (num)] ... version: 2.0 docs: http://sunsite.berkeley.edu/SWISH-E/ *** 9872 Failed to run swish: 'Swish returned unknown output' *** Failed to find any results Oh, looks like /usr/local/bin/swish-e is version 2.0 of swish. We need 2.1-dev and above! =head1 Frequently Asked Questions Here's some common questions and answers. =head2 How do I change the way the output looks? The script uses a module to generate output. By default it uses the SWISH::TemplateDefault.pm module. The module used is selected in the swish.cgi configuration file. Modules are located in the example/modules/SWISH directory in the distribution, but are installed in the $prefix/lib/swish-e/perl/SWISH/ directory. To make simple changes you can edit the installed SWISH::TemplatDefault module directly, otherwise make a copy of the module and modify its package name. For example, change directories to the location of the installed module and copy the module to a new name: $ cp TemplateDefault.pm MyTemplateDefault.pm Then at the top of the module adjust the "package" line to: package SWISH::MyTemplateDefault; To use this modules you need to adjust the configuration settings (either at the top of F or in a configuration file: template => { package => 'SWISH::MyTemplateDefault', }, The module does not need to be in the SWISH namespace, and can be stored in any location as long as the module can be found via the @INC array (i.e. modify the "use lib" statement in swish.cgi if needed). =head2 How do I use a templating system with swish.cgi? In addition to the TemplateDefault.pm module, the swish-e distribution includes two other Perl modules for generating output using the templating systems HTML::Template and Template-Toolkit. Templating systems use template files to generate the HTML, and make maintaining the look of a large (or small) site much easier. HTML::Template and Template-Toolkit are separate packages and can be downloaded from the CPAN. See http://search.cpan.org. Two basic templates are provided as examples for generating output using these templating systems. The example templates are located in the F directory. The module F uses the file F to generate its output, while the module F uses the F file. (Note: swish.tt was renamed from search.tt Jun 03, 2004.) To use either of these modules you will need to adjust the "template" configuration setting. Examples for both templating systems are provided in the configuration settings near the top of the F program. Use of these modules is an advanced usage of F and are provided as examples only. All of the output generation modules are passed a hash with the results from the search, plus other data use to create the output page. You can see this hash by using the debugging option "dump" or by using the included SWISH::TemplateDumper module: ~/swishtest >cat .swishcgi.conf return { title => 'Search the Apache Documentation', template => { package => 'SWISH::TemplateDumper', }, }; And run a query. For example: http://localhost/swishtest/swish.cgi?query=install =head2 Why are there three different highlighting modules? Three are three highlighting modules included with the swish-e distribution. Each is a trade-off of speed vs. accuracy: SWISH::DefaultHighlight - reasonably fast, but does not highlight phrases SWISH::PhraseHighlight - reasonably slow, but is reasonably accurate SWISH::SimpleHighlight - fast, some phrases, but least accurate Eh, the default is actually "PhraseHighlight". Oh well. All of the highlighting modules slow down the script. Optimizations to these modules are welcome! =head2 My ISP doesn't provide access to the web server logs There are a number of options. One way it to use the CGI::Carp module. Search in the swish.cgi script for: use Carp; # Or use this instead -- PLEASE see perldoc CGI::Carp for details # use CGI::Carp qw(fatalsToBrowser warningsToBrowser); And change it to look like: #use Carp; # Or use this instead -- PLEASE see perldoc CGI::Carp for details use CGI::Carp qw(fatalsToBrowser warningsToBrowser); This should be only for debugging purposes, as if used in production you may end up sending quite ugly and confusing messages to your browsers. =head2 Why does the output show (NULL)? Swish-e displays (NULL) when attempting to display a property that does not exist in the index. The most common reason for this message is that you did not use StoreDescription in your config file while indexing. StoreDescription HTML* 200000 That tells swish to store the first 200,000 characters of text extracted from the body of each document parsed by the HTML parser. The text is stored as property "swishdescription". The index must be recreated after changing the swish-e configuration. Running: ~/swishtest > ./swish-e -T index_metanames will display the properties defined in your index file. This can happen with other properties, too. For example, this will happen when you are asking for a property to display that is not defined in swish. ~/swishtest > ./swish-e -w install -m 1 -p foo # SWISH format: 2.1-dev-25 # Search words: install err: Unknown Display property name "foo" . ~/swishtest > ./swish-e -w install -m 1 -x 'Property foo=\n' # SWISH format: 2.1-dev-25 # Search words: install # Number of hits: 14 # Search time: 0.000 seconds # Run time: 0.038 seconds Property foo=(NULL) . To check that a property exists in your index you can run: ~/swishtest > ./swish-e -w not dkdk -T index_metanames | grep foo foo : id=10 type=70 META_PROP:STRING(case:ignore) *presorted* Ok, in this case we see that "foo" is really defined as a property. Now let's make sure F is asking for "foo" (sorry for the long lines): ~/swishtest > SWISH_DEBUG=command ./swish.cgi > /dev/null Debug level set to: 3 Enter a query [all]: Using 'not asdfghjklzxcv' to match all records Enter max results to display [1]: ---- Running swish with the following command and parameters ---- ./swish-e \ -w \ 'swishdefault=(not asdfghjklzxcv)' \ -b \ 1 \ -m \ 1 \ -f \ index.swish-e \ -s \ swishrank \ desc \ swishlastmodified \ desc \ -x \ '\t\t\t\t\t\t\t\t\n' \ -H \ 9 If you look carefully you will see that the -x parameter has "fos" instead of "foo", so there's our problem. =head2 How do I use the SWISH::API perl module with swish.cgi? Use the C configuration directive: use_library => 1, This will only provide improved performance when running under mod_perl or other persistent environments. =head2 Why does the "Run time" differ when using the SWISH::API module When using the SWISH::API module the run (and search) times are calculated within the script. When using the swish-e binary the swish-e program reports the times. The "Run time" may include the time required to load and compile the SWISH::API module. =head1 MOD_PERL This script can be run under mod_perl (see http://perl.apache.org). This will improve the response time of the script compared to running under CGI by loading the swish.cgi script into the Apache web server. You must have a mod_perl enabled Apache server to run this script under mod_perl. Configuration is simple. In your httpd.conf or your startup.pl file you need to load the script. For example, in httpd.conf you can use a perl section: use lib '/usr/local/apache/cgi-bin'; # location of the swish.cgi file use lib '/home/yourname/swish-e/example/modules'; # modules required by swish.cgi require "swish.cgi"; Again, note that the paths used will depend on where you installed the script and the modules. When running under mod_perl the swish.cgi script becomes a perl module, and therefore the script does not need to be installed in the cgi-bin directory. (But, you can actually use the same script as both a CGI script and a mod_perl module at the same time, read from the same location.) The above loads the script into mod_perl. Then to configure the script to run add this to your httpd.conf configuration file: PerlSetVar Swish_Conf_File /home/yourname/swish-e/myconfig.pl allow from all SetHandler perl-script PerlHandler SwishSearch Note that you use the "Swish_Conf_File" setting in httpd.conf to tell the script which config file to use. This means you can use the same script (and loaded modules) for different search sites (running on the same Apache server). You can just specify differnt config files for each Location and they can search different indexes and have a completely different look for each site, but all share the same code. B that the config files are cached in the swish.cgi script. Changes to the config file will require restarting the Apache server before they will be reloaded into the swish.cgi script. This avoids calling stat() for every request. Unlike CGI, mod_perl does not change the current directory to the location of the script, so your settings for the swish binary and the path to your index files must be absolute paths (or relative to the server root). Using the SWISH::API module with mod_perl will provide the most performance improvements. Use of the SWISH::API module can be enabled by the configuration setting C: use_library => 1, Without highlighting code enabled, using the SWISH::API module resulted in about 20 requests per second, where running the swish-e binary slowed the script down to about 8 requests per second. Note that the highlighting code is slow. For the best search performance turn off highlighting. In your config file you can add: highlighting => 0, # disable highlighting and the script will show the first 500 chars of the description (or whatever you set for "max_chars"). Without highlight one test was processing about 20 request per second. With The "PhraseHighlight" module that dropped to a little better than two requests per second, "DefaultHighlight" was about 2.3 request per second, and "SimpleHighlight" was about 6 request per second. Experiement with different highlighting options when testing performance. Please post to the swish-e discussion list if you have any questions about running this script under mod_perl. Here's some general request/second on an Athlon XP 1800+ with 1/2GB RAM, Linux 2.4.20. Highlighting Mode None Phrase Default Simple Using SWISH::API 45 1.5 2 12 ---------------------------------------------------------------------------- Using swish-e 12 1.3 1.8 7.5 binary As you can see the highlighting code is a limiting factor. =head1 SpeedyCGI SpeedyCGI (also called PersistentPerl) is another way to run Perl scripts persistently. SpeedyCGI is good if you do not have mod_perl available or do not have root access. SpeedyCGI works on Unix systems by loading the script into a "back end" process and keeping it in memory between requests. New requests are passed to the back end processes which avoids the startup time required by a Perl CGI script. Install SpeedyCGI from http://daemoninc.com/ (your OS may provide a packaged version of SpeedyCGI) and then change the first line of swish.cgi. For example, if the speedy binary is installed in /usr/bin/speedy, use the line: #! /usr/bin/speedy -w -- -t60 The -w option is passed to Perl, and all options following the double-dash are SpeedyCGI options. Note that when using SpeedyCGI configuration data is cached in memory. If you change the swish.cgi configuration file (.swishcgi.conf) then touch the main swish.cgi script to force reloading of configuration data. =head1 Spidering There are two ways to spider with swish-e. One uses the "http" input method that uses code that's part of swish. The other way is to use the new "prog" method along with a perl helper program called C. Here's an example of a configuration file for spidering with the "http" input method. You can see that the configuration is not much different than the file system input method. (But, don't use the http input method -- use the -S prog method shown below.) # Define what to index IndexDir http://www.myserver.name/index.html IndexOnly .html .htm IndexContents HTML* .html .htm DefaultContents HTML* StoreDescription HTML* 200000 MetaNames swishdocpath swishtitle # Define http method specific settings -- see swish-e documentation SpiderDirectory ../swish-e/src/ Delay 0 You index with the command: swish-e -S http -c spider.conf Note that this does take longer. For example, spidering the Apache documentation on a local web server with this method took over a minute, where indexing with the file system took less than two seconds. Using the "prog" method can speed this up. Here's an example configuration file for using the "prog" input method: # Define the location of the spider helper program IndexDir ../swish-e/prog-bin/spider.pl # Tell the spider what to index. SwishProgParameters default http://www.myserver.name/index.html IndexContents HTML* .html .htm DefaultContents HTML* StoreDescription HTML* 200000 MetaNames swishdocpath swishtitle Then to index you use the command: swish-e -c prog.conf -S prog -v 0 Spidering with this method took nine seconds. =head1 Stemmed Indexes Many people enable a feature of swish called word stemming to provide "fuzzy" search options to their users. The stemming code does not actually find the "stem" of word, rather removes and/or replaces common endings on words. Stemming is far from perfect, and many words do not stem as you might expect. Plus, currently only English is supported. But, it can be a helpful tool for searching your site. You may wish to create both a stemmed and non-stemmed index, and provide a checkbox for selecting the index file. To enable a stemmed index you simply add to your configuration file: UseStemming yes If you want to use a stemmed index with this program and continue to highlight search terms you will need to install a perl module that will stem words. This section explains how to do this. The perl module is included with the swish-e distribution. It can be found in the examples directory (where you found this file) and called something like: SWISH-Stemmer-0.05.tar.gz The module should also be available on CPAN (http://search.cpan.org/). Here's an example session for installing the module. (There will be quite a bit of output when running make.) % gzip -dc SWISH-Stemmer-0.05.tar.gz |tar xof - % cd SWISH-Stemmer-0.05 % perl Makefile.PL or % perl Makefile.PL PREFIX=$HOME/perl_lib % make % make test (perhaps su root at this point if you did not use a PREFIX) % make install % cd .. Use the B if you do not have root access or you want to install the modules in a local library. If you do use a PREFIX setting, add a C statement to the top of this swish.cgi program. For example: use lib qw( /home/bmoseley/perl_lib/lib/site_perl/5.6.0 /home/bmoseley/perl_lib/lib/site_perl/5.6.0/i386-linux/ ); Once the stemmer module is installed, and you are using a stemmed index, the C script will automatically detect this and use the stemmer module. =head1 DISCLAIMER Please use this CGI script at your own risk. This script has been tested and used without problem, but you should still be aware that any code running on your server represents a risk. If you have any concerns please carefully review the code. See http://www.w3.org/Security/Faq/www-security-faq.html Security on Windows questionable. =head1 SUPPORT The SWISH-E discussion list is the place to ask for any help regarding SWISH-E or this example script. See http://swish-e.org. Before posting please review: http://swish-e.org/2.2/docs/INSTALL.html#When_posting_please_provide_the_ Please do not contact the author or any of the swish-e developers directly. =head1 LICENSE swish.cgi $Revision: 1830 $ Copyright (C) 2001 Bill Moseley search@hank.org Example CGI program for searching with SWISH-E This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. =head1 AUTHOR Bill Moseley =cut swish-e-2.4.7/example/swish.tmpl0000775000077100017500000001520311166010111013506 00000000000000 <TMPL_IF RESULTS>Results for: <TMPL_VAR QUERY_SIMPLE><TMPL_ELSE><TMPL_VAR TITLE></TMPL_IF>
Swish-e home page

Limit search to:
Sort by: Reverse Sort
 Results for   to of results. Run time: | Search time:    
  << [] >>  Page: Previous Next

-- rank:
...

:
bytes.


Pages:
Previous Next


Powered by Swish-e swish-e.org Powered by:

Valid HTML 4.01!

swish-e-2.4.7/prog-bin/0000777000077100017500000000000011166013170011623 500000000000000swish-e-2.4.7/prog-bin/index_hypermail.pl0000775000077100017500000002101611166010113015254 00000000000000# !/usr/bin/perl -w use strict; ## See documentation below. Script may require customization ## read documentation with "perldoc index_hypermail.pl" use File::Find; use Date::Parse; use HTML::TreeBuilder; use Data::Dumper; ## This is the string that is removed while indexing from email addresses ## as defined in the hypermailrc file. # #---------------------- config ----------------------------------------------------- my $dumb_spamblock = '(at)not-real.'; #------------------------------------------------------------------------------------ my $dir = shift || die "must specfy directory to search"; debug(@ARGV) if $dir eq 'debug'; # Do all the work find( { wanted => \&wanted }, $dir ); sub wanted { return if -d; # don't need to process directories return unless /^\d+\.html$/; # If you want it to parse using HTML::Parser use the first line # and comment out the second. But it's a LOT slower #output_file( $File::Find::name, parse_file($_) ); output_file( $File::Find::name, fast_parse($_) ); } sub output_file { my ( $file, $data ) = @_; local $SIG{__WARN__} = sub { "$file: @_" }; # Get last_mod date my $date = str2time( $data->{comments}{received} ); unless ( $data ) { warn "Failed to parse received date in $file\n"; $date = str2time( $data->{comments}{send} ); unless ( $date ) { warn "Failed to parse any dates: skipping $file\n"; return; } } $data->{received} = $date; my $comments = $data->{comments}; $comments->{email} =~ s/\Q$dumb_spamblock/-blabla-/; my $metas = join "\n", map { qq[] } sort keys %{$data->{comments}}; my $title = $comments->{subject} || ''; my $html = < $title $metas $data->{body} EOF my $bytecount = length pack 'C0a*', $html; print <new; $tree->store_comments(1); # meta data is in the comments $tree->warn(1); $tree->parse_file( $file ); my %comments; # Extract out metadata for ( $tree->look_down( '_tag', '~comment' )) { my $comment = $_->attr("text"); $comments{$1} = $2 if $comment =~ /(\w+)="([^"]+)/; } $data{comments} = \%comments if %comments; # should die here if not. # Extract out the searchable content my $body = $tree->look_down('_tag', 'div', 'class', 'mail'); unless ( $body ) { warn "$file: failed to find
\n"; return; } # Remove some sub-nodes we don't care about $body->look_down('_tag', 'address', 'class', 'headers')->delete; $body->look_down('_tag', 'span', 'id', 'received')->delete; $data{body} = $body->as_HTML; $tree->delete; return \%data; } sub fast_parse { my $file = shift; local $_; unless ( open FH, "<$file" ) { warn "Failed to open '$file'. Error: $!"; return; } my %data; my %comments; # First parse out the comments while () { if ( my( $tag, $content) = /$/ ) { unless ( $content ) { warn "File '$file' tag '$tag' empty content\n"; next; } last if $tag eq 'body'; # no more comments in this section $comments{$tag} = $content; } } $data{comments} = \%comments; # Now grab the content my $end_str; # for skipping sections my $body = ''; while ( ) { # loo for ending tag, or maybe even the signature last if // || /^-- $/ || /^--$/ || /^(_|-){40,}\s*$/; # Look for ending tag for a skipped tag set if ( $end_str ) { $end_str = '' if /\Q$end_str/; next; } # These are sections to skip if ( /\Q
100000 UndefinedMetaTags ignore Copy index_hypermail.pl to the current directory. Swish-e installs index_hypermail.pl in the $prefix/share/doc/swish-e/examples/prog-bin directory, where $prefix is typically "/usr/local" or simply "/usr" on some distributions. $ cp /usr/local/share/doc/swish-e/example/prog-bin/index_hypermail . Then Index the documents: $ swish-e -c swish.conf -S prog Now create the search interface: $ cp /usr/local/lib/swish-e/swish.cgi . $ cat .swishcgi.conf $ENV{TZ} = 'UTC'; # display dates in UTC format return { title => "Search the Foo List Archive", display_props => [qw/ name email swishlastmodified /], sorts => [qw/swishrank swishtitle email swishlastmodified/], metanames => [qw/swishdefault swishtitle name email/], name_labels => { swishrank => 'Rank', swishtitle => 'Subject Only', name => "Poster's Name", email => "Poster's Email", swishlastmodified => 'Message Date', swishdefault => 'Subject & Body', }, highlight => { package => 'SWISH::PhraseHighlight', xhighlight_on => '', xhighlight_off => '', meta_to_prop_map => { # this maps search metatags to display properties swishdefault => [ qw/swishtitle swishdescription/ ], swishtitle => [ qw/swishtitle/ ], email => [ qw/email/ ], name => [ qw/name/ ], swishdocpath => [ qw/swishdocpath/ ], }, }, }; Setup web server (OS/web server dependent): /var/www # ln -s /path/to/hypermail/search /var/www # ln -s /path/to/hypermail/archive and maybe tell apache to run the script: $ cat .htaccess Deny from all Allow from all SetHandler cgi-script Options +ExecCGI =head1 DESCRIPTION This script is used to parse files produced by hypermail. Last tested with hypermail pre-2.1.9. It scans the directory passed as the first parameter for files matching \d+\.html and then extracts out the content, email, name and subject. This is then passed to swish-e for indexing. The swish.cgi script is used for searching the resulting index. Configuration settings are stored in the .swish.cgi file located in the current directory. By default, swish.cgi expects the current working directory to be the location of the cgi script. On other web servers this may not be the case and you will need to edit swish.cgi to use absolute path names for .swishcgi.conf and the index files. =head1 USAGE See the SYNOPSIS above. If you do not use the directory structure above you may need to use ReplaceRules in the swish-e config file to adjust the paths stored in the swish-e index file. =head1 COPYRIGHT This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =head1 SEE ALSO Hypermail can be downloaded from: http://hypermail.org =head1 AUTHOR Bill Moseley moseley@hank.org. 2004 =head1 SUPPORT Please contact the Swish-e discussion email list for support with this module or with Swish-e. Please do not contact the developers directly. swish-e-2.4.7/prog-bin/Makefile.am0000664000077100017500000000214511166010113013571 00000000000000perlmoduledir = $(libexecdir)/perl exampledir = $(datadir)/doc/$(PACKAGE)/examples/prog-bin libexec_SCRIPTS = spider.pl DirTree.pl # These are really out dated perlmodule_SCRIPTS = \ doc2txt.pm \ pdf2html.pm \ pdf2xml.pm CLEANFILES = spider.pl DirTree.pl # This is done here to stay in the GNU coding standards # libexecdir can be modified at make time, so can't use # variable substitution at configure time spider.pl: spider.pl.in @rm -f spider.pl @sed \ -e 's,@@perlmoduledir@@,$(libexecdir)/perl,' \ -e 's,@@swishbindir@@,$(bindir),' \ -e 's,@@perlbinary@@,$(PERL),' \ $(srcdir)/spider.pl.in > spider.pl DirTree.pl: DirTree.pl.in @rm -f spider.pl @sed \ -e 's,@@perlmoduledir@@,$(libexecdir)/perl,' \ -e 's,@@swishbindir@@,$(bindir),' \ -e 's,@@perlbinary@@,$(PERL),' \ $(srcdir)/DirTree.pl.in > DirTree.pl other_examples = \ README \ file.pl \ SwishSpiderConfig.pl \ MySQL.pl \ index_hypermail.pl \ pdf2xml.pm \ pdf2html.pm \ doc2txt.pm example_DATA = $(other_examples) EXTRA_DIST = \ spider.pl.in \ DirTree.pl.in \ $(other_examples) swish-e-2.4.7/prog-bin/doc2txt.pm0000775000077100017500000000462511166010113013472 00000000000000package doc2txt; use strict; =pod =head1 NAME doc2txt - swish-e sample module to convert MS Word docs to text =head1 SYNOPSIS use doc2txt; my $doc_record_ref = doc2txt( $doc_file_name ); # or by passing content in a scalar reference my $doc_text_ref = doc2txt( \$doc_content ); =head1 DESCRIPTION Sample module for use with other swish-e 'prog' document source programs. Pass either a file name, or a scalar reference. The differece is when you pass a reference to a scalar only the content is returned. When you pass a file name an entire record is returned ready to be fed to swish -- this includes the headers required by swish for indexing. =head1 REQUIREMENTS Uses the catdoc program. http://www.fe.msk.ru/~vitus/catdoc/ You may need to adjust the parameters used to call catdoc. You will also need the module File::Temp available from CPAN if passing content to this module (instead of a file name). I'm not thrilled about how that currently works... =head1 AUTHOR Bill Moseley =cut use Symbol; use vars qw( @ISA @EXPORT $VERSION ); # $Id: doc2txt.pm 1279 2003-06-12 04:00:45Z whmoseley $ $VERSION = sprintf '%d.%02d', q$Revision: 1279 $ =~ /: (\d+)\.(\d+)/; require Exporter; @ISA = qw(Exporter); @EXPORT = qw(doc2txt); my @InfoTags = qw/Title Subject Author CreationDate Creator Producer ModDate Keywords/; my $catdoc = 'catdoc -a'; # how cat doc is called. Rainer uses catdoc -s8859-1 -d8859-1 sub doc2txt { my $file_or_content = shift; my $file = ref $file_or_content ? create_temp_file( $file_or_content ) : $file_or_content; # This doesn't work my $path = $file; for ( $path ) { s/"/\\"/g; $path = qq["$path"]; } my $content = `$catdoc $path`; return \$content if ref $file_or_content; # otherwise build the headers my $mtime = (stat $file )[9]; my $size = length $content; my $ret = < 1 ); print $fh $$scalar_ref or die $!; close $fh or die "Failed to close '$file_name' $!"; return $file_name; } swish-e-2.4.7/prog-bin/DirTree.pl.in0000775000077100017500000002437711166010113014053 00000000000000#!@@perlbinary@@ -w use strict; ## Run this program with the -man option for documentation ## # This is set to where Swish-e's "make install" installed the helper modules. use lib ( '@@perlmoduledir@@' ); use File::Find; # for recursing a directory tree use Getopt::Long; use Pod::Usage; #--------------- User Configuration Section ------------------------ # File extensions that indicate text files even though SWISH::Filter thinks # they might be binary based on mime.types my @not_binary_extensions = qw/ .pl .pm .c .conf rc /; # Subroutines to validate files and directories. # Return true if file is ok to process or false to skip the file/directory # The Default for these functions is to return true and process all # files and directories. sub check_path { my $path = shift; return 1; # return true to process this file } sub check_dir { my $dir = shift; return 1; # return true to process this directory } #-------------------- End User Config ------------------------------------ my $extensions = join '|', map { quotemeta } @not_binary_extensions; my $textre = qr/($extensions)$/; my %options; GetOptions( \%options, 'verbose!', 'debug!', 'symlinks!', 'path', 'man', 'no_skip', ) || pod2usage(2); pod2usage( -verbose => 2 ) if $options{man}; if ( $options{path} ) { print '@@perlmoduledir@@',"\n"; exit; } pod2usage("Must supply at least one directory") unless @ARGV; $ENV{FILTER_DEBUG} = 1 if $options{debug}; # See perldoc File::Find for information on following symbolic links # and other important topics. use constant DEBUG => 0; # Try to load the filter module eval { require SWISH::Filter }; warn "Failed to load SWISH::Filter [$@]\n" if $@ && $options{debug}; my $filter = SWISH::Filter->new unless $@; find( { wanted => \&wanted, no_chdir => 1, # 5.6 feature follow => $options{follow_symlinks}, }, @ARGV, ); $options{verbose} = 1 if $options{debug}; sub wanted { my $path = $File::Find::name; if ( -d ) { #stat if ( !check_dir( $path ) ) { $File::Find::prune = 1; warn "Skipped dir [$path] by user function check_dir()\n" if $options{verbose}; } return; } if ( !-r _ ) { warn "$File::Find::name is not readable\n" if $options{verbose}; return; } my $mtime = (stat _ )[9]; if ( !check_path( $path ) ) { warn "Skipped path [$path] by user function check_path()\n" if $options{verbose}; return; } if ( $filter ) { my $doc = $filter->convert( document => $path, ); unless ( $doc ) { if ( $options{no_skip} ) { process_file( $path, $mtime ); return; } warn "Failed [$path] SWISH::Filter->convert failed.\n" if $options{verbose}; return; } if ( $doc->is_binary && $path !~ /$textre/ ) { # ignore "binary" files (not text/* mime type) warn "Skipping [$path] due to content type: " . $doc->content_type .": may be binary\n" if $options{verbose}; return; } my $bytes = output_document( $path, $doc->fetch_doc, $mtime, $doc->swish_parser_type ); if ( $options{verbose} ) { print STDERR "Indexed [$path] ", ($doc->was_filtered ? "(Was filtered) " : "(Not filtered) "), $doc->content_type . " ", ($doc->swish_parser_type || '(parser unspecified)'), " ($bytes bytes)", "\n"; } return; } # Otherwise, fetch document manually process_file( $path, $mtime ); } sub process_file { my ( $path, $mtime ) = @_; unless ( open FH, $path ) { warn "Failed to open '$path': $!\n"; return; } local $/ = undef; my $content = ; close FH; my $bytes = output_document( $path, \$content, $mtime ); if ( $options{verbose} ) { print STDERR "Indexed [$path] (not processed with SWISH:Filter) ($bytes bytes)\n"; } } sub output_document { my ( $path, $content_ref, $mtime, $parser_type ) = @_; # Get the length of the content - have to worry about multi-byte content # ugly and maybe expensive, but perhaps more portable than "use bytes" my $bytecount = length pack 'C0a*', $$content_ref; my $header = "Path-Name: $path\nContent-Length: $bytecount\nLast-Mtime: $mtime\n"; $header .= "Document-Type: $parser_type\n" if $parser_type; print $header . "\n" . $$content_ref; } __END__ =head1 NAME DirTree.pl - program to fetch local documents for Swish-e =head1 SYNOPSIS DirTree.pl [options] directory | swish-e -S prog -i stdin Options: -verbose Display processing info -debug Enable debugging (including SWISH::Filter debugging) -man Display documentation -path Display location lib path set at installation -no_skip Process documents even if filtering fails -symlinks Follow symbolic links. Default is to NOT follow symlinks =head1 DESCRIPTION DirTree.pl is an example Perl script that can be used with Swish-e to fetch documents from the local file system. It works somewhat like Swish-e's default -S fs input method (reading from the file system). DirTree.pl will attempt to load the SWISH::Filter module for use in filtering documents (e.g. PDF or MS Word). DirTree.pl is a thin wrapper around Perl's File::Find module. Before modifying this script for your own use please read the documentation for File::Find: $ perldoc File::Find IMPORTANT: By default DirTree.pl will attempt to index all files in the directories and sub-directories supplied. It's expected that you will customize this script for your own needs. When using -S prog many of the features available to select or exclude files that can be specified in the swish-e config file will have no effect. It's expected that checks on files will be added to the DirTree.pl program. This is much more powerful and allows more control, but requires more work to setup. There are two skeleton functions at the top of DirTree.pl that can be modified for filtering what gets indexed: check_path() and check_dir(). Both are passed in the path or directory name as their only parameter. Return FALSE to skip the given path or directory. Here's two examples: # Skip all .wav files. sub check_path { my $path = shift; return if $path =~ /\.wav$/; # return false if ends in .wav? return 1; # otherwise return true } # Skip all directories that start with a dot (hidden dirs) sub check_dir { my $dir = shift; return $dir !~ m[^\.]; # return true if does not start with a dot } Those are called for each file or directory processed. The File::Find module also provides a preprocess option where all the files and directories in a directory are passed in as a list to a subroutine. This list can be filtered and passed back to File::Find. This would be useful if, say, you wanted to skip a directory if a file "noindex" existed in the directory. See perldoc File::Find for details. =head2 Filtering Filtering is the process of converting a document that swish-e cannot index into a document that swish-e can index. The SWISH::Filter module is used for filtering documents. SWISH::Filter is part of the swish-e distribution and was installed at the same time Swish-e was installed. SWISH::Filter uses "helper" programs to do the actual filtering. For example, to filter PDF files you would need to have the Xpdf package installed (included with the Windows version of Swish-e). When SWISH::Filter is first loaded it determines which filters are available. SWISH::Filter uses the MIME::Types module to convert a file name into a MIME type (e.g. .doc => application/msword) and that type is used to determine what filter to use, if any. Filters convert the document to a new MIME type (e.g. the MS Word filter might convert the document to text/html or text/plain). Binary Files After Filtering, this program (DirTree.pl) then checks to see if the file is a binary file. This is a very simple test that simply looks for "text/" at the start of the MIME type. Clearly, this is incorrect for man MIME types. For example, if you were indexing Perl scripts of type "application/x-perl" this program would think the file was binary and not index it. At the top of the program is a list of file endings that tell DirTree.pl that they should be indexed even if their MIME type does not start with "text/". Another problem is some files will not map to a MIME type. The best solution is to add the file ending and MIME type to your mime.types file. But, if you just want to index any file that does not have a MIME type use the -no_skip option. =head1 REQUIREMENTS To use the SWISH::Filter module you will need the helper applications installed. Check with your OS packages or Google for sources. PDF conversion requires the Xpdf package MS Word conversion requires the Catdoc package The Windows version of Swish-e includes Xpdf and Catdoc packages. For content type matching install the Perl Mime::Types module. =head1 OPTIONS A few options may be passed to DirTree.pl =over 8 =item B<-verbose> Produces information about each file as it is processed. =item B<-debug> Enables detailed debugging. SWISH::Filter debugging is also enabled. =item B<-no_skip> When set documents that fail processing with SWISH::Filter will still be processed. Typically this means documents where a content-type could be determined. Make sure you have the Mime::Types module installed. =item B<-symlinks> When specified will recurse into directories that are symbolic links. The default is to NOT recurse into symbolic links. This options sets the "follow" option in the File::Find module. =back =head1 BUGS May not work well on multi-byte input files. In order to work on Windows (where two chars are used to terminate lines) this program reads the ENTIRE file into memory so that an accurate byte count can be made. Therefore, it's probably a good idea not to index files that are too big. =head1 SUPPORT Contact the Swish-e discussion list. See: http://swish-e.org swish-e-2.4.7/prog-bin/README0000664000077100017500000000347111166010113012420 00000000000000These are example scripts that you can use the with "prog" document source feature of Swish-e. The "prog" document source feature of Swish-e allow you to index any type of document, provided you can convert the document into a format that Swish-e can parse (text, html, or xml). spider.pl Working example of a web spider. This program is a full-featured spider, that is fully customizable through its configuration file. Note: spider.pl is installed in the scripts directory. Running swish-e -h will display the scripts directory. SwishSpiderConfig.pl Example configuration file for the spider.pl program file.pl A very simple examle of a program that feeds documents to swish. Its purpose it to demonstrate how to write a program for use with Swish-e's "prog" input method. DirTree.pl A slightly more advanced example that reads a directory tree and indexes a few files types. Uses the pdf2xml module for pdf files. Its purpose it to demonstrate how to write a program for use with Swish-e's "prog" input method. MySQL.pl Another simple example that shows how to index data stored in a MySQL database. Instructions are included on how to configure the swish.cgi program index_hypermail.pl An example program for indexing mailing list archives that are created with the popular Hypermail program. pdf2xml.pm and pdf2html.pm Perl modules to convert pdf to xml documents for indexing. Requires the pdftotext program. Type perldoc pdf2xml.pm or perldoc pdf2html.pm from the prog-bin directory for documentation. doc2txt.pm Perl module to convert MS Word documents to text. Requires the catdoc program. Type perldoc doc2txt.pm from the prog-bin directory for documentation. Note: The modules to convert PDF and MS Word documents are outdated. See SWISH::Filter for more information. swish-e-2.4.7/prog-bin/pdf2xml.pm0000775000077100017500000000720711166010113013456 00000000000000package pdf2xml; use strict; =pod =head1 NAME pdf2xml - swish-e sample module to convert pdf2xml =head1 SYNOPSIS use pdf2xml; my $xml_record_ref = pdf2xml( $pdf_file_name ); # or by passing content in a scalar reference my $xml_text_ref = pdf2xml( \$pdf_content ); =head1 DESCRIPTION Sample module for use with other swish-e 'prog' document source programs. Pass either a file name, or a scalar reference. The differece is when you pass a reference to a scalar only the content is returned. When you pass a file name an entire record is returned ready to be fed to swish -- this includes the headers required by swish for indexing. The plan is to find a library that will do this to avoid forking an external program. =head1 REQUIREMENTS Uses the xpdf package that includes the pdftotext conversion program. This is available from http://www.foolabs.com/xpdf/xpdf.html. You will also need the module File::Temp (and its dependencies) available from CPAN if passing content to this module (instead of a file name). =head1 AUTHOR Bill Moseley =cut use Symbol; use vars qw( @ISA @EXPORT $VERSION ); # $Id: pdf2xml.pm 1279 2003-06-12 04:00:45Z whmoseley $ $VERSION = sprintf '%d.%02d', q$Revision: 1279 $ =~ /: (\d+)\.(\d+)/; require Exporter; @ISA = qw(Exporter); @EXPORT = qw(pdf2xml); my @InfoTags = qw/Title Subject Author CreationDate Creator Producer ModDate Keywords/; sub pdf2xml { my $file_or_content = shift; my $file = ref $file_or_content ? create_temp_file( $file_or_content ) : $file_or_content; my $headers = get_pdf_headers( $file ) || ''; my $content_ref = get_pdf_content_ref( $file ); my $txt = < $headers $$content_ref EOF return \$txt if ref $file_or_content; my $mtime = (stat $file )[9]; my $size = length $txt; my $ret = <) { if ( /^\s*([^:]+):\s+(.+)$/ ) { my ( $metaname, $value ) = ( lc( $1 ), escapeXML( $2 ) ); $metaname =~ tr/ /_/; $metadata{$metaname} = $value; } } close $sym or die "$0: Failed close on pipe to pdfinfo for $file: $?"; return join "\n", map { "<$_>$metadata{$_}" } sort keys %metadata; } sub get_pdf_content_ref { my $file = shift; # This doesn't work my $path = $file; for ( $path ) { s/"/\\"/g; $path = qq["$path"]; } my $sym = gensym; open $sym, "pdftotext $path - |" or die "$0: failed to run pdftotext: $!"; local $/ = undef; my $content = escapeXML(<$sym>); close $sym or die "$0: Failed close on pipe to pdftotext for $file: $?"; return \$content; } # How are URLs printed with pdftotext? sub escapeXML { my $str = shift; for ( $str ) { s/&/&/go; s/"/"/go; s//>/go; } return $str; } # This is the portable way to do this, I suppose. # Otherwise, just create a file in the local directory. sub create_temp_file { my $scalar_ref = shift; require "File/Temp.pm"; my ( $fh, $file_name ) = File::Temp::tempfile( UNLINK => 1 ); print $fh $$scalar_ref or die $!; close $fh or die "Failed to close '$file_name' $!"; return $file_name; } swish-e-2.4.7/prog-bin/pdf2html.pm0000775000077100017500000001077711166010113013630 00000000000000package pdf2html; use strict; =pod =head1 NAME pdf2html - swish-e sample module to convert pdf to html =head1 SYNOPSIS use pdf2html; my $html_record_ref = pdf2html( $pdf_file_name, 'title' ); # or by passing content in a scalar reference my $html_text_ref = pdf2html( \$pdf_content, 'title' ); =head1 DESCRIPTION Sample module for use with other swish-e 'prog' document source programs. Pass either a file name, or a scalar reference. The differece is when you pass a reference to a scalar only the content is returned. When you pass a file name an entire record is returned ready to be fed to swish -- this includes the headers required by swish for indexing. The second optional parameter is the extracted PDF info tag to use as the HTML title. The plan is to find a library that will do this to avoid forking an external program. =head1 REQUIREMENTS Uses the xpdf package that includes the pdftotext conversion program. This is available from http://www.foolabs.com/xpdf/xpdf.html. You will also need the module File::Temp (and its dependencies) available from CPAN if passing content to this module (instead of a file name). =head1 AUTHOR Bill Moseley =cut use Symbol; use vars qw( @ISA @EXPORT $VERSION ); # $Id: pdf2html.pm 1279 2003-06-12 04:00:45Z whmoseley $ $VERSION = sprintf '%d.%02d', q$Revision: 1279 $ =~ /: (\d+)\.(\d+)/; require Exporter; @ISA = qw(Exporter); @EXPORT = qw(pdf2html); if ( $0 eq 'pdf2html.pm' ) { my $file = shift || die "Usage: perl pdf2html.pm file.pdf [title tag]\n"; my $title = shift; print ${pdf2html( $file, $title )}; } sub pdf2html { my $file_or_content = shift; my $title_tag = shift; my $file = ref $file_or_content ? create_temp_file( $file_or_content ) : $file_or_content; my $metadata = get_pdf_headers( $file ); my $headers = format_metadata( $metadata ); if ( $title_tag && exists $metadata->{ $title_tag } ) { my $title = escapeXML( $metadata->{ $title_tag } ); $headers = "$title\n" . $headers } # Check for encrypted content my $content_ref; # patch provided by Martial Chartoire if ( $metadata->{encrypted} && $metadata->{encrypted} =~ /yes\.*\scopy:no\s\.*/i ) { $content_ref = \''; } else { $content_ref = get_pdf_content_ref( $file ); } my $txt = < $headers
$$content_ref
EOF if ( ref $file_or_content ) { # unlink $file; return \$txt; } my $mtime = (stat $file )[9]; my $size = length $txt; my $ret = <) { if ( /^\s*([^:]+):\s+(.+)$/ ) { my ( $metaname, $value ) = ( lc( $1 ), $2 ); $metaname =~ tr/ /_/; $metadata{$metaname} = $value; } } close $sym or warn "$0: Failed close on pipe to pdfinfo for $file: $?"; return \%metadata; } sub format_metadata { my $metadata = shift; my $metas = join "\n", map { qq['; } sort keys %$metadata; return $metas; } sub get_pdf_content_ref { my $file = shift; # This doesn't work my $path = $file; for ( $path ) { s/"/\\"/g; $path = qq["$path"]; } my $sym = gensym; open $sym, "pdftotext $path - |" or die "$0: failed to run pdftotext: $!"; local $/ = undef; my $content = escapeXML(<$sym>); close $sym or warn "$0: Failed close on pipe to pdftotext for $file: $?"; return \$content; } # How are URLs printed with pdftotext? sub escapeXML { my $str = shift; for ( $str ) { s/&/&/go; s/"/"/go; s//>/go; } return $str; } # This is the portable way to do this, I suppose. # Otherwise, just create a file in the local directory. sub create_temp_file { my $scalar_ref = shift; require "File/Temp.pm"; my ( $fh, $file_name ) = File::Temp::tempfile(); print $fh $$scalar_ref or die $!; close $fh or die "Failed to close '$file_name' $!"; return $file_name; } 1; swish-e-2.4.7/prog-bin/spider.pl.in0000775000077100017500000026531311166010113014000 00000000000000#!@@perlbinary@@ -w use strict; # This is set to where Swish-e's "make install" installed the helper modules. use lib ( '@@perlmoduledir@@' ); # $Id: spider.pl.in 1900 2007-02-07 17:28:56Z moseley $ # # "prog" document source for spidering web servers # # For documentation, type: # # perldoc spider.pl # # Copyright (C) 2001-2003 Bill Moseley swishscript@hank.org # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License # as published by the Free Software Foundation; either version # 2 of the License, or (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # The above lines must remain at the top of this program #---------------------------------------------------------------------------------- $HTTP::URI_CLASS = "URI"; # prevent loading default URI::URL # so we don't store long list of base items # and eat up memory with >= URI 1.13 use LWP::RobotUA; use HTML::LinkExtor; use HTML::Tagset; use vars '$VERSION'; $VERSION = sprintf '%d.%02d', q$Revision: 1900 $ =~ /: (\d+)\.(\d+)/; use vars '$bit'; use constant DEBUG_ERRORS => $bit = 1; # program errors use constant DEBUG_URL => $bit <<= 1; # print out every URL processes use constant DEBUG_HEADERS => $bit <<= 1; # prints the response headers use constant DEBUG_FAILED => $bit <<= 1; # failed to return a 200 use constant DEBUG_SKIPPED => $bit <<= 1; # didn't index for some reason use constant DEBUG_INFO => $bit <<= 1; # more verbose use constant DEBUG_LINKS => $bit <<= 1; # prints links as they are extracted use constant DEBUG_REDIRECT => $bit <<= 1; # prints links that are redirected use constant MAX_REDIRECTS => 20; # keep from redirecting forever my %DEBUG_MAP = ( errors => DEBUG_ERRORS, url => DEBUG_URL, headers => DEBUG_HEADERS, failed => DEBUG_FAILED, skipped => DEBUG_SKIPPED, info => DEBUG_INFO, links => DEBUG_LINKS, redirect => DEBUG_REDIRECT, ); # Valid config file options my @config_options = qw( agent base_url credentials credential_timeout debug delay_min (deprecated) delay_sec email filter_content get_password ignore_robots_file keep_alive link_tags max_depth max_files max_indexed max_size max_time max_wait_time quiet remove_leading_dots same_hosts skip spider_done test_response test_url use_cookies use_default_config use_head_requests use_md5 validate_links filter_object output_function ); my %valid_config_options = map { $_ => 1 } @config_options; use constant MAX_SIZE => 5_000_000; # Max size of document to fetch use constant MAX_WAIT_TIME => 30; # request time. #Can't locate object method "host" via package "URI::mailto" at ../prog-bin/spider.pl line 473. #sub URI::mailto::host { return '' }; # This is not the right way to do this. sub UNIVERSAL::host { '' }; sub UNIVERSAL::port { '' }; sub UNIVERSAL::host_port { '' }; sub UNIVERSAL::userinfo { '' }; #----------------------------------------------------------------------- use vars '@servers'; my $config = shift || 'SwishSpiderConfig.pl'; if ( lc( $config ) eq 'default' ) { @servers = default_urls(); } else { do $config or die "Failed to read $0 configuration parameters '$config' $! $@"; die "$0: config file '$config' failed to set \@servers array\n" unless @servers; die "$0: config file '$config' did not set \@servers array to contain a hash\n" unless ref $servers[0] eq 'HASH'; # Check config options for my $server ( @servers ) { for ( keys %$server ) { warn "$0: ** Warning: config option [$_] is unknown. Perhaps misspelled?\n" unless $valid_config_options{$_} } } } print STDERR "$0: Reading parameters from '$config'\n" unless $ENV{SPIDER_QUIET}; my $abort; local $SIG{HUP} = sub { warn "Caught SIGHUP\n"; $abort++ } unless $^O =~ /Win32/i; my %visited; # global -- I suppose would be smarter to localize it per server. my %validated; my %bad_links; for my $s ( @servers ) { if ( !$s->{base_url} ) { die "You must specify 'base_url' in your spider config settings\n"; } # Merge in default config? $s = { %{ default_config() }, %$s } if $s->{use_default_config}; # Now, process each URL listed my @urls = ref $s->{base_url} eq 'ARRAY' ? @{$s->{base_url}} :( $s->{base_url}); for my $url ( @urls ) { # purge config options -- used when base_url is an array $valid_config_options{$_} || delete $s->{$_} for keys %$s; $s->{base_url} = $url; process_server( $s ); } } if ( %bad_links ) { print STDERR "\nBad Links:\n\n"; foreach my $page ( sort keys %bad_links ) { print STDERR "On page: $page\n"; printf(STDERR " %-40s %s\n", $_, $validated{$_} ) for @{$bad_links{$page}}; print STDERR "\n"; } } #================================================================================== # process_server() # # This processes a single server config (part of @servers) # It validates and cleans up the config and then starts spidering # for each URL listed in base_url # #---------------------------------------------------------------------------------- sub process_server { my $server = shift; # set defaults # Set debug options. $server->{debug} = defined $ENV{SPIDER_DEBUG} ? $ENV{SPIDER_DEBUG} : ($server->{debug} || 0); # Convert to number if ( $server->{debug} !~ /^\d+$/ ) { my $debug = 0; $debug |= (exists $DEBUG_MAP{lc $_} ? $DEBUG_MAP{lc $_} : die "Bad debug setting passed in " . (defined $ENV{SPIDER_DEBUG} ? 'SPIDER_DEBUG environment' : q['debug' config option]) . " '$_'\nOptions are: " . join( ', ', sort keys %DEBUG_MAP) ."\n") for split /\s*,\s*/, $server->{debug}; $server->{debug} = $debug; } $server->{quiet} ||= $ENV{SPIDER_QUIET} || 0; # Lame Microsoft $URI::ABS_REMOTE_LEADING_DOTS = $server->{remove_leading_dots} ? 1 : 0; $server->{max_size} = MAX_SIZE unless defined $server->{max_size}; die "max_size parameter '$server->{max_size}' must be a number\n" unless $server->{max_size} =~ /^\d+$/; $server->{max_wait_time} ||= MAX_WAIT_TIME; die "max_wait_time parameter '$server->{max_wait_time}' must be a number\n" if $server->{max_wait_time} !~ /^\d+$/; # Can be zero or undef or a number. $server->{credential_timeout} = 30 unless exists $server->{credential_timeout}; die "credential_timeout '$server->{credential_timeout}' must be a number\n" if defined $server->{credential_timeout} && $server->{credential_timeout} !~ /^\d+$/; $server->{link_tags} = ['a'] unless ref $server->{link_tags} eq 'ARRAY'; $server->{link_tags_lookup} = { map { lc, 1 } @{$server->{link_tags}} }; die "max_depth parameter '$server->{max_depth}' must be a number\n" if defined $server->{max_depth} && $server->{max_depth} !~ /^\d+/; for ( qw/ test_url test_response filter_content/ ) { next unless $server->{$_}; $server->{$_} = [ $server->{$_} ] unless ref $server->{$_} eq 'ARRAY'; my $n; for my $sub ( @{$server->{$_}} ) { $n++; die "Entry number $n in $_ is not a code reference\n" unless ref $sub eq 'CODE'; } } my $start = time; if ( $server->{skip} ) { print STDERR "Skipping Server Config: $server->{base_url}\n" unless $server->{quiet}; return; } require "HTTP/Cookies.pm" if $server->{use_cookies}; require "Digest/MD5.pm" if $server->{use_md5}; # set starting URL, and remove any specified fragment my $uri = URI->new( $server->{base_url} ); $uri->fragment(undef); if ( $uri->userinfo ) { die "Can't specify parameter 'credentials' because base_url defines them\n" if $server->{credentials}; $server->{credentials} = $uri->userinfo; $uri->userinfo( undef ); } print STDERR "\n -- Starting to spider: $uri --\n" if $server->{debug}; # set the starting server name (including port) -- will only spider on server:port # All URLs will end up with this host:port $server->{authority} = $uri->canonical->authority; # All URLs must match this scheme ( Jan 22, 2002 - spot by Darryl Friesen ) $server->{scheme} = $uri->scheme; # Now, set the OK host:port names $server->{same} = [ $uri->canonical->authority || '' ]; push @{$server->{same}}, @{$server->{same_hosts}} if ref $server->{same_hosts}; $server->{same_host_lookup} = { map { $_, 1 } @{$server->{same}} }; # set time to end $server->{max_time} = $server->{max_time} * 60 + time if $server->{max_time}; # set default agent for log files $server->{agent} ||= 'swish-e http://swish-e.org/'; # get a user agent object my $ua; # set the delay unless ( defined $server->{delay_sec} ) { if ( defined $server->{delay_min} && $server->{delay_min} =~ /^\d+\.?\d*$/ ) { # change if ever move to Time::HiRes $server->{delay_sec} = int ($server->{delay_min} * 60); } $server->{delay_sec} = 5 unless defined $server->{delay_sec}; } $server->{delay_sec} = 5 unless $server->{delay_sec} =~ /^\d+$/; if ( $server->{ignore_robots_file} ) { $ua = LWP::UserAgent->new; return unless $ua; $ua->agent( $server->{agent} ); $ua->from( $server->{email} ); } else { $ua = LWP::RobotUA->new( $server->{agent}, $server->{email} ); return unless $ua; $ua->delay( 0 ); # handle delay locally. } # If ignore robots files also ignore meta ignore # comment out so can find http-equiv charset # $ua->parse_head( 0 ) if $server->{ignore_robots_file} || $server->{ignore_robots_headers}; # Set the timeout - used to only for windows and used alarm, but this # did not always works correctly. Hopefully $ua->timeout works better in # current versions of LWP (before DNS could block forever) $ua->timeout( $server->{max_wait_time} ); $server->{ua} = $ua; # save it for fun. # $ua->parse_head(0); # Don't parse the content $ua->cookie_jar( HTTP::Cookies->new ) if $server->{use_cookies}; if ( $server->{keep_alive} ) { if ( $ua->can( 'conn_cache' ) ) { my $keep_alive = $server->{keep_alive} =~ /^\d+$/ ? $server->{keep_alive} : 1; $ua->conn_cache( { total_capacity => $keep_alive } ); } else { delete $server->{keep_alive}; warn "Can't use keep-alive: conn_cache method not available\n"; } } # Disable HEAD requests if there's no reason to use them # Keep_alives is questionable because even without keep alives # it might be faster to do a HEAD than a partial GET. if ( $server->{use_head_requests} && !$server->{keep_alive} || !( $server->{test_response} || $server->{max_size} ) ) { warn 'Option "use_head_requests" was disabled.\nNeed keep_alive and either test_response or max_size options\n'; delete $server->{use_head_requests}; } # uri, parent, depth eval { spider( $server, $uri ) }; print STDERR $@ if $@; # provide a way to call a function in the config file when all done check_user_function( 'spider_done', undef, $server ); delete $server->{ua}; # Free up LWP to avoid CLOSE_WAITs hanging around when using a lot of @servers. return if $server->{quiet}; $start = time - $start; $start++ unless $start; my $max_width = 0; my $max_num = 0; for ( keys %{$server->{counts}} ) { $max_width = length if length > $max_width; my $val = commify( $server->{counts}{$_} ); $max_num = length $val if length $val > $max_num; } print STDERR "\nSummary for: $server->{base_url}\n"; for ( sort keys %{$server->{counts}} ) { printf STDERR "%${max_width}s: %${max_num}s (%0.1f/sec)\n", $_, commify( $server->{counts}{$_} ), $server->{counts}{$_}/$start; } } #----------------------------------------------------------------------- # Deal with Basic Authen # Thanks Gisle! sub get_basic_credentials { my($uri, $server, $realm ) = @_; # Exists but undefined means don't ask. return if exists $server->{credential_timeout} && !defined $server->{credential_timeout}; # Exists but undefined means don't ask. my $netloc = $uri->canonical->host_port; my ($user, $password); eval { local $SIG{ALRM} = sub { die "timed out\n" }; # a zero timeout means don't time out alarm( $server->{credential_timeout} ) unless $^O =~ /Win32/i; if ( $uri->userinfo ) { print STDERR "\nSorry: invalid username/password\n"; $uri->userinfo( undef ); } print STDERR "Need Authentication for $uri at realm '$realm'\n( skips)\nUsername: "; $user = ; chomp($user) if $user; die "No Username specified\n" unless length $user; alarm( $server->{credential_timeout} ) unless $^O =~ /Win32/i; print STDERR "Password: "; system("stty -echo"); $password = ; system("stty echo"); print STDERR "\n"; # because we disabled echo chomp($password); alarm( 0 ) unless $^O =~ /Win32/i; }; alarm( 0 ) unless $^O =~ /Win32/i; return if $@; return join ':', $user, $password; } #----------- Non recursive spidering --------------------------- # Had problems with some versions of LWP where memory was not freed # after the URI objects went out of scope, so instead just maintain # a list of URI. # Should move this to a DBM or database. sub spider { my ( $server, $uri ) = @_; # Validate the first link, just in case return unless check_link( $uri, $server, '', '(Base URL)' ); my @link_array = [ $uri, '', 0 ]; while ( @link_array ) { die $server->{abort} if $abort || $server->{abort}; my ( $uri, $parent, $depth ) = @{shift @link_array}; delay_request( $server ); # Delete any per-request data delete $server->{_request}; my $new_links = process_link( $server, $uri->clone, $parent, $depth ); push @link_array, map { [ $_, $uri, $depth+1 ] } @$new_links if $new_links; } } #---------- Delay a request based on the delay time ------------- sub delay_request { my ( $server ) = @_; # Here's a place to log the type of connection if ( $server->{keep_alive_connection} ) { $server->{counts}{'Connection: Keep-Alive'}++; # no delay on keep-alives return; } $server->{counts}{'Connection: Close'}++; # return if no delay or first request return if !$server->{delay_sec} || !$server->{last_response_time}; my $wait = $server->{delay_sec} - ( time - $server->{last_response_time} ); return unless $wait > 0; print STDERR "sleeping $wait seconds\n" if $server->{debug} & DEBUG_URL; sleep( $wait ); } #================================================================================ # process_link() - process a link from the list # # Can be called recursively (for auth and redirects) # # This does most of the work. # Pass in: # $server -- config hash, plus ugly scratch pad memory # $uri -- uri to fetch and extract links from # $parent -- parent uri for better messages # $depth -- for controlling how deep to go into a site, whatever that means # # Returns: # undef or an array ref of links to add to the list # # Makes request, tests response, logs, parsers and extracts links # Very ugly as this is some of the oldest code # #--------------------------------------------------------------------------------- sub process_link { my ( $server, $uri, $parent, $depth ) = @_; $server->{counts}{'Unique URLs'}++; die "$0: Max files Reached\n" if $server->{max_files} && $server->{counts}{'Unique URLs'} > $server->{max_files}; die "$0: Time Limit Exceeded\n" if $server->{max_time} && $server->{max_time} < time; # clean up some per-request crap. # Really should just subclass the response object! $server->{no_contents} = 0; $server->{no_index} = 0; $server->{no_spider} = 0; # Make request object for this URI my $request = HTTP::Request->new('GET', $uri ); ## HTTP::Message uses Compress::Zlib, and Gisle responded Jan 8, 07 that it's safe to test my @encodings; eval { require Compress::Zlib }; push @encodings, qw/gzip x-gzip deflate/ unless $@; eval { require Compress::Bzip2 }; push @encodings, 'x-bzip2' unless $@; $request->header('Accept-encoding', join ', ', @encodings ) if @encodings; $request->header('Referer', $parent ) if $parent; # Set basic auth if defined - use URI specific first, then credentials # this doesn't track what should have authorization my $last_auth; if ( $server->{last_auth} ) { my $path = $uri->path; $path =~ s!/[^/]*$!!; $last_auth = $server->{last_auth}{auth} if $server->{last_auth}{path} eq $path; } if ( my ( $user, $pass ) = split /:/, ( $last_auth || $uri->userinfo || $server->{credentials} || '' ) ) { $request->authorization_basic( $user, $pass ); } my $response; delete $server->{response_checked}; # to keep from checking more than once if ( $server->{use_head_requests} ) { $request->method('HEAD'); # This is ugly in what it can return. It's can be recursive. $response = make_request( $request, $server, $uri, $parent, $depth ); return $response if !$response || ref $response eq 'ARRAY'; # returns undef or an array ref if done # otherwise, we have a response object. $request->method('GET'); } # Now make GET request $response = make_request( $request, $server, $uri, $parent, $depth ); return $response if !$response || ref $response eq 'ARRAY'; # returns undef or an array ref # Now we have a $response object with content return process_content( $response, $server, $uri, $parent, $depth ); } #=================================================================================== # make_request -- # # This only can deal with things that happen in a HEAD request. # Well, unless test for the method # # Hacke up function to make either a HEAD or GET request and test the response # Returns one of three things: # undef - stop processing and return # and array ref - a list of URLs extracted (via recursive call) # a HTTP::Response object # # # Yes it's a mess -- got pulled out of other code when adding HEAD requests #----------------------------------------------------------------------------------- sub make_request { my ( $request, $server, $uri, $parent, $depth ) = @_; my $response; my $response_aborted_msg; my $killed_connection; my $ua = $server->{ua}; if ( $request->method eq 'GET' ) { # When making a GET request this gets called for every chunk returned # from the webserver (well, from the OS). No idea how bit it will be. # my $total_length = 0; my $callback = sub { my ( $content, $response ) = @_; # First time, check response - this can die() check_response( $response, $server, $uri ) unless $server->{response_checked}++; # In case didn't return a content-length header $total_length += length $content; check_too_big( $response, $server, $total_length ) if $server->{max_size}; $response->add_content( $content ); }; ## Make Request ## # Used to wrap in an eval and use alarm on non-win32 to fix broken $ua->timeout $response = $ua->simple_request( $request, $callback, 4096 ); # Check for callback death: # If the LWP callback aborts if ( $response->header('client-aborted') ) { $response_aborted_msg = $response->header('X-Died') || 'unknown'; $killed_connection++; # so we will delay } } else { # Make a HEAD request $response = $ua->simple_request( $request ); # check_response - user callback can call die() so wrap in eval block eval { check_response( $response, $server, $uri ) unless $server->{response_checked}++; }; $response_aborted_msg = $@ if $@; } # save the request completion time for delay between requests $server->{last_response_time} = time; # Ok, did the request abort for some reason? (response checker called die() ) if ( $response_aborted_msg ) { # Log unless it's the callback (because the callback already logged it) if ( $response_aborted_msg !~ /test_response/ ) { $server->{counts}{Skipped}++; # Not really sure why request aborted. Let's try and make the error message # a bit cleaner. print STDERR "Request for '$uri' aborted because: '$response_aborted_msg'\n" if $server->{debug}&DEBUG_SKIPPED; } # Aborting in the callback breaks the connection (so tested on Apache) # even if all the data was transmitted. # Might be smart to flag to abort but wait until the next chunk # to really abort. That might make so the connection would not get killed. delete $server->{keep_alive_connection} if $killed_connection; return; } # Look for connection. Assume it's a keep-alive unless we get a Connection: close # header. Some server errors (on Apache) will close the connection, but they # report it. # Have to assume the connection is open (without asking LWP) since the first # connection we normally do not see (robots.txt) and then following keep-alive # connections do not have Connection: header. my $connection = $response->header('Connection') || 'Keep-alive'; # assume keep-alive $server->{keep_alive_connection} = !$killed_connection && $server->{keep_alive} && $connection !~ /close/i; # Did a callback return abort? return if $server->{abort}; # Clean up the URI so passwords don't leak $response->request->uri->userinfo( undef ) if $response->request; $uri->userinfo( undef ); # A little debugging print STDERR "\nvvvvvvvvvvvvvvvv HEADERS for $uri vvvvvvvvvvvvvvvvvvvvv\n\n---- Request ------\n", $response->request->as_string, "\n---- Response ---\nStatus: ", $response->status_line,"\n", $response->headers->as_string, "\n^^^^^^^^^^^^^^^ END HEADERS ^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n" if $server->{debug} & DEBUG_HEADERS; # Deal with failed responses return failed_response( $response, $server, $uri, $parent, $depth ) unless $response->is_success; # Don't log HEAD requests return $request if $request->method eq 'HEAD'; # Log if requested log_response( $response, $server, $uri, $parent, $depth ) if $server->{debug} & DEBUG_URL; # Check for meta refresh # requires that $ua->parse_head() is enabled (the default) return redirect_response( $response, $server, $uri, $parent, $depth, $1, 'meta refresh' ) if $response->header('refresh') && $response->header('refresh') =~ /URL\s*=\s*(.+)/; return $response; } #=================================================================== # check_response -- after resonse comes back from server # # Failure here should die() because check_user_function can die() # #------------------------------------------------------------------- sub check_response { my ( $response, $server, $uri ) = @_; return unless $response->is_success; # 2xx response. # Cache user/pass if entered from the keyboard or callback function (as indicated by the realm) # do here so we know it is correct if ( $server->{cur_realm} && $uri->userinfo ) { my $key = $uri->canonical->host_port . ':' . $server->{cur_realm}; $server->{auth_cache}{$key} = $uri->userinfo; # not too sure of the best logic here my $path = $uri->path; $path =~ s!/[^/]*$!!; $server->{last_auth} = { path => $path, auth => $uri->userinfo, }; } # check for document too big. check_too_big( $response, $server ) if $server->{max_size}; die "test_response" if !check_user_function( 'test_response', $uri, $server, $response ); } #===================================================================== # check_too_big -- see if document is too big # Die if it is too big. #-------------------------------------------------------------------- sub check_too_big { my ( $response, $server, $length ) = @_; $length ||= $response->content_length || 0; return unless $length && $length =~ /^\d+$/; die "Document exceeded $server->{max_size} bytes (Content-Length: $length) Method: " . $response->request->method . "\n" if $length > $server->{max_size}; } #========================================================================= # failed_response -- deal with a non 2xx response # #------------------------------------------------------------------------ sub failed_response { my ( $response, $server, $uri, $parent, $depth ) = @_; my $links; # Do we need to authorize? if ( $response->code == 401 ) { # This will log the error $links = authorize( $response, $server, $uri, $parent, $depth ); return $links if ref $links or !$links; } # Are we rejected because of robots.txt? if ( $response->status_line =~ 'robots.txt' ) { print STDERR "-Skipped $depth $uri: ", $response->status_line,"\n" if $server->{debug}&DEBUG_SKIPPED; $server->{counts}{'robots.txt'}++; return; } # Look for redirect return redirect_response( $response, $server, $uri, $parent, $depth ) if $response->is_redirect; # Report bad links (excluding those skipped by robots.txt) # Not so sure about this being here for these links... validate_link( $server, $uri, $parent, $response ) if $server->{validate_links}; # Otherwise, log if needed and then return. log_response( $response, $server, $uri, $parent, $depth ) if $server->{debug} & DEBUG_FAILED; return; } #============================================================================= # redirect_response -- deal with a 3xx redirect # # Returns link to follow # #---------------------------------------------------------------------------- sub redirect_response { my ( $response, $server, $uri, $parent, $depth, $location, $description ) = @_; $location ||= $response->header('location'); unless ( $location ) { print STDERR "Warning: $uri returned a redirect without a Location: header\n"; return; } $description ||= 'Location'; # This should NOT be needed, but some servers are broken # and don't return absolute links. # and this may even break things my $u = URI->new_abs( $location, $response->base ); if ( $u->canonical eq $uri->canonical ) { print STDERR "Warning: $uri redirects to itself!.\n"; return; } # make sure it's ok: return unless check_link( $u, $server, $response->base, '(redirect)', $description ); # make recursive request # This will not happen because the check_link records that the link has been seen. # But leave here just in case if ( $server->{_request}{redirects}++ > MAX_REDIRECTS ) { warn "Exceeded redirect limimt: perhaps a redirect loop: $uri on parent page: $parent\n"; return; } print STDERR "--Redirect: $description $uri -> $u. Parent: $parent\n" if $server->{debug} & DEBUG_REDIRECT; $server->{counts}{"$description Redirects"}++; my $links = process_link( $server, $u, $parent, $depth ); $server->{_request}{redirects}-- if $server->{_request}{redirects}; return $links; } #================================================================================= # Do we need to authorize? If so, ask for password and request again. # First we try using any cached value # Then we try using the get_password callback # Then we ask. sub authorize { my ( $response, $server, $uri, $parent, $depth ) = @_; delete $server->{last_auth}; # since we know that doesn't work if ( $response->header('WWW-Authenticate') && $response->header('WWW-Authenticate') =~ /realm="([^"]+)"/i ) { my $realm = $1; my $user_pass; # Do we have a cached user/pass for this realm? unless ( $server->{_request}{auth}{$uri}++ ) { # only each URI only once my $key = $uri->canonical->host_port . ':' . $realm; if ( $user_pass = $server->{auth_cache}{$key} ) { # If we didn't just try it, try again unless( $uri->userinfo && $user_pass eq $uri->userinfo ) { # add the user/pass to the URI $uri->userinfo( $user_pass ); return process_link( $server, $uri, $parent, $depth ); } } } # now check for a callback password (if $user_pass not set) unless ( $user_pass || $server->{_request}{auth}{callback}++ ) { # Check for a callback function $user_pass = $server->{get_password}->( $uri, $server, $response, $realm ) if ref $server->{get_password} eq 'CODE'; } # otherwise, prompt (over and over) if ( !$user_pass ) { $user_pass = get_basic_credentials( $uri, $server, $realm ); } if ( $user_pass ) { $uri->userinfo( $user_pass ); $server->{cur_realm} = $realm; # save so we can cache if it's valid my $links = process_link( $server, $uri, $parent, $depth ); delete $server->{cur_realm}; return $links; } } log_response( $response, $server, $uri, $parent, $depth ) if $server->{debug} & DEBUG_FAILED; return; # Give up } #================================================================================== # Log a response sub log_response { my ( $response, $server, $uri, $parent, $depth ) = @_; # Log the response print STDERR '>> ', join( ' ', ( $response->is_success ? '+Fetched' : '-Failed' ), $depth, "Cnt: $server->{counts}{'Unique URLs'}", $response->request->method, " $uri ", ( $response->status_line || $response->status || 'unknown status' ), ( $response->content_type || 'Unknown content type'), ( $response->content_length || '???' ), "parent:$parent", "depth:$depth", ),"\n"; } #=================================================================================================== # Calls a user-defined function # #--------------------------------------------------------------------------------------------------- sub check_user_function { my ( $fn, $uri, $server ) = ( shift, shift, shift ); return 1 unless $server->{$fn}; my $tests = ref $server->{$fn} eq 'ARRAY' ? $server->{$fn} : [ $server->{$fn} ]; my $cnt; for my $sub ( @$tests ) { $cnt++; print STDERR "?Testing '$fn' user supplied function #$cnt '$uri'\n" if $server->{debug} & DEBUG_INFO; my $ret; eval { $ret = $sub->( $uri, $server, @_ ) }; if ( $@ ) { print STDERR "-Skipped $uri due to '$fn' user supplied function #$cnt death '$@'\n" if $server->{debug} & DEBUG_SKIPPED; $server->{counts}{Skipped}++; return; } next if $ret; print STDERR "-Skipped $uri due to '$fn' user supplied function #$cnt\n" if $server->{debug} & DEBUG_SKIPPED; $server->{counts}{Skipped}++; return; } print STDERR "+Passed all $cnt tests for '$fn' user supplied function\n" if $server->{debug} & DEBUG_INFO; return 1; } #============================================================================== # process_content -- deals with a response object. Kinda # # returns an array ref of new links to follow # #----------------------------------------------------------------------------- sub process_content { my ( $response, $server, $uri, $parent, $depth ) = @_; # Check for meta robots tag # -- should probably be done in request sub to avoid fetching docs that are not needed # -- also, this will not not work with compression $$$ check this unless ( $server->{ignore_robots_file} || $server->{ignore_robots_headers} ) { if ( my $directives = $response->header('X-Meta-ROBOTS') ) { my %settings = map { lc $_, 1 } split /\s*,\s*/, $directives; $server->{no_contents}++ if exists $settings{nocontents}; # an extension for swish $server->{no_index}++ if exists $settings{noindex}; $server->{no_spider}++ if exists $settings{nofollow}; } } # make sure content is unique - probably better to chunk into an MD5 object above if ( $server->{use_md5} ) { my $digest = $response->header('Content-MD5') || Digest::MD5::md5($response->content); if ( $visited{ $digest } ) { print STDERR "-Skipped $uri has same digest as $visited{ $digest }\n" if $server->{debug} & DEBUG_SKIPPED; $server->{counts}{Skipped}++; $server->{counts}{'MD5 Duplicates'}++; return; } $visited{ $digest } = $uri; } my $content = $response->decoded_content; unless ( $content ) { my $empty = ''; output_content( $server, \$empty, $uri, $response ) unless $server->{no_index}; return; } # Extract out links (if not too deep) my $links_extracted = extract_links( $server, \$content, $response ) unless defined $server->{max_depth} && $depth >= $server->{max_depth}; # Index the file if ( $server->{no_index} ) { $server->{counts}{Skipped}++; print STDERR "-Skipped indexing $uri some callback set 'no_index' flag\n" if $server->{debug}&DEBUG_SKIPPED; } else { return $links_extracted unless check_user_function( 'filter_content', $uri, $server, $response, \$content ); output_content( $server, \$content, $uri, $response ) unless $server->{no_index}; } return $links_extracted; } #============================================================================================== # Extract links from a text/html page # # Call with: # $server - server object # $content - ref to content # $response - response object # #---------------------------------------------------------------------------------------------- sub extract_links { my ( $server, $content, $response ) = @_; return unless $response->header('content-type') && $response->header('content-type') =~ m[^text/html]; # allow skipping. if ( $server->{no_spider} ) { print STDERR '-Links not extracted: ', $response->request->uri->canonical, " some callback set 'no_spider' flag\n" if $server->{debug}&DEBUG_SKIPPED; return; } $server->{Spidered}++; my @links; my $base = $response->base; $visited{ $base }++; # $$$ come back and fix this (see 4/20/03 lwp post) print STDERR "\nExtracting links from ", $response->request->uri, ":\n" if $server->{debug} & DEBUG_LINKS; my $p = HTML::LinkExtor->new; $p->parse( $$content ); my %skipped_tags; for ( $p->links ) { my ( $tag, %attr ) = @$_; # which tags to use ( not reported in debug ) my $attr = join ' ', map { qq[$_="$attr{$_}"] } keys %attr; print STDERR "\nLooking at extracted tag '<$tag $attr>'\n" if $server->{debug} & DEBUG_LINKS; unless ( $server->{link_tags_lookup}{$tag} ) { # each tag is reported only once per page print STDERR " <$tag> skipped because not one of (", join( ',', @{$server->{link_tags}} ), ")\n" if $server->{debug} & DEBUG_LINKS && !$skipped_tags{$tag}++; if ( $server->{validate_links} && $tag eq 'img' && $attr{src} ) { my $img = URI->new_abs( $attr{src}, $base ); validate_link( $server, $img, $base ); } next; } # Grab which attribute(s) which might contain links for this tag my $links = $HTML::Tagset::linkElements{$tag}; $links = [$links] unless ref $links; my $found; # Now, check each attribut to see if a link exists for my $attribute ( @$links ) { if ( $attr{ $attribute } ) { # ok tag # Create a URI object my $u = URI->new_abs( $attr{$attribute},$base ); next unless check_link( $u, $server, $base, $tag, $attribute ); push @links, $u; print STDERR qq[ $attribute="$u" Added to list of links to follow\n] if $server->{debug} & DEBUG_LINKS; $found++; } } if ( !$found && $server->{debug} & DEBUG_LINKS ) { print STDERR " tag did not include any links to follow or is a duplicate\n"; } } print STDERR "! Found ", scalar @links, " links in ", $response->base, "\n\n" if $server->{debug} & DEBUG_INFO; return \@links; } #============================================================================= # This function check's if a link should be added to the list to spider # # Pass: # $u - URI object # $server - the server hash # $base - the base or parent of the link # # Returns true if a valid link # # Calls the user function "test_url". Link rewriting before spider # can be done here. # #------------------------------------------------------------------------------ sub check_link { my ( $u, $server, $base, $tag, $attribute ) = @_; $tag ||= ''; $attribute ||= ''; # Kill the fragment $u->fragment( undef ); # Here we make sure we are looking at a link pointing to the correct (or equivalent) host unless ( $server->{scheme} eq $u->scheme && $server->{same_host_lookup}{$u->canonical->authority||''} ) { print STDERR qq[ ?? <$tag $attribute="$u"> skipped because different host\n] if $server->{debug} & DEBUG_LINKS; $server->{counts}{'Off-site links'}++; validate_link( $server, $u, $base ) if $server->{validate_links}; return; } $u->host_port( $server->{authority} ); # Force all the same host name # Allow rejection of this URL by user function return unless check_user_function( 'test_url', $u, $server ); # Don't add the link if already seen - these are so common that we don't report # Might be better to do something like $visited{ $u->path } or $visited{$u->host_port}{$u->path}; if ( $visited{ $u->canonical }++ ) { #$server->{counts}{Skipped}++; $server->{counts}{Duplicates}++; # Just so it's reported for all pages if ( $server->{validate_links} && $validated{$u->canonical} ) { push @{$bad_links{ $base->canonical }}, $u->canonical; } return; } return 1; } #============================================================================= # This function is used to validate links that are off-site. # # It's just a very basic link check routine that lets you validate the # off-site links at the same time as indexing. Just because we can. # #------------------------------------------------------------------------------ sub validate_link { my ($server, $uri, $base, $response ) = @_; $base = URI->new( $base ) unless ref $base; $uri = URI->new_abs($uri, $base) unless ref $uri; # Already checked? if ( exists $validated{ $uri->canonical } ) { # Add it to the list of bad links on that page if it's a bad link. push @{$bad_links{ $base->canonical }}, $uri->canonical if $validated{ $uri->canonical }; return; } $validated{ $uri->canonical } = 0; # mark as checked and ok. unless ( $response ) { my $ua = LWP::UserAgent->new(timeout => $server->{max_wait_time} ); my $request = HTTP::Request->new('HEAD', $uri->canonical ); $response = $ua->simple_request( $request ); } return if $response->is_success; my $error = $response->status_line || $response->status || 'unknown status'; $error .= ' ' . URI->new_abs( $response->header('location'), $response->base )->canonical if $response->is_redirect && $response->header('location'); $validated{ $uri->canonical } = $error; push @{$bad_links{ $base->canonical }}, $uri->canonical; } #=================================================================================== # output_content -- formats content for swish-e # #----------------------------------------------------------------------------------- sub output_content { my ( $server, $content, $uri, $response ) = @_; $server->{indexed}++; unless ( length $$content ) { print STDERR "Warning: document '", $response->request->uri, "' has no content\n"; $$content = ' '; } ## Now, either need to re-encode into the original charset, # or remove any charset from tags and then return utf8. # HTTP::Message uses a different method to extract out the charset, # but should result in the same value. for ( $response->header('content-type') ){ $server->{charset} = $1 if /\bcharset=([^;]+)/; } # Re-encode the data for outside of Perl eval { # Need to only require Encode here? $$content = Encode::encode( $server->{charset}, $$content ) if $server->{charset}; }; if ( $@ ) { print STDERR "Warning: document '", $response->request->uri, "' could not be encoded to charset '$server->{charset}'\n"; delete $server->{charset}; } $server->{counts}{'Total Bytes'} += length $$content; $server->{counts}{'Total Docs'}++; # ugly and maybe expensive, but perhaps more portable than "use bytes" my $bytecount = length pack 'C0a*', $$content; # Decode the URL my $path = $uri; $path =~ s/%([0-9a-fA-F]{2})/chr hex($1)/ge; # For Josh if ( my $fn = $server->{output_function} ) { eval { $fn->( $server, $content, $uri, $response, $bytecount, $path); }; die "output_function died for $uri: $@\n" if $@; die "$0: Max indexed files Reached\n" if $server->{max_indexed} && $server->{counts}{'Total Docs'} >= $server->{max_indexed}; return; } my $headers = join "\n", 'Path-Name: ' . $path, 'Content-Length: ' . $bytecount, ''; $headers .= 'Charset: ' . delete( $server->{charset}) . "\n" if $server->{charset}; $headers .= 'Last-Mtime: ' . $response->last_modified . "\n" if $response->last_modified; # Set the parser type if specified by filtering if ( my $type = delete $server->{parser_type} ) { $headers .= "Document-Type: $type\n"; } elsif ( $response->content_type =~ m!^text/(html|xml|plain)! ) { $type = $1 eq 'plain' ? 'txt' : $1; $headers .= "Document-Type: $type*\n"; } $headers .= "No-Contents: 1\n" if $server->{no_contents}; print "$headers\n$$content"; die "$0: Max indexed files Reached\n" if $server->{max_indexed} && $server->{counts}{'Total Docs'} >= $server->{max_indexed}; } sub commify { local $_ = shift; 1 while s/^([-+]?\d+)(\d{3})/$1,$2/; return $_; } sub default_urls { my $validate = 0; if ( @ARGV && $ARGV[0] eq 'validate' ) { shift @ARGV; $validate = 1; } die "$0: Must list URLs when using 'default'\n" unless @ARGV; my $config = default_config(); $config->{base_url} = [ @ARGV ]; $config->{validate}++ if $validate; return $config; } # Returns a default config hash sub default_config { ## See if we have any filters my ($filter_sub, $response_sub, $filter); eval { ($filter_sub, $response_sub, $filter) = swish_filter() }; if ( $@ ) { warn "Failed to find the SWISH::Filter module. Only processing text/* content.\n$@\n"; $response_sub = sub { my $content_type = $_[2]->content_type; return $content_type =~ m!^text/!; } } return { email => 'swish@user.failed.to.set.email.invalid', link_tags => [qw/ a frame /], keep_alive => 1, test_url => sub { $_[0]->path !~ /\.(?:gif|jpeg|png)$/i }, test_response => $response_sub, use_head_requests => 1, # Due to the response sub filter_content => $filter_sub, filter_object => $filter, }; } #================================================================================= # swish_filter # returns a subroutine for filtering with SWISH::Filter -- for use in config files # #--------------------------------------------------------------------------------- sub swish_filter { require SWISH::Filter; my $filter = SWISH::Filter->new; # closure my $filter_sub = sub { my ( $uri, $server, $response, $content_ref ) = @_; my $content_type = $response->content_type; # Ignore text/* content type -- no need to filter if ( $content_type =~ m!^text/! ) { $server->{counts}{$content_type}++; return 1; } my $doc = $filter->convert( document => $content_ref, name => $response->base, content_type => $content_type, ); return 1 unless $doc; # so just proceed as if not using filter if ( $doc->is_binary ) { # ignore "binary" files (not text/* mime type) die "Skipping " . $response->base . " due to content type: " . $doc->content_type ." may be binary\n"; } # nicer to use **char... $$content_ref = ${$doc->fetch_doc}; # let's see if we can set the parser. $server->{parser_type} = $doc->swish_parser_type || ''; $server->{counts}{"$content_type->" . $doc->content_type}++; return 1; }; # This is used in HEAD request to test the content type ahead of time my $response_sub = sub { my ( $uri, $server, $response, $content_ref ) = @_; my $content_type = $response->content_type; return 1 if $content_type =~ m!^text/!; # allow all text (assume don't want to filter) return $filter->can_filter( $content_type ); }; return ( $filter_sub, $response_sub, $filter ); } __END__ =head1 NAME spider.pl - Example Perl program to spider web servers =head1 SYNOPSIS spider.pl [] [ ...] # Spider using some common defaults and capture the output # into a file ./spider.pl default http://myserver.com/ > output.txt # or using a config file spider.config: @servers = ( { base_url => 'http://myserver.com/', email => 'me@myself.com', # other spider settings described below }, ); ./spider.pl spider.config > output.txt # or using the default config file SwishSpiderConfig.pl ./spider.pl > output.txt # using with swish-e ./spider.pl spider.config | swish-e -c swish.config -S prog -i stdin # or in two steps ./spider.pl spider.config > output.txt swish-e -c swish.config -S prog -i stdin < output.txt # or with compression ./spider.pl spider.config | gzip > output.gz gzip -dc output.gz | swish-e -c swish.config -S prog -i stdin # or having swish-e call the spider directly using the # spider config file SwishSpiderConfig.pl: swish-e -c swish.config -S prog -i spider.pl # or the above but passing passing a parameter to the spider: echo "SwishProgParameters spider.config" >> swish.config echo "IndexDir spider.pl" >> swish.config swish-e -c swish.config -S prog Note: When running on some versions of Windows (e.g. Win ME and Win 98 SE) you may need to tell Perl to run the spider directly: perl spider.pl | swish-e -S prog -c swish.conf -i stdin This pipes the output of the spider directly into swish. =head1 DESCRIPTION F is a program for fetching documnts from a web server, and outputs the documents to STDOUT in a special format designed to be read by Swish-e. The spider can index non-text documents such as PDF and MS Word by use of filter (helper) programs. These programs are not part of the Swish-e distribution and must be installed separately. See the section on filtering below. A configuration file is noramlly used to control what documents are fetched from the web server(s). The configuration file and its options are described below. The is also a "default" config suitable for spidering. The spider is designed to spider web pages and fetch documents from one host at a time -- offsite links are not followed. But, you can configure the spider to spider multiple sites in a single run. F is distributed with Swish-e and is installed in the swish-e library directory at installation time. This directory (libexedir) can be seen by running the command: swish-e -h Typically on unix-type systems the spider is installed at: /usr/local/lib/swish-e/spider.pl This spider stores all links in memory while processing and does not do parallel requests. =head2 Running the spider The output from F can be captured to a temporary file which is then fed into swish-e: ./spider.pl > docs.txt swish-e -c config -S prog -i stdin < docs.txt or the output can be passed to swish-e via a pipe: ./spider.pl | swish-e -c config -S prog -i stdin or the swish-e can run the spider directly: swish-e -c config -S prog -i spider.pl One advantage of having Swish-e run F is that Swish-e knows where to locate the program (based on libexecdir compiled into swish-e). When running the spider I any parameters it looks for a configuration file called F in the current directory. The spider will abort with an error if this file is not found. A configuration file can be specified as the first parameter to the spider: ./spider.pl spider.config > output.txt If running the spider via Swish-e (i.e. Swish-e runs the spider) then use the Swish-e config option L to specify the config file: In swish.config: # Use spider.pl as the external program: IndexDir spider.pl # And pass the name of the spider config file to the spider: SwishProgParameters spider.config And then run Swish-e like this: swish-e -c swish.config -S prog Finally, by using the special word "default" on the command line the spider will use a default configuration that is useful for indexing most sites. It's a good way to get started with the spider: ./spider.pl default http://my_server.com/index.html > output.txt There's no "best" way to run the spider. I like to capture to a file and then feed that into Swish-e. The spider does require Perl's LWP library and a few other reasonably common modules. Most well maintained systems should have these modules installed. See L below for more information. It's a good idea to check that you are running a current version of these modules. Note: the "prog" document source in Swish-e bypasses many Swish-e configuration settings. For example, you cannot use the L directive with the "prog" document source. This is by design to limit the overhead when using an external program for providing documents to swish; after all, with "prog", if you don't want to index a file, then don't give it to swish to index in the first place. So, for spidering, if you do not wish to index images, for example, you will need to either filter by the URL or by the content-type returned from the web server. See L below for more information. =head2 Robots Exclusion Rules and being nice By default, this script will not spider files blocked by F. In addition, The script will check for Emeta name="robots"..E tags, which allows finer control over what files are indexed and/or spidered. See http://www.robotstxt.org/wc/exclusion.html for details. This spider provides an extension to the EmetaE tag exclusion, by adding a B attribute. This attribute turns on the C setting, which asks swish-e to only index the document's title (or file name if not title is found). For example: says to just index the document's title, but don't index its contents, and don't follow any links within the document. Granted, it's unlikely that this feature will ever be used... If you are indexing your own site, and know what you are doing, you can disable robot exclusion by the C configuration parameter, described below. This disables both F and the meta tag parsing. You may disable just the meta tag parsing by using C. This script only spiders one file at a time, so load on the web server is not that great. And with libwww-perl-5.53_91 HTTP/1.1 keep alive requests can reduce the load on the server even more (and potentially reduce spidering time considerably). Still, discuss spidering with a site's administrator before beginning. Use the C to adjust how fast the spider fetches documents. Consider running a second web server with a limited number of children if you really want to fine tune the resources used by spidering. =head2 Duplicate Documents The spider program keeps track of URLs visited, so a document is only indexed one time. The Digest::MD5 module can be used to create a "fingerprint" of every page indexed and this fingerprint is used in a hash to find duplicate pages. For example, MD5 will prevent indexing these as two different documents: http://localhost/path/to/some/index.html http://localhost/path/to/some/ But note that this may have side effects you don't want. If you want this file indexed under this URL: http://localhost/important.html But the spider happens to find the exact content in this file first: http://localhost/developement/test/todo/maybeimportant.html Then only that URL will be indexed. =head2 Broken relative links Sometimes web page authors use too many C segments in relative URLs which reference documents above the document root. Some web servers such as Apache will return a 400 Bad Request when requesting a document above the root. Other web servers such as Micorsoft IIS/5.0 will try and "correct" these errors. This correction will lead to loops when spidering. The spider can fix these above-root links by placing the following in your spider config: remove_leading_dots => 1, It is not on by default so that the spider can report the broken links (as 400 errors on sane webservers). =head2 Compression If The Perl module Compress::Zlib is installed the spider will send the Accept-Encoding: gzip x-gzip header and uncompress the document if the server returns the header Content-Encoding: gzip Content-Encoding: x-gzip If The Perl distribution IO-Compress-Zlib is installed the spider will use this module to uncompress "gzip" (x-gzip) and also "deflate" compressed documents. The "compress" method is not supported. See RFC 2616 section 3.5 for more information. MD5 checksomes are done on the compressed data. MD5 may slow down indexing a tiny bit, so test with and without if speed is an issue (which it probably isn't since you are spidering in the first place). This feature will also use more memory. =head1 REQUIREMENTS Perl 5 (hopefully at least 5.00503) or later. You must have the LWP Bundle on your computer. Load the LWP::Bundle via the CPAN.pm shell, or download libwww-perl-x.xx from CPAN (or via ActiveState's ppm utility). Also required is the the HTML-Parser-x.xx bundle of modules also from CPAN (and from ActiveState for Windows). http://search.cpan.org/search?dist=libwww-perl http://search.cpan.org/search?dist=HTML-Parser You will also need Digest::MD5 if you wish to use the MD5 feature. HTML::Tagset is also required. Other modules may be required (for example, the pod2xml.pm module has its own requirementes -- see perldoc pod2xml for info). The spider.pl script, like everyone else, expects perl to live in /usr/local/bin. If this is not the case then either add a symlink at /usr/local/bin/perl to point to where perl is installed or modify the shebang (#!) line at the top of the spider.pl program. Note that the libwww-perl package does not support SSL (Secure Sockets Layer) (https) by default. See F included in the libwww-perl package for information on installing SSL support. =head1 CONFIGURATION FILE The spider configuration file is a read by the script as Perl code. This makes the configuration a bit more complex than simple text config files, but allows the spider to be configured programmatically. For example, the config file can contain logic for testing URLs against regular expressions or even against a database lookup while running. The configuration file sets an array called C<@servers>. This array can contain one or more hash structures of parameters. Each hash structure is a configuration for a single server. Here's an example: my %main_site = ( base_url => 'http://example.com', same_hosts => 'www.example.com', email => 'admin@example.com', ); my %news_site = ( base_url => 'http://news.example.com', email => 'admin@example.com', ); @servers = ( \%main_site, \%news_site ); 1; The above defines two Perl hashes (%main_site and %news_site) and then places a *reference* (the backslash before the name of the hash) to each of those hashes in the @servers array. The "1;" at the end is required at the end of the file (Perl must see a true value at the end of the file). The C is the first parameter passed to the spider script. ./spider.pl F If you do not specify a config file then the spider will look for the file F in the current directory. The Swish-e distribution includes a F file with a few example configurations. This example file is installed in the F documentation directory (on unix often this is /usr/local/share/swish-e/prog-bin). When the special config file name "default" is used: SwishProgParameters default http://www.mysite/index.html [] [...] Then a default set of parameters are used with the spider. This is a good way to start using the spider before attempting to create a configuration file. The default settings skip any urls that look like images (well, .gif .jpeg .png), and attempts to filter PDF and MS Word documents IF you have the required filter programs installed (which are not part of the Swish-e distribution). The spider will follow "a" and "frame" type of links only. Note that if you do use a spider configuration file that the default configuration will NOT be used (unless you set the "use_default_config" option in your config file). =head1 CONFIGURATION OPTIONS This describes the required and optional keys in the server configuration hash, in random order... =over 4 =item base_url This required setting is the starting URL for spidering. This sets the first URL the spider will fetch. It does NOT limit spidering to URLs at or below the level of the directory specified in this setting. For that feature you need to use the C callback function. Typically, you will just list one URL for the base_url. You may specify more than one URL as a reference to a list and each will be spidered: base_url => [qw! http://swish-e.org/ http://othersite.org/other/index.html !], but each site will use the same config opions. If you want to index two separate sites you will likely rather add an additional configuration to the @servers array. You may specify a username and password: base_url => 'http://user:pass@swish-e.org/index.html', If a URL is protected by Basic Authentication you will be prompted for a username and password. The parameter C controls how long to wait for user entry before skipping the current URL. See also C below. =item same_hosts This optional key sets equivalent B name(s) for the site you are spidering. For example, if your site is C but also can be reached by C (with or without C) and also C then: Example: $serverA{base_url} = 'http://www.mysite.edu/index.html'; $serverA{same_hosts} = ['mysite.edu', 'web.mysite.edu']; Now, if a link is found while spidering of: http://web.mysite.edu/path/to/file.html it will be considered on the same site, and will actually spidered and indexed as: http://www.mysite.edu/path/to/file.html Note: This should probably be called B because it compares the URI C against the list of host names in C. So, if you specify a port name in you will want to specify the port name in the the list of hosts in C: my %serverA = ( base_url => 'http://mytest.site.invalid:4444/', same_hosts => [ qw/www.mytest.site.invalid:4444/ ], email => 'my@email.address', ); =item email This required key sets the email address for the spider. Set this to your email address. =item agent This optional key sets the name of the spider. =item link_tags This optional tag is a reference to an array of tags. Only links found in these tags will be extracted. The default is to only extract links from EaE tags. For example, to extract tags from C tags and from C tags: my %serverA = ( base_url => 'http://mytest.site.invalid:4444/', same_hosts => [ qw/www.mytest.site.invalid:4444/ ], email => 'my@email.address', link_tags => [qw/ a frame /], ); =item use_default_config This option is new for Swish-e 2.4.3. The spider has a hard-coded default configuration that's available when the spider is run with the configuration file listed as "default": ./spider.pl default This default configuration skips urls that match the regular expression: /\.(?:gif|jpeg|png)$/i and the spider will attempt to use the SWISH::Filter module for filtering non-text documents. (You still need to install programs to do the actual filtering, though). Here's the basic config for the "default" mode: @servers = ( { email => 'swish@user.failed.to.set.email.invalid', link_tags => [qw/ a frame /], keep_alive => 1, test_url => sub { $_[0]->path !~ /\.(?:gif|jpeg|png)$/i }, test_response => $response_sub, use_head_requests => 1, # Due to the response sub filter_content => $filter_sub, } ); The filter_content callback will be used if SWISH::Filter was loaded and ready to use. This doesn't mean that filtering will work automatically -- you will likely need to install aditional programs for filtering (like Xpdf or Catdoc). The test_response callback will be set to test if a given content type can be filtered by SWISH::Filter (if SWISH::Filter was loaded), otherwise, it will check for content-type of text/* -- any text type of document. Normally, if you specify your own config file: ./spider.pl my_own_spider.config then you must setup those features available in the default setting in your own config file. But, if you wish to build upon the "default" config file then set this option. For example, to use the default config but specify your own email address: @servers = ( { email => my@email.address, use_default_config => 1, delay_sec => 0, }, ); 1; What this does is "merge" your config file with the default config file. =item delay_sec This optional key sets the delay in seconds to wait between requests. See the LWP::RobotUA man page for more information. The default is 5 seconds. Set to zero for no delay. When using the keep_alive feature (recommended) the delay will be used only where the previous request returned a "Connection: closed" header. =item delay_min (deprecated) Set the delay to wait between requests in minutes. If both delay_sec and delay_min are defined, delay_sec will be used. =item max_wait_time This setting is the number of seconds to wait for data to be returned from the request. Data is returned in chunks to the spider, and the timer is reset each time a new chunk is reported. Therefore, documents (requests) that take longer than this setting should not be aborted as long as some data is received every max_wait_time seconds. The default it 30 seconds. NOTE: This option has no effect on Windows. =item max_time This optional key will set the max minutes to spider. Spidering for this host will stop after C minutes, and move on to the next server, if any. The default is to not limit by time. =item max_files This optional key sets the max number of files to spider before aborting. The default is to not limit by number of files. This is the number of requests made to the remote server, not the total number of files to index (see C). This count is displayted at the end of indexing as C. This feature can (and perhaps should) be use when spidering a web site where dynamic content may generate unique URLs to prevent run-away spidering. =item max_indexed This optional key sets the max number of files that will be indexed. The default is to not limit. This is the number of files sent to swish for indexing (and is reported by C when spidering ends). =item max_size This optional key sets the max size of a file read from the web server. This B to 5,000,000 bytes. If the size is exceeded the resource is skipped and a message is written to STDERR if the DEBUG_SKIPPED debug flag is set. Set max_size to zero for unlimited size. If the server returns a Content-Length header then that will be used. Otherwise, the document will be checked for size limitation as it arrives. That's a good reason to have your server send Content-Length headers. See also C below. =item keep_alive This optional parameter will enable keep alive requests. This can dramatically speed up spidering and reduce the load on server being spidered. The default is to not use keep alives, although enabling it will probably be the right thing to do. To get the most out of keep alives, you may want to set up your web server to allow a lot of requests per single connection (i.e MaxKeepAliveRequests on Apache). Apache's default is 100, which should be good. When a connection is not closed the spider does not wait the "delay_sec" time when making the next request. In other words, there is no delay in requesting documents while the connection is open. Note: try to filter as many documents as possible B making the request to the server. In other words, use C to look for files ending in C<.html> instead of using C to look for a content type of C if possible. Do note that aborting a request from C will break the current keep alive connection. Note: you must have at least libwww-perl-5.53_90 installed to use this feature. =item use_head_requests This option is new as of swish-e 2.4.3 and can effect the speed of spidering and the load of the web server. To understand this you will likely need to read about the L below -- specifically about the C callback function. This option is also only used when C is also enabled (although it could be debated that it's useful without keep alives). This option tells the spider to use http HEAD requests before each request. Normally, the spider simply does a GET request and after receiving the first chunk of data back from the web server calls the C callback function (if one is defined in your config file). The C callback function is a good place to test the content-type header returned from the server and reject types that you do not want to index. Now, *if* you are using the C feature then rejecting a document will often (always?) break the keep alive connection. So, what the C option does is issue a HEAD request for every document, checks for a Content-Length header (to check if the document is larger than C, and then calls your C callback function. If your callback function returns true then a GET request is used to fetch the document. The idea is that by using HEAD requests instead of GET request a false return from your C callback function (i.e. rejecting the document) will not break the keep alive connection. Now, don't get too excited about this. Before using this think about the ratio of rejected documents to accepted documents. If you reject no documents then using this feature will double the number of requests to the web server -- which will also double the number of connections to the web server. But, if you reject a large percentage of documents then this feature will help maximize the number of keep alive requests to the server (i.e. reduce the number of separate connections needed). There's also another problem with using HEAD requests. Some broken servers may not respond correctly to HEAD requests (some issues a 500 error), but respond fine to a normal GET request. This is something to watch out for. Finally, if you do not have a C callback AND C is set to zero then setting C will have no effect. And, with all other factors involved you might find this option has no effect at all. =item skip This optional key can be used to skip the current server. It's only purpose is to make it easy to disable a specific server hash in a configuration file. =item debug Set this item to a comma-separated list of debugging options. Options are currently: errors, failed, headers, info, links, redirect, skipped, url Here are basically the levels: errors => general program errors (not used at this time) url => print out every URL processes headers => prints the response headers failed => failed to return a 200 skipped => didn't index for some reason info => a little more verbose links => prints links as they are extracted redirect => prints out redirected URLs Debugging can be also be set by an environment variable SPIDER_DEBUG when running F. You can specify any of the above debugging options, separated by a comma. For example with Bourne type shell: SPIDER_DEBUG=url,links spider.pl [....] Before Swish-e 2.4.3 you had to use the internal debugging constants or'ed together like so: debug => DEBUG_URL | DEBUG_FAILED | DEBUG_SKIPPED, You can still do this, but the string version is easier. In fact, if you want to turn on debugging dynamically (for example in a test_url() callback function) then you currently *must* use the DEBUG_* constants. The string is converted to a number only at the start of spiderig -- after that the C parameter is converted to a number. =item quiet If this is true then normal, non-error messages will be supressed. Quiet mode can also be set by setting the environment variable SPIDER_QUIET to any true value. SPIDER_QUIET=1 =item max_depth The C parameter can be used to limit how deeply to recurse a web site. The depth is just a count of levels of web pages descended, and not related to the number of path elements in a URL. A max_depth of zero says to only spider the page listed as the C. A max_depth of one will spider the C page, plus all links on that page, and no more. The default is to spider all pages. =item ignore_robots_file If this is set to true then the robots.txt file will not be checked when spidering this server. Don't use this option unless you know what you are doing. =item use_cookies If this is set then a "cookie jar" will be maintained while spidering. Some (poorly written ;) sites require cookies to be enabled on clients. This requires the HTTP::Cookies module. =item use_md5 If this setting is true, then a MD5 digest "fingerprint" will be made from the content of every spidered document. This digest number will be used as a hash key to prevent indexing the same content more than once. This is helpful if different URLs generate the same content. Obvious example is these two documents will only be indexed one time: http://localhost/path/to/index.html http://localhost/path/to/ This option requires the Digest::MD5 module. Spidering with this option might be a tiny bit slower. =item validate_links Just a hack. If you set this true the spider will do HEAD requests all links (e.g. off-site links), just to make sure that all your links work. =item credentials You may specify a username and password to be used automatically when spidering: credentials => 'username:password', A username and password supplied in a URL will override this setting. This username and password will be used for every request. See also the C callback function below. C, if defined, will be called when a page requires authorization. =item credential_timeout Sets the number of seconds to wait for user input when prompted for a username or password. The default is 30 seconds. Set this to zero to wait forever. Probably not a good idea. Set to undef to disable asking for a password. credential_timeout => undef, =item remove_leading_dots Removes leading dots from URLs that might reference documents above the document root. The default is to not remove the dots. =back =head1 CALLBACK FUNCTIONS Callback functions can be defined in your parameter hash. These optional settings are I subroutines that are called while processing URLs. A little perl discussion is in order: In perl, a scalar variable can contain a reference to a subroutine. The config example above shows that the configuration parameters are stored in a perl I. my %serverA = ( base_url => 'http://mytest.site.invalid:4444/', same_hosts => [ qw/www.mytest.site.invalid:4444/ ], email => 'my@email.address', link_tags => [qw/ a frame /], ); There's two ways to add a reference to a subroutine to this hash: sub foo { return 1; } my %serverA = ( base_url => 'http://mytest.site.invalid:4444/', same_hosts => [ qw/www.mytest.site.invalid:4444/ ], email => 'my@email.address', link_tags => [qw/ a frame /], test_url => \&foo, # a reference to a named subroutine ); Or the subroutine can be coded right in place: my %serverA = ( base_url => 'http://mytest.site.invalid:4444/', same_hosts => [ qw/www.mytest.site.invalid:4444/ ], email => 'my@email.address', link_tags => [qw/ a frame /], test_url => sub { reutrn 1; }, ); The above example is not very useful as it just creates a user callback function that always returns a true value (the number 1). But, it's just an example. The function calls are wrapped in an eval, so calling die (or doing something that dies) will just cause that URL to be skipped. If you really want to stop processing you need to set $server-E{abort} in your subroutine (or send a kill -HUP to the spider). The first two parameters passed are a URI object (to have access to the current URL), and a reference to the current server hash. The C hash is just a global hash for holding data, and useful for setting flags as described below. Other parameters may be also passed in depending the the callback function, as described below. In perl parameters are passed in an array called "@_". The first element (first parameter) of that array is $_[0], and the second is $_[1], and so on. Depending on how complicated your function is you may wish to shift your parameters off of the @_ list to make working with them easier. See the examples below. To make use of these routines you need to understand when they are called, and what changes you can make in your routines. Each routine deals with a given step, and returning false from your routine will stop processing for the current URL. =over 4 =item test_url C allows you to skip processing of urls based on the url before the request to the server is made. This function is called for the C links (links you define in the spider configuration file) and for every link extracted from a fetched web page. This function is a good place to skip links that you are not interested in following. For example, if you know there's no point in requesting images then you can exclude them like: test_url => sub { my $uri = shift; return 0 if $uri->path =~ /\.(gif|jpeg|png)$/; return 1; }, Or to write it another way: test_url => sub { $_[0]->path !~ /\.(gif|jpeg|png)$/ }, Another feature would be if you were using a web server where path names are NOT case sensitive (e.g. Windows). You can normalize all links in this situation using something like test_url => sub { my $uri = shift; return 0 if $uri->path =~ /\.(gif|jpeg|png)$/; $uri->path( lc $uri->path ); # make all path names lowercase return 1; }, The important thing about C (compared to the other callback functions) is that it is called while I links, not while actually fetching that page from the web server. Returning false from C simple says to not add the URL to the list of links to spider. You may set a flag in the server hash (second parameter) to tell the spider to abort processing. test_url => sub { my $server = $_[1]; $server->{abort}++ if $_[0]->path =~ /foo\.html/; return 1; }, You cannot use the server flags: no_contents no_index no_spider This is discussed below. =item test_response This function allows you to filter based on the response from the remote server (such as by content-type). Web servers use a Content-Type: header to define the type of data returned from the server. On a web server you could have a .jpeg file be a web page -- file extensions may not always indicate the type of the file. If you enable C then this function is called after the spider makes a HEAD request. Otherwise, this function is called while the web pages is being fetched from the remote server, typically after just enought data has been returned to read the response from the web server. The test_response callback function is called with the following parameters: ( $uri, $server, $response, $content_chunk ) The $response variable is a HTTP::Response object and provies methods of examining the server's response. The $content_chunk is the first chunk of data returned from the server (if not a HEAD request). When not using C the spider requests a document in "chunks" of 4096 bytes. 4096 is only a suggestion of how many bytes to return in each chunk. The C routine is called when the first chunk is received only. This allows ignoring (aborting) reading of a very large file, for example, without having to read the entire file. Although not much use, a reference to this chunk is passed as the forth parameter. If you are spidering a site with many different types of content that you do not wish to index (and cannot use a test_url callback to determine what docs to skip) then you will see better performance using both the C and C features. (Aborting a GET request kills the keep-alive session.) For example, to only index true HTML (text/html) pages: test_response => sub { my $content_type = $_[2]->content_type; return $content_type =~ m!text/html!; }, You can also set flags in the server hash (the second parameter) to control indexing: no_contents -- index only the title (or file name), and not the contents no_index -- do not index this file, but continue to spider if HTML no_spider -- index, but do not spider this file for links to follow abort -- stop spidering any more files For example, to avoid index the contents of "private.html", yet still follow any links in that file: test_response => sub { my $server = $_[1]; $server->{no_index}++ if $_[0]->path =~ /private\.html$/; return 1; }, Note: Do not modify the URI object in this call back function. =item filter_content This callback function is called right before sending the content to swish. Like the other callback function, returning false will cause the URL to be skipped. Setting the C server flag and returning false will abort spidering. You can also set the C flag. This callback function is passed four parameters. The URI object, server hash, the HTTP::Response object, and a reference to the content. You can modify the content as needed. For example you might not like upper case: filter_content => sub { my $content_ref = $_[3]; $$content_ref = lc $$content_ref; return 1; }, I more reasonable example would be converting PDF or MS Word documents for parsing by swish. Examples of this are provided in the F directory of the swish-e distribution. You may also modify the URI object to change the path name passed to swish for indexing. filter_content => sub { my $uri = $_[0]; $uri->host('www.other.host') ; return 1; }, Swish-e's ReplaceRules feature can also be used for modifying the path name indexed. Note: Swish-e now includes a method of filtering based on the SWISH::Filter Perl modules. See the SwishSpiderConfig.pl file for an example how to use SWISH::Filter in a filter_content callback function. If you use the "default" configuration (i.e. pass "default" as the first parameter to the spider) then SWISH::Filter is used automatically. This only adds code for calling the programs to filter your content -- you still need to install applications that do the hard work (like xpdf for pdf conversion and catdoc for MS Word conversion). The a function included in the F for calling SWISH::Filter when using the "default" config can also be used in your config file. There's a function called swish_filter() that returns a list of two subroutines. So in your config you could do: my ($filter_sub, $response_sub ) = swish_filter(); @server = ( { test_response => $response_sub, filter_content => $filter_sub, [...], } ); The $response_sub is not required, but is useful if using HEAD requests (C): It tests the content type from the server to see if there's any filters that can handle the document. The $filter_sub does all the work of filtering a document. Make sense? If not, then that's what the Swish-e list is for. =item spider_done This callback is called after processing a server (after each server listed in the @servers array if more than one). This allows your config file to do any cleanup work after processing. For example, if you were keeping counts during, say, a test_response() callback function you could use the spider_done() callback to print the results. =item output_function If defined, this callback function is called instead of printing the content and header to STDOUT. This can be used if you want to store the output of the spider before indexing. The output_function is called with the following parameters: ($server, $content, $uri, $response, $bytecount, $path); Here is an example that simply shows two of the params passed: output_function => sub { my ($server, $content, $uri, $response, $bytecount, $path) = @_; print STDERR "passed: uri $uri, bytecount $bytecount...\n"; # no output to STDOUT for swish-e } You can do almost the same thing with a filter_content callback. =item get_password This callback is called when a HTTP password is needed (i.e. after the server returns a 401 error). The function can test the URI and Realm and then return a username and password separated by a colon: get_password => sub { my ( $uri, $server, $response, $realm ) = @_; if ( $uri->path =~ m!^/path/to/protected! && $realm eq 'private' ) { return 'joe:secret931password'; } return; # sorry, I don't know the password. }, Use the C setting if you know the username and password and they will be the same for every request. That is, for a site-wide password. =back Note that you can create your own counters to display in the summary list when spidering is finished by adding a value to the hash pointed to by C<$server-E{counts}>. test_url => sub { my $server = $_[1]; $server->{no_index}++ if $_[0]->path =~ /private\.html$/; $server->{counts}{'Private Files'}++; return 1; }, Each callback function B return true to continue processing the URL. Returning false will cause processing of I URL to be skipped. =head2 More on setting flags Swish (not this spider) has a configuration directive C that will instruct swish to index only the title (or file name), and not the contents. This is often used when indexing binary files such as image files, but can also be used with html files to index only the document titles. As shown above, you can turn this feature on for specific documents by setting a flag in the server hash passed into the C or C subroutines. For example, in your configuration file you might have the C callback set as: test_response => sub { my ( $uri, $server, $response ) = @_; # tell swish not to index the contents if this is of type image $server->{no_contents} = $response->content_type =~ m[^image/]; return 1; # ok to index and spider this document } The entire contents of the resource is still read from the web server, and passed on to swish, but swish will also be passed a C header which tells swish to enable the NoContents feature for this document only. Note: Swish will index the path name only when C is set, unless the document's type (as set by the swish configuration settings C or C) is HTML I a title is found in the html document. Note: In most cases you probably would not want to send a large binary file to swish, just to be ignored. Therefore, it would be smart to use a C callback routine to replace the contents with single character (you cannot use the empty string at this time). A similar flag may be set to prevent indexing a document at all, but still allow spidering. In general, if you want completely skip spidering a file you return false from one of the callback routines (C, C, or C). Returning false from any of those three callbacks will stop processing of that file, and the file will B be spidered. But there may be some cases where you still want to spider (extract links) yet, not index the file. An example might be where you wish to index only PDF files, but you still need to spider all HTML files to find the links to the PDF files. $server{test_response} = sub { my ( $uri, $server, $response ) = @_; $server->{no_index} = $response->content_type ne 'application/pdf'; return 1; # ok to spider, but don't index } So, the difference between C and C is that C will still index the file name, just not the contents. C will still spider the file (if it's C) but the file will not be processed by swish at all. B If C is set in a C callback function then the document I. That is, your C callback function will not be called. The C flag can be set to avoid spiderering an HTML file. The file will still be indexed unless C is also set. But if you do not want to index and spider, then simply return false from one of the three callback funtions. =head1 SIGNALS Sending a SIGHUP to the running spider will cause it to stop spidering. This is a good way to abort spidering, but let swish index the documents retrieved so far. =head1 CHANGES List of some of the changes =head2 Thu Sep 30 2004 - changes for Swish-e 2.4.3 Code reorganization and a few new featues. Updated docs a little tiny bit. Introduced a few spelling mistakes. =over 4 =item Config opiton: use_default_config It used to be that you could run the spider like: spider.pl default and the spider would use its own internal config. But if you used your own config file then the defaults were not used. This options allows you to merge your config with the default config. Makes making small changes to the default easy. =item Config option: use_head_requests Tells the spider to make a HEAD request before GET'ing the document from the web server. Useful if you use keep_alive and have a test_response() callback that rejects many documents (which breaks the connection). =item Config option: spider_done Callback to tell you (or tell your config as it may be) that the spider is done. Useful if you need to do some extra processing when done spidering -- like record counts to a file. =item Config option: get_password This callback is called when a document returns a 401 error needing a username and password. Useful if spidering a site proteced with multiple passwords. =item Config option: output_function If defined spider.pl calls this instead of sending ouptut to STDOUT. =item Config option: debug Now you can use the words instead of or'ing the DEBUG_* constants together. =back =head1 TODO Add a "get_document" callback that is called right before making the "GET" request. This would make it easier to use cached documents. You can do that now in a test_url callback or in a test_response when using HEAD request. Save state of the spider on SIGHUP so spidering could be restored at a later date. =head1 COPYRIGHT Copyright 2001 Bill Moseley This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =head1 SUPPORT Send all questions to the The SWISH-E discussion list. See http://swish-e.org/ =cut swish-e-2.4.7/prog-bin/SwishSpiderConfig.pl0000775000077100017500000002142011166010113015464 00000000000000=pod =head1 NAME SwishSpiderConfig.pl - Sample swish-e spider configuration =head1 DESCRIPTION This is a sample configuration file for the spider.pl program provided with the swish-e distribution. A spider.pl configuration file is not required as spider.pl has reasonable defaults. In fact, it's recommended that you only use a spider.pl configuration file *after* successfully indexing with spider.pl's default settings. To use the default settings run the spider using the special magical word "default" as the first parameter: spider.pl default [...] If no parameters are passed to spider.pl then spider.pl will look for a file called F in the current directory. A spider.pl config file is useful when you need to change the default behavior of the way spider.pl operates. For example, you may wish to index just part of your site, or tell the spider that example.com, www.example.com and web.example.com are all the same site. The configuration file is actually Perl code. This makes it possible to do reasonably complicated things directly within the config file. For example, parse HTML content into sections and index each section as a separate "document" allowing searches to be targeted. The spider.pl config file must set an array called "@servers". The "@servers" array holds one or more descriptions of a server to index. In other words, you may define multiple configurations to index different servers (or different parts of the same server) and group then together in the @servers array. Each server description is contained in a single Perl hash. For example, to index two sites define two Perl hashes: my %main_site = ( base_url => 'http://example.com', same_hosts => 'www.example.com', email => 'admin@example.com', ); my %news_site = ( base_url => 'http://news.example.com', email => 'admin@example.com', ); @servers = ( \%main_site, \%news_site ); 1; The above defines two Perl hashes (%main_site and %news_site) and then places a *reference* (the backslash before the name of the hash) to each of those hashes in the @servers array. The "1;" at the end is required at the end of the file (Perl must see a true value at the end of the file). Let's start out with a simple example. As of Swish-e 2.4.3 there's a new option that allow you to merge your config file with the default config file used when you specify "default" as the first parameter to F. So, say you only wanted to change the limit the number of files indexed. @servers = ( { use_default_config => 1, # same as using 'default' max_files => 100, }, ); 1; That last number one is important, by the way. It keeps Perl happy. Below are two example configurations, but included in the same @servers array (as anonymous Perl hashes). They both have the skip flag set which disables their use (this is just an example after all). The first is a simple example of a few parameters, and shows the use of a "test_url" function to limit what files are fetched from the server (in this example only .html files are fetched). The second example is slightly more complex and makes use the the SWISH::Filter module to filter documents (such as PDF and MS Word). Note: The examples below are outside "pod" documentation -- if you are reading this with the "perldoc" command you will not see the examples below. =cut # @servers is a list of hashes -- so you can spider more than one site # in one run (or different parts of the same tree) # The main program expects to use this array (@SwishSpiderConfig::servers). ### Please do not spider these examples -- spider your own servers, with permission #### #============================================================================= # This is a simple example, that includes a few limits # Only files ending in .html will be spidered (probably a bit too restrictive) @ servers = ({ skip => 1, # skip spidering this server base_url => 'http://www.swish-e.org/index.html', same_hosts => [ qw/swish-e.org/ ], agent => 'swish-e spider http://swish-e.org/', email => 'swish@domain.invalid', # limit to only .html files test_url => sub { $_[0]->path =~ /\.html?$/ }, delay_sec => 2, # Delay in seconds between requests max_time => 10, # Max time to spider in minutes max_files => 100, # Max Unique URLs to spider max_indexed => 20, # Max number of files to send to swish for indexing keep_alive => 1, # enable keep alives requests } ); 1; #============================================================================= # This example just shows more settings, and makes use of the SWISH::Filter # module for converting documents. Some sites require cookies, so this # config enables spider.pl's use of cookies, and also enables MD5 # checksums to catch duplicate pages (i.e. if / and /index.html point # to the same page). # This example also only indexes the "docs" sub-tree of the swish-e # site by checking the path of the URLs # Let spider.pl setup SWISH::Filter # These will be used in the config below my ($filter_sub, $response_sub) = swish_filter(); @servers = ({ skip => 0, # Flag to disable spidering this host. base_url => 'http://swish-e.org/current/docs/', same_hosts => [ qw/www.swish-e.org/ ], agent => 'swish-e spider http://swish-e.org/', email => 'swish@domain.invalid', keep_alive => 1, # Try to keep the connection open max_time => 10, # Max time to spider in minutes max_files => 20, # Max files to spider delay_secs => 2, # Delay in seconds between requests ignore_robots_file => 0, # Don't set that to one, unless you are sure. use_cookies => 1, # True will keep cookie jar # Some sites require cookies # Requires HTTP::Cookies use_md5 => 1, # If true, this will use the Digest::MD5 # module to create checksums on content # This will very likely catch files # with differet URLs that are the same # content. Will trap / and /index.html, # for example. # This will generate A LOT of debugging information to STDOUT debug => DEBUG_URL | DEBUG_SKIPPED | DEBUG_HEADERS, # Here are hooks to callback routines to validate urls and responses # Probably a good idea to use them so you don't try to index # Binary data. Look at content-type headers! test_url => \&test_url, test_response => $response_sub, filter_content => $filter_sub, } ); 1; #---------------------- Public Functions ------------------------------ # Here are some examples of callback functions # # # Use these to adjust skip/ignore based on filename/content-type # Or to filter content (pdf -> text, for example) # # Remember to include the code references in the config as above. # #---------------------------------------------------------------------- # This subroutine lets you check a URL before requesting the # document from the server # return false to skip the link sub test_url { my ( $uri, $server ) = @_; # return 1; # Ok to index/spider # return 0; # No, don't index or spider; # ignore any common image files return if $uri->path =~ /\.(gif|jpg|jpeg|png)?$/; # make sure that the path is limited to the docs path return $uri->path =~ m[^/current/docs/]; } ## Here's an example of a "test_response" callback. You would # add it to your config like: # # test_response => \&test_response_sub, # # This routine is called when the *first* block of data comes back # from the server. If you return false no more content will be read # from the server. $response is a HTTP::Response object. # It's useful for checking the content type of documents. # # For example, say we have a lot of audio files linked our our site that we # do not want to index. But we also have a lot of image files that we want # to index the path name only. sub test_response_sub { my ( $uri, $server, $response ) = @_; return if $response->content_type =~ m[^audio/]; # In this example set the "no_contents" flag for $server->{no_contents}++ unless $response->content_type =~ m[^image/]; return 1; # ok to index and spider } # Dont' forget to return a true value at the end... 1; swish-e-2.4.7/prog-bin/Makefile.in0000664000077100017500000003261611166010113013610 00000000000000# Makefile.in generated by automake 1.9.6 from Makefile.am. # @configure_input@ # Copyright (C) 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, # 2003, 2004, 2005 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. @SET_MAKE@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = .. am__cd = CDPATH="$${ZSH_VERSION+.}$(PATH_SEPARATOR)" && cd INSTALL = @INSTALL@ install_sh_DATA = $(install_sh) -c -m 644 install_sh_PROGRAM = $(install_sh) -c install_sh_SCRIPT = $(install_sh) -c INSTALL_HEADER = $(INSTALL_DATA) transform = $(program_transform_name) NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : build_triplet = @build@ host_triplet = @host@ subdir = prog-bin DIST_COMMON = README $(srcdir)/Makefile.am $(srcdir)/Makefile.in ACLOCAL_M4 = $(top_srcdir)/aclocal.m4 am__aclocal_m4_deps = $(top_srcdir)/config/acinclude.m4 \ $(top_srcdir)/configure.in am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \ $(ACLOCAL_M4) mkinstalldirs = $(SHELL) $(top_srcdir)/config/mkinstalldirs CONFIG_HEADER = $(top_builddir)/src/acconfig.h CONFIG_CLEAN_FILES = am__installdirs = "$(DESTDIR)$(libexecdir)" \ "$(DESTDIR)$(perlmoduledir)" "$(DESTDIR)$(exampledir)" libexecSCRIPT_INSTALL = $(INSTALL_SCRIPT) perlmoduleSCRIPT_INSTALL = $(INSTALL_SCRIPT) SCRIPTS = $(libexec_SCRIPTS) $(perlmodule_SCRIPTS) SOURCES = DIST_SOURCES = am__vpath_adj_setup = srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; am__vpath_adj = case $$p in \ $(srcdir)/*) f=`echo "$$p" | sed "s|^$$srcdirstrip/||"`;; \ *) f=$$p;; \ esac; am__strip_dir = `echo $$p | sed -e 's|^.*/||'`; exampleDATA_INSTALL = $(INSTALL_DATA) DATA = $(example_DATA) DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST) ACLOCAL = @ACLOCAL@ ALLOCA = @ALLOCA@ AMDEP_FALSE = @AMDEP_FALSE@ AMDEP_TRUE = @AMDEP_TRUE@ AMTAR = @AMTAR@ AR = @AR@ AS = @AS@ AUTOCONF = @AUTOCONF@ AUTOHEADER = @AUTOHEADER@ AUTOMAKE = @AUTOMAKE@ AWK = @AWK@ BTREE_OBJS = @BTREE_OBJS@ BUILDDOCS_FALSE = @BUILDDOCS_FALSE@ BUILDDOCS_TRUE = @BUILDDOCS_TRUE@ CC = @CC@ CCDEPMODE = @CCDEPMODE@ CFLAGS = @CFLAGS@ CPP = @CPP@ CPPFLAGS = @CPPFLAGS@ CXX = @CXX@ CXXCPP = @CXXCPP@ CXXDEPMODE = @CXXDEPMODE@ CXXFLAGS = @CXXFLAGS@ CYGPATH_W = @CYGPATH_W@ DEFS = @DEFS@ DEPDIR = @DEPDIR@ DLLTOOL = @DLLTOOL@ ECHO = @ECHO@ ECHO_C = @ECHO_C@ ECHO_N = @ECHO_N@ ECHO_T = @ECHO_T@ EGREP = @EGREP@ EXEEXT = @EXEEXT@ F77 = @F77@ FFLAGS = @FFLAGS@ INSTALLDOCS_FALSE = @INSTALLDOCS_FALSE@ INSTALLDOCS_TRUE = @INSTALLDOCS_TRUE@ INSTALL_DATA = @INSTALL_DATA@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@ LARGEFILES_MACROS = @LARGEFILES_MACROS@ LDFLAGS = @LDFLAGS@ LIBOBJS = @LIBOBJS@ LIBS = @LIBS@ LIBTOOL = @LIBTOOL@ LIBXML2_CFLAGS = @LIBXML2_CFLAGS@ LIBXML2_LIB = @LIBXML2_LIB@ LIBXML2_OBJS = @LIBXML2_OBJS@ LIBXML_REQUIRED_VERSION = @LIBXML_REQUIRED_VERSION@ LN_S = @LN_S@ LTLIBOBJS = @LTLIBOBJS@ MAINT = @MAINT@ MAINTAINER_MODE_FALSE = @MAINTAINER_MODE_FALSE@ MAINTAINER_MODE_TRUE = @MAINTAINER_MODE_TRUE@ MAKEINFO = @MAKEINFO@ OBJDUMP = @OBJDUMP@ OBJEXT = @OBJEXT@ PACKAGE = @PACKAGE@ PACKAGE_BUGREPORT = @PACKAGE_BUGREPORT@ PACKAGE_NAME = @PACKAGE_NAME@ PACKAGE_STRING = @PACKAGE_STRING@ PACKAGE_TARNAME = @PACKAGE_TARNAME@ PACKAGE_VERSION = @PACKAGE_VERSION@ PATH_SEPARATOR = @PATH_SEPARATOR@ PCRE_CFLAGS = @PCRE_CFLAGS@ PCRE_CONFIG = @PCRE_CONFIG@ PCRE_LIBS = @PCRE_LIBS@ PCRE_REQUIRED_VERSION = @PCRE_REQUIRED_VERSION@ PERL = @PERL@ POD2MAN = @POD2MAN@ RANLIB = @RANLIB@ SET_MAKE = @SET_MAKE@ SHELL = @SHELL@ STRIP = @STRIP@ SWISH_WEB = @SWISH_WEB@ VERSION = @VERSION@ XML2_CONFIG = @XML2_CONFIG@ Z_CFLAGS = @Z_CFLAGS@ Z_LIBS = @Z_LIBS@ ac_ct_AR = @ac_ct_AR@ ac_ct_AS = @ac_ct_AS@ ac_ct_CC = @ac_ct_CC@ ac_ct_CXX = @ac_ct_CXX@ ac_ct_DLLTOOL = @ac_ct_DLLTOOL@ ac_ct_F77 = @ac_ct_F77@ ac_ct_OBJDUMP = @ac_ct_OBJDUMP@ ac_ct_RANLIB = @ac_ct_RANLIB@ ac_ct_STRIP = @ac_ct_STRIP@ am__fastdepCC_FALSE = @am__fastdepCC_FALSE@ am__fastdepCC_TRUE = @am__fastdepCC_TRUE@ am__fastdepCXX_FALSE = @am__fastdepCXX_FALSE@ am__fastdepCXX_TRUE = @am__fastdepCXX_TRUE@ am__include = @am__include@ am__leading_dot = @am__leading_dot@ am__quote = @am__quote@ am__tar = @am__tar@ am__untar = @am__untar@ bindir = @bindir@ build = @build@ build_alias = @build_alias@ build_cpu = @build_cpu@ build_os = @build_os@ build_vendor = @build_vendor@ datadir = @datadir@ exec_prefix = @exec_prefix@ host = @host@ host_alias = @host_alias@ host_cpu = @host_cpu@ host_os = @host_os@ host_vendor = @host_vendor@ includedir = @includedir@ infodir = @infodir@ install_sh = @install_sh@ libdir = @libdir@ libexecdir = @libexecdir@ localstatedir = @localstatedir@ mandir = @mandir@ mkdir_p = @mkdir_p@ oldincludedir = @oldincludedir@ prefix = @prefix@ program_transform_name = @program_transform_name@ sbindir = @sbindir@ sharedstatedir = @sharedstatedir@ sysconfdir = @sysconfdir@ target_alias = @target_alias@ perlmoduledir = $(libexecdir)/perl exampledir = $(datadir)/doc/$(PACKAGE)/examples/prog-bin libexec_SCRIPTS = spider.pl DirTree.pl # These are really out dated perlmodule_SCRIPTS = \ doc2txt.pm \ pdf2html.pm \ pdf2xml.pm CLEANFILES = spider.pl DirTree.pl other_examples = \ README \ file.pl \ SwishSpiderConfig.pl \ MySQL.pl \ index_hypermail.pl \ pdf2xml.pm \ pdf2html.pm \ doc2txt.pm example_DATA = $(other_examples) EXTRA_DIST = \ spider.pl.in \ DirTree.pl.in \ $(other_examples) all: all-am .SUFFIXES: $(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am $(am__configure_deps) @for dep in $?; do \ case '$(am__configure_deps)' in \ *$$dep*) \ cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh \ && exit 0; \ exit 1;; \ esac; \ done; \ echo ' cd $(top_srcdir) && $(AUTOMAKE) --foreign prog-bin/Makefile'; \ cd $(top_srcdir) && \ $(AUTOMAKE) --foreign prog-bin/Makefile .PRECIOUS: Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status @case '$?' in \ *config.status*) \ cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh;; \ *) \ echo ' cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe)'; \ cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe);; \ esac; $(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh $(top_srcdir)/configure: @MAINTAINER_MODE_TRUE@ $(am__configure_deps) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh $(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps) cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh install-libexecSCRIPTS: $(libexec_SCRIPTS) @$(NORMAL_INSTALL) test -z "$(libexecdir)" || $(mkdir_p) "$(DESTDIR)$(libexecdir)" @list='$(libexec_SCRIPTS)'; for p in $$list; do \ if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \ if test -f $$d$$p; then \ f=`echo "$$p" | sed 's|^.*/||;$(transform)'`; \ echo " $(libexecSCRIPT_INSTALL) '$$d$$p' '$(DESTDIR)$(libexecdir)/$$f'"; \ $(libexecSCRIPT_INSTALL) "$$d$$p" "$(DESTDIR)$(libexecdir)/$$f"; \ else :; fi; \ done uninstall-libexecSCRIPTS: @$(NORMAL_UNINSTALL) @list='$(libexec_SCRIPTS)'; for p in $$list; do \ f=`echo "$$p" | sed 's|^.*/||;$(transform)'`; \ echo " rm -f '$(DESTDIR)$(libexecdir)/$$f'"; \ rm -f "$(DESTDIR)$(libexecdir)/$$f"; \ done install-perlmoduleSCRIPTS: $(perlmodule_SCRIPTS) @$(NORMAL_INSTALL) test -z "$(perlmoduledir)" || $(mkdir_p) "$(DESTDIR)$(perlmoduledir)" @list='$(perlmodule_SCRIPTS)'; for p in $$list; do \ if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \ if test -f $$d$$p; then \ f=`echo "$$p" | sed 's|^.*/||;$(transform)'`; \ echo " $(perlmoduleSCRIPT_INSTALL) '$$d$$p' '$(DESTDIR)$(perlmoduledir)/$$f'"; \ $(perlmoduleSCRIPT_INSTALL) "$$d$$p" "$(DESTDIR)$(perlmoduledir)/$$f"; \ else :; fi; \ done uninstall-perlmoduleSCRIPTS: @$(NORMAL_UNINSTALL) @list='$(perlmodule_SCRIPTS)'; for p in $$list; do \ f=`echo "$$p" | sed 's|^.*/||;$(transform)'`; \ echo " rm -f '$(DESTDIR)$(perlmoduledir)/$$f'"; \ rm -f "$(DESTDIR)$(perlmoduledir)/$$f"; \ done mostlyclean-libtool: -rm -f *.lo clean-libtool: -rm -rf .libs _libs distclean-libtool: -rm -f libtool uninstall-info-am: install-exampleDATA: $(example_DATA) @$(NORMAL_INSTALL) test -z "$(exampledir)" || $(mkdir_p) "$(DESTDIR)$(exampledir)" @list='$(example_DATA)'; for p in $$list; do \ if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \ f=$(am__strip_dir) \ echo " $(exampleDATA_INSTALL) '$$d$$p' '$(DESTDIR)$(exampledir)/$$f'"; \ $(exampleDATA_INSTALL) "$$d$$p" "$(DESTDIR)$(exampledir)/$$f"; \ done uninstall-exampleDATA: @$(NORMAL_UNINSTALL) @list='$(example_DATA)'; for p in $$list; do \ f=$(am__strip_dir) \ echo " rm -f '$(DESTDIR)$(exampledir)/$$f'"; \ rm -f "$(DESTDIR)$(exampledir)/$$f"; \ done tags: TAGS TAGS: ctags: CTAGS CTAGS: distdir: $(DISTFILES) @srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; \ topsrcdirstrip=`echo "$(top_srcdir)" | sed 's|.|.|g'`; \ list='$(DISTFILES)'; for file in $$list; do \ case $$file in \ $(srcdir)/*) file=`echo "$$file" | sed "s|^$$srcdirstrip/||"`;; \ $(top_srcdir)/*) file=`echo "$$file" | sed "s|^$$topsrcdirstrip/|$(top_builddir)/|"`;; \ esac; \ if test -f $$file || test -d $$file; then d=.; else d=$(srcdir); fi; \ dir=`echo "$$file" | sed -e 's,/[^/]*$$,,'`; \ if test "$$dir" != "$$file" && test "$$dir" != "."; then \ dir="/$$dir"; \ $(mkdir_p) "$(distdir)$$dir"; \ else \ dir=''; \ fi; \ if test -d $$d/$$file; then \ if test -d $(srcdir)/$$file && test $$d != $(srcdir); then \ cp -pR $(srcdir)/$$file $(distdir)$$dir || exit 1; \ fi; \ cp -pR $$d/$$file $(distdir)$$dir || exit 1; \ else \ test -f $(distdir)/$$file \ || cp -p $$d/$$file $(distdir)/$$file \ || exit 1; \ fi; \ done check-am: all-am check: check-am all-am: Makefile $(SCRIPTS) $(DATA) installdirs: for dir in "$(DESTDIR)$(libexecdir)" "$(DESTDIR)$(perlmoduledir)" "$(DESTDIR)$(exampledir)"; do \ test -z "$$dir" || $(mkdir_p) "$$dir"; \ done install: install-am install-exec: install-exec-am install-data: install-data-am uninstall: uninstall-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am installcheck: installcheck-am install-strip: $(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \ install_sh_PROGRAM="$(INSTALL_STRIP_PROGRAM)" INSTALL_STRIP_FLAG=-s \ `test -z '$(STRIP)' || \ echo "INSTALL_PROGRAM_ENV=STRIPPROG='$(STRIP)'"` install mostlyclean-generic: clean-generic: -test -z "$(CLEANFILES)" || rm -f $(CLEANFILES) distclean-generic: -test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES) maintainer-clean-generic: @echo "This command is intended for maintainers to use" @echo "it deletes files that may require special tools to rebuild." clean: clean-am clean-am: clean-generic clean-libtool mostlyclean-am distclean: distclean-am -rm -f Makefile distclean-am: clean-am distclean-generic distclean-libtool dvi: dvi-am dvi-am: html: html-am info: info-am info-am: install-data-am: install-exampleDATA install-perlmoduleSCRIPTS install-exec-am: install-libexecSCRIPTS install-info: install-info-am install-man: installcheck-am: maintainer-clean: maintainer-clean-am -rm -f Makefile maintainer-clean-am: distclean-am maintainer-clean-generic mostlyclean: mostlyclean-am mostlyclean-am: mostlyclean-generic mostlyclean-libtool pdf: pdf-am pdf-am: ps: ps-am ps-am: uninstall-am: uninstall-exampleDATA uninstall-info-am \ uninstall-libexecSCRIPTS uninstall-perlmoduleSCRIPTS .PHONY: all all-am check check-am clean clean-generic clean-libtool \ distclean distclean-generic distclean-libtool distdir dvi \ dvi-am html html-am info info-am install install-am \ install-data install-data-am install-exampleDATA install-exec \ install-exec-am install-info install-info-am \ install-libexecSCRIPTS install-man install-perlmoduleSCRIPTS \ install-strip installcheck installcheck-am installdirs \ maintainer-clean maintainer-clean-generic mostlyclean \ mostlyclean-generic mostlyclean-libtool pdf pdf-am ps ps-am \ uninstall uninstall-am uninstall-exampleDATA uninstall-info-am \ uninstall-libexecSCRIPTS uninstall-perlmoduleSCRIPTS # This is done here to stay in the GNU coding standards # libexecdir can be modified at make time, so can't use # variable substitution at configure time spider.pl: spider.pl.in @rm -f spider.pl @sed \ -e 's,@@perlmoduledir@@,$(libexecdir)/perl,' \ -e 's,@@swishbindir@@,$(bindir),' \ -e 's,@@perlbinary@@,$(PERL),' \ $(srcdir)/spider.pl.in > spider.pl DirTree.pl: DirTree.pl.in @rm -f spider.pl @sed \ -e 's,@@perlmoduledir@@,$(libexecdir)/perl,' \ -e 's,@@swishbindir@@,$(bindir),' \ -e 's,@@perlbinary@@,$(PERL),' \ $(srcdir)/DirTree.pl.in > DirTree.pl # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: swish-e-2.4.7/prog-bin/MySQL.pl0000775000077100017500000000261711166010113013046 00000000000000#!/usr/bin/perl -w use strict; =pod This is an example program to index data stored in a MySQL database. In this example, a table is read that contains "minutes" from some organization's meetings. The text of the minutes was stored in a blob, compressed with zlib. This script reads the record from MySQL, uncompresses the text, and formats for swish-e. The example below uses HTML, but you could also format at XML, or even plain text. =cut use DBI; use Compress::Zlib; use Time::Local; my %database = { dsn => 'dbi:mysql:test', user => 'user', password => 'pass', }; my $dbh = DBI->connect( @database{qw/ dsn user password /}, { RaiseError => 1 } ); my $sth = $dbh->prepare("select id, date, minutes from meetings"); $sth->execute(); while ( my( $id, $date, $minutes) = $sth->fetchrow_array ) { my $uncompressed = uncompress( $minutes ); my $unix_date = unixtime( $date ); my $content = < Minutes for meeting on date $date $uncompressed EOF my $length = length $content; print <; }